Thursday, January 10, 2013

Discarded Live Batteries

I'm a great fan of my noise canceling headphones, but I don't like the fact they they require a battery. I generally use Eneloop rechargeable batteries at home so that I don't have to act like a conveyer belt for alkaline battery migration from Costco to the landfill.

It seems that most devices that use AAA batteries require two of them. Inevitably, one battery of the two will drain while the other retains a good deal of its charge. Years ago, we used to use the trick of reversing the batteries after the first failure to squeeze the final charge out of the second battery (or was that just a Boy Scout thing?). People don't seem to do that anymore as it is easier to just replace them both and move on.

When I travel to the Mozilla Mothership for work several times per year, I mine the discarded battery bins for useful batteries. About fifty percent of the AAA batteries that I dumpster dive for retain nearly a full charge. The headphones take only one battery and they don't eat much power. I can get hours of music from batteries that would otherwise be discarded. My LED headlamp has been powered by discarded batteries for several years now.

Yeah, I'm smug about this tiny contribution to conservation. Yes, I know that the batteries still end up in the landfill eventually. I've not given up my commitment to rechargeables at home.

Monday, January 07, 2013

HBase as Socorro Crash Storage

Nota Bene: In this blog posting, I'm stretching my area expertise.  I suspect that there may be inaccuracies in my understanding of the specifics of how HBase, the Hadoop File System and Hadoop work.  I will make corrections to this posting as others point them out to me.

In 2010, Daniel Einspanjer and I conspired to migrate the Socorro primary data store from the file system to HBase. Instead of storing the all of our crashes in a monolithic NFS mounted filesystem, we'd save them in the Hadoop File System with HBase. Being a distributed file system, Hadoop FS would give us great headroom for growth as well as fault tolerance. Hbase, with techniques for higher level organization and indexing, brings a type of queries to the proverbial table. The ability to also execute analytical map reduce jobs over the corpus of crashes sealed the deal.

At the time of implementation, there was no agreed on standard Crash Storage API. The only thing we had to work with was the existing API for the File System Storage. It really wasn't entirely appropriate for the semantics of interacting with HBase. To expedite implementation, we agreed that Daniel would make a Pythonic API adaptation of the HBase / Thrift API for storing crashes. I would further adapt his work and, if necessary, add another layer to fit into the newly conceived Crash Storage framework (this was the predecessor to the modern Configman Crash Storage System).

Relational database programmers take for granted the indexing tools inherent in the relational data model. HBase programming is exactly unlike that. HBase code gives you a primitive data structure: the table and very little else. If you want indexing, you have to implement the technique yourself using additional HBase tables. Queries happen on tables by setting up scans that test every row of the table of interest. There are no joins. From the perspective of a relational database, this sounds horrific, but in reality it isn't so bad. The data is distributed across many machines, so the scan happens in parallel and can happen relatively quickly.

Rows in our tables in HBase have a key called a 'row_id'.   Just like we assign meaning to regions of our own string crash_id, parts of a string row_id have meaning to HBase.  The first character of the row_id controls which region of the distributed file system in which the row is stored.  It is our best interest to make sure that these are evenly distributed across the regions.

Our crash_ids start out as a UUID before we alter them.  The possible values of the first character is guaranteed to evenly distributed across the domain of hex characters.  We use that first character of a crash_id as the first character of the row_id.

We frequently want to look at sets of crashes by date.  The interface that we have for HBase to gives us the option of a prefix scan of HBase tables. A prefix scan will return table rows for which a prefix matches row_ids.  We make the second through seventh character of a row_id is the date of submission, identical to the last six digits of the crash_id. This facilitates iterating over date ranges.

For example, if we want to see all the crashes from January 1, 2013, we'd do a prefix scan sixteen times (once for each possible hex digit of the first character):

    0120101
    1120101
    2120101
    ...
    F120101

The composition of an HBase row_id from a crash_id.



The Socorro HBase Tables


Our HBase schema consists of one data table and several index and statistics tables.
  • crash_reports 
  • crash_reports_index_hang_id_submitted_time 
  • crash_reports_index_hang_id 
  • crash_reports_index_legacy_unprocessed_flag crash_reports_index_unprocessed_flag 
  • crash_reports_index_signature_ooid 
  • metrics

crash_reports 

the main data table containing all three parts of a crash: raw_crash, dump, and processed_crash. It uses the row_id form outlined above as the row identifier.  The values are stored in rows with columns as in a relational database, but the columns don't have to be the same in every row (which in my mind means that they aren't really columns). Columns are divided into families:

  • ids : the column family for any ids associated with the crash.
    • 'ooid' – the original crash_id assigned by the collector
    • 'hang_id' – the id shared by a hang pair. This is a deprecated concept that ought to be excised from the code. It was replaced by crashes with multiple dumps.
  • raw_data : this is the column family for the binary breakpad dumps. The main raw binary crash dump is in the column 'dump'. While the auxiliary dumps use the names originally given to them by breakpad on the client at the time of the crash.
  • meta_data : this is the column family home of the raw_crash data. It contains a single column called 'json'. An opinion: I find it most unfortunate that we weren't more vigilant in naming columns. 'json' is really a type name rather than a meaningful identifier of the purpose of the data.
  • processed_data : this is the family for data regarding the output of the processor.
    • 'json' – the unfortunate name for the processed_crash in json form
    • 'signature' – the signature from the processed_crashs
  • flags : this a set of binary flags that describe the state of the data in the row:
    • 'processed' – 'Y' if the raw_crash has been processed to make a processed_crash, otherwise 'N'
    • 'legacy_processing' – 'Y' if the collector's throttling system indicated that this crash was to be processed rather than deferred, otherwise the column just doesn't exist.
  • timestamps : a column family for timestamps associated with the crash 
    • 'submitted' – the timestamp at the time of the submission to the collector
    • 'processed' – the timestamp of the time of completion of processing by the processor.

crash_reports_index_hang_id_submitted_time 

This is a deprecated table for hang pairs. Hangs used to be submitted in multiple crashes. As of release 34, Socorro will support multidump crashes instead. 

crash_reports_index_hang_id 

Another deprecated table involved in supporting crash hang pairs superseded by multidump.

crash_reports_index_legacy_unprocessed_flag 

This table serves as an index for the crash_reports table. It consists of a single column family 'ids' with a single column, 'ooid'. Every row in this table corresponds to a row in the 'crash_reports' table. It is used a queue for crashes that were marked for processing. The monitor scans this table and assigns crash_ids to the processors.

crash_reports_index_unprocessed_flag 

This tables serves as the list of crashes that have not been processed across all storage. Having just a single column family called 'ids', only the crash_id of a given unprocessed crash is stored in this table.  It appears that we're not directly using this table in our Socorro code.  However, values from this table are available for analysis using Hadoop tools.

crash_reports_index_signature_ooid 

This table is an indexing scheme for finding signatures. Each row has a unique key consisting of the union of a crash signature and a crash_id. The data in the row is the a single column family called 'ids' containing a single row called 'ooid'.

metrics 

This is a table of raw statistics. As crashes are added or removed from HBase, counters are incremented or decremented in concert. This table also hold some statistics about aspect of the flow of crashes: the counters: year, month, day, hour, minute, total inserts, number of throttled crashes, number of hangs, number of hangs by hang type, current number of unprocessed crashes, current number of throttle crashes that are unprocessed.  An opinion: I cannot see anywhere in our codebase that we are using any of these stats.  Perhaps we ought to start using them, or stop collecting them.

The HBase Client API 


The module socorro.external.hbase.hbaseclient defines a connection class for HBase called HBaseConnection. The base class defines the semantics of the connection: establish connection, close connection. The derived class,  HBaseConnectionForCrashReports, embues a connection with Socorro domain specific methods such as saving raw and processed crashes, fetching crashes and scanning for crashes with a given attribute. This module is also is a standalone HBase query tool thah can be used at the command line to execute any of the domain specific methods.  This is very useful for debugging Socorro and Hbase.

The HBase Connection Retry System 

In the first month of our deployment of HBase and then continuing periodically for all the years since, we have trouble establishing and maintaining connections to HBase. Rather than having users of the module handle connection troubles, we added a retry decorator to all the domain methods of the HBase connection class. This means that if a domain function raises an exception from a list of exceptions eligible for retry (like timeout, or server unavailable), the decorator code catches the exception and retries the method from the beginning. The decorator uses a looping constraint, so that it will only retry a certain number of times. Should the method fail and the number of retries if exhausted, the exception will escape out for the client application to deal with it.

In the modern Crash Storage System, all HBase actions are placed in a transaction object. This object encapsulates the retry behavior using a time delay back off. When the hbase client module is refactored or rewritten, the native retry behavior will be removed in favor of the transaction object's behavior. Right now in 2013, both systems are in use and they don't interfere with each other.

The Live Command Line Program 

The module also includes a command line tool that allows any of the Socorro domain methods to be invoked from the command line. This is very handy for manual testing or collecting crash data for out-of-system analysis.

To use this feature, invoke it like this, then follow the instructions regard the methods supported.
    python ./socorro/external/hbase/hbase_client.py --help

    Usage: ./socorro/external/hbase/hbase_client.py [-h host[:port]] command [arg1 [arg2...]]

    Commands:
        Crash Report specific:
          get_report ooid
          get_json ooid
          get_dump ooid
          get_processed_json ooid
          get_report_processing_state ooid
          union_scan_with_prefix table prefix columns [limit]
          merge_scan_with_prefix table prefix columns [limit]
          put_json_dump ooid json dump
          put_json_dump_from_files ooid json_path dump_path
          export_jsonz_for_date YYMMDD export_path
          export_jsonz_tarball_for_date YYMMDD temp_path tarball_name
          export_jsonz_tarball_for_ooids temp_path tarball_name <stdin ooids list>
          export_sampled_crashes_tarball_for_dates sample_size
                   YYMMDD,YYMMDD,... path tarball_name
        HBase generic:
          describe_table table_name
          get_full_row table_name row_id


Up Next - The HBase and the Crash Storage API and the Future of HBase in Socorro

Sunday, January 06, 2013

What's Next For Socorro File System Storage?

The File System Storage System (FSS) plays a critical role in Socorro as a crash storage buffer standing between the collectors and HBase. Because of the potential instability of HBase connections and our mandate to never lose a crash, the collectors write to our very reliable FSS. Once crashes are safely ensconced in a local file system, the crash mover apps will spool the crashes into HBase as they can. At Mozilla, that's the only role for the FSS in our production system.

There are other uses for FSS. Prior to the adoption of HBase, it served as our primary storage scheme. Since other organizations are interested in using Socorro and their storage needs may not be as extreme as ours, the FSS may be perfectly adequate. In addition, while developing for Socorro, it is useful to have a complete live Socorro installation available. If you've ever tried to install and maintain an HBase installation on a virtual machine on a laptop, you likely would kill for an alternative. For the support of the community of Socorro users as well as our own developers, we're investing in maintaining FSS as a completely functional primary storage mechanism for Socorro. It will only take a switch in a configuration file to chose between either storage system (or future alternate implementations).

Work to rewrite the existing FSS code both begins and is slated for completion in Q1 2013. I have several goals in this rewrite:
  1. the public API should match the Crash Storage API exactly with no need for adapters 
  2. there should be a two class inheritance hierarchy for the two file system layouts (with vs. without date branch structure) 
  3. the existing PolyCrashStorage and/or Fallback Storage Classes should be subclassed (or used as model) for the case where a need for separate standard and deferred crash storage. 
  4. Implemented in parallel with a full suite of tests

No Need For Adapters 

The existing FSS implementation has an API that was fine at the time of its implementation, but is awkward now in view of the more refined Crash Storage API.

For example, the class underlying the saving of a raw crash (called JsonDumpStorage) has a method for saving a raw crash and its associated binary crash dump. Rather just accepting a raw crash and dump, the JsonDumpStorage sets up the directory structure and then returns a tuple of open file handles for the raw crash and dump respectively. It expects the client of the module to do the work of actually writing the file contents and then follow through with closing the open handles.

In my proof of concept implementation of FSS shoe horned into the Crash Storage API, I had to make adapting code to do the work writing the two files under the 'save_raw_crash' and 'save_dump' Crash Storage API. Functionality like this ought to be pushed into the implementation of FSS, minimizing responsibilities of the Crash Storage API code.

Two Related Implementation Classes 

In my previous blog posting, I showed how there are two primary file system layouts in FSS: with and without date branch indexing. The general case of without date branch indexing should be written as the function of the base class. A single derived class should add the date class to the functionality of the base class.

Standard vs. Deferred Storage 

In the case where separate standard and deferred storage is desired, there is already precedent for storage location decisions to be made in the Crash Storage API implementation. The code within the FSS classes should be moved out to the Crash Storage API level. That will give the flexibility of using totally different storage schemes for standard and deferred storage independent of FSS. For example, standard storage could be HBase while deferred storage could be the FSS. The NullStorage class could even be employed to throw away crashes destined for deferred storage.

Tests 

The original code was written entirely without tests or with an ambiguous separation of unit vs. integration tests. This doesn't require much dicussion, test are important and ought to be written in parallel with writing the actual FSS code.

The Future

Having been written in 2008, the File System Storage code is probably the oldest Socorro code still in use.  That makes it about three centuries old in Internet years.  It has been a reliable and critical work horse Socorro.  With the upcoming modernizing changes, the FSS code will survive for another few Internet centuries supporting Socorro installations way beyond Mozilla.

Thursday, January 03, 2013

More Socorro File System Storage


In my last blog posting, I discussed the inner workings of the File System Crash storage mechanism. Today, I'm going to talk about how that system has been retrofitted into the modern Crash Storage API. First, however, that's going to require another excursion into the past.

The early installation of Socorro had hardware resource problems. It wasn't clear at the beginning exactly how many crashes we'd be getting or what the size of the crashes were going to be. For the first couple years we were starved for processing power and disk space. If I recall correctly, our machines were surplussed from AMO. Our Postgres server had only 4G of RAM and non-local disk storage unable to rival the performance of laptops given to employees of that era.

Our mandate was to for a developer to be able to see the results of processing any crash within sixty seconds of a request. At the time, just running MDSW would take thirty seconds. It was clear to me that we could not afford to process every crash that we received. If we tried, we'd slip further and further behind. I recall one point having a backlog of over five million crashes. It was taking days between  submission and processing.

We decided to start processing a sampling of crashes and arbitrarily chose fifteen percent. I implemented a sampling system that eventually evolved into the selective throttling system that is in use today. This throttling system divided the population of crashes into two sets: processed and deferred. The mandate was to store all crashes. Any crash, even if not initially processed, had to be eligible for processing on demand within one minute. My aforementioned priority processing scheme was to handle the sixty second on demand requirement.

Disk space constraints lead us to save the standard and deferred crash populations separately. We used the file system storage scheme twice for raw crash storage. “Standard Storage” was for crashes destined for processing. “Deferred Storage” was for crashes that were not processed unless specifically requested. We used the file system crash storage a third time to store the processing results in “Processed Storage”. The latter two file system storage schemes were never in need of the indexing by date, so the internal file system branch “date” didn't exist for them.

The file system storage was retired in 2010 when we graduated to HBase. Sadly HBase was initially very unstable and we lost crashes when it was down. We called the File System Storage back from the old folks' home to serve as a buffer between the collectors and HBase. The collector pushes the crashes into the file system storage because its proven stability. A new app, appropriately called the crash mover, then walks the “date” tree and moves crashes into HBase. This allows the collectors to be immune to direct trouble with HBase.

The substitution of the File System Storage within the collector was my inspiration for the Crash Storage API, now being deployed three years later. In this modern Crash Storage world, we have three classes that implement file system crash storage schemes: FileSystemRawCrashStorage, FileSystemThrottledCrashStorage, FileSystemCrashStorage. With these classes, Socorro is able to scale from tiny installations receiving only a handful of crashes per day to huge ones that receive millions per day.

FileSystemRawCrashStorage


This class is the simplest. It has one file system root for all crashes without regard for the throttling status of any given crash. This is the crash storage classes used by the collectors. Since the throttle status is saved within the crash itself, and the crash mover doesn't care about that status, we don't need two different storage locations. This class declines to implement processed crash storage.



FileSystemThrottledCrashStorage


This class couples two instances of the file system crash storage. It has file system roots for a standard tree of crashes (to be processed) and deferred storage. Unlike the original system, when a crash from deferred storage is called up for processing, it isn't moved from deferred to standard storage. I'm undecided if that is a flaw or not. The other flaw that I see in this implementation is the “date” branch in the deferred storage. Deferred storage is never indexed by date, so this extra file system wrangling to support it is unnecessary.



FileSystemCrashStorage


This is the complete package – it implements the entire crash storage API: standard, deferred, and processed. In the original file storage system, processed storage was a separate class independent of the raw crash storage. This class takes all storage types and unifies them under the banner of the Crash Storage API.

So why do these all have separate roots?  Can't they be combined? It makes no sense to combine the stardard and deferred storage.  The effect would be just like using standard storage alone.  The processed storage can share a root with either of the standard or deferrred without contention.  The main reason that these have separately configurable roots is that I want to give people who deploy Socorro the same opportunity that we had to distribute storage over different file systems.