Sunday, January 06, 2013

What's Next For Socorro File System Storage?

The File System Storage System (FSS) plays a critical role in Socorro as a crash storage buffer standing between the collectors and HBase. Because of the potential instability of HBase connections and our mandate to never lose a crash, the collectors write to our very reliable FSS. Once crashes are safely ensconced in a local file system, the crash mover apps will spool the crashes into HBase as they can. At Mozilla, that's the only role for the FSS in our production system.

There are other uses for FSS. Prior to the adoption of HBase, it served as our primary storage scheme. Since other organizations are interested in using Socorro and their storage needs may not be as extreme as ours, the FSS may be perfectly adequate. In addition, while developing for Socorro, it is useful to have a complete live Socorro installation available. If you've ever tried to install and maintain an HBase installation on a virtual machine on a laptop, you likely would kill for an alternative. For the support of the community of Socorro users as well as our own developers, we're investing in maintaining FSS as a completely functional primary storage mechanism for Socorro. It will only take a switch in a configuration file to chose between either storage system (or future alternate implementations).

Work to rewrite the existing FSS code both begins and is slated for completion in Q1 2013. I have several goals in this rewrite:
  1. the public API should match the Crash Storage API exactly with no need for adapters 
  2. there should be a two class inheritance hierarchy for the two file system layouts (with vs. without date branch structure) 
  3. the existing PolyCrashStorage and/or Fallback Storage Classes should be subclassed (or used as model) for the case where a need for separate standard and deferred crash storage. 
  4. Implemented in parallel with a full suite of tests

No Need For Adapters 

The existing FSS implementation has an API that was fine at the time of its implementation, but is awkward now in view of the more refined Crash Storage API.

For example, the class underlying the saving of a raw crash (called JsonDumpStorage) has a method for saving a raw crash and its associated binary crash dump. Rather just accepting a raw crash and dump, the JsonDumpStorage sets up the directory structure and then returns a tuple of open file handles for the raw crash and dump respectively. It expects the client of the module to do the work of actually writing the file contents and then follow through with closing the open handles.

In my proof of concept implementation of FSS shoe horned into the Crash Storage API, I had to make adapting code to do the work writing the two files under the 'save_raw_crash' and 'save_dump' Crash Storage API. Functionality like this ought to be pushed into the implementation of FSS, minimizing responsibilities of the Crash Storage API code.

Two Related Implementation Classes 

In my previous blog posting, I showed how there are two primary file system layouts in FSS: with and without date branch indexing. The general case of without date branch indexing should be written as the function of the base class. A single derived class should add the date class to the functionality of the base class.

Standard vs. Deferred Storage 

In the case where separate standard and deferred storage is desired, there is already precedent for storage location decisions to be made in the Crash Storage API implementation. The code within the FSS classes should be moved out to the Crash Storage API level. That will give the flexibility of using totally different storage schemes for standard and deferred storage independent of FSS. For example, standard storage could be HBase while deferred storage could be the FSS. The NullStorage class could even be employed to throw away crashes destined for deferred storage.

Tests 

The original code was written entirely without tests or with an ambiguous separation of unit vs. integration tests. This doesn't require much dicussion, test are important and ought to be written in parallel with writing the actual FSS code.

The Future

Having been written in 2008, the File System Storage code is probably the oldest Socorro code still in use.  That makes it about three centuries old in Internet years.  It has been a reliable and critical work horse Socorro.  With the upcoming modernizing changes, the FSS code will survive for another few Internet centuries supporting Socorro installations way beyond Mozilla.