Saturday, December 15, 2012

Socorro Modular Design

Nearly all the back-end applications in Socorro follow the same basic form. Data flows from a source stream into a transformative application and is then streamed out to some destination. For most applications, the source stream consists of crashes and their dumps. A good example of a source stream the inflow of crashes from deployed applications. The destinations are frequently long term storage systems. HBase, Postgres, and file system storage schemes are examples of destinations. The transformation step can be complicated like the processors applying minidump-stackwalk and exploitability analysis. Transformation can also be the degenerate case (no change at all). The crash mover uses the Null Transform when it moves crashes from one location to another.

Socorro's default implementation of the streaming data flow uses a threaded fetch-transform-save producer consumer system. A typical back-end application consists of a couple managerial threads and a flock of worker threads. One of the managerial threads, called the queuing thread, reads a stream of information telling it what crashes it must act on. It places a reference to the crash and a transformative function into a queue. The workers grab from the queue using the crash reference as a key to request the crash from the source. Once the transformation function has acted on the crash, the worker pushes the crash to the destination.

The sources, destinations and transformations are all modular and are loaded at run time during configuration. Creating new sources, destinations and transformations just requires implementing a handful of API calls. The 2013 version of Socorro implements file system, HBase, and Postgres sources and destinations.  Aggregate  destinations can save crash to multiple places or have fallback storage systems in case of failures.

The Sources

  • FileSystem
  • HBase

The Destinations

  • FileSystem
  • HBase
  • Postgres
  • URL POST
  • Elastic Search
  • Aggregate Destinations:
  • Poly Storage
  • Fallback Storage

The Transformations

  • Legacy Processor
  • Null Transformation

Configuring an App From Ground Zero


Hopefully, Socorro will have sensible defaults for all configuration values so it will just work right out of the box. Here's how to setup an app customized for your installation. For this example, the crash mover will be the target app.
If you want to start at ground zero with no prebuilt configuration files, do this first:
    $ cd $SOCORRO_HOME
    
    $ # stash any original config files
    $ mv config config.original

    $ mkdir config
    $ # create a new empty config file
    $ touch config/crashmover.ini

    $ # tell the crash mover to write its own sample ini file
    $ ./socorro/collector/crashmover_app.py --admin.dump_conf=config/crashmover.ini 
          --source.crashstorage_class='' --destination.crashstorage_class=''

If you look at the configuration file at this point, you see something like this:

    # name: application
    # doc: the fully qualified module or class of the application
    # converter: configman.converters.class_converter
    # application='CrashMoverApp'
    
    [destination]

        # name: crashstorage_class
        # doc: the destination storage class
        # converter: configman.converters.class_converter
        crashstorage_class=''

    [logging]

        # section omitted for brevity

    [producer_consumer]

        # section omitted for brevity

    [source]

        # name: crashstorage_class
        # doc: the source storage class
        # converter: configman.converters.class_converter
        crashstorage_class=''


Building a ini file is an iterative process. We're going to invoke the crashmover_app several times adding configuration values with each iteration and having the crashmover_app write out its own configuration with each step. First we're going to setup the [source] section.

When the crashmover_app is invoked, it first reads the any existing configuration file that it can find in the default configuration directory at $SOCORRO_HOME/config. Anything that it finds in the configuration file will override the defaults built into the application. After reading the configuration file, the app searches for any environment variables match names of the app's configuration options. The app brings those values in and they supersede both the defaults and the configuration file. Finally the app brings in the commandline arguments, these values supersede any values found using any of the previous means.

We're going to tell the crashmover_app that we want the source to be a file system location. This means that the crashmover_app will look at a file system location to find crashes to send to the configured destination. The class that implements a file system storage scheme is 'socorro.external.filesystem.crashstorage.FileSystemRawCrashStorage'. We'll specify that on the command line and then get the crashmover_app to write out another ini file:

    $ ./socorro/collector/crashmover_app.py 
        --source.crashstorage_class=socorro.external.filesystem.crashstorage
                                     .FileSystemRawCrashStorage 
        --admin.dump_conf=config/crashmover.ini
 
 Now looking at the ini file, we can see that the filesystem class was loaded and it added a bunch of new configuration requirements to the ini file:
 
    # name: application
    # doc: the fully qualified module or class of the application
    # converter: configman.converters.class_converter
    # application='CrashMoverApp'
 
    [destination]
 
        # name: crashstorage_class
        # doc: the destination storage class
        # converter: configman.converters.class_converter
        crashstorage_class=''

    [logging]

        # section omitted for brevity

    [producer_consumer]

        # section omitted for brevity

    [source]

    # name: crashstorage_class
    # doc: the source storage class
    # converter: configman.converters.class_converter
    crashstorage_class='socorro.external.filesystem.crashstorage'
                       '.FileSystemRawCrashStorage'

    # name: dir_permissions
    # doc: a number used for permissions for directories in the local file system
    # converter: int
    dir_permissions='504'

    # name: dump_dir_count
    # doc: the number of dumps to be stored in a single directory in the local file system
    # converter: int
    dump_dir_count='1024'

    # name: dump_file_suffix
    # doc: the suffix used to identify a dump file
    # converter: str
    dump_file_suffix='.dump'

    # name: dump_gid
    # doc: the group ID for saved crashes in local file system (optional)
    # converter: str
    dump_gid=''

    # name: dump_permissions
    # doc: a number used for permissions crash dump files in the local file system
    # converter: int
    dump_permissions='432'

    # name: json_file_suffix
    # doc: the suffix used to identify a json file
    # converter: str
    json_file_suffix='.json'

    # name: std_fs_root
    # doc: a path to a local file system
    # converter: str
    std_fs_root='/home/socorro/primaryCrashStore'
 
In most cases, the defaults will be acceptable. The most common one that might need changing would be the last one 'std_fs_root'. There are several ways that you can change that value. You could set an environment variable 'source.std_fs_root'. You could go in and directly edit the value in the config file. You could get the crashmover_app to rewrite the ini file with you specifying the new value on the command line:

    $ ./socorro/collector/crashmover_app.py 
         --source.std_fs_root='/home/lars/my_crash_source'
         --admin.dump_conf=config/crashmover.ini

You can verify that the change was made by either inspecting the config file or invoking the crashmover_app yet again this time specifying --help on the command line. The output of help always reflects the values loaded by the stack of configuration value sources (app defaults, config file, environment, command line).

    $ ./socorro/collector/crashmover_app.py –help
    Application: crashmover 2.0
    this app will move crashes from one storage location to another

    Options:
       --admin.conf
       the pathname of the config file (path/filename)
       (default: ./config/crashmover.ini)

       --admin.dump_conf
       a pathname to which to write the current config
       (default: )

      --admin.print_conf

      ... other options omitted for brevity

      --source.std_fs_root
      a path to a local file system
      (default: /home/lars/my_crash_source)

Now we can use the same process to setup the destination storage. This time, our destination will be HBase.
  
    $ ./socorro/collector/crashmover_app.py 
         --destination.crashstorage_class=socorro.external.hbase
                                          .crashstorage.HBaseCrashStorage
         --admin.dump_conf=config/crashmover.ini

This brings in all the configuration dependencies of HBase. We can use --help to see what they are or examine the ini file directly.

    [destination]

        # name: crashstorage_class
        # doc: the destination storage class
        # converter: configman.converters.class_converter
        crashstorage_class='socorro.external.hbase.crashstorage.HBaseCrashStorage'

        # name: dump_file_suffix
        # doc: the suffix used to identify a dump file (for use in temp files)
        # converter: str
        dump_file_suffix='.dump'

        # name: forbidden_keys
        # doc: a comma delimited list of keys banned from the processed crash in HBase
        # converter: socorro.external.hbase.connection_context.<lambda>
        forbidden_keys='email, url, user_id, exploitability'

        # name: hbase_connection_pool_class
        # doc: the class responsible for pooling and giving out HBaseconnections
        # converter: configman.converters.class_converter
        hbase_connection_pool_class=
             'socorro.external.hbase.connection_context.HBaseConnectionContextPooled'

        # name: hbase_host
        # doc: Host to HBase server
        # converter: str
        hbase_host='localhost'

        # name: hbase_port
        # doc: Port to HBase server
        # converter: int
        hbase_port='9090'

        # name: hbase_timeout
        # doc: timeout in milliseconds for an HBase connection
        # converter: int
        hbase_timeout='5000'

        # name: number_of_retries
        # doc: Max. number of retries when fetching from hbaseClient
        # converter: int
        number_of_retries='0'

        # name: temporary_file_system_storage_path
        # doc: a local filesystem path where dumps temporarily during processing
        # converter: str
        temporary_file_system_storage_path='/home/socorro/temp'

        # name: transaction_executor_class
        # doc: a class that will execute transactions
        # converter: configman.converters.class_converter
        transaction_executor_class=
            'socorro.database.transaction_executor.TransactionExecutor'

Again, we can set the proper values by directly editing the config file or continuing this iterative process.  At this point the application is fully configured and ready to run.  Just invoke the app with no command line options.

 Other Class Options For Crash Storage 


When designating HBase or Postgres as destinations, a few more dependencies were brought in that control connection and transactional behaviors. In the case above, the destination.transaction_executor_class is a class that will tell the system how to react to failures.

The default class 'socorro.database.transaction_executor.TransactionExecutor' imbues the HBase with the behavior of “give up instantly”. If the there is a connection problem with HBase, any exception raised will be immediately passed on to the crashmover_app which will pass it on to the thread manager which will log the problem and move on. That's not ideal behavior. It would be nice if the the connection to HBase failed, it would had a fallback behavior of retrying the transaction.

There are three alternatives:
  • TransactionExecutor
  • TransactionExecutorWithLimitedBackoff
  • TransactionExecutorWithInfiniteBackoff. 

The latter two offer retry behaviors with progressively longer delays between retries (completely configurable). The latter most will never give up and keep retrying forever. To use these alternatives, just specify the class in the configuration file.  The latter two classes will bring in a couple more configuration options that specify wait times and logging intervals.

For some more discussion on these classes, see The Tale of the Unstable Connection and the Transaction Class .

Crashstorage Collections


There are two other crashstorage classes that act as duplicating containers for other crash storage classes: PolyCrashStorage and FallbackCrashStorage.  We can use it as a destination storage that will save crashes to two locations at the same time:
 
    $ ./socorro/collector/crashmover_app.py 
         --destination.crashstorage_class=
           socorro.external.crashstorage_base.PolyCrashStorage
         --admin.dump_conf=config/crashmover.ini

This will get us an ini file with a destination section that looks like this:

    [destination]
    
        # name: crashstorage_class
        # doc: the destination storage class
        # converter: configman.converters.class_converter
        crashstorage_class='socorro.external.crashstorage_base.PolyCrashStorage'
    
        # name: storage_classes
        # doc: a comma delimited list of storage classes
        # converter: configman.converters.class_list_converter
        storage_classes=''
    
        [[storage0]]
    
            # name: crashstorage_class
            # doc: None
            # converter: configman.converters.class_converter
            crashstorage_class=''

We can get the details filled in for us with this:

    $ ./socorro/collector/crashmover_app.py 
        --destination.storage_classes=
             'socorro.external.hbase.crashstorage.HBaseCrashStorage,
              socorro.external.filesystem.crashstorage.FileSystemRawCrashStorage'
     
After which, the destination section both crash storage systems will receive exact copies of every crash. The ini file will look like the example below. Edit the values to whatever is appropriate to your local system.

   [destination]
    
        # name: crashstorage_class
        # doc: the destination storage class
        # converter: configman.converters.class_converter
        crashstorage_class='socorro.external.crashstorage_base.PolyCrashStorage'

        # name: storage_classes
        # doc: a comma delimited list of storage classes
        # converter: configman.converters.class_list_converter
        storage_classes='socorro.external.filesystem.crashstorage'
                       '.FileSystemRawCrashStorage,'  
                       'socorro.external.hbase.crashstorage.HBaseCrashStorage'

        [[storage0]]

            # name: crashstorage_class
            # doc: None
            # converter: configman.converters.class_converter
            crashstorage_class='socorro.external.hbase.crashstorage'
                               '.HBaseCrashStorage'

            # name: dump_file_suffix
            # doc: the suffix used to identify a dump file (for use in temp files)
            # converter: str
            dump_file_suffix='.dump'

            # name: forbidden_keys
            # doc: a comma delimited list of keys banned from the processed 
            #      crash in HBase
            # converter: socorro.external.hbase.connection_context.
            forbidden_keys='email, url, user_id, exploitability'

            # name: hbase_connection_pool_class
            # doc: the class responsible for pooling and giving out
            #      HBaseconnections
            # converter: configman.converters.class_converter
            hbase_connection_pool_class='socorro.external.hbase.connection_context'
                                        '.HBaseConnectionContextPooled'

            # name: hbase_host
            # doc: Host to HBase server
            # converter: str
            hbase_host='localhost'

            # name: hbase_port
            # doc: Port to HBase server
            # converter: int
            hbase_port='9090'

            # name: hbase_timeout
            # doc: timeout in milliseconds for an HBase connection
            # converter: int
            hbase_timeout='5000'

            # name: number_of_retries
            # doc: Max. number of retries when fetching from hbaseClient
            # converter: int
            number_of_retries='0'

            # name: temporary_file_system_storage_path
            # doc: a local filesystem path where dumps temporarily 
            #      during processing
            # converter: str
            temporary_file_system_storage_path='/home/socorro/temp'

            # name: transaction_executor_class
            # doc: a class that will execute transactions
            # converter: configman.converters.class_converter
            transaction_executor_class='socorro.database.transaction_executor'
                                       '.TransactionExecutor'

        [[storage1]]

            # name: crashstorage_class
            # doc: None
            # converter: configman.converters.class_converter
            crashstorage_class='socorro.external.filesystem.crashstorage'
                               '.FileSystemRawCrashStorage'

            # name: dir_permissions
            # doc: a number used for permissions for directories in the local 
            #      file system
            # converter: int
            dir_permissions='504'

            # name: dump_dir_count
            # doc: the number of dumps to be stored in a single directory in 
            #      the local file system
            # converter: int
            dump_dir_count='1024'

            # name: dump_file_suffix
            # doc: the suffix used to identify a dump file
            # converter: str
            dump_file_suffix='.dump'

            # name: dump_gid
            # doc: the group ID for saved crashes in local file system (optional)
            # converter: str
            dump_gid=''

            # name: dump_permissions
            # doc: a number used for permissions crash dump files in the local 
            #      file system
            # converter: int
            dump_permissions='432'

            # name: json_file_suffix
            # doc: the suffix used to identify a json file
            # converter: str
            json_file_suffix='.json'

            # name: std_fs_root
            # doc: a path to a local file system
            # converter: str
            std_fs_root='/home/socorro/primaryCrashStore'

The 'FallbackCrashStorage' class is similar except that it only saves to the second crash store if the saving to the first one fails.  In conjunction with the TransactionExecutor's retry behavior, a fault tolerant application can try, for example, to save several times in HBase.  If that ultimately fails, the 'FallBackCrashStorage' will stash the crash into a file system storage for later recovery.

Conclusion

All the Socorro back end applications, the Collector, the Crash Mover, the Submitter, the Monitor, the Processor, the Middleware, the Crontabber and many of the individual cron applications use this same system for modular storage.  This flexibility allows Socorro to scale from tiny installations receiving a handful  of crashes per day to huge installations handling millions of crashes per day.