Monday, February 27, 2012

a Socorro Rule System

It happens over and over again.  I need to solve a problem by transforming data from one from into another.  I can be simple or complicated, but one thing for sure, if I've hard coded the transformation, I'm going to have to go back and change it.

The quest is for a system that can be changed at runtime so that a simple business logic policy change doesn't mean having to recode.

One example of a system like this is Socorro's throttler system.  It's a rule based system that decides if a crash is going to be processed or deferred by Socorro.  We only need to do a 10% statistical sampling for Firefox.  Other less popular Mozilla products can get by with processing 100% of all the crashes. Think Thunderbird and Camino.

To do this selection, the Socorro Collector holds an ordered list of 3-tuples that it uses as rules.  A tuple is comprised of a key, a predicate and a normalized probability threshold.  The crash is packet of json data that I'll call the 'json_dict'.  If the key matches the predicate and a random number is below the threshold, a crash is accepted for processing.  The key is used to fetch a value from the json_dict.  The predicate fragment is a literal string, a regular expression, or a function.  If the predicate is string, the value of the json key is tested for equality.  If the predicate is a regular expression, the value of the json key is tested for 'match'. If the predicate is a function, then the function is executed with the value of the json key as a parameter.  This latter option invites writing lambda functions as quick selector rules:

    ('ProductName', lambda x: x in ('Thunderbird', 'Camino'), 100.0)

This rule, when applied to an incoming crash, tests if the crash is from 'Thunderbird' or 'Camino' and, if true, applies a processing probability of 100%.

Another example where a rule based system transforms data is the Signature generation system.  It applies a bunch of regular expressions to a list of C++ function signatures to generate an overall signature for a crash.

Yet another one has come up in the last week.  We're in the unfortunate position of having to adjust the basic crash information that comes in from the field.  In other words, the crashes are lying to us and it is Socorro's responsibility to detect and correct the lies.  We've already got one of these hard coded and another has just cropped up.  We need a rule system so that we don't have to re-code because somebody way above us made an odd business logic decision.

We've hard coded the changing of a ProductName on an incoming crash.  These crashes come from Fennec, the mobile version of Firefox.  All instances of Fennec identify themselves as 'Fennec' to Socorro when they crash.  Cued by other information within the crash, we might be able to see that the Fennec that crashed is actually a special version of Fennec.  When we detect this, we rewrite the ProductName to 'FennecAndroid'.

Now we have a situation where we need to do something similar to a Version number.  If the crash came from a "ReleaseChannel" of 'esr', then the crash came from an instance of Firefox under long term support.  The version number should get rewritten with 'esr' as a suffix.

I didn't want to hard code this; I don't want to revisit this issue in another three months.  I want these rules to be chosen by those with the business case knowledge.

I can foresee several other places where a rule based transformation would be very useful.  I decided to come up with one rule system that could be used in all the rule appropriate places in Socorro.

We'll start by looking back to the Socorro Collector rule system.  That middle parameter, the predicate, is a key to power.  The outer two items are just modifiers to what is done by the middle function.  Generalizing, I see it as a predicate and an action.  The collector has a hard coded action, setting a crash up for processing or not.  It could be an arbitrary function too, just like the predicate.  How would this work for the aforementioned 'esr' version rewrite problem?

The predicate would be some function that could test the json file for equality on some key:

    def equality(json_dict, key, value):
        return json_dict[key] == value

The action function could be this:

    def add_suffix(json_dict, key, suffix):
        json_dict[key] = json_dict[key] + suffix

Encoded as a rule, it could look like this:

    (equality, ('ReleaseChannel', 'esr'), add_suffix, ('Version', 'esr'))

When we eventually want to use this rule, we just have to do this:

    if rule[0](json_dict, *rule[1]):
        rule[2](json_dict, *rule[3])

Ok, that's ugly. Ugly but useful.  The first two items in the tuple represent, essentially, a partial function.  'equality' is paired with two of its parameters to be used later when it is invoked.  The 'json_dict' isn't part of the rule, it is added into the function calls at the last minute.

Having the rule as a tuple isn't ideal for all the reasons we've all hated anonymous tuples in the past.  A class will do nicely to name the various parts. 

    esr_rule = TransformRule(predicate=equality,
                             predicate_args=(),
                             predicate_kwargs={'key':'ReleaseChannel',
                                               'value': 'esr'},
                             action=add_suffix,
                             action_args=(),
                             action_kwargs={'key':'Version',
                                            'suffix': 'esr'})

Executing the rule is done by just passing in the json_dict.  The TransformRule class will handle the details.

    esr_rule.act(json_dict)

The return value will be the return value of the action function.

Collecting all a set of rules into a collection is handled by TransformRuleSystem.  It maintains an ordered list of TransformRules.  There are several useful ways the TransformRulesSystem can act on a target with its set of rules: 
  1. go through all rules disregarding the success or failure of any given rule.  This method is useful for defining a list of transformation rules all of which must be executed.  This is how a TransformRuleSystem would work for the 'esr' and the 'ProductName' rewriting from above.
  2. go through all the rules until an action succeeds.  This is how the collector works. It goes through its list applying rules until it either falls off the bottom of the rules list or a rule indicates that a crash is to be processed.
  3. go through the rules until a predicate fails.  This is partially how signatures are generated.  We apply a set of transform rules until they fail.  Once they've failed, we've gathered all we can for the generation of a signature.
This same base for a rule system can be used in all three places in Socorro.  Further, the processor itself, just has a set of hard coded transformations that turn a crash json_dict into a form suitable for insertion into a database.  Expect future implementations of the Processor to do all of its parameter length limiting and other conversions using this rule system.

Another advantage of this rule system that I've not yet discussed is the suitability of systems of rules to be stored externally to a program.  The TransformRule class can take any of its initialization parameters as strings.  This means that rules could be stored in flat text files or database tables.

   ( "socorro.processor.processor.equality", 
     "", 
     "key='ReleaseChannel', value='esr'",
     "socorro.processor.processor.add_suffix",
     "",
     "key='Version', suffix='esr'")

By hijacking the dynamic class loading functionality of configman, the functions can be specified as strings.  The functions will be dynamically loaded at run time.

The first iteration of the rule system will be submitted as a pull request on Wednesday, February 29.  You can see the code under development at my github repo (warning volatile link, it may not last).  Follow the fun at Socorro Github Pull Requests, specifically Bugzilla Bug 729097