Monday, February 27, 2012

a Socorro Rule System

It happens over and over again.  I need to solve a problem by transforming data from one from into another.  I can be simple or complicated, but one thing for sure, if I've hard coded the transformation, I'm going to have to go back and change it.

The quest is for a system that can be changed at runtime so that a simple business logic policy change doesn't mean having to recode.

One example of a system like this is Socorro's throttler system.  It's a rule based system that decides if a crash is going to be processed or deferred by Socorro.  We only need to do a 10% statistical sampling for Firefox.  Other less popular Mozilla products can get by with processing 100% of all the crashes. Think Thunderbird and Camino.

To do this selection, the Socorro Collector holds an ordered list of 3-tuples that it uses as rules.  A tuple is comprised of a key, a predicate and a normalized probability threshold.  The crash is packet of json data that I'll call the 'json_dict'.  If the key matches the predicate and a random number is below the threshold, a crash is accepted for processing.  The key is used to fetch a value from the json_dict.  The predicate fragment is a literal string, a regular expression, or a function.  If the predicate is string, the value of the json key is tested for equality.  If the predicate is a regular expression, the value of the json key is tested for 'match'. If the predicate is a function, then the function is executed with the value of the json key as a parameter.  This latter option invites writing lambda functions as quick selector rules:

    ('ProductName', lambda x: x in ('Thunderbird', 'Camino'), 100.0)

This rule, when applied to an incoming crash, tests if the crash is from 'Thunderbird' or 'Camino' and, if true, applies a processing probability of 100%.

Another example where a rule based system transforms data is the Signature generation system.  It applies a bunch of regular expressions to a list of C++ function signatures to generate an overall signature for a crash.

Yet another one has come up in the last week.  We're in the unfortunate position of having to adjust the basic crash information that comes in from the field.  In other words, the crashes are lying to us and it is Socorro's responsibility to detect and correct the lies.  We've already got one of these hard coded and another has just cropped up.  We need a rule system so that we don't have to re-code because somebody way above us made an odd business logic decision.

We've hard coded the changing of a ProductName on an incoming crash.  These crashes come from Fennec, the mobile version of Firefox.  All instances of Fennec identify themselves as 'Fennec' to Socorro when they crash.  Cued by other information within the crash, we might be able to see that the Fennec that crashed is actually a special version of Fennec.  When we detect this, we rewrite the ProductName to 'FennecAndroid'.

Now we have a situation where we need to do something similar to a Version number.  If the crash came from a "ReleaseChannel" of 'esr', then the crash came from an instance of Firefox under long term support.  The version number should get rewritten with 'esr' as a suffix.

I didn't want to hard code this; I don't want to revisit this issue in another three months.  I want these rules to be chosen by those with the business case knowledge.

I can foresee several other places where a rule based transformation would be very useful.  I decided to come up with one rule system that could be used in all the rule appropriate places in Socorro.

We'll start by looking back to the Socorro Collector rule system.  That middle parameter, the predicate, is a key to power.  The outer two items are just modifiers to what is done by the middle function.  Generalizing, I see it as a predicate and an action.  The collector has a hard coded action, setting a crash up for processing or not.  It could be an arbitrary function too, just like the predicate.  How would this work for the aforementioned 'esr' version rewrite problem?

The predicate would be some function that could test the json file for equality on some key:

    def equality(json_dict, key, value):
        return json_dict[key] == value

The action function could be this:

    def add_suffix(json_dict, key, suffix):
        json_dict[key] = json_dict[key] + suffix

Encoded as a rule, it could look like this:

    (equality, ('ReleaseChannel', 'esr'), add_suffix, ('Version', 'esr'))

When we eventually want to use this rule, we just have to do this:

    if rule[0](json_dict, *rule[1]):
        rule[2](json_dict, *rule[3])

Ok, that's ugly. Ugly but useful.  The first two items in the tuple represent, essentially, a partial function.  'equality' is paired with two of its parameters to be used later when it is invoked.  The 'json_dict' isn't part of the rule, it is added into the function calls at the last minute.

Having the rule as a tuple isn't ideal for all the reasons we've all hated anonymous tuples in the past.  A class will do nicely to name the various parts. 

    esr_rule = TransformRule(predicate=equality,
                             predicate_args=(),
                             predicate_kwargs={'key':'ReleaseChannel',
                                               'value': 'esr'},
                             action=add_suffix,
                             action_args=(),
                             action_kwargs={'key':'Version',
                                            'suffix': 'esr'})

Executing the rule is done by just passing in the json_dict.  The TransformRule class will handle the details.

    esr_rule.act(json_dict)

The return value will be the return value of the action function.

Collecting all a set of rules into a collection is handled by TransformRuleSystem.  It maintains an ordered list of TransformRules.  There are several useful ways the TransformRulesSystem can act on a target with its set of rules: 
  1. go through all rules disregarding the success or failure of any given rule.  This method is useful for defining a list of transformation rules all of which must be executed.  This is how a TransformRuleSystem would work for the 'esr' and the 'ProductName' rewriting from above.
  2. go through all the rules until an action succeeds.  This is how the collector works. It goes through its list applying rules until it either falls off the bottom of the rules list or a rule indicates that a crash is to be processed.
  3. go through the rules until a predicate fails.  This is partially how signatures are generated.  We apply a set of transform rules until they fail.  Once they've failed, we've gathered all we can for the generation of a signature.
This same base for a rule system can be used in all three places in Socorro.  Further, the processor itself, just has a set of hard coded transformations that turn a crash json_dict into a form suitable for insertion into a database.  Expect future implementations of the Processor to do all of its parameter length limiting and other conversions using this rule system.

Another advantage of this rule system that I've not yet discussed is the suitability of systems of rules to be stored externally to a program.  The TransformRule class can take any of its initialization parameters as strings.  This means that rules could be stored in flat text files or database tables.

   ( "socorro.processor.processor.equality", 
     "", 
     "key='ReleaseChannel', value='esr'",
     "socorro.processor.processor.add_suffix",
     "",
     "key='Version', suffix='esr'")

By hijacking the dynamic class loading functionality of configman, the functions can be specified as strings.  The functions will be dynamically loaded at run time.

The first iteration of the rule system will be submitted as a pull request on Wednesday, February 29.  You can see the code under development at my github repo (warning volatile link, it may not last).  Follow the fun at Socorro Github Pull Requests, specifically Bugzilla Bug 729097




Thursday, February 23, 2012

Another day lost to effin' Linux video drivers

I've been using Suse 12.1 for a while.  The only problems that I've been having are in regards to the video drivers.  After having bad luck trying to use the nVidia drivers with my workstation's Quadro FX 570 video card, I've made peace using the default
Nouveau driver.  There are a few artifacts and sometimes  some windows freeze, but overall the experience has not been too bad.

Then came Firefox 10.0.1.  While dealing with an unrelated crisis in my own Mozilla project, Firefox became unusably unstable.  I ended up filing this bug: https://bugzilla.mozilla.org/show_bug.cgi?id=729817

In comment #2, it was suggested that I use the nVidia drivers because the Nouveau drivers were known to be problematic.  Dismissing the alarm bells in my head, I decided to try the nVidia drivers again.  Just like last time, I never recovered.

With the nVidia drivers, instability spread to everything. The screen was excruciatingly slow.  Dragging a window would leave artifacts behind.  The mouse would lag noticeably behind where I though it should be.  To really irk me, I lost the ability to use the second monitor.  nVidia's X configuration app was of no help.  It wanted to write an xorg.conf file, but xorg apparently doesn't use that file anymore.  Once I managed to get the second screen to light up, I was unable to set rotation because "X doesn't support that". Bull, I was using a rotated monitor just a few hours ago.  Arrrrgggh.

I couldn't figure out how to revert to the Nouveau drivers.  Nothing that I did would restore what I had earlier.

At the same time as all this crap, I was trying to deal with the crisis in my own project.  Unable to get my machine to work reliably, I pulled the plug on both my workstation and the current push of my own project.  Oh, yeah, then I had to go to the dentist.

It's been a hell of a day.  I reinstalled Suse 12.1 from scratch to restore the Nouveau video.  Now, some thirteen hours later, I've gotten my workstation restored to where it was twenty-four hours ago.  The day was a complete effin' waste.

I'm so done with Linux video.  I just may well be done with Linux.

Sunday, February 05, 2012

of phlegm, delirium and github

I've got the flu.  It's been persistent for about six days now.  The fever broke over the weekend, but not until I had a most interesting experience with delirium. If mixing phlegm and geek stuff makes you queasy don't read on.

I went to bed on Friday evening with a fever of 102.5.  I could only doze because about every ten minutes I was wracked with coughing up another wad of foul greenish phlegm from the depths of my lungs.  Once it's been coughed up, what do you do with it through the night?  I felt an urgent need to resolve the phlegm problem.

Reality began to blur with my dream state as my fever broke.  Drenched in sweat, I knew I was going to have to try some experiments to get rid of all this phlegm.  I branched on github and then downloaded a local repo.  What did I branch?  My lungs, of course.  Hey, this is delirium, I get to do this.  I tried over and over to refactor the phlegm from my lungs, but a few minutes later, I'd startle awake, coughing up another wad.

This seemed to go on for hours and hours.  I was dizzy and confused when it dawned on me.  I checked the source of my local repo and the URI was unrecognizable. I had pulled from the wrong repo, in fact, it wasn't even my repo: I was coughing up some other person's phlegm!  I remember waking up laughing at this idea, but feeling an urgent to dump the local repo and try the whole branching idea again.

It wasn't until a while later that I realized that github could not save me from the phlegm.

This is not the first time that this sort of revision control system delirium has gripped my mind.  I recall a similar dream experience in the 90s trying to come up with a ClearCase configspec that would get me back into the correct apartment.  Brains are so weird.