Tuesday, May 21, 2013

The TransactionExecutor vs. HBase


Yesterday, I wrote about the use of the class TransactionExecutor. Today, I'm going to propose a new way of using it within Socorro's HBase crashstorage subsystem.

When used with Postgres, the TransactionExecutor treats all of the individual steps within a transaction collectively as if they were together a single atomic operation. This is the exactly how we want a database transaction treated. Lets say we have a transaction that consists of steps A, B, C, D. We start a transaction and succeed in steps A and B. However, C fails. The TransactionExecutor rolls back the transaction, essentially undoing steps A and B. Then it retries the whole thing from the beginning after sleeping.

That idea isn't ideal for HBase. Since HBase doesn't support relational database-like transactions, the same strategy of treating multiple steps as one atomic transaction doesn't work as well. Just like the previous example, lets say that for HBase we have four steps A, B, C and D. We start a “transaction” and complete steps A and B before we get a failure in step C. If the transaction executor is used exactly like it is in Postgres, we rollback the transaction. But HBase doesn't support transactions, so steps A and B are not actually undone. The TransactionExecutor then sleeps and after waking, retries the whole thing from the beginning.We end up redoing steps A and B that have already been done. 

What are the consequences of this? It is my understanding that rows in tables in HBase are static. We don't really change them, we just make a new version of the row. The old version of the row still sticks around (the number of old rows that HBase saves is configurable). By starting the transaction over from the beginning, we're increasing the amount of space that HBase needs to save the same information.

Honestly, this isn't much of a problem. During a failure, having multiple copies of the same information will happen only to a single row (or in our case, one row per actively writing thread during the HBase problem). That's not a lot of extra storage wasted in comparison to the huge size of our data storage.

However, if we care to, we can resolve this duplication problem by using the TransactionExecutor in a different manner. Rather than treating all the steps A, B, C, D as if they are collectively atomic, we could use the TransactionExecutor individually on each step. If we don't fail until step C, then only step C will be retried. We can avoid the duplication of steps A and B. 

For purity sake, it makes sense to use the  TransactionExecutor on operations that are truly atomic.  In our application with HBase, it really doesn't matter too much and would be likely more trouble than it is worth to change.




Monday, May 20, 2013

The history of the TransactionExecutor



Or, "yet another awkwardly named class"

One of the most important tenets of Socorro is to be resilient when external resources fail. The Mozilla Socorro deployment depends on Postgres and HBase to work. However, these are two external resources that can fail.

What happens when we try to write to one of these and we find that the resource in unavailable? Earlier versions of Socorro treated the HBase and Postgres failure cases separately.

For Postgres, since it is a transactional storage system, Socorro employed the native transactional behaviors. Interacting with Postgres involves a series of steps (insert, update, delete, select) followed by commit or rollback. If one of the intervening steps were to fail, we didn't want the program to quit, nor did we want errors to be ignored. Socorro implemented a “backing off retry” behavior. On failure of a step, the code would classify failure into one of two types: retriable and fatal. In either case, a rollback would be issued. In the retry case, the code would sleep for a predetermined amount of time and then retry the transaction from the beginning. In the fatal case, there is no choice except to allow the program to shutdown.

For HBase, true transactions are not supported. However, the behavior Socorro wanted was just the same as the Postgres case: classify the failure and then, if retriable, repeat the steps until we have success. HBase doesn't have the concept of commit and rollback, but the intervening steps of a transaction may be repeated without negative consequence.

Even though the behavior was similar, the two cases were coded independently and shared no code. In the grand Configman refactoring of Socorro, the two cases were merged into one class to maximize reuse. The Postgres case was used as the canonical example. Dummy null op commit and rollback were added to the HBase connection classes to facilitate the use of the class.

How do the TransactionExector classes work? There are three of them with slightly different behaviors:
  • TransactionExecutor
  • TransactionExecutorWithLimitedBackoff
  • TransactionExecutorWithInfiniteBackoff
The code can be found at: https://github.com/mozilla/socorro/blob/master/socorro/database/transaction_executor.py

These classes implement methods that accepts a function, a connection context to some resource and arbitrary function parameters. When instantiated and invoked, these classes will call the function passing it the connection and the additional parameters. The raising of an exception within the function indicates that a failure of the transaction: a rollback is automatically issued on the connection context. If the function succeeds and exits normally, then a 'commit' is issued on the connection context.

The first class in the list above is the degenerate single-shot case. It doesn't implement any retry behavior. If the function fails by raising an exception, then a rollback is issued on the connection and program moves on. Success results in a commit and the program moves on.

The latter two classes implement a retry behavior. If the function raises an exception, the Transaction class checks to see if the exception is of a type that is eligible for retry. If it is eligible, then a delay amount is selected and the thread sleeps. When it wakes, it tries to invoke the function again with the same parameters. The time delays are specified by a list of integers representing successive numbers of seconds to wait before trying again. For the class TransactionExecutorWithLimitedBackoff, when the list of time delays is exhausted the transaction is abandoned and the program moves on. The TransactionExecutorWithInfiniteBackoff will never give up, running the last time in the delay list over and over until the transaction finally succeeds or somebody kills the program.

How does the TransactionExecutor determine if an exception is eligible for retry? The connection context object is required to have a couple instance variables and methods to assist in the determination.

First, operational_exceptions defines a collection of exceptions that are eligible for the retry behavior. If one of the exceptions from this collection is raised, the retry behavior is triggered.

conditional_exceptions is a list of ambiguous exceptions that may or may not be eligible for retry. We encountered this with Postgres using psycopg2 on the ProgrammingError exception. Normally, this type of exception would not be retriable because it indicates a fundamental problem with a query such as a syntax error. Syntax errors are not retrible. However, sometimes we get network errors disguised as ProgrammingErrors; these are retriable.

If an exception found in the conditional_exceptions collection is raised, we have to further examine the error to determine if it should result in a failure or retry. The instance method is_operational_exception implemented by the connection class is used to determine in the current exception is retriable or not. In the case of Postgres, we look to the text of the exception to see if it contains the string “EOF”. We know that's a network error, not really a programming error so we can do a retry.

Is this class named poorly? Now that we've got many more external resources using this retry behavior and only Postgres is truly transactional, it seems that the name may not be right. Perhaps ExternalResourceActionRetrier?







Wednesday, April 24, 2013

12V to 5V USB Charging in a Camper


I've got a pickup truck camper.  The internal electrical system is 12VDC.  I wanted to be able to recharge my phone and tablet while sleeping.  I wanted to install some 12V sockets in the over-the-cab sleeping area.  Not satisfied with just adding chargers, I decided that I wanted some LED strip lighting in the area too.

This is a photo blog on how I used my Makerbot Replicator2, LED lights from Ikea, and components from Radio Shack & Fry's to make a cool addition to my camper.


Using the now-defunct 3D modeling Web site tinkercad.com, I designed a control plate with spots for a 12V utility socket, some switches, a power indicator LED and a socket into which I could plug LED strip lights:


The 3D print of the control plate took about six hours.  It's already fitted with all the holes that I'll need forthe 12V socket, switches and power indicator.  The hole on the side will be the port into which the the LED strip lights will plug in.


This shows how the 12V socket mounts in the center of the control plate.


I want the wiring to be neat and tidy, so I've gone to the 
trouble of using proper lugs. 


The loops in the wires is a style that I learned long ago: make sure you have enough wire to reach anywhere into your project. You never know when you'll going to rearrange everything, so you don't want the wires to be too short. 


The little LED light that I wanted to use as a power indicator for the 12V socket requires a step down resistor, so I had to do some soldering. 


You'll see in later images that I used shrink tubing on the resistor wire.  While not strictly necessary, I thought it made the project look neater. 


I discovered that the Ikea Ledberg LED strip lights were 12V, I realized that it was an opportunity to for some very inexpensive lighting for the camper.  I started by liberating the end socket from its 120VAC to 12VDC adapter.


Discarding the transformer, I now needed to test to make sure that the +/- labeling on the bottom edge of the socket was to be trusted. 


It was, indeed, properly labeled as I tested a single strip on a 12V battery. 


Wired into the switch, the LED socket was then hot glued into the channel on the underside of the control plate. 


Using a saw on a Leatherman tool, I cut a access into the side panel inside the camper.  Not shown was running a 12VDC power connector into that space, too. 


The control plate dropped into the hole perfectly and, once wired and powered, I demonstrated that the power indicator light worked correctly. 


I then plugged in the 12VDC to 5V USB converter and I suddenly had a way to recharge my phone and tablet on the bedside in the camper.


The LED lights work, too.  I reproduced this in mirror image on the other side of the bed giving me great diffuse lighting and the ability to recharge two phones, and two tablets at the same time.

Sunday, April 14, 2013

New Socorro Processor Outperforms the Old One

I was beginning to look over my shoulder to see if advancing glaciers were going to overrun me.  It seemly has taken forever, but my rewrite last year of the Socorro processors has finally made it to production.  The switchover happened on Thursday afternoon (while I was at the dentist) from the old processor (Processor2008) to the new processor (Processor2012).

The new processor was written as a Fetch-Transform-Save Configman app.  This just means that the steps to process a Firefox crash were encoded in a modular manner and, through configuration, attached to the FTS framework.  Many of the Socorro apps follow that basic FTS model allowing them all to share the same multi-threading framework and storage schemes.

What's the advantage of doing this?

First, sharing code between apps allows us to reduce the shear size of the codebase that we have to maintain.  No more repeating similar mechanisms between apps.   This also allows the steps to accomplish a  task in a Socorro application to be isolated and reduced down to a single action.  For example, in the processor, there is now a single method called 'convert_raw_crash_to_processed_crash'.  That method knows nothing about storage mechanisms, queuing, or threading.  Its target action is its only concern.  The external framework handles all the other details.

Second, with that focus on a single action, the methods can be optimized.  Here's some stats comparing last weekend with Procssor2008 and this weekend with Processor2012:

Average Time from submission to completed processing:

Processor2008: 40 seconds
Processor2012: 16 seconds

Average Time for actual conversion from raw to processed:

Processor2008: 2.7 seconds
Processor2012: 1.7 seconds

I really don't think I need to say anything more. I'm just smiling today.

disclaimer: Processor2012 has only been running for a few days, long term stats will have to be collected to get a more focused picture of the performance characteristics.


Thursday, April 11, 2013

No, you cannot see my ID

I contend that every time you show your ID to someone, you compound the likelihood that you will suffer identity theft.

I encountered the issue when a doctor's office wanted to see my ID before they checked me in for an appointment.  Further, they wanted to make a copy of my driver's license and save it with my electronic records.  I asked how making a copy of my ID and saving it in a computer system that I do not control protected me from identity theft.  I got an answer that had nothing to do with my question - they said that they wanted to make sure that my insurance wasn't being used fraudulently.  I pointed out that they were protecting the insurance company, not me. 

At that point, I refused, declining to show them anything but my insurance card.  I stated that disseminating the contents of my ID puts me more at risk.  I'd rather protect myself than protect the insurance company.

Getting frustrated with me, the doctor's receptionist asked how showing my ID put me at risk.  I answered like this, "Show me your driver license for five seconds and I'll have your name and license number memorized.  Give me a total of ten seconds with your card and I'll have your address, too.  What kind of damage could I do with that information?"  This disturbed her.  I didn't even have to go into my mistrust of the integrity of their Microsoft computer systems with the wide open WiFi access to the HP Laser Printer behind her. 

"Are you going to refuse me service?" I asked.  She backed down and has never asked for my ID since.

I do not have eidetic memory, however, I've practiced memorization techniques to help my aging brain.  If I were a professor, I would be categorized as an absented minded, forgetful one.  I believe that almost anyone could easily learn the memorization trick. 

The doctor's office could easily set up a "secret passphrase" system to verify my identity.  Rather than ask for my ID, they could ask me to regurgitate a phrase or password that I set up at my first appointment.  They get verification, I get to protect my private information.

Holding a Grudge Forever

Microsoft burned me in 2002 when I was running a small retail nursery business selling rose bushes online. Through their puppet, the Business Software Alliance, they accused my tiny business of having pirated software. I was threatened with hundreds of thousands dollars of liability. They offered an amnesty program that would still cost me thousands, forcing me to layoff an employee. I had no pirated software in my business, but I couldn't find any of those "Certificates of Authenticity" that had come with my three Dell computers, I just about bit their amnesty hook. I had a week of sleepless nights as I agonized who I was going to layoff and how I was going to get by short an employee for a whole season.

Just as I was about to pay, I figured out that the whole thing was a scam: a protection racket. Microsoft knew nothing about my business, they had bought a marketing mailing list and SPAM'd every one with the same accusations. They were just trying to pump small businesses for money. I think what they did was a crime. 

Microsoft turned an enthusiastic customer into an enemy. I excised all Microsoft products from my business, embracing Open Source Software. Linux replaced Windows.  Microsoft keyboards and mice went to electronic recycling. Open Office trumped MSOffice. Python took over for MS C++. Postgres shoved SQLServer aside.  Never again would Microsoft be allowed to have their hand in my revenue stream. I will hate them and their products forever.

Now some eleven years later, I delight in seeing Microsoft falter. The day that vile company files for bankruptcy, I will pop a champagne bottle. I would love to see a "For Lease" sign on their corporate headquarters building. There are certainly companies in the world that are probably even more deserving of my ire, but my focus is still the Redmond Racketeer.

Articles like these make me smile:

Infoworld: The death of the PC
Windows 8: Epic Fail of the Decade?
Five reasons why Windows 8 has failed

Friday, March 29, 2013

A Mistrust for Software as a Service

Burned twice in the same month, I've suddenly developed mistrust for software as a service. If I cannot load the software on a machine that I control, I'm going to think twice before I decide to use it.

 It started with Google Reader. I've used this as my primary method for reading RSS and Atom feeds for years. I've linked it up with Yahoo Pipes and IfThisThenThat to make a wonderfully tailored experience for myself. Each morning when I sit down to my computer, I have a news feed that has automatically filtered out the crap and shows me only that which has a high probability of being interesting.

Poof, it ends. Well, that should be a lesson to me about using free services. If I'm not paying for it, what should I expect? But wait, paying for a service doesn't seem to help as another service that I use extensively is going to vanish.

TinkerCad is 3D modeling done on the Web. It's brilliant. Rather, it was brilliant. They've just announced that as of April 30, they're done. The brilliance vanishes leaving me in the dark. I spent weeks honing my skills on this Web application and I've gotten very good at it. I have a Makerbot Replicator 2 and I use it extensively. TinkerCad was my primary means of making models. With the announcement of the end of TinkerCad, all my effort to learn it has gone "poof".

I will not replace these two services with services from someone else. I will choose to not use any new product from either Google or the people behind TinkerCad unless I can download and run the software on my own hardware. You burn me, I will not extend my trust again.

I'm examining all the services that I use and assessing the feasibility of dropping them before they drop me. I cannot see how any company could attain my trust for their service software offerings. At least these were personal projects. If I ran a company, I certainly wouldn't trust my revenue stream to any software as service offering without an explicit contract that guaranteed service availability for a predetermined amount of time.

 Nope, I'm not interested in a service that can be taken away at any random time.

Tuesday, March 05, 2013

Named Arguments for Configman

I'm proposing a small modification to Configman to add named arguments to its Swiss Army Knife feature set. Neglected in the first versions of Configman, I think this proposal will be very useful. So what I'm I talking about?
    $ some_app.py  --help
    usage:
        some_app.py [OPTIONS]...  arg1 [ arg2 [ arg3 ]]

It's the arg1, arg2, arg3 that I'm interested in here. Normally, you'd access these with sys.args without configman or config.args with configman. They come to the programmer as an uninterpreted list of strings. I'm proposing treating them just like switches. We should be able to define what is expected: the position, defaults, conversion functions, actions to take, just like the command line switches. The end result to the programmer should be something that looks like this:
    config = config_manager.get_config()
    print config.arg1, config.arg2, config.arg3

In other words, from the programmers perspective, they're just values passed into the program with the same access method and priority as command line switches. In fact, with a minor change to the configman Option object, they can be used to specify both switches and arguments:
    n.add_option(
        name='filename',
        doc='the name of the file',
        default=None,
        is_argument=True
    )
    n.add_option(
        name='action',
        doc='the action to take on the file',
        default='echo',
    )

Within the program, this could be accessed as:
    config = config_manager.get_config()
    with open(config.filename) as fp:
        do_something_interesting(fp)

From the users perspective, the command line can be used like this:
    $ some_app.py --help
    usage:
        some_app.py [OPTIONS]... filename
    OPTIONS:
        --action    the action to take on the file (default: echo)
        --filename  the name of the file

    $ some_app.py my_file.txt
    contents of my file

    $ some_app.py --action=upper my_file.txt
    CONTENTS OF MY FILE

    $ some_app.py --filename=my_file.txt --action=backwards
    elif ym fo stnetnoc

Notice that the actions can be specified as either positional arguments or as switches. If you do one then the other is automatically disallowed to prevent conflicts. By treating positional arguments in the same manner as switches, we get the benefit of configman's dependency injection. Say the first positional argument is the action and that corresponds with the name of a function. Because we specified the converter for the action argument to load a matching Python object from the scope, config.action will be a callable function:
    def echo(x):
        print x
    
    def backwards(x):
        print x[::-1]
    
    def upper(x):
        print x.upper()
    
    n = Namespace()
    n.add_option(
        'action',
        default=None,
        doc='the action to take [echo, backwards, upper]',
        short_form='a',
        is_argument=True,
        from_string_converter=class_converter
    )
    n.add_option(
        'text',
        default='Socorro Forever',
        doc='the text input value',
        short_form='t',
        is_argument=True,
    )
    c = ConfigurationManager(
        n,
        app_name='demo1',
        app_description=__doc__
    )
    try:
        config = c.get_config()
        config.action(config.text)
    except AttributeError, x:
        print "%s is not a valid command"
    except TypeError:
        print "you must specify an action"

Which will yield this user experience:
    $ demo1.py --help
    usage:
        demo1.py [OPTIONS]... action [ text ]
    OPTIONS:
        --action  the action to take [echo, backwards, upper]
        --text    the text input value
                  default: "Socorro Forever"

Notice that because the action has no default, action is required, there are no brackets around it in help. However, if you specify it as a switch, the necessity to use it as an argument goes away.
    $ demo1.py backwards "Configman is pretty cool"
    looc ytterp si namgifnoC
    $ demo1.py "Socorro uses Configman" --action=upper
    SOCORRO USES CONFIGMAN

I think this is pretty cool because it makes subcommands and subcommand help simple. The list of options and additional arguments are loaded dynamically from the classes (or functions) that the user specifies on the command line.
    $ socorro.py processor --help
    usage:
        socorro.py [OPTIONS]... command
    OPTIONS:
        --command   the socorro subsystem to start (default: processor)
        -- ...   all the options for processor

    $ socorro.py monitor --help
    usage:
        socorro.py [OPTIONS]... command
    OPTIONS:
        --command   the socorro subsystem to start (default: monitor)
        -- ...   all the options for monitor