Friday, May 31, 2013

A Call for Source Code CSS


I've only recently realized how different my mind is from the other programmers around me.  I see things they do not, and vice versa.  Yet our tools assume that we're all the same.  I believe that the concept of CSS ought to be applied to source code.  Allowing programmers to view source code in different styles would be revolutionary.  Two different minds could look at the same code and see in a style optimal for the way they work. We should decouple function from presentation.

Source code today is still just like it was when we used punched cards: plain ASCII text.  Style and functionality are welded together like a Web site from 1995.  Here's what motivates my thoughts on this topic:

It started with comment number 18 on Bug 654567: “It is ironic that I be the one to say this, but we ought to adopt PEP8 coding standards for all new code in the project. I know someone is going to quote me and I'll never hear the end of it...” Of course, the rest of the team pounced on this and with my assent, PEP 8 became the style rule of Socorro. That was on June 16th of 2011, so we can say that we're coming up on the second anniversary of my fateful proclamation.

So where are we with PEP 8? Has it been a revolution improving our code readability and quality? For most people working on the project, I'd imagine that they'd call it a win.

However, for me, I'd say that PEP 8 has wrecked the readability of the code. This is not an indictment of PEP 8, nor do I want to indict myself.  I'm just disappointed that PEP 8 is suboptimal for me and the way that I read code. Read my previous posting for an idea as to why.

There is one rule in particular that causes trouble: the 79 character per line limit. I think Python is a beautiful language. Eschewing braces to delineate blocks of code gives whitespace semantic meaning. With just a glance, I can understand the structure of a method in written in Python. I know that indentation reflects structure. In a C based language, indentation cannot be trusted, it is the braces that define the structure. I'm really bad at visually matching braces to see the blocks of code.

With a 79 character limit on lines, many Python statements have to be continued onto a second, third or fourth line. This overloads the meaning of whitespace: it means the definition of a code block AND it means a line is too long and had to be continued. I can no longer trust glancing at the code to see its structure. I have to parse to see which meaning the whitespace has.

Here's a typical example below: a literal quote of Python code from the Socorro project. PEP 8 dictates a profusion of whitespace that has nothing to do with code blocks. Most of the indentation is continuation lines. The 'if' statement blocks are lost to my eyes. The details of the individual method calls have been promoted to the same importance as block structure. With that promotion, indentation means two different things and it is not obvious to me at a glance which is which.

def __init__(self, config, quit_check_callback=None):
    super(ElasticSearchCrashStorage, self).__init__(
        config,
        quit_check_callback
    )
    self.transaction = config.transaction_executor_class(
        config,
        self,
        quit_check_callback
    )
    if self.config.elasticsearch_urls:
        self.es = pyelasticsearch.ElasticSearch(
            self.config.elasticsearch_urls,
            timeout=self.config.timeout
        )

        settings_json = open(self.config.elasticsearch_index_settings).read()
        self.index_settings = json.loads(
            settings_json % self.config.elasticsearch_doctype
        )
    else:
        config.logger.warning('elasticsearch crash storage is disabled.')


Below is the same method without the 79 character rule. The code blocks are now instantly recognizable. However, the code is denser, and for some, I'd imagine less readable.  However, for me, I want to see the structure first, if I want to understand the details of the individual method calls, I can pick them out. 


def __init__(self, config, quit_check_callback=None):
    super(ElasticSearchCrashStorage, self).__init__(config, quit_check_callback)
    self.transaction = config.transaction_executor_class(config, self, quit_check_callback)
    if self.config.elasticsearch_urls:
        self.es = pyelasticsearch.ElasticSearch(self.config.elasticsearch_urls, timeout=self.config.timeout)
        settings_json = open(self.config.elasticsearch_index_settings).read()
        self.index_settings = json.loads(settings_json % self.config.elasticsearch_doctype)
    else:
        config.logger.warning('elasticsearch crash storage is disabled.')

Different minds work different ways.  PEP 8 is all about standardizing the look of source code. However, making something clearer for one person is obfuscating for another. For the way my brain works, PEP 8 obfuscates more than it helps.

I'm calling for style sheets for source code.  I want to be able to load that method into my editor and see it styled in the manner that works for me.  If someone else wants to see it in the PEP 8 style, that ought to be an option on the editor.  Our source code should express functionality and functionality alone.  Like the relationship between HTML and CSS, source code style should be left to some presentation layer.

Temple Grandin has a great TedTalk about the diversity in how people's minds work: "The world needs all kinds of minds." I say that software engineering needs all kinds of minds, too.  I believe that building a CSS-like system for source code would open up programming to people that might otherwise be unable to contribute.

Thursday, May 30, 2013

Waltzing and Programming are the Same Thing


Have you ever waltzed? I mean truly waltzed, not just going through the steps. When you really waltz it is an amazing thing – you and your partner lean away until you perfectly counter balance one another. The three beat movement becomes effortless as the angular momentum of your turns sustains the motion. Suddenly, you're flying, you and your partner's bodies intimately interconnected by physics.

I see programming, especially multi-threaded programming, as an intimate dance similar to waltzing. In my mind, programming is motion in many dimensions. The motions trace out patterns. Views from different perspectives reveals orthogonal structures. From one perspective, I can see the structure of execution repeatedly surge outward into complicated recursive fractal curves only to withdraw back to singularities. From another perspective, almost as if I've taken the program and rotated it ninety degrees in my mind, I see the code structures, objects, inheritance, functions as if they were elements of a crystal lattice. Rotate again and I see the semantic structure of the information transforms within the program. With dimensionality that I cannot hope to project onto this page, I can spin the program on different axis to reveal structures and, most importantly, inconsistencies.  Like waltzing, its all about balance of opposing forces coupled with motion.

So many times in my forty years of programming, I've discovered subtle bugs in software design, not because of problems in data, but because I see in my mind a “discoloration” in the motion in one perspective that isn't evident in another. I may not know what the problem is or how it will manifest, but I see the region as literally emitting the wrong color.

Way back in graduate school in my artificial intelligence class, the professor was showing a recursive pattern matching algorithm in Lisp. I was daydreaming and was jerked back to class by a flash of blue on the overhead projector screen. Confused, since the professor was using only black ink, I interrupted him saying there was problem in his algorithm. I couldn't express what the problem was, and I wasn't going to say that it was too blue. I pointed out the location and said something vague about the recursion losing its context at that point. I was right. After staring at his code for a while, the professor looked up and grinned at me: he canned the example that he had been using for years. I rewrote it for him in his office after class.

I truly love multi-threaded programming. It adds more dimensions from which I can view a program. Imagine multiple couples waltzing and each are swinging and trading chandeliers among themselves as they glide through their turns. The chandeliers pass through one another, perfectly meshing, never hitting. Slower dancers do not hold up faster ones because the dance floor itself warps and bends to keep everyone synchronized. The chandeliers are data structures, the couples are threads of execution, the dance steps are the algorithms, the music is the machine.

This is my first time ever to try to put my programming brain processes into words. I've alluded to the dance metaphor in the past, I really do see software systems as interlocking curved motions. You'll often hear me state that programming is performance art. Dancing is pretty clearly performance art. You now understand the reason behind my metaphor. Good software is like skilled dancers waltzing in perfect balance. Ironically, I rarely ever dance.

I assume the color thing is a form of synethesia. A good program will have a balanced glow in yellows, greens and browns; earth tones. Problems show as blues and purples. For some reason, software never appears red to me. You know one of the most unfortunate things? Many of the edicts in Python's PEP 8 gives code a bluish cast. Truly, that's a sad thing as it is likely to eventually drive me to some other language.


Tuesday, May 21, 2013

The TransactionExecutor vs. HBase


Yesterday, I wrote about the use of the class TransactionExecutor. Today, I'm going to propose a new way of using it within Socorro's HBase crashstorage subsystem.

When used with Postgres, the TransactionExecutor treats all of the individual steps within a transaction collectively as if they were together a single atomic operation. This is the exactly how we want a database transaction treated. Lets say we have a transaction that consists of steps A, B, C, D. We start a transaction and succeed in steps A and B. However, C fails. The TransactionExecutor rolls back the transaction, essentially undoing steps A and B. Then it retries the whole thing from the beginning after sleeping.

That idea isn't ideal for HBase. Since HBase doesn't support relational database-like transactions, the same strategy of treating multiple steps as one atomic transaction doesn't work as well. Just like the previous example, lets say that for HBase we have four steps A, B, C and D. We start a “transaction” and complete steps A and B before we get a failure in step C. If the transaction executor is used exactly like it is in Postgres, we rollback the transaction. But HBase doesn't support transactions, so steps A and B are not actually undone. The TransactionExecutor then sleeps and after waking, retries the whole thing from the beginning.We end up redoing steps A and B that have already been done. 

What are the consequences of this? It is my understanding that rows in tables in HBase are static. We don't really change them, we just make a new version of the row. The old version of the row still sticks around (the number of old rows that HBase saves is configurable). By starting the transaction over from the beginning, we're increasing the amount of space that HBase needs to save the same information.

Honestly, this isn't much of a problem. During a failure, having multiple copies of the same information will happen only to a single row (or in our case, one row per actively writing thread during the HBase problem). That's not a lot of extra storage wasted in comparison to the huge size of our data storage.

However, if we care to, we can resolve this duplication problem by using the TransactionExecutor in a different manner. Rather than treating all the steps A, B, C, D as if they are collectively atomic, we could use the TransactionExecutor individually on each step. If we don't fail until step C, then only step C will be retried. We can avoid the duplication of steps A and B. 

For purity sake, it makes sense to use the  TransactionExecutor on operations that are truly atomic.  In our application with HBase, it really doesn't matter too much and would be likely more trouble than it is worth to change.




Monday, May 20, 2013

The history of the TransactionExecutor



Or, "yet another awkwardly named class"

One of the most important tenets of Socorro is to be resilient when external resources fail. The Mozilla Socorro deployment depends on Postgres and HBase to work. However, these are two external resources that can fail.

What happens when we try to write to one of these and we find that the resource in unavailable? Earlier versions of Socorro treated the HBase and Postgres failure cases separately.

For Postgres, since it is a transactional storage system, Socorro employed the native transactional behaviors. Interacting with Postgres involves a series of steps (insert, update, delete, select) followed by commit or rollback. If one of the intervening steps were to fail, we didn't want the program to quit, nor did we want errors to be ignored. Socorro implemented a “backing off retry” behavior. On failure of a step, the code would classify failure into one of two types: retriable and fatal. In either case, a rollback would be issued. In the retry case, the code would sleep for a predetermined amount of time and then retry the transaction from the beginning. In the fatal case, there is no choice except to allow the program to shutdown.

For HBase, true transactions are not supported. However, the behavior Socorro wanted was just the same as the Postgres case: classify the failure and then, if retriable, repeat the steps until we have success. HBase doesn't have the concept of commit and rollback, but the intervening steps of a transaction may be repeated without negative consequence.

Even though the behavior was similar, the two cases were coded independently and shared no code. In the grand Configman refactoring of Socorro, the two cases were merged into one class to maximize reuse. The Postgres case was used as the canonical example. Dummy null op commit and rollback were added to the HBase connection classes to facilitate the use of the class.

How do the TransactionExector classes work? There are three of them with slightly different behaviors:
  • TransactionExecutor
  • TransactionExecutorWithLimitedBackoff
  • TransactionExecutorWithInfiniteBackoff
The code can be found at: https://github.com/mozilla/socorro/blob/master/socorro/database/transaction_executor.py

These classes implement methods that accepts a function, a connection context to some resource and arbitrary function parameters. When instantiated and invoked, these classes will call the function passing it the connection and the additional parameters. The raising of an exception within the function indicates that a failure of the transaction: a rollback is automatically issued on the connection context. If the function succeeds and exits normally, then a 'commit' is issued on the connection context.

The first class in the list above is the degenerate single-shot case. It doesn't implement any retry behavior. If the function fails by raising an exception, then a rollback is issued on the connection and program moves on. Success results in a commit and the program moves on.

The latter two classes implement a retry behavior. If the function raises an exception, the Transaction class checks to see if the exception is of a type that is eligible for retry. If it is eligible, then a delay amount is selected and the thread sleeps. When it wakes, it tries to invoke the function again with the same parameters. The time delays are specified by a list of integers representing successive numbers of seconds to wait before trying again. For the class TransactionExecutorWithLimitedBackoff, when the list of time delays is exhausted the transaction is abandoned and the program moves on. The TransactionExecutorWithInfiniteBackoff will never give up, running the last time in the delay list over and over until the transaction finally succeeds or somebody kills the program.

How does the TransactionExecutor determine if an exception is eligible for retry? The connection context object is required to have a couple instance variables and methods to assist in the determination.

First, operational_exceptions defines a collection of exceptions that are eligible for the retry behavior. If one of the exceptions from this collection is raised, the retry behavior is triggered.

conditional_exceptions is a list of ambiguous exceptions that may or may not be eligible for retry. We encountered this with Postgres using psycopg2 on the ProgrammingError exception. Normally, this type of exception would not be retriable because it indicates a fundamental problem with a query such as a syntax error. Syntax errors are not retrible. However, sometimes we get network errors disguised as ProgrammingErrors; these are retriable.

If an exception found in the conditional_exceptions collection is raised, we have to further examine the error to determine if it should result in a failure or retry. The instance method is_operational_exception implemented by the connection class is used to determine in the current exception is retriable or not. In the case of Postgres, we look to the text of the exception to see if it contains the string “EOF”. We know that's a network error, not really a programming error so we can do a retry.

Is this class named poorly? Now that we've got many more external resources using this retry behavior and only Postgres is truly transactional, it seems that the name may not be right. Perhaps ExternalResourceActionRetrier?