Wednesday, October 29, 2014

Judge the Project, Not the Contributors

I recently read a blog posting titled, The 8 Essential Traits of a Great Open Source Contributor I am disturbed by this posting. While clearly not the intended effect, I feel the posting just told a huge swath of people that they are neither qualified nor welcome to contribute to Open Source. The intent of the posting was to say that there is a wide range of skills needed in Open Source. Even if a potential contributor feels they lack an essential technical skill, here's an enumeration of other skills that are helpful.
Over the years, I’ve talked to many people who have wanted to contribute to open source projects, but think that they don’t have what it takes to make a contribution. If you’re in that situation, I hope this post helps you get out of that mindset and start contributing to the projects that matter to you.
See? The author has completely good intentions. My fear is that the posting has the opposite effect. It raises a bar as if it is an ad for a paid technical position. He uses superlatives that say to me, “we are looking for the top people as contributors, not common people”.

Unfortunately, my interpretation of this blog posting is not the need for a wide range of skills, it communicates that if you contribute, you'd better be great at doing so. In fact, if you do not have all these skills, you cannot be considered great. So where is the incentive to participate? It makes Open Source sound as if it an invitation to be judged as either great or inadequate.

Ok, I know this interpretation is through my own jaundiced eyes. So to see if my interpretation was just a reflection of my own bad day, I shared the blog posting with a couple colleagues.  Both colleagues are women that judge their own skills unnecessarily harshly, but, in my judgement are really quite good. I chose these two specifically, because I knew both suffer “imposter syndrome”, a largely unshakable feeling of inadequacy that is quite common among technical people.   Both reacted badly to the posting, one saying that it sounded like a job posting for a position for which there would be no hope of ever landing.

I want to turn this around. Let's not judge the contributors, let's judge the projects instead. In fact, we can take these eight traits and boil them down to one: essential trait of a great open source project:
Essential trait of a great open source project:
Leaders & processes that can advance the project while marshalling imperfect contributors gracefully.
That's a really tall order. By that standard, my own Open Source projects are not great. However, I feel much more comfortable saying that the project is not great, rather than sorting the contributors.

If I were paying people to work on my project, I'd have no qualms about judging their performance any where along a continuum of “great” to “inadequate”. Contributors are NOT employees subject to performance review.  In my projects, if someone contributes, I consider both the contribution and the contributor to be “great”. The contribution may not make it into the project, but it was given to me for free, so it is naturally great by that aspect alone.

Contribution: Voluntary Gift

Perhaps if the original posting had said, "these are the eight gifts we need" rather than saying the the gifts are traits of people we consider "great", I would not have been so uncomfortable.

A great Open Source project is one that produces a successful product and is inclusive. An Open Source project that produces a successful product, but is not inclusive, is merely successful.

Monday, June 09, 2014

Crontabber and Postgres


This essay is about Postgres and Crontabber, the we-need-something-more-robust-than-cron job runner that Mozilla uses in Socorro, the crash reporting system.

Sloppy database programming in an environment where autocommit is turned off leads to very sad DBAs. There are a lot of programmers out there that cut their teeth in databases that either had autocommit on by default or didn't even implement transactions.  Programmers that are used to working with relational databases in autocommit mode actually miss out on one of the most powerful features of relational databases. However, bringing the cavalier attitude of autocommit into a transactional world will lead to pain.  

In autocommit mode, every statement given to the database is committed as soon as it is done. That isn't always the best way to interact with a database, especially if there are multiple steps to a database related task.

For example say we've got database tables representing monetary accounts. To move money from one account to another requires two steps, deduct from the first account and add to the other. If using autocommit mode, there is a danger that the accounts could get out of sync if some disaster happens between the two steps.

To counter that, transactions allow the two steps to be linked together. If something goes wrong during the two steps, we can rollback any changes and not let the accounts get out of sync. However, having manual transaction requires the programmer to be more careful and make sure that there is no execution path out of the database code that doesn't pass through either a commit or rollback. Failing to do so may end up leaving connections idle in transactions. The risk is critical consumption of resources and impending deadlocks.

Crontabber provides a feature to help make sure that database transactions get closed properly and still allow the programmer to be lazy.

When writing a Crontabber application that accesses a database, there are a number of helpers. Let's jump directly to the one that guarantees proper transactional behavior.

# sets up postgres
@using_postgres()
# tells crontabber control transactions
@as_single_postgres_transaction()
def run(self, connection):
    # connection is a standard psycopg2 connection instance.
    # use it to do the two steps:
    cursor = connection.cursor()
    cursor.execute(
        'update accounts set total = total - 10' where acc_num = '1'
    )
    do_something_dangerous_that_could_cause_an_exception()
    cursor.execute(
        'update accounts set total = total +10' where acc_num = '2'
    )

In this contrived example, the method decorator gave the crontabber job the a connection to the database and will ensure that that if the job runs to completion, the transaction will be commited. It also guarantees that if the the 'run' method exits abnormally (an exception), the transaction will be rolled back.

Using this class decorator is declaring that this Crontabber job represents a single database transaction.  Needless to say, if the job takes twenty minutes to run, you may not want it to be a single transaction.  

Say you have a collection of periodic database related scripts that have evolved over years by Python programmers long gone. Some of the old crustier ones from the murky past are really bad about leaving database connections “idle in transaction”. In porting it to crontabber, call that ill behaved function from within the context of a construct like that previous example. Crontabber will take on the responsibility of transactions for that function with these simple rules:

  • If the method ends normally, crontabber will issue the commit on the connection.
  • If an exception escapes from the scope of the function, crontabber will rollback the database connection.

Crontabber provides three dedicated class decorators to assist in handling periodic Postgres tasks. Their documentation can be found here: Read The Docs: Postgres Decorators.  The @with_postgres_connection_as_argument decorator will pass the connection the run method, but does not handle commit and/or rollback.  Use that decorator if you'd like to manage transactions manually within the Crontabber job. 

Transactional behavior contributes in making Crontabber robust.  Crontabber is also robust because of self healing behaviors. If a given job fails, dependent jobs will not be run. The next time the periodic job's time to execute comes around, the 'backfill' mechanism will make sure that it makes up for the previous failure. See Read The Docs: Backfill for more details.

The transactional system can also contribute to self healing by retrying failed transactions, if those failures were caused by transient issues. Temporary network glitches can cause failure. If your periodic job runs only once every 24 hours, maybe you'd rather your app retry a few times before giving up and waiting for the next scheduled run time.

Through configuration, the transactional behavior of Postrges, embodied by Crontabber's TransactionExecutor class, can do a “backing off retry”. Here's the log of an example of backoff retry, my commentary is in green:

# we cannot seem to connect to Postgres
2014-06-08 03:23:53,101 CRITICAL - MainThread - ... transaction error eligible for retry
OperationalError: ERROR: pgbouncer cannot connect to server
# the TransactorExector backs off, retrying in 10 seconds
2014-06-08 03:23:53,102 DEBUG - MainThread - retry in 10 seconds
2014-06-08 03:23:53,102 DEBUG - MainThread - waiting for retry ...: 0sec of 10sec
# it fails again, this time scheduling a retry in 30 seconds;
2014-06-08 03:24:03,159 CRITICAL - MainThread - ... transaction error eligible for retry
OperationalError: ERROR: pgbouncer cannot connect to server
2014-06-08 03:24:03,160 DEBUG - MainThread - retry in 30 seconds
2014-06-08 03:24:03,160 DEBUG - MainThread - waiting for retry ...: 0sec of 30sec
2014-06-08 03:24:13,211 DEBUG - MainThread - waiting for retry ...: 10sec of 30sec
2014-06-08 03:24:23,262 DEBUG - MainThread - waiting for retry ...: 20sec of 30sec
# it fails a third time, now opting to wait for a minute before retrying
2014-06-08 03:24:33,319 CRITICAL - MainThread - ... transaction error eligible for retry
2014-06-08 03:24:33,320 DEBUG - MainThread - retry in 60 seconds
2014-06-08 03:24:33,320 DEBUG - MainThread - waiting for retry ...: 0sec of 60sec
...
2014-06-08 03:25:23,576 DEBUG - MainThread - waiting for retry ...: 50sec of 60sec
2014-06-08 03:25:33,633 CRITICAL - MainThread - ... transaction error eligible for retry
2014-06-08 03:25:33,634 DEBUG - MainThread - retry in 120 seconds
2014-06-08 03:25:33,634 DEBUG - MainThread - waiting for retry ...: 0sec of 120sec
...
2014-06-08 03:27:24,205 DEBUG - MainThread - waiting for retry ...: 110sec of 120sec
# finally it works and the app goes on its way
2014-06-08 03:27:34,989 INFO  - Thread-2 - starting job: 065ade70-d84e-4e5e-9c65-0e9ec2140606
2014-06-08 03:27:35,009 INFO  - Thread-5 - starting job: 800f6100-c097-440d-b9d9-802842140606
2014-06-08 03:27:35,035 INFO  - Thread-1 - starting job: a91870cf-4d66-4a24-a5c2-02d7b2140606
2014-06-08 03:27:35,045 INFO  - Thread-9 - starting job: a9bfe628-9f2e-4d95-8745-887b42140606
2014-06-08 03:27:35,050 INFO  - Thread-7 - starting job: 07c55898-9c64-421f-b1b3-c18b32140606
The TransactionExecutor can be set to retry as many times as you'd like with retries at whatever interval is desired.  The default is to try only once.  If you'd like the backing off retry behavior, change TransactionExecutor in the Crontabber config file to TransactionExecutorWithLimitedBackOff or TransactionExecutorWithInifiteBackOff

While Crontabber supports Postgres by default, Socorro, the Mozilla Crash Reporter, extends the support of the TransactionExecutor to HBase, RabbitMQ, and Ceph.  It would not be hard to get it to work for MySQL or,  really, any connection based resource.

The TransactionExecutor, Coupled with Crontabber's Backfilling capabilities, nobody has to get out of bed at 3am because the crons have failed again.  They can take care of themselves.

On Tuesday, June 10, Peter Bengtsson of Mozilla will give a presentation about Crontabber to the SFPUG.  The presentation will be broadcast on AirMozilla.

SFPUG June: Crontabber manages ALL the tasks








Sunday, May 04, 2014

Crouching Argparse, Hidden Configman

I've discovered that people that persist in being programmers over age fifty do not die.  Wrapped in blankets woven from their own abstractions, they're immune to the forces of the outside world. This is the first posting in a series about a pet hacking project of mine so deep in abstractions that not even light can escape.

I've written about Configman several times over the last couple of years as it applies to the Mozilla Socorro Crash Stats project.  It is unified configuration.  Configman strives to wrap all the different ways that configuration information can be injected into a program.  In doing so, it handily passes the event threshold and becomes a configuration manager, a dependency injection framework, a dessert topping and a floor wax.

In my experimental branch of Configman, I've finally added support for argparse.  That's the canonical Python module for parsing the command line into key/value pairs, presumably as configuration.  It includes its own data definition language in the form of calls to a function called add_argument.  Through this method, you define what information you'll accept from the command line.

argparse only deals with command lines.  It won't help you with environment variables, ini files, json files, etc.  There are other libraries that handle those things.  Unfortunately, they don't integrate at all with argparse and may include their own data definition system or none at all.

Integrating Configman with argparse was tough.  argparse doesn't play well in extending it in the manner that I want.  Configman employs argparse but resorts to deception to get the work done.  Take a look at this classic first example from the argparse documentation.

from configman import ArgumentParser

parser = ArgumentParser(description='Process some integers.')
parser.add_argument('integers', metavar='N', type=int, nargs='+',
                    help='an integer for the accumulator')
parser.add_argument('--sum', dest='accumulate', action='store_const',
                    const=sum, default=max,
                    help='sum the integers (default: find the max)')

args = parser.parse_args()
print(args.accumulate(args.integers))

Instead of importing argparse from its own module, I import it from Configman.  That just means that we're going to use my subclass of the argparse parser class.  Otherwise it looks, acts and tastes just like argparse: I don't emulate it or try to reimplement anything that it does, I use it to do what it does best.  Only at the command line, running the 'help' option, is the inner Configman revealed.

$ ./x1.py 0 0 --help
    usage: x1.py [-h] [--sum] [--admin.print_conf ADMIN.PRINT_CONF]
                 [--admin.dump_conf ADMIN.DUMP_CONF] [--admin.strict]
                 [--admin.conf ADMIN.CONF]
                 N [N ...]

    Process some integers.

    positional arguments:
      N                     an integer for the accumulator

    optional arguments:
      -h, --help            show this help message and exit
      --sum                 sum the integers (default: find the max)
      --admin.print_conf ADMIN.PRINT_CONF
                            write current config to stdout (json, py, ini, conf)
      --admin.dump_conf ADMIN.DUMP_CONF
                            a file system pathname for new config file (types:
                            json, py, ini, conf)
      --admin.strict        mismatched options generate exceptions rather than
                            just warnings
      --admin.conf ADMIN.CONF
                            the pathname of the config file (path/filename)

There's a bunch of options with "admin" in them.  Suddenly, argparse supports all the different configuration libraries that Configman understands: that brings a rainbow of configuration files to the argparse world.  While this little toy program hardly needs them, wouldn't it be nice to have a complete system of "ini" or "json" files with no more work than your original argparse argument definitions? 

using argparse through Configman means getting ConfigObj for free

Let's make our example write out its own ini file:

    $ ./x1.py --admin.dump_conf=x1.ini
    $ cat x1.ini
    # sum the integers (default: find the max)
    #accumulate=max
    # an integer for the accumulator
    #integers=
Then we'll edit that file and make it automatically use the sum function instead of the max function.  Uncomment the "accumulate" line and replace the "max" with "sum".  Configman will associate an ini file with the same base name as a program file to trigger automatic loading.  From that point on, invoking the program means loading the ini file.  That means the command line arguments aren't necessary.  Rather not have a secret automatically loaded config file? Give it a different name.

    $ ./x1.py 1 2 3
    6
    $ ./x1.py 4 5 6
    15
I can even make the integer arguments get loaded from the ini file.  Revert the "sum" line change and instead change the "integers" line to be a list of numbers of your own choice.

    $ cat x1.ini
    # sum the integers (default: find the max)
    #accumulate=max
    # an integer for the accumulator
    integers=1 2 3 4 5 6
    $ ./x1.py
    6
    $ ./x1.py --sum
    21

By the way, making argparse not have a complete conniption fit over the missing command line arguments was quite the engineering effort.  I didn't change it, I fooled it into thinking that the command line arguments are there.


Ini files are supported in Configman by ConfigObj.  Want json files instead of ini files?  Configman figures out what you want by the file extension and searches for an appropriate handler.  Specify that you want a "py" file and Configman will write a Python module of values.  Maybe I'll write an XML reader/writer next time I'm depressed.

Configman does environment variables, too:
    $ export accumulate=sum
    $ ./x1.py 1 2 3
    6
    $ ./x1.py 1 2 3 4 5 6
    21

There is a hierarchy to all this.  Think of it as layers: at the bottom you have the defaults expressed or implied by the arguments defined for argparse.  Then next layer up is the environment.  Anything that appears in the environment will override the defaults.  The next layer up is the config file.  Values found there will override both the defaults and the environment.  Finally, the arguments supplied on the command line override everything else.

This hierarchy is configurable, you can make it any order that you want.  In fact, you can put anything that conforms to the collections.Mapping api into that hierarchy.  However, for this example, as a drop-in augmentation of argparse, the api to adjust the "values source list" is not exposed.

In the next installment, I'll show a more interesting example where I play around with the type in the definition of the argparse arguments.  By putting a function there that will dynamically load a class, we suddenly have a poor man's dependency injection framework.  That idea is used extensively in Mozilla's Socorro to allow us to switch out storage schemes on the fly.

If you want to play around with this, you can pip install Configman.  However, what I've talked about here today with argparse is not part  of the current release.  You can get this version of configman from my github repo: Configman pretends to be argparse - source from github.  Remember, this branch is not production code.  It is an interesting exercise in wrapping myself in yet another layer of abstraction. 

My somewhat outdated previous postings on this topic begin with Configuration, Part 1

Sunday, April 13, 2014

Pegboard Tool Storage - circuit tester & grounding adapter

(this post is part 8 of a longer series.  See  
Pegboard Tool Storage, Part 1 for the beginning,
or the whole series)

These are the most commonly lost tools that I own.  In fact, I finally found the left grounding adapter on the floor behind the shelf in the garage, no where near an outlet.  That's the problem: make these tools easy to find.


The solution is to make fake electrical outlets to put on the pegboard, of course.  Now all have to do is remember to put them away when I'm done with them.  If I can manage to do that, finding them again will be trivial because they'll be in plain sight.


This simple design is available for download at: pegboard outlet storage

Saturday, April 12, 2014

Pegboard Tool Storage - hex bits

(this post is part 7 of a longer series.  See  


I've got lots of hex bits for the electric drill, the electric screw driver and various interchangeable tip tools.  I've been keeping them in a jar on a shelf.  I've gotten sick of dumping them out to search for the one that I want.


This design has given me instant visual access to the whole array of hex bits.  The photo is of only one of them, I've actually made a bunch for all the different types:


This design is available for 3D printing at: Pegboard Hex Bit Holder

Next in this series Circuit Tester and Grounding Adapter

Wednesday, April 09, 2014

Pegboard Tool Storage - the goat hook

(this post is part 6 of a longer series.  See  

 And why shouldn't pegboard storage systems be whimsical?   This design for a screwdriver holder came as a sudden inspiration while browsing thingiverse.com. I saw the goat head sculpture and decided to scale it down and mix it up with my pegboard blank:



Adding the funnel shaped hole through the top of the head and emerging from the mouth makes for a silly and slightly disturbing tool holder.






This design is available for the 3D printing at: http://www.thingiverse.com/thing:293801

Next in this series, the ever exciting Hex Bits

Monday, April 07, 2014

Pegboard Tool Storage - Batteries


(this post is part 5 of a longer series.  See  

I am a heavy user of rechargeable batteries.  There is a trick to the successful use of rechargeables. If you have a need for just a handful of batteries, then you need to have 2 times the number of rechargeables than you have spaces that need batteries.  If you have more than a handful, you can get away with about 1.5 times the number of spaces requiring batteries.

The overstock is so that that you never have the experience of running out of power  and having to wait for batteries to charge.  

I keep the battery charger in the breezeway into the old cottage, just beneath the pegboard tools storage.  I keep fully charged batteries on the pegboard so they can just be grabbed as needed.  Batteries in need of a charge go directly into the charger sitting on the self below the pegboard.  When I walk by, if I notice that there are completed batteries in the charger, I just transfer them up on the pegboard holders.

Here's my design of the pegboard battery station:




Those designs translate into reality as these:



You can download this design at my Pegboard Battery Storage entry on thingiverse.com