Monday, June 09, 2014

Crontabber and Postgres

This essay is about Postgres and Crontabber, the we-need-something-more-robust-than-cron job runner that Mozilla uses in Socorro, the crash reporting system.

Sloppy database programming in an environment where autocommit is turned off leads to very sad DBAs. There are a lot of programmers out there that cut their teeth in databases that either had autocommit on by default or didn't even implement transactions.  Programmers that are used to working with relational databases in autocommit mode actually miss out on one of the most powerful features of relational databases. However, bringing the cavalier attitude of autocommit into a transactional world will lead to pain.  

In autocommit mode, every statement given to the database is committed as soon as it is done. That isn't always the best way to interact with a database, especially if there are multiple steps to a database related task.

For example say we've got database tables representing monetary accounts. To move money from one account to another requires two steps, deduct from the first account and add to the other. If using autocommit mode, there is a danger that the accounts could get out of sync if some disaster happens between the two steps.

To counter that, transactions allow the two steps to be linked together. If something goes wrong during the two steps, we can rollback any changes and not let the accounts get out of sync. However, having manual transaction requires the programmer to be more careful and make sure that there is no execution path out of the database code that doesn't pass through either a commit or rollback. Failing to do so may end up leaving connections idle in transactions. The risk is critical consumption of resources and impending deadlocks.

Crontabber provides a feature to help make sure that database transactions get closed properly and still allow the programmer to be lazy.

When writing a Crontabber application that accesses a database, there are a number of helpers. Let's jump directly to the one that guarantees proper transactional behavior.

# sets up postgres
# tells crontabber control transactions
def run(self, connection):
    # connection is a standard psycopg2 connection instance.
    # use it to do the two steps:
    cursor = connection.cursor()
        'update accounts set total = total - 10' where acc_num = '1'
        'update accounts set total = total +10' where acc_num = '2'

In this contrived example, the method decorator gave the crontabber job the a connection to the database and will ensure that that if the job runs to completion, the transaction will be commited. It also guarantees that if the the 'run' method exits abnormally (an exception), the transaction will be rolled back.

Using this class decorator is declaring that this Crontabber job represents a single database transaction.  Needless to say, if the job takes twenty minutes to run, you may not want it to be a single transaction.  

Say you have a collection of periodic database related scripts that have evolved over years by Python programmers long gone. Some of the old crustier ones from the murky past are really bad about leaving database connections “idle in transaction”. In porting it to crontabber, call that ill behaved function from within the context of a construct like that previous example. Crontabber will take on the responsibility of transactions for that function with these simple rules:

  • If the method ends normally, crontabber will issue the commit on the connection.
  • If an exception escapes from the scope of the function, crontabber will rollback the database connection.

Crontabber provides three dedicated class decorators to assist in handling periodic Postgres tasks. Their documentation can be found here: Read The Docs: Postgres Decorators.  The @with_postgres_connection_as_argument decorator will pass the connection the run method, but does not handle commit and/or rollback.  Use that decorator if you'd like to manage transactions manually within the Crontabber job. 

Transactional behavior contributes in making Crontabber robust.  Crontabber is also robust because of self healing behaviors. If a given job fails, dependent jobs will not be run. The next time the periodic job's time to execute comes around, the 'backfill' mechanism will make sure that it makes up for the previous failure. See Read The Docs: Backfill for more details.

The transactional system can also contribute to self healing by retrying failed transactions, if those failures were caused by transient issues. Temporary network glitches can cause failure. If your periodic job runs only once every 24 hours, maybe you'd rather your app retry a few times before giving up and waiting for the next scheduled run time.

Through configuration, the transactional behavior of Postrges, embodied by Crontabber's TransactionExecutor class, can do a “backing off retry”. Here's the log of an example of backoff retry, my commentary is in green:

# we cannot seem to connect to Postgres
2014-06-08 03:23:53,101 CRITICAL - MainThread - ... transaction error eligible for retry
OperationalError: ERROR: pgbouncer cannot connect to server
# the TransactorExector backs off, retrying in 10 seconds
2014-06-08 03:23:53,102 DEBUG - MainThread - retry in 10 seconds
2014-06-08 03:23:53,102 DEBUG - MainThread - waiting for retry ...: 0sec of 10sec
# it fails again, this time scheduling a retry in 30 seconds;
2014-06-08 03:24:03,159 CRITICAL - MainThread - ... transaction error eligible for retry
OperationalError: ERROR: pgbouncer cannot connect to server
2014-06-08 03:24:03,160 DEBUG - MainThread - retry in 30 seconds
2014-06-08 03:24:03,160 DEBUG - MainThread - waiting for retry ...: 0sec of 30sec
2014-06-08 03:24:13,211 DEBUG - MainThread - waiting for retry ...: 10sec of 30sec
2014-06-08 03:24:23,262 DEBUG - MainThread - waiting for retry ...: 20sec of 30sec
# it fails a third time, now opting to wait for a minute before retrying
2014-06-08 03:24:33,319 CRITICAL - MainThread - ... transaction error eligible for retry
2014-06-08 03:24:33,320 DEBUG - MainThread - retry in 60 seconds
2014-06-08 03:24:33,320 DEBUG - MainThread - waiting for retry ...: 0sec of 60sec
2014-06-08 03:25:23,576 DEBUG - MainThread - waiting for retry ...: 50sec of 60sec
2014-06-08 03:25:33,633 CRITICAL - MainThread - ... transaction error eligible for retry
2014-06-08 03:25:33,634 DEBUG - MainThread - retry in 120 seconds
2014-06-08 03:25:33,634 DEBUG - MainThread - waiting for retry ...: 0sec of 120sec
2014-06-08 03:27:24,205 DEBUG - MainThread - waiting for retry ...: 110sec of 120sec
# finally it works and the app goes on its way
2014-06-08 03:27:34,989 INFO  - Thread-2 - starting job: 065ade70-d84e-4e5e-9c65-0e9ec2140606
2014-06-08 03:27:35,009 INFO  - Thread-5 - starting job: 800f6100-c097-440d-b9d9-802842140606
2014-06-08 03:27:35,035 INFO  - Thread-1 - starting job: a91870cf-4d66-4a24-a5c2-02d7b2140606
2014-06-08 03:27:35,045 INFO  - Thread-9 - starting job: a9bfe628-9f2e-4d95-8745-887b42140606
2014-06-08 03:27:35,050 INFO  - Thread-7 - starting job: 07c55898-9c64-421f-b1b3-c18b32140606
The TransactionExecutor can be set to retry as many times as you'd like with retries at whatever interval is desired.  The default is to try only once.  If you'd like the backing off retry behavior, change TransactionExecutor in the Crontabber config file to TransactionExecutorWithLimitedBackOff or TransactionExecutorWithInifiteBackOff

While Crontabber supports Postgres by default, Socorro, the Mozilla Crash Reporter, extends the support of the TransactionExecutor to HBase, RabbitMQ, and Ceph.  It would not be hard to get it to work for MySQL or,  really, any connection based resource.

The TransactionExecutor, Coupled with Crontabber's Backfilling capabilities, nobody has to get out of bed at 3am because the crons have failed again.  They can take care of themselves.

On Tuesday, June 10, Peter Bengtsson of Mozilla will give a presentation about Crontabber to the SFPUG.  The presentation will be broadcast on AirMozilla.

SFPUG June: Crontabber manages ALL the tasks