Wednesday, September 25, 2013

the Socorro Monitor, rest in peace

2013-09-24 14:11:59,347 INFO - MainThread - SIGTERM detected
2013-09-24 14:11:59,347 DEBUG - MainThread - outer detects quit
2013-09-24 14:11:59,347 DEBUG - MainThread - standardLoop done.
2013-09-24 14:11:59,347 DEBUG - MainThread - waiting to join.
2013-09-24 14:11:59,410 DEBUG - jobCleanupThread - got quit message
2013-09-24 14:11:59,410 INFO - jobCleanupThread - jobCleanupLoop done.
2013-09-24 14:12:00,071 DEBUG - priorityLoopingThread - outer detects quit
2013-09-24 14:12:00,072 INFO - priorityLoopingThread - priorityLoop done.
2013-09-24 14:12:00,072 DEBUG - MainThread - calling databaseConnectionPool.cleanup().
2013-09-24 14:12:00,072 DEBUG - MainThread - MainThread - killing database connections
2013-09-24 14:12:00,073 DEBUG - MainThread - MainThread - connection MainThread closed
2013-09-24 14:12:00,073 DEBUG - MainThread - MainThread - connection jobCleanupThread closed
2013-09-24 14:12:00,073 DEBUG - MainThread - MainThread - connection priorityLoopingThread closed
2013-09-24 14:12:00,074 DEBUG - MainThread - crashStore for MainThread closed
2013-09-24 14:12:00,074 DEBUG - MainThread - crashStore for priorityLoopingThread closed
2013-09-24 14:12:00,074 INFO - MainThread - done.
It touched literally billions of Firefox crashes since it first spun up in 2008. Conceived at a time when resources were scarce, the Socorro Monitor orchestrated assigning crashes to processors. Originally, it detected new crashes as they appeared in an NFS mounted file system. In 2010, it evolved to read instead from HBase.

Always called Socorro's SPOF (Single Point of Failure), it was a critical multithreaded singleton built like a patchwork calico cat: a queuing system, a registrar of processors, a systems status aggregator, a file system cleanup janitor. 

On Tuesday, September 24 at 14:11:59, it received SIGTERM for the last time. Just as it was designed to do, the quit signal propagated to all the threads and they dutifully shut down in an orderly manner, closing their resources, logging their own demise.

RabbitMQ, a real queuing system, replaces most functions of the monitor. The processors now drink from the RabbitMQ firehose as they see fit rather than being assigned jobs from the monitor. It's a simplification of the Socorro data flow that should enable Socorro to scale up to the next level. As a singleton, the monitor had a performance ceiling and therefore blocked our plans to increase the processing volume.

I cannot help but feeling a bit wistful: a few thousand lines of my code has just been retired forever. My coding footprint in the world is a bit smaller.

Thanks to Selena Deckelmann, Eric Rose, Brandon Savage, Brandon Burton, et al, for their bravery in wading into my code and welding RabbitMQ to the Socorro juggernaut.