Sunday, May 04, 2014

Crouching Argparse, Hidden Configman

I've discovered that people that persist in being programmers over age fifty do not die.  Wrapped in blankets woven from their own abstractions, they're immune to the forces of the outside world. This is the first posting in a series about a pet hacking project of mine so deep in abstractions that not even light can escape.

I've written about Configman several times over the last couple of years as it applies to the Mozilla Socorro Crash Stats project.  It is unified configuration.  Configman strives to wrap all the different ways that configuration information can be injected into a program.  In doing so, it handily passes the event threshold and becomes a configuration manager, a dependency injection framework, a dessert topping and a floor wax.

In my experimental branch of Configman, I've finally added support for argparse.  That's the canonical Python module for parsing the command line into key/value pairs, presumably as configuration.  It includes its own data definition language in the form of calls to a function called add_argument.  Through this method, you define what information you'll accept from the command line.

argparse only deals with command lines.  It won't help you with environment variables, ini files, json files, etc.  There are other libraries that handle those things.  Unfortunately, they don't integrate at all with argparse and may include their own data definition system or none at all.

Integrating Configman with argparse was tough.  argparse doesn't play well in extending it in the manner that I want.  Configman employs argparse but resorts to deception to get the work done.  Take a look at this classic first example from the argparse documentation.

from configman import ArgumentParser

parser = ArgumentParser(description='Process some integers.')
parser.add_argument('integers', metavar='N', type=int, nargs='+',
                    help='an integer for the accumulator')
parser.add_argument('--sum', dest='accumulate', action='store_const',
                    const=sum, default=max,
                    help='sum the integers (default: find the max)')

args = parser.parse_args()
print(args.accumulate(args.integers))

Instead of importing argparse from its own module, I import it from Configman.  That just means that we're going to use my subclass of the argparse parser class.  Otherwise it looks, acts and tastes just like argparse: I don't emulate it or try to reimplement anything that it does, I use it to do what it does best.  Only at the command line, running the 'help' option, is the inner Configman revealed.

$ ./x1.py 0 0 --help
    usage: x1.py [-h] [--sum] [--admin.print_conf ADMIN.PRINT_CONF]
                 [--admin.dump_conf ADMIN.DUMP_CONF] [--admin.strict]
                 [--admin.conf ADMIN.CONF]
                 N [N ...]

    Process some integers.

    positional arguments:
      N                     an integer for the accumulator

    optional arguments:
      -h, --help            show this help message and exit
      --sum                 sum the integers (default: find the max)
      --admin.print_conf ADMIN.PRINT_CONF
                            write current config to stdout (json, py, ini, conf)
      --admin.dump_conf ADMIN.DUMP_CONF
                            a file system pathname for new config file (types:
                            json, py, ini, conf)
      --admin.strict        mismatched options generate exceptions rather than
                            just warnings
      --admin.conf ADMIN.CONF
                            the pathname of the config file (path/filename)

There's a bunch of options with "admin" in them.  Suddenly, argparse supports all the different configuration libraries that Configman understands: that brings a rainbow of configuration files to the argparse world.  While this little toy program hardly needs them, wouldn't it be nice to have a complete system of "ini" or "json" files with no more work than your original argparse argument definitions? 

using argparse through Configman means getting ConfigObj for free

Let's make our example write out its own ini file:

    $ ./x1.py --admin.dump_conf=x1.ini
    $ cat x1.ini
    # sum the integers (default: find the max)
    #accumulate=max
    # an integer for the accumulator
    #integers=
Then we'll edit that file and make it automatically use the sum function instead of the max function.  Uncomment the "accumulate" line and replace the "max" with "sum".  Configman will associate an ini file with the same base name as a program file to trigger automatic loading.  From that point on, invoking the program means loading the ini file.  That means the command line arguments aren't necessary.  Rather not have a secret automatically loaded config file? Give it a different name.

    $ ./x1.py 1 2 3
    6
    $ ./x1.py 4 5 6
    15
I can even make the integer arguments get loaded from the ini file.  Revert the "sum" line change and instead change the "integers" line to be a list of numbers of your own choice.

    $ cat x1.ini
    # sum the integers (default: find the max)
    #accumulate=max
    # an integer for the accumulator
    integers=1 2 3 4 5 6
    $ ./x1.py
    6
    $ ./x1.py --sum
    21

By the way, making argparse not have a complete conniption fit over the missing command line arguments was quite the engineering effort.  I didn't change it, I fooled it into thinking that the command line arguments are there.


Ini files are supported in Configman by ConfigObj.  Want json files instead of ini files?  Configman figures out what you want by the file extension and searches for an appropriate handler.  Specify that you want a "py" file and Configman will write a Python module of values.  Maybe I'll write an XML reader/writer next time I'm depressed.

Configman does environment variables, too:
    $ export accumulate=sum
    $ ./x1.py 1 2 3
    6
    $ ./x1.py 1 2 3 4 5 6
    21

There is a hierarchy to all this.  Think of it as layers: at the bottom you have the defaults expressed or implied by the arguments defined for argparse.  Then next layer up is the environment.  Anything that appears in the environment will override the defaults.  The next layer up is the config file.  Values found there will override both the defaults and the environment.  Finally, the arguments supplied on the command line override everything else.

This hierarchy is configurable, you can make it any order that you want.  In fact, you can put anything that conforms to the collections.Mapping api into that hierarchy.  However, for this example, as a drop-in augmentation of argparse, the api to adjust the "values source list" is not exposed.

In the next installment, I'll show a more interesting example where I play around with the type in the definition of the argparse arguments.  By putting a function there that will dynamically load a class, we suddenly have a poor man's dependency injection framework.  That idea is used extensively in Mozilla's Socorro to allow us to switch out storage schemes on the fly.

If you want to play around with this, you can pip install Configman.  However, what I've talked about here today with argparse is not part  of the current release.  You can get this version of configman from my github repo: Configman pretends to be argparse - source from github.  Remember, this branch is not production code.  It is an interesting exercise in wrapping myself in yet another layer of abstraction. 

My somewhat outdated previous postings on this topic begin with Configuration, Part 1