Configuration - Part 3

In installment number three, we're going to get to something interesting.  I made a statement in the first part of this series about programming having become more about configuration and less about algorithms.  I used Java's XML madness as an example.  When working in that world, I really loathed it -- but I understand it. It is a powerful concept and the basis for dependency injection.  Here's how to do it with the ConfigurationManager.

import config_manager as cm

n = cm.Namespace()
n.option(
    "storageClass",
    doc="a class name for storage",
    default="socorro.storage.crashstorage.HBaseCrashStorage",
    from_string_converter=cm.class_converter,
)
conf_man = cm.ConfigurationManager([n], application_name="sample")
config = conf_man.get_config()
print config.storageClass

On invocation, the ConfigurationManager will take the 'storageClass' option, overlay any replacement values from the environment, config files and the command line, then dynamically load the module and finally assign the resultant class to the key 'storageClass' in the mapping 'config'.

$ python sample.py --storageClass=socorro.storage.crashstorage.LegacyCrashStorage
>class socorro.storage.crashstorage.DatabaseStorage<

We can dynamically load classes from modules, which means that we can select and instantiate objects at runtime.  Programs that use this technique can instantiate and use objects that weren't even conceived of when the programs were originally written.  However, this is just the first step.

In the example, the class HBaseCrashStorage has some configuration requirements of its own.  For example, to open a connection to HBase, we need  'host', 'port' and 'timeout'.  Since the program doesn't know ahead of time what class it will be loading, it can't know ahead of time what config parameters it's going to need.  The class itself is going to have to cooperate and inform the configuration manager of its needs.

On dynamically loading a module containing a desired class, the ConfigurationManager interrogates the class for its configuration needs by invoking a function called 'get_config_requirements'.  If the class is equipped to respond, it will return a list of Options.  For example, the HBaseStorage class returns a list defined like this:

rc = cm.Namespace()
rc.option(
    name="hbaseHost",
    doc="Hostname for HBase/Hadoop cluster. May be a VIP or load balancer",
    default="localhost",
)
rc.option(name="hbasePort", doc="HBase port number", default=9090)
rc.option(name="hbaseTimeout", doc="timeout in milliseconds for an HBase connection", default=5000)

How can this work? By the time the ConfigurationManager has loaded the module, isn't it too late to apply new configuration variables?  Well, yes, but the ConfigurationManager knows that if it has loaded a class, that it had better do the whole overlay of value sources again.  Maybe that second overlay will dynamically load more classes, forcing the ConfigurationManger to overlay a third time.  In fact, the ConfigurationManager will repeat the overlaying until it knows that it hasn't loaded any new classes.

Here's the help output for the default run of the sample2.py program:

$ python sample2.py --help
    sample
        --_write
            write config file to stdout (conf, ini, json) (default: None)
        --config_path
            path for config file (not the filename) (default: ./)
        --hbaseHost
            Hostname for HBase/Hadoop cluster. May be a VIP or load balancer (default: localhost)
        --hbasePort
            HBase port number (default: 9090)
        --hbaseTimeout
            timeout in milliseconds for an HBase connection (default: 5000)
        --help
            print this
        --storageClass
            a class name for storage (default: socorro.storage.crashstorage.HBaseCrashStorage)  

Now we run it with and change the 'storageClass' on the command line:

$ python sample2.py --help   --storageClass=socorro.storage.crashstorage.LegacyCrashStorage
    sample2
        --_write
            write config file to stdout (conf, ini, json) (default: None)
        --config_path
            path for config file (not the filename) (default: ./)
        --deferredStorageRoot
            a file system root for crash storage (default: ./def/)
        --dirPermissions
            the permissions to use in creating directories (decimal) (default: 504)
        --dumpDirCount
            the max number of crashes that can be stored in any single directory (default: 1000)
        --dumpFileSuffix
            the file extention for dump files (default: dump)
        --dumpGID
            the GID to use when storing crashes (leave blank for file system default) (default: None)
        --dumpPermissions
            the permissions to use in storing crashes (decimal) (default: 432)
        --help
            print this
        --jsonFileSuffix
            the file extention for json files (default: json)
        --processedStorageRoot
            a file system root for crash storage (default: ./pro/)
        --storageClass
            a class name for storage (default: socorro.storage.crashstorage.LegacyCrashStorage)
        --storageRoot
            a file system root for crash storage (default: ./std/) 

This time, the help output looks very different because the requirements of the LegacyCrashStorage class were much more extensive than the requirements of the HBaseCrashStorage class.

Just like any config parameter, the 'classStorage' can be overriden in the environment, an ini, conf or json file, the command line or whatever source you can think of.

In the examples that I've given here, the HbaseCrashStorage and LegacyCrashStorage classes derive from a common base class.  The configuration manger module defines a mix-in base class called 'RequiredConfig' that provides some structure for hierarchical discovery of required configuration parameters.  Classes that derive from this base only need to define a class level Namespace (or dict) called 'required_config'.  The base class provides the method for walking the inheritance tree and collecting all the Options into one Namespace.

The examples that I've shown here have used classes, but the option could specify just a module.  I can see this being used to, perhaps, switch an application between using Postgres and MySQL.  I'm even using it in unit testing to 'inject' mock objects into instances of classes.

The next topic in this series?  Nested namespaces.