Configuration is eating my brain

I've created a monster and it has come back to eat my brain. I've made several blog posts about Configman, my universal configuration manager that encapsulates command line, configuration file and environment configuration systems. It is a powerful system that gave Socorro a flexible dependency injection framework. It has enabled us to swap out storage schemes and processing algorithms using configuration.

In Socorro, we've chosen to use INI files for configuration. Configman is able to create the canonical INI file for any app that employs Configman. Applications are comprised of components that declare what external resources they need. For example, a processor may need an HBase crash storage source, an HBase crash storage destination and a RabbitMQ queue. The processor code for each of these three components declare their needs in a Configman compatible manner. In turn, Configman will create an INI file for the processor that has three sections: source, destination and queue. Within each of these sections will be the configuration requirements for the external resources:

[source]
    storage_class=socorro.external.hb.crashstorage.HBaseCrashStorage
    host=localhost
    port=9090
[destination]         
    storage_class=socorro.external.hb.crashstorage.HBaseCrashStorage
    host=localhost
    port=9090
[queue]         
    queue_class=socorro.external.rabbitmq.new_crash_source
    host=rabbitmqHost
    user=rabbitmqUser
    password=rabbitmqPassword
Notice that the source and destination sections both have the same requirements. It is inconvenient to have to specify the HBase connection information twice. To solve that problem, we've chosen to extend the INI file syntax with an +include directive:
[source]
    +include common_hbase.ini
[destination]         
    +include common_hbase.ini
[queue]         
    queue_class=socorro.external.rabbitmq.new_crash_source
    host=rabbitmqHost
    user=rabbitmqUser
    password=rabbitmqPassword
Then we create the file common_hbase.ini with the HBase connection requirements and the information only has to be specified once.

This works great until some other component needs the some of the same information, but not all of it from the common_hbase.ini file. We cannot use the +include in that case because bringing extra symbols into the a section is an error as far as Configman is concerned. To get around this problem, we relaxed the requirements to allow unknown symbols in sections. Unfortunately, this immediately sacrifices important error detection: misspell a symbol and configman won't know if it is misspelled or just unused. This is not ideal.

The system of +include also enables multiple applications to share some configuration information. The processor and the crashmover both need to talk to HBase, so we could use one common_hbase.ini file for both applications. That works fine until one application needs different values for one or more of the parameters defined in the include file. This is the case in our production environment, where some applications use a different user names to connect with the same resource. We could factor the variable parameters back out of the +include file, or make nested +include files. As we get into it, however, we end up adding a whole new layer of complexity that is hard to manage.

Here is a proposal for getting around the problem. I'm going to mandate that all INI files have a [resource] section. Within that section, each external resource will have its own subsection. Configman will create this resource section automatically when it reads the resource requirements from the loaded application components.
[resources]
    [[hbase]]
        storage_class=socorro.external.hb.crashstorage.HBaseCrashStorage
        host=localhost
        port=9090
    [[rabbitmq]]
        queue_class=socorro.external.rabbitmq.new_crash_source
        host=rabbitmqHost
        user=rabbitmqUser
        password=rabbitmqPassword
[source]
    # storage_class -> resources.hbase.storage_class
    # storage_class=
    # host -> resources.hbase. host
    # host=
    # port -> resources.hbase. port
    # port=
[destination]         
    # storage_class -> resources.hbase.storage_class
    # storage_class=
    # host -> resources.hbase. host
    # host=
    # port -> resources.hbase. port
    # port=
[queue]      
    # storage_class -> resources.rabbitmq.storage_class   
    # queue_class=
    # host -> resources.rabbitmq. host
    # host=
    # user -> resources.rabbitmq. user
    # user=
    # password -> resources.rabbitmq.password
    # password=
For example, the application, when it wants its configuration value for the source storage_class, will reference the configuration object normally: config.source.storage_class. Behind the scenes, Configman knows that this configuration parameter is linked to the resource section. Configman will return the value from the resource section to the application.

In the case where a particular service needs a different value than the one defined in the resource section, it may be overridden in its original location by uncommenting it and providing an alternative value:

[resources]
    [[hbase]]
        storage_class=socorro.external.hb.crashstorage.HBaseCrashStorage
        host=localhost
        port=9090
…
[source]
    # host -> resources.hbase. host
    host=192.168.1.222
This new resource system does not preclude the use of +include files. If several applications were to need HBase configuration, a +include common_hbase.ini could be created and used inside the resource section:

[resources]
    [[hbase]]
        +include common_hbase.ini
The values read in from the +include file can be overridden in the original sections, just as in the previous example. However, because Configman employs ConfigObj for INI file processing, an override of a given value within the same section that has the +include is not allowed. This is a restriction imposed by ConfigObj.

How does this resolve the problem that we're having at Mozilla?

It consolidates the resources configs. Configuration for an app's external resources is done in one place at the top of the INI file for each app. We do not need to maintain the common_*.ini include files. The configuration files for development, staging, and production can be identical except for the resource connection details.

But now we have to repeat the resource connection information in the INI file for each app, isn't that less convenient?

We can choose to use +include files, but I discourage it. While we may be calling them 'common' files, in our production environment they aren't really common. The processors use a different HBase host than the middleware; the middleware uses a different user and host for Postgres than Crontabber; etc. Coding for exceptions to the common files is a complication.. It will be easier to maintain configuration on an app by app basis. It minimizes the number of configuration files and completely avoids +includes and their inevitable exceptions.