Wednesday, February 20, 2013

Too many repetitions in INI files

In the Socorro project we've chosen to use INI files for configuration. Because of the way dependency injection works in Configman, Socorro's INI files get crowded with repeated keys when multiple resources need similar configuration options.

    [source]
        some_key=4
        database_host=localhost
        database_user=wilma
    [destination]
        some_other_key=17
        database_host=localhost
        database_user=wilma

If both the source and the destination require the same information, why do we have to repeat the values in two places? Well, neither Configman nor Socorro can know ahead of time that you're going to use the same information for both 'source' and 'destination'. In fact, you could have specified a different class for the two that didn't even have the similar configuration variables. Plus, this gives the user the most flexibility just in case they want to use different database credentials for the two situations.

To help relieve the tedium of this, I devised a way to add include files to the INI syntax. This allowed uses like this:

    [source]
        some_key=4
        +include ./database_credentials.ini
    [destination]
        some_other_key=17
        +include ./database_credentials.ini

Having the external ./database_credentials.ini allows the values to be used over and over in different places. Rather than simplifying, I was horrified to realize how this actually made things more complex. Suddenly we had a second set of INI files to manage and then there was the problem of search paths for these extra files.

As a new or augmented solution, I'm going to take a concept from Zope called acquisition. In this latest version of configman, I'm allowing any configuration value source to drop common keys from inner nested namespaces into the outer namespaces:

    database_host=localhost
    database_user=wilma
    [source]
        some_key=4
    [destination]
        some_other_key=17

If Configman cannot find a value by keying directly to source.database_host, it walks the nested namespaces outward until it either finds an outer scope that contains the desired key, or it fails to find a match. If it finds matching key in the outer scope, it uses it. If it does not find the key, then Configman assumes that the INI file has declined to define a value and the default for that configuration option is retained.

In that example, the outer namespace defined the values for the inner namespaces. It is perfectly allowable for the inner namespaces to go ahead and define values, too. Any value in an inner namespace will override a value in an outer namespace:

    database_host=localhost
    database_user=wilma
    [source]
        some_key=4
    [destination]
        some_other_key=17
        database_user=dwight

In this example, the source namespace will get 'wilma' as the database user. destination will get 'dwight'

I'm optimistic that this change will help resolve the issue with repeated keys. To help adopt this change, the automatic config file writer will facilitate the change by automatically detecting candidate duplicates. It will automatically write these common entries into the top level namespace. It will leave the originals in place but comment them out:
    # name: database_host
    # doc: host name for the database
    # converter: str
    database_host=localhost
    
    # name: database_user
    # doc: username for the database
    # converter: str
    database_user=wilma
    
    [source]
        # name: database_host
        # doc: host name for the database
        # converter: str
        # database_host=localhost
    
        # name: database_user
        # doc: username for the database
        # converter: str
        # database_user=wilma
    
        # name: some_key
        # doc: the number for something important
        # converter: int
        some_key=4
        
    [destination]
        # name: database_host
        # doc: host name for the database
        # converter: str
        # database_host=localhost
    
        # name: database_user
        # doc: username for the database
        # converter: str
        # database_user=wilma
    
        # name: some_other_key
        # doc: another number for something important
        # converter: int
        some_other_key=17