It’s not often that I make mistakes, but I made some in the design of Lead, specifially in its use of the Zope 3 Component Architecture (CA). I think there are some useful lessons in these mistakes, and in the way I ended up doing things. Feel free to disagree. :)

Mistake #1 – Inventing an arbitrary new registry

The first mistake was one of not using the CA enough. Lead allows you to set up several databases, essentially with different connection parameters. An SQLAlchemy Engine is instantiated, lazily, based on this information and then made available via a component providing IDatabase, whose job it is to give access to an SQLAlchemy Session and Connection.

My first design had the following interaction pattern:

>>> from collective.lead.interfaces import IDatabases
>>> from zope.component import getUtility
>>> databases = getUtility(IDatabases)
>>> my_db = databases['my_db']

my_db would now be an instance of IDatabase, constructed lazily the first time it was retrieved. The global IDatabases utility maintained a dict of already-constructed IDatabase’s.

Here’s how it works now instead:

>>> from collective.lead.interfaces import IDatabase
>>> from zope.component import getUtility
>>> my_db = getUtility(IDatabase, name='my_database')

This is a much more natural API – the client code is looking for a resource (a database connection) and looks it up by type (IDatabase) and name. It did mean putting the lazy Engine instantiation logic inside IDatabase rather than some factory code, but that’s code that I only had to write once.

Mistake #2 – Over-componentising

The second mistake was to over-componentise the design. Lead is concerned with the instantiation of Engine’s and the management of transactions. Applications are supposed to register a new database (by name), providing the code to construct a data model with SQLAlchemy Table’s, an ORM model with SQLAlchemy Model’s, as well as provoding the DSN for the database.

In the old design, the application was responsible for registering three (!) different utilities:

  1. A named utility providing ITables. This was a dict-like mapping of tables, with a method called setUp() which was called by the IDatabases utility to set it all up.
  2. Similarly, a named utility providing IMappers contained mappers, set up from the tables when the IDatabases utility called setUp() on it.
  3. A named utility providing IDatabaseConnectionSettings provided the URL to use in the DSN when constructing the engine.

These all had to have the same name. The first time some client code requested a database by name from the IDatabases utility, it would look up each of these and construct an Engine, initialize the ITables and IMappers utilities and return the IDatabase.

Mostly, this design evolved because I was falling for the great CA design myth:

Component Architecture design means “don’t do subclasses”

Rubbish!

Inheritance in OOP is a fine way of modelling an “is-a” relationship. What proponents of component design suggest, is that using mix-in classes to support common features across a hierarchy of types leads to hard-to-maintain and difficult-to-extend code.

A database connection, as represented by an IDatabase utility, “is a” database. Using the general utility syntax, we can obtain one by name. All we need is for the application code to register a utility with the specific characteristics of a named database. And since most IDatabase utilities will share the same fundamental logic, it’s appropriate to provide a base class for IDatabase utilities.

Here’s the way you use it now:

from collective.lead import Database
import sqlalchemy as sa
class MyTable(object):
    pass

class MyDatabase(Database):

    url = sa.engine.url.URL(host='localhost', user='root', database='db', driver='sqlite')

    def _setup_tables(self, metadata, tables):
        tables['mytable'] = sa.Table('sometable', metadata)

    def _setup_mappers(self, tables, metadata):
        metadata['mytable'] = sa.mapper(MyTable, tables['mytable'])

And then you register this as a factory for a named utility providing IDatabase.

You might recognise this as the Template Method design pattern. Of course, being components, there’s nothing to say you can’t register another named utility providing IDatabase, without using this base class, so long as it conforms with its interface. The base class is an implementation detail which helps the utility writer getting the code right, nothing more.

I also used an adapter internally to represent the ITransactionAware aspect of a databsae connection, mostly to keep this out of the public API of the IDatabase class – this is an example of where using components rather than mix-in classes is probably a good idea.

4 Responses to “Component Architecture design lessons”

  1. [...] Component Architecture design lessons [...]

  2. Alex Martelli: “I would exempt mix-in classes from my general dislike of inheritance.”

    Apparently, one of the most clever Pythonistas thinks mix-in classes are one honking great idea.

    http://video.google.com/videoplay?docid=-3035093035748181693
    (about 40:00)

    That said, I’m not sure if I’d say mix-in classes are bad generally. At 14:38 into the video, Alex says: “Favour object composition over inheritance” and then “inherit only when it’s convenient”.

  3. optilude said

    I think we should disinguish between mix-in classes which are an implementation convenience, and those which are a polymorphic requirement.

    For example, if I can factor some functionality into a mix-in class and use it in a few different places as I see fit, that makes sense.

    If the framework (in this case, Zope 2) requires that certain classes are mixed in, in order for certain functionality to be available, you end up with a very large __dict__ (ever tried to dir() a content object in pdb?) and potential naming clashes.

    For example, see http://dev.plone.org/plone/browser/plone.app.content/trunk/plone/app/content/container.py

    I needed a lot of trial and error to get the right method resolution order with:

    class Container(OFSContainer,
    CMFCatalogAware,
    PortalFolderBase,
    PortalContent,
    DefaultDublinCoreImpl,
    Contained):

    Without all those, the class doesn’t behave as expected. If the order is wrong, it doesn’t work properly as a folder or isn’t catalogable, or doesn’t get dublin core and so on.

    We get that pain because of expected mix-in classes: Things don’t work if there’s no reindexObject() or catalog_object() or whatever on the class. So some base classes have to implement them as null operations. I then have to un-override them by re-mixing in things which were mixed in and partially overriden by another base class. Suddenly, method resolution order bcomes an art, not a science.

    The actual interface of an object is far to fat to re-implement manually, so you *have* to use mix-in base classes.

  4. Not that I’m totally impressed, but this is more than I expected when I found a link on Digg telling that the info here is awesome. Thanks.

Leave a Reply