Wednesday, March 02, 2005

Object-Relational mapping

You may have overheard the question before, whispered clandestinely between curious developpers. 'What is the point of using a object-relational tool - why not just use JDBC?' They ask. 'We know the database, and we can create the objects as we need them.' Let me posit some answers.

In order to address this question, we need to understand the full scope of a persistence layer.

1.) Transparent and/or low impact enabling of persistence on objects.
2.) Caching of objects
3.) Querying of objects
4.) Mapping of database to the objects and usually vice-versa

A lot of developers out there end up gradually evolving a home-grown and usual only partially functional persistence layer -- piece by piece, layer by layer.

At my current company, we have gone through 2 or 3 different persistence mechanisms over three years, with each one leaving a long trail of problems. This reflects general trends in software development over the past few years, although our codebase does seem to be a few years behind the industry-wide trends.

In the first stage, the code base was written using what is typically called a Table access mechanism, i.e., with a more or less 1-to-1 correspondence between the database structure and that of the data access objects. The relationships between tables are embodied in the SQL at the code level ( i.e. straight-forward JDBC for data retrieval and recording.)

The second generation (and, really the final generation, the third having never really received full traction in the company) comprises an evolutionary approach to persistence. This involves the following stages:

1.) Creating a set of classes similar to the classes above for loading/saving of individual classes
2.) Creating a single mechanism for loading of certain classes based on a unique ID structure (i.e. each class type has a corresponding ID type which is passed to the loader)
3.) Implementing caching for each loader.
4.) Getting the objects to manage their loading (e.g. have classes load database connections in the model)
5.) Creating a separate system for the saving of data.

Now, each of these evolutions came about as a new requirement came in, and the reason that a complete persistence system was never implemented was because it was never absolutely necessary to have a complete persistence domain at any one time. One can see, however, that in the end, the final result is a limited and rather poorly designed persistence mechanism. This, at least partially, illustrates why it is useful to think about the long-term picture before patching a limited system.

This addresses at least 3 of the properties listed above, but does not address the querying property. This came about later, when we were doing real-time analysis of data. Since we have a complex object model, including who created the data, who currently owned the data, etc -- altogether over 40 interconnected classes, ignoring sub-domain specific classes and data tables -- we often found ourselves performing complex queries across the datasets. The manner in which these queries were written down evolved over time, until they came to be written in a language that had a strange resemblance and HQL. Unfortunately, by the time this came about, we had written at least 3 different libraries for efficiently producing SQL queries, with the final version being only very slightly different from HQL.

The reasons that these changes were not foreseen (or perhaps just ignored) revolve a great deal around the politics of the company and its reluctance to use new technologies. However, at a deeper level it is a common mistake to evolve a design based on a small problem domain and not grasp the whole picture of where the code is heading. In our case, this had a great deal more to do with the quasi-burlesque view of our software as being a different from all other software projects, from the manner in which it was conceived, to its functionality, and finally to it capabilities, but these are topics for another day.

Since this is my first blog, I would appreciate comments ( if anyone happens to read this) with regards to the content: is this too little detail, is it stating the obvious, and does anyone really care? Did this help in the least bit to clarify why O/R layers are actually useful and not just some bizarre metadata-driven form of intellectual masturbation that allows architects to defend their constantly evolving realm?

No comments: