Sunday, November 1, 2009

Temporal Data

In the first installment of this series, I mentioned that I unnecessarily complicated the ApparelFactory app by introducing temporal data, thus delving myself into the world of temporal database. A temporal database differs from an ordinary database in that it stores different version of data based on their validity throughout time. Other people simply refers to these versions as “history” of the information. A piece of data is said to be “valid” on a certain time if it is consider to be true at that time.  
“Querying a particular version of a temporal data is not straightforward because you need to consider the time component.”
It’s easy to spot a database which contains temporal data. You usually see two time columns in a table indicating the start and the end during which the data is considered valid. The granularity dictates the data type used. For example, if you don’t consider changes within a day to be relevant, then you can use DATETIME ignoring the time portion. SQL Server 2008 has a DATE data type well-suited for this need. If changes within the year is not relevant, you can use INT for both the start and end years. The granularity is usually dictated by the business but in the case of the ApparelFactory, I simply assumed that changes within the day is not relevant, thus I have a tables like the one below where start_dt and end_dt are the start date and end date respectively.
image
Figure 1 – A table for storing temporal data
Most of the time, users are only concerned with the current valid version of the data. Past versions are of interest only for analysis and evaluation of trends, say the price fluctuation of certain commodity. They may be viewed only through reports and some other data warehousing tools. Querying a particular version of a temporal data is not straightforward because you need to consider the time component. For example, in the t_allowance table, if I were to write a flexible query to retrieve any version an employee’s allowance by date, the WHERE statement would have looked like this
WHERE employee_id = @employee_id AND (@referenceDate BETWEEN start_dt AND end_dt)
Figure 2 – Flexible WHERE statement for a temporal data
A query involving BETWEEN operator is not the most efficient so coders usually cheat when writing one for retrieving the current version. They use = instead, but this presupposes that there is already an established convention in determining the current version. The most common is to make the end date nullable and any row with a null value is considered current. In the application, I used the another variation which sets the end date to a very unrealistically high value. This has the advantage of not requiring you to convert the null end date to a real value when using BETWEEN. This is because using BETWEEN on a null date always evaluates to false.

image
Figure 3 – Sample query utilizing equality operator instead of BETWEEN for temporal data
In a true temporal database, every change in the data is recorded. This is not practical since most of the time, only a subset of the columns are worth tracking. Only the columns  which trends are worth analyzing are tracked. These are then separated in a table together with  the start and end time columns. In the apparel app, I made sure that user can always see the history of the rates of every operation as well as their usage in a style (what operation is involved in a style at a certain point). The queries would have been complicated had I not employed some “acceptable cheats” in my table design. I’ll discuss more about these so-called cheats in the future installment of the series.

image
Figure 4 – Column which need to be tracked are usually taken out and made into a separate table
Temporal data are useless if the coder doesn’t employ an intuitive way of presenting them to the user. All these data should be available to reports of course, but users also appreciate it if they could see them instantly without resorting to reports. This way, they don’t have to be inundated with so many information when all they want is, let’s say the rate history for an operation which rate the company has just decided to change. In this connection, the ability for a user to select a date as a parameter during viewing of the data is very integral when designing the UI. A coder can simply throw in a DateTimePicker or Calendar web control. If the user prefers the actual change dates, then a list is a good choice. This approach involves another query but it always ensures new results during every query. It also presents the user with the exact number of changes; something not available with the date time picker / calendar approach.

In the application, the list approach is used as seen in Figure 5. The number of current operation for the style is 3, one is added sometime on 11/01/2009. The rate of one operation also changed on the same date, thus changing the current rate of the style.

Style history 1
Figure 5 – A UI displaying historical changes on data membership

The addition of another operation and the change in rate of another one can be seen in Figure 6. Since only the rate is tracked for the operation, it makes much more sense to simply display the history right away.

Operation view
igure 6 – A UI displaying historical changes of a single column
Being able to work on temporal database is really a fulfilling experience for a coder, let alone for a novice. A coder feels that his application adds more value to the business because it ensures all the data are there and easy for the viewing. The flipside is, it might not be needed at all and it’s just another unnecessary complexity to an already delayed project. It wouldn’t hurt if the client is asked first because the last thing a coder needs is a flawless feature that no one even dares to use.

Next, I’ll discuss the adjustments, conventions and the cheats I employed.

No comments:

Post a Comment