Nullable Nomad

Tuesday, November 9, 2010

I've moved

I like the formatting capability of the site that's why. You can follow me here.

Thursday, February 11, 2010

Based on my experience, this one is a bit trickier than the LinqToSql counterpart I’ve explored. Whereas LinqToSql is error prone because it requires you to use .NET object within literals (bye bye Intellisense), Entity Framework (EF) is no different with its proprietary Entity SQL and other datatype nuances. To access the columns of a table, you need to use the it keyword. I still haven’t found the documentation (and rationale) behind this. I also found out that DateTime value needs special syntax in the form DATETIME’YYYY-MM-DD HH:mm’. The problem with this is that there is no DateTime.ToString() overload that spits out this format. I had to create an extension just for this. If you use TEXT in your table, prepare for the dreaded The text, ntext, and image data types cannot be compared or sorted, except when using IS NULL or LIKE operator error. This is very pesky but I was able to solve this by replacing text with varchar. Googling this error message yields a couple of discussion but none of them works for me unfortunately. Below is the filter-building logic part of the downloadable code.

Tuesday, December 8, 2009

Filtering LinqDataSource

Codes can be downloaded here

The RAD nature of ASP.NET Webforms makes it the sound choice for creating tools and utilities. I’m talking about small applications which aid development and are not intended for production. Among these features, the one that probably saves the greatest amount of time is data binding. I’ve tried all the data sources and with LINQ elegant syntax involving extension methods and lambda’s, I expected that with LinqDataSource, I can apply these new language features during filtering. I’m very disappointed to find out I have to deal instead with something I hated most – string literals!

The page I created is a log viewer which is nothing but a grid and a bunch of controls for the filter values. The illustration here is simplistic but the technique is not compromised in any way. The data is stored in a table with columns for the date the log was made, the level or severity of the message and of course the message. A grid will display this using LinqDataSource.

Because it’s an ad hoc filter, I decided not to use the configurable WHERE parameters of the LinqDataSource. I thought I could add parameters as needed at run time. It is indeed possible and after some rummaging, I found out that the best place to do this is in the LinqDataSource.Selecting event. In this event, the sender is a LinqDataSourceView which exposes a Where string property. This accepts the body of the lambda expression that one usually put in the Where extension method. The only catch: it’s string. Beautiful! The tough part is when dealing with dates because you have to use DateTime utility methods and all this without the benefit of Intellisense. Shown below are the pertinent codes for the sample app you can download for this post.

Tuesday, November 17, 2009

Temporal Data: Considerations, Conventions and Cheats

I’m not an expert of temporal database, its concepts and principles. The techniques I discuss here are the ones I employed to an application during my early days of programming. For exhaustive information on this subject, you may consult the works of Richard T. Snodgrass or C.J. Date. It’s so easy, just Google them. ;)

Temporal Granularity

The first thing to establish in a temporal database is the shortest length of time by which a change in the data is considered significant. This is commonly referred to as the "temporal granularity”. In my application, I assumed it to be by “day” but SQL Server didn’t have any DATE-only data type until version 10 so this limitation was taken into consideration during the development. A function was created to remove the time portion of a DATETIME.

Figure 1 – A function to remove the time portion of a DATETIME

Index

If you isolate the columns to be tracked in another table, the new table should have a unique clustered index on the combination of the foreign key and the start date time. An example of this is the operation history table which tracks changes of the rate of an operation. In that table, the foreign key and start date combination is part of a unique clustered index to ensure that there is always one valid rate per given date. It also wouldn’t hurt if a check is employed to make sure that the end date is always after the start date.

Figure 2 – Table containing temporal data should have unique clustered index on the FK and start date/time

Querying Strategy

Querying a version of data in a temporal table always involves a date parameter indicating the version to retrieve. This date should fall between the start and end dates of at most one row for every id. The id here refers to the foreign key pointing to the non-temporal parent table. To make this clear, consider the following transactions:

Operation “Button Attachment” with rate of 0.35 is inserted on 10/28/2009
Rate for operation “Button Attachment” is changed to 0.37 on 11/01/2009

These insert 3 rows: 1 for the operation table and 2 to the operation history. The former is not pertinent to the discussion because it does not contain temporal data. In the latter, the rows inserted are highlighted below:

Figure 3 – Sample rows showing the current and previous version of a temporal data

If the user asks for the rate on 10/30/2009, obviously she would retrieve 0.35. The query is straightforward as long as the date fall between the transaction dates. What would she get then if she asks for the rate on 11/01/2009? Again, this is where business should come to the rescue. The business should establish the convention employed during the closing and opening of two contiguous periods, which happens when the data is updated. The convention can be determined by simply asking two questions:
Given an update on date n

What should be the end date of the previous period?
What should be the start date of the current period?

The answers determine the insert and update strategy to employ. If for the first question, Business says n-1, then it also implies that the answer for the second question is n. It can also be an n for the first and n+1 for the second. Important thing is that they should not overlap. After the convention is established, the BETWEEN operator can be safely used in the query as shown here:

SELECT o.name_vc, h.labor_sm FROM t_operation o
JOIN t_ophist h ON o.operation_id = h.operation_id
WHERE o.operation_id = @operation_id
AND (@versionDate BETWEEN h.start_dt AND h.end_dt)

Figure 4 – The most correct way of querying temporal data

In Figure 3 however, one can see that in my application, the end of the previous and the start of the current periods are the same. This is because back the, I didn’t know about BETWEEN operator. To achieve the same effect, I had to use inequalities operator and lines like these were all over the place:

start_dt <= @versionDate AND @versionDate < end_dt

Figure 5 – A condition used in the application in lieu of BETWEEN

Thru that filter statement, I effectively established the convention of n on end of current and n+1 on the start of the new version

Data Modification Logic

One interesting aspect of temporal data is that it changes the conventional semantics of data modification. Update can be any of these 3 cases: update plus insert, update only, delete and update. A delete in the other hand is merely an update on one column. Finally, an insert may only occur in the temporal table and not in the parent (non-temporal) table. Now it should be clear why dealing with temporal data is definitely not a walk in the park.

When a temporal data in my application is updated, the start date of the current version is compared to the date of the transaction. The first case in update happens if the transaction date is not the same as the start date of the current version. The current version is closed or invalidated and a new version is inserted. Invalidating a version is just providing a valid value to the end date depending on the convention used. The “Button Attachment” transactions is an example of this case. The update is later than the start date of the current version therefore a new record is inserted and the current version is invalidated with the date of the transaction.

The second case happens when the transaction occurs on the same day as the start date of the current version. A delete and insert in this case is an overwork because all that is needed is an update on the rate (labor_sm). If we use the “Button Attachment” as an example, this is the case when the rate is raised again to 0.38 on 11/1/2009.

The third case is a special case of the second. This happens when the new value is the same as the value of the previous version. The previous version is the one preceding the current. In the “Button Attachment” example, this happens when the rate is changed back to 0.35 on 11/01/2009. This is like not updating the 10/28/2009-11/01/2009 version at all. So the course of action is to delete the current and open the 10/28/2009-11/01/2009 version.

Figure 6 – An stored procedure illustrating the different cases involved in updating a temporal data

As mentioned before, there is no actual delete in a temporal database. Delete is achieved by simply closing the current version on the temporal table. In the example, if “Button Attachment” is deleted on 11/01/2009, the only action taken is closing the current version. The record “Button Attachment” in the operation table is left intact. This means that during insert, it’s possible that the data is already in the operation table. In such a case, an insert to the operation history for the new valid version of the said data does the job.

Cheats

Later during the development of the application, it became clear that most of the queries in the frontend involved only the current data. I was fully aware of the performance implication innate to temporal queries like that in Figure 4 so I decided to employ a cheat. Instead of BETWEEN, I used = operator to get the current version since a convention is already established that end date for current should always be 12/30/5000. This also means that end date was no longer null-able and was included in the unique clustered index together with the FK and start date.

Another cheat that I employed later in my other projects is what I call the “reverse relationship”. It still uses = but instead of end date, the parent table contains a foreign key of the temporal child table. This also necessitates having a surrogate primary key in the temporal table. Usually, like in Figure 2, the temporal table can use the combination of the FK and the start date as a composite PK but in the case of reverse relationship, this composite key becomes a bulky FK in the parent table. A lean and faster approach is to have a surrogate key as FK like the one illustrated in Figure 7. Of course, it doesn’t have to be a constraint; a column pointer would suffice but careful check should be done to avoid pointing to a non-existent temporal data. The “fast” factor comes into play in there is a very high volume of data in the temporal table. A slight performance benefit can be squeezed out from querying integer instead of a date.

Figure 7 – A sample parent-temporal child table with “reverse relationship” to speed up temporal query

I’ve also seen temporal table that has flag column to indicate if the row is current. Like the reverse relationship, any technique that involves additional column to convey a semantics also open the database to potential synchronization problems. For example, it’s possible that a flaw in the update logic might insert the wrong FK in the parent table or flag the wrong row as current.

I’m sure there are still more “cheats” or techniques out there. A modeler can employ even the most unconventional but it shouldn’t matter as long it’s justified. These kind of practices deviates from the standard so proper documentation of these “deviation” is also a must.

What’s Next

Next post will be about the preliminary efforts I had when translating the VB6 version of the application to .NET version. I’ll discuss the overall architecture and motivation of the approach I chose.

Sunday, November 1, 2009

Temporal Data

In the first installment of this series, I mentioned that I unnecessarily complicated the ApparelFactory app by introducing temporal data, thus delving myself into the world of temporal database. A temporal database differs from an ordinary database in that it stores different version of data based on their validity throughout time. Other people simply refers to these versions as “history” of the information. A piece of data is said to be “valid” on a certain time if it is consider to be true at that time.

“Querying a particular version of a temporal data is not straightforward because you need to consider the time component.”

It’s easy to spot a database which contains temporal data. You usually see two time columns in a table indicating the start and the end during which the data is considered valid. The granularity dictates the data type used. For example, if you don’t consider changes within a day to be relevant, then you can use DATETIME ignoring the time portion. SQL Server 2008 has a DATE data type well-suited for this need. If changes within the year is not relevant, you can use INT for both the start and end years. The granularity is usually dictated by the business but in the case of the ApparelFactory, I simply assumed that changes within the day is not relevant, thus I have a tables like the one below where start_dt and end_dt are the start date and end date respectively.

Figure 1 – A table for storing temporal data

Most of the time, users are only concerned with the current valid version of the data. Past versions are of interest only for analysis and evaluation of trends, say the price fluctuation of certain commodity. They may be viewed only through reports and some other data warehousing tools. Querying a particular version of a temporal data is not straightforward because you need to consider the time component. For example, in the t_allowance table, if I were to write a flexible query to retrieve any version an employee’s allowance by date, the WHERE statement would have looked like this

WHERE employee_id = @employee_id AND (@referenceDate BETWEEN start_dt AND end_dt)

Figure 2 – Flexible WHERE statement for a temporal data

A query involving BETWEEN operator is not the most efficient so coders usually cheat when writing one for retrieving the current version. They use = instead, but this presupposes that there is already an established convention in determining the current version. The most common is to make the end date nullable and any row with a null value is considered current. In the application, I used the another variation which sets the end date to a very unrealistically high value. This has the advantage of not requiring you to convert the null end date to a real value when using BETWEEN. This is because using BETWEEN on a null date always evaluates to false.

Figure 3 – Sample query utilizing equality operator instead of BETWEEN for temporal data

In a true temporal database, every change in the data is recorded. This is not practical since most of the time, only a subset of the columns are worth tracking. Only the columns which trends are worth analyzing are tracked. These are then separated in a table together with the start and end time columns. In the apparel app, I made sure that user can always see the history of the rates of every operation as well as their usage in a style (what operation is involved in a style at a certain point). The queries would have been complicated had I not employed some “acceptable cheats” in my table design. I’ll discuss more about these so-called cheats in the future installment of the series.

Figure 4 – Column which need to be tracked are usually taken out and made into a separate table

Temporal data are useless if the coder doesn’t employ an intuitive way of presenting them to the user. All these data should be available to reports of course, but users also appreciate it if they could see them instantly without resorting to reports. This way, they don’t have to be inundated with so many information when all they want is, let’s say the rate history for an operation which rate the company has just decided to change. In this connection, the ability for a user to select a date as a parameter during viewing of the data is very integral when designing the UI. A coder can simply throw in a DateTimePicker or Calendar web control. If the user prefers the actual change dates, then a list is a good choice. This approach involves another query but it always ensures new results during every query. It also presents the user with the exact number of changes; something not available with the date time picker / calendar approach.

In the application, the list approach is used as seen in Figure 5. The number of current operation for the style is 3, one is added sometime on 11/01/2009. The rate of one operation also changed on the same date, thus changing the current rate of the style.

Figure 5 – A UI displaying historical changes on data membership

The addition of another operation and the change in rate of another one can be seen in Figure 6. Since only the rate is tracked for the operation, it makes much more sense to simply display the history right away.

igure 6 – A UI displaying historical changes of a single column

Being able to work on temporal database is really a fulfilling experience for a coder, let alone for a novice. A coder feels that his application adds more value to the business because it ensures all the data are there and easy for the viewing. The flipside is, it might not be needed at all and it’s just another unnecessary complexity to an already delayed project. It wouldn’t hurt if the client is asked first because the last thing a coder needs is a flawless feature that no one even dares to use.

Next, I’ll discuss the adjustments, conventions and the cheats I employed.

Friday, October 16, 2009

Logging the Stack Trace

Codes can be downloaded here

Stack trace is very important in locating exceptions because unlike the Message property, it can pinpoint the exact location of the line that throws the exception. Unfortunately, developers seldom log the stack trace for 2 reasons:

They can be very long which in turn requires a large storage capacity
The information don’t help that much because the methods are listed in reverse order.

The size of the stack trace depends on where the method that throws the exception is located in the call stack. Take for example adding of record functionality in this application:

The function has a very shallow stack which is initiated by a button click and terminates in the database. This is illustrated by the call stack diagram below:

Button click in Default.aspx -----calls----> Entity Framework Model -----calls------> Database

If a line in the “button click” method causes an exception, say a database integrity violation, the first method in the stack trace is not not the “button click” but the one that detects the violation which is shown here:

at System.Data.Mapping.Update.Internal.UpdateTranslator.Update(IEntityStateManager stateManager, IEntityAdapter adapter) at System.Data.EntityClient.EntityAdapter.Update(IEntityStateManager entityCache) at System.Data.Objects.ObjectContext.SaveChanges(Boolean acceptChangesDuringSave) at System.Data.Objects.ObjectContext.SaveChanges() at WebApp._Default.btnException_OnClick(Object sender, EventArgs e) in C:\Users\Vanni\Documents\Visual Studio 2008\Projects\StackTraceSplit\WebApp\WebApp\Default.aspx.cs:line 28

For the developer searching for the bug in a production environment, the last method in the trace is the most important because that is where he’s going to start the search. To make his life easy, one can write a extension methods for System.Exception that can reverse the order of the stack trace and retrieve only a portion of it, usually the ones which the developer has access to. Extension methods are user-defined methods that can attached to an object which implementation is off-limit to developers. The method should adhere to the following guidelines in order for it to qualify as an extension method

It has to be static
It has to have a return value
The first parameter should be in the form “this <object to extend> parameter value”

The extension methods for the predicament discussed may look like the ones below. Note that it’s a good practice to specify the class as static because this ensures that all methods should be static too.

The first method splits the entire stack trace into arrays by using the characters “ at “ as the separator and reverses the order. The second is an overload which accepts a number of stack to return, starting from the top. The third method returns only one big chunk of the stack trace and is intended for logging.

When running the following codes,

a portion of the result would look like this:

Now that will definitely make a debugger’s life easy!

Sunday, October 11, 2009

Aliasing a class? Whoa!

I accidentally discovered a feature in C# that I thought only possible with C/C++ – aliasing a class. I’ve been coding .NET for years but never have I encountered examples or extensive discussion on this. I will not be a fan of this because I’m sure this also have the same problems inherent to the C/C++ equivalent. One justification for using this may be for shortening of a class name, but again we don’t have to type who needs to type the entire class name? Don't we have Intellisense to help us with that. Besides, if class has very unwieldy long name, it wouldn’t hurt to contract some words or better, redesign it because it may not be cohesive all along.

Tuesday, November 9, 2010

I've moved

Thursday, February 11, 2010

Filtering EntityDataSource

Tuesday, December 8, 2009

Filtering LinqDataSource

Tuesday, November 17, 2009

Temporal Data: Considerations, Conventions and Cheats

Temporal Granularity

Figure 1 – A function to remove the time portion of a DATETIME

Index

Figure 2 – Table containing temporal data should have unique clustered index on the FK and start date/time

Querying Strategy

Figure 3 – Sample rows showing the current and previous version of a temporal data

Figure 4 – The most correct way of querying temporal data

Figure 5 – A condition used in the application in lieu of BETWEEN

Data Modification Logic

Figure 6 – An stored procedure illustrating the different cases involved in updating a temporal data

Cheats

Figure 7 – A sample parent-temporal child table with “reverse relationship” to speed up temporal query

Sunday, November 1, 2009

Temporal Data

Figure 1 – A table for storing temporal data

Figure 2 – Flexible WHERE statement for a temporal data

Figure 3 – Sample query utilizing equality operator instead of BETWEEN for temporal data

Figure 4 – Column which need to be tracked are usually taken out and made into a separate table

Figure 5 – A UI displaying historical changes on data membership

igure 6 – A UI displaying historical changes of a single column

Friday, October 16, 2009

Logging the Stack Trace

Sunday, October 11, 2009

Aliasing a class? Whoa!