Nullable Nomad: February 2009

Sunday, February 8, 2009

IEqualityComparer.Equals() Not Called By Some LINQ Extensions

While I was reviewing the utility program I discussed in my last post, the geek in me suddenly kicked in. It’s trying to find other means of achieving what that program has already achieved. After a few minutes of rummaging the good ole’ Intellisense, it finally found them: Intersect, Except and Union. What I thought was a straightforward refactor turned out to be a one-hour ordeal. Along the way, I learned a fact about some LINQ extensions and a bad practice I’ve been so oblivious after all these years.

The extension methods Intersect, Except and Union are set-based operations that can be used in lieu of subqueries. They might be a little academic but using them makes the queries less complex and easy to read. One notable advantage they have over subqueries is that don’t need lamba’s as you see here:

But I was surprised by the results. The one at the left was what I was expecting which was the result of the implementation in my last post.

It was obvious that queries involving IEqualityComparer failed. Queries for update versions, new files and files to copy - although not the result I expected - were correct based on the operands used. My investigation zeroed in to nothing else but my implementation of IEqualityComparer. The question is: why are my IEqualityComparers not working in set-based operators?

Just like any coder, the first thing I did was create breakpoints in the Equals logic as shown here:

The result of the first run took me aback as much as the result of the query. The Equals methods were never called at all! Could this be a bug? Could it be that those methods are ignoring IEqualityComparer? I ponder on reporting this to MS as a bug but something was telling me I haven’t exhausted all possible causes. That something was that fellow in the corner and his name is GetHashCode().

Shifting the breakpoints to GetHashCode brought me to the next step. They’re indeed the ones called instead of Equals but that still didn’t help me figure out why the logic was failing. Well, I would be damned. It’s failing because I was using the wrong object to generate the hash code. It should be the property used in Equals instead of the object that defines the generic class. Fixing the lines as shown below finally brought end to this ordeal

So what?

Now I know that some extension methods have a special preference on GetHashCode. GetHashCode in turn gained a new level of respect from me. Call me stupid but I didn’t give a hoot about GetHashCode then. As far as I could remember, the only time I did was when I dissected CSLA. Good thing this glitch happened only to a pathetic program. An hour debug session on production codes is just too much for a simple omission like this.

Saturday, February 7, 2009

Pathetic LINQ Programming

One of the tasks I inherited from a colleague after he left for a vacation is the very mundane (and time-consuming) xcopy of source files from DEV site to QA. That entails backing up the old version of those files before replacing them with the new ones just in case something is screwed up afterwards. It’s a very error-prone process. It’s so easy to miss a file or two if the list is quite long. It’s also possible that I back up the wrong file, thus preventing me from reverting to the previous working state. With my manager’s permission, I can definitely end this inefficiency with SVN or VSS but at the same time I thought it seemed to be a good time to flex those LINQ To Object muscles. So I did and the result is one pathetic utility program.

Steps

The steps were pretty much straightforward. It’s better understood if itemized together with the codes. Here they are

1) Get the list of the files from the source and destination folders.

These lists shall be compared to each other to find out which files are candidate for transfer and backup. IEnumerable<T> is a key component of LINQ To Object. This contains extension methods which can be exploited for set-based operations. The good thing is, DirectoryInfo.GetFiles() returns an array which is convertible to IEnumerable.

2) Get the list of existing files.

The existing files are those common in the source and destination. I should issue a LINQ To Object statement with a restriction clause similar to the “WHERE sourcefileName IN (destinationFileName1, destinationFileName2,…destinationFileNameN)” of SQL. And a WHERE method IEnumerable has.

Unlike SQL which mainly deals with a scalar value, LINQ deals with objects. It has to be specified what to match and how to do the match when comparing. In the case of the file comparison, I interested in matching only the file name at this point. The logic for this is contained in the IEqualityComparer which is required the Where overload I used.

The Equals() member dictates the logic of the compare and specifies which property to compare. This one means that a FileInfo object are equal if their names are the same. Had I omitted the IEqualityComparer in the Where statement above, .NET would have compared the two files using default object reference logic. That’s not what I wanted.

3) Get the list of files which are not updates.

These files could just have been copied accidentally and it was not really the intention to deploy them. The logic is similar to step 2 except that this uses another IEqualityComparer to compare only the LastWriteTime property of the files.

The FileLastWriteTimeEqualityComparer dictates that two files should be treated as equal if their LastWriteTime properties are the same. This is not always reliable. Two totally different files can have exactly the same LastWriteTime. In my codes, a better implementation is something that includes another property, the file name perhaps. But since I’m applying the WHERE restriction to the existing files, I’m assured I get files only from the source folder.

4) Get the list of update files.

For this step, I used the result from step 3 to perform something similar to “WHERE existingFile NOT IN (sameVersionFile1, sameVersionFile2…)” restriction of SQL:

Notice that it uses an overload that does not require an IEqualityComparer. This time, it’s safe to do so because the items in existingFiles and sameVersions come from the same set –file from the source folder. An equality in the object reference also implies equality on the properties of the objects so the FileNameEqualityComparer is redundant.

5) Get the list of new files

New files are those found only in the source. Taking away the existing files from the set found in the source yields this list.

6) Combine new files and update files.

These are the files that shall be copied to the destination. The OR operator below means that the source file should be either new or update to qualify.

7) Get the list of files to be backed up

The list of files to be backed up are files in the destination with the same name with the files in step 6.

8) Backup and copy

Need to say more?

Test

All this pathetic program needs are 3 folders like the ones below

After running the pathetic program, Text1 should be copied to the second folder since it’s new. Text2 should stay where they are. Text3 from source should be copied to Destination since it’s an update. Finally, Text3 from Destination should be copied to the Backup folder. The following debug trace clearly shows these:

That’s it! Pathetic as it may seem, it spares me from staying late in the office. I couldn’t imagine what the code would have looked like without LINQ. I know its possible but it would surely be messy and convoluted at its best.

Monday, February 2, 2009

Simple Database Change Log in SQL Server 2005

As promised in my last post, I’m going to tackle DDL triggers in SQL Server 2005. The only way to do this is through the feature SQL-CLR which was first introduced in the said version of SQL Server. Basically, with this feature you can create database objects using your favorite .NET language. I wrote an article a few months back about one of the objects you can create with SQL-CLR. DDL trigger is not as complicated as the one discussed there as you see momentarily.

The .NET Side

SQL Server 2005 DDL trigger is nothing more than a static void function adorned with Microsoft.SqlServer.Server.SqlTriggerAttribute as shown below

If you’re using Visual Studio Express or above, probably the only assembly you need to import is System.Xml. Everything else comes from the common set of assembly referenced by default. Notice that we left out namespace from our codes. I suggest you do this because namespaces have some weird effects to SQL Server when you catalogue your SQL-CLR objects. You may try otherwise but you’ve been warned.

The next step is optional. It merely checks whether or not your function is executed by SQL Server. It checks if the SQL Server context is present. If not, we should exit or else subsequent calls to context-sensitive objects would fail. Context-sensitive objects have values only when the function is running inside SQL Server. They can produce exception or behave unexpectedly when called outside SQL Server. If your 100% sure your functions will not be executed by consumers other than SQL Server,then you can skip this line.

Passing beyond this line guarantees that we are indeed inside SQL Server. Since we’re going to insert a record, we need a connection. Instead of creating a new one, we can use the connection that executes the function. We do this by supplying the argument “context connection = true” in one of the SqlConnection constructor overloads. To guarantee automatic closing and disposal, we use the using construct

Inside this block, we insert the records using a SqlCommand. It’s fine to use inline statement in this case because the codes are never run outside SQL Server and therefore SQL injection is not possible.

The schema of the ChangeLog table can be found in the last post.

The to insert comes from another context object SqlTriggerContext. This object is member of SqlContext and is only meaningful inside a trigger. It has a member EventData which is a SqlXml containing additional info about the event and object that fires the trigger. This acts as your EventArgs. This is exactly the same EventData I discussed in the last post so there’s no need to elaborate.

The final task in the .NET side is to execute the command. The values for the parameters are taken from the elements of the XML document. Navigating to those elements require XPath which like navigating to a folder. You can optionally raise an error for no insertion although this is very rare.

Build your project and copy your assembly to a shallow location. You may close your Visual Studio now; we’re done with the .NET side.

The SQL Server Side

The first step in the SQL Server side is to catalogue the assembly. This can be achieve by the CREATE ASSEMBLY command as shown:

You can catalogue the assembly with a different name but doing so might create confusion later on.

The next step is to catalogue the trigger. The syntax is very similar to that of the T-SQ version illustrated in the last post except for the body which follows after AS.

If you change something in the function other than renaming it, you can use ALTER ASSEMBLY command. This already takes care of the cataloguing of the trigger.

You’re trigger is now ready for reporting. You may issue a statement similar to the following and should get similar results:

You may catalogue your trigger in the model databases if you want this simple logging mechanism to be present in all your future databases.