Sunday, February 8, 2009

IEqualityComparer.Equals() Not Called By Some LINQ Extensions

While I was reviewing the utility program I discussed in my last post, the geek in me suddenly kicked in. It’s trying to find other means of achieving what that program has already achieved. After a few minutes of rummaging the good ole’ Intellisense, it finally found them: Intersect, Except and Union. What I thought was a straightforward refactor turned out to be a one-hour ordeal. Along the way, I learned a fact about some LINQ extensions and a bad practice I’ve been so oblivious after all these years.

The extension methods Intersect, Except and Union are set-based operations that can be used in lieu of subqueries. They might be a little academic but using them makes the queries less complex and easy to read. One notable advantage they have over subqueries is that don’t need lamba’s as you see here:image

But I was surprised by the results. The one at the left was what I was expecting which was the result of the implementation in my last post.

image image

It was obvious that queries involving IEqualityComparer failed. Queries for update versions, new files and files to copy - although not the result I expected - were correct based on the operands used. My investigation zeroed in to nothing else but my implementation of IEqualityComparer. The question is: why are my IEqualityComparers not working in set-based operators?

Just like any coder, the first thing I did was create breakpoints in the Equals logic as shown here:

image

The result of the first run took me aback as much as the result of the query. The Equals methods were never called at all! Could this be a bug? Could it be that those methods are ignoring IEqualityComparer? I ponder on reporting this to MS as a bug but something was telling me I haven’t exhausted all possible causes. That something was that fellow in the corner and his name is GetHashCode().

Shifting the breakpoints to GetHashCode brought me to the next step. They’re indeed the ones called instead of Equals but that still didn’t help me figure out why the logic was failing. Well, I would be damned. It’s failing because I was using the wrong object to generate the hash code. It should be the property used in Equals instead of the object that defines the generic class. Fixing the lines as shown below finally brought end to this ordeal

image

image

So what?

Now I know that some extension methods have a special preference on GetHashCode. GetHashCode in turn gained a new level of respect from me. Call me stupid but I didn’t give a hoot about GetHashCode then. As far as I could remember, the only time I did was when I dissected CSLA. Good thing this glitch happened only to a pathetic program. An hour debug session on production codes is just too much for a simple omission like this.

7 comments:

  1. Thank you very much for your blog post. I'm been pulling my hair out for more than an hour and you completely explained the issue and now I'm good to go. Thanks!

    ReplyDelete
  2. This comment has been removed by the author.

    ReplyDelete
  3. Actually, I'm testing Except using code I found here, http://brendan.enrick.com/blog/linq-your-collections-with-iequalitycomparer-and-lambda-expressions/
    to create inline Lambda's.

    Equals() is apparently called only when two values returned by GetHashCode() return the same value, so apparently Equals() can do a fine tooth comparison.

    For instance, you can have a Book object return GetHashCode(BookName),

    But in Equals(Book a, Book b){ compare author, etc. }

    ReplyDelete
  4. Chuck, it doesn't make sense to me why someone compares different properties or objects in Equals and GetHashCode. This will definitely be confusing to the consumer of the class. For example, in the FileNameEqualityComparer, I can't find any compelling reason why someone should use FileName in Equals and another property in GetHashCode.

    ReplyDelete
  5. Good post.

    How would I implement if I had say N properties that where used to determine equality? Like if the first and last name were the same then return true.

    Could I add the GetHashCodes()? I think I've read somewhere to do a bitwise comparison?


    Thanks for any input,
    Mike D.

    ReplyDelete
  6. Thanks Michael. I haven't implemented GetHashCode on multiple properties because I'm a great fan of surrogate key. I know sooner or later I'll encounter a situation which will compel me otherwise :D
    Bitwise comparison is one way but you can also implement your own algorithm. MS documentation suggested XOR.

    ReplyDelete
  7. Nice one, just what I needed

    ReplyDelete