25.2.08

Using Predicate<T> in Your Methods

If you ever use any of the generic List<> functions like Find, RemoveAll, et cetera, you should be familiar with Predicate<T>. In short, it is a delegate that returns true or false, based on whatever criteria is needed. So if you are looking through a List<int> to find an element that is evenly divisible by 7, you can do it with a foreach (or for) loop:
List myList;
int x = -1;
foreach (int i in myList)
{
if (i % 7 == 0)
{
x = i;
}
}
or by using a predicate:
int x = myList.Find(i => i % 7);

Since my example uses a value type, you have to watch out for confusing no match with an actual match. In the foreach, x is initialized to -1. If -1 is a possible match (it isn't, in this case, since -1 % 7 = -1), you won't be able to determine whether or not it worked. You can always add a boolean flag, separate from the int value.

With the Find() method, it returns default(T) if there is no match. For value types, default is 0, so if that is a possibility, you will have problems. There is a FindIndex() method which would work more appropriately in this case.

If you are using reference types, default is null--so you don't have to worry!

Now that I've illustrated Predicates, let's look at how you can use them in your code.
public T FindSomethingOrOther(Predicate<T> match)
This function will find something or other, either in a static property somewhere, or within the instance it exists--either way you want.

To use the predicate, all you have to do is use it like a function:
public T FindSomethingOrOther(Predicate<T> match)
{
foreach (T obj in this.Objects)
{
if (match(obj))
return obj;
}
return null;
}

This is a powerful way to write utility functions that are not limited to one use. The above example is very basic--not really the best example. However, if the method wasn't iterating through a local collection, but through rows in a database, there would be plenty of setup and teardown code. Instead of writing a separate method to match on each column, you could write one method that took a delegate(DataRow).

22.2.08

LINQ performance

Often times, I'm not really on the cutting-edge bandwagon. Actually, it seems like I never am. But I like being the late guy to the party--I show up for the good stuff.

Anyways, I haven't actually used LINQ for anything, but I've seen it around. You know, caught a glimpse here and there, so that I have a vague idea what it's for and what it looks like. Naturally the biggest question I have is, How fast is it?

From what I understand, it is most efficient when used on an IQueryable, but works on anything that's IEnumerable. Okay, fair enough. The C# foreach keyword works on IEnumerables too. Consequently, every time you increment, it checks the bounds, moves next, etc. Compared to a for loop, it's expensive--but not so much so.

I ran across this post from Steve Hebert today, thanks to Google, and it verified much of what I was thinking and fearing about LINQ: it's inefficient.

But let's face it, it's not LINQ's fault, and I think that article (and the comments) make that abundantly clear. It has limitations, and if you don't know about them, and don't take advantage of LINQ when it has the upper hand, you will fail.

Fair enough. You should always know how your underlying data structures and algorithms work, or you'll end up making bad choices, with or without LINQ. A linked list, for example, would make a wonderful priority queue, because removing the first element is a O(1) operation. However, adding would be O(n), since it has to traverse the entire list. If you use a generic List, you can add to the end immediately, but every time you List.Remove() from the front, it re-indexes everything.

The real point I'm trying to make here is that LINQ has it's limitations. It might be very easy to write, but perform very badly if it is unable to take into consideration all of the knowledge that you, as a developer, know about the data structure you are using, and the nature of the data inside it.

On the other hand, there are situations where it will compile into the exact same code, in which case you might actually appreciate the concise and straight-forward syntax that it offers you.

18.2.08

Iteration with Lambda Expressions

I started using lambda expressions recently, and converted all my anonymous delegates. I like the syntax for a lot of things, but not all. Basically, any simple, one or two line predicates or actions are very neatly rewritten into a lambda expression.

But then we hit the question of performance. Delegates and lambda expressions are exactly the same thing in C#. The transformation between the two is ridiculously trivial, so it's not surprising that they are compiled into the same IL.

delegate(int i)
{
DoWork(i);
}
i => DoWork(i)


But let's talk about real-world applications and their performance implications. Let's say I have an array, and I want to iterate through it and call some method with each element. Thanks to C#'s wonderful language constructs, I can think of several ways to accomplish this:

for (int i = 0; i <>
foreach (int i in Program.array)
{
Program.DoWork(i);
}
Array.ForEach(Program.array,
delegate(int i)
{
Program.DoWork(i);
}
);
Array.ForEach(Program.array, x => DoWork(x));

As I already mentioned, the last two are functionally equivalent, but I'll include both just for demonstration and completeness. Here's results for an array of 100,000,000 integers.







MethodTime
for loop0.6318259s
foreach0.9067668s
Array.Foreach (delegate)1.284918s
Array.Foreach (lambda)1.2216206s


As you can see, a simple for loop is the fastest. Foreach is more efficient over arrays than over collections--it does actually use indexing to go through the array rather than an enumerator. Array.Foreach takes twice as long. Most of the slow-down is that it's doing a delegate call.

In conclusion, nothing beats a good ol' for loop. On the other hand, the simplicity and clarity of lambda expressions is similar to what foreach brought us. Foreach always ran slower than a for loop, but removed the need to manage indexes, create a temp variable, etc. I would argue that it reduces errors and makes the code more readable. In a way, lambda expressions are just a continuation of the same.

3.2.08

C# 3.0 - var keyword

There's a new keyword on the block: var. And while many people are praising its wonderous benefits, I don't like it. Let's consider the facts:
  • It can only be used on variables initialized on the same line

  • It is statically typed when initialized, so strong typing is maintained
Visual Studio 2008 is smart enough to show you the inferred type when you go to use a var variable later, which is nice, but it doesn't solve the problem of code readability. There are plenty of articles with good examples of this, so I won't reiterate. However, today I came up with a new scenario that would provide some credibility to the use of var:
for (var i = 0; i < 400; i++)
{
if (i == 0)
Console.Write(i.GetType().Name);
}


Since the var i is in the range of 0 to 399, I hoped the compiler would realize that it could use a short (or even a ushort) when it typed it. It does not.

The reason is pretty simple, really. It's not intelligent. Since I assigned 0, which is by default an Int32, the variable i was statically typed as an Int32. If I change it to read var i = 0u, it will type it as a UInt32.

This isn't really surprising, but it's a little disappointing. My basic idea was that if the size of the loop changed (statically) in the future, I could save the trouble of changing the variable type manually. That is, if the compiler automatically picked the smallest variable type required.

In any case, I dislike the use of var immensely. I understand it is required for LINQ, but in any other case, it looks like pure laziness. One article I read said that typing "var" instead of "Dictionary" was savings. However, any software engineer will tell you the bottleneck is not in the typing, it's in the thinking. If I have to spend more time thinking the next time I look at this code to figure out what my new var type actually is, I've wasted time.