17.12.07

Trees vs Arrays: Expression Evaluation in C# Part 2

My next idea was to do some sort of precompilation with delegates. Basically the idea is that the Array Evaluator wastes a lot of time in each iteration, comparing the value of each token. The order of tokens never changes. So we can build an anonymous method at runtime which, then execute that ten thousand times.

Something like this..
public long Evaluate()
{
return 2*3+1;
}

It doesn't have any if..else if..else if...etc. It doesn't have any stacks. Sounds good, right? My code to generate it:
public delegate long Del();

public static Del GenerateDelegate(Tree<string> e)
{
Stack<Del> stack = new Stack<Del>(16);
foreach (string t in e.PostFix)
{
if (t == "+")
{
Del b = stack.Pop();
Del a = stack.Pop();
stack.Push(delegate() { return a() + b(); });
}
else if (t == "-")
{
Del b = stack.Pop();
Del a = stack.Pop();
stack.Push(delegate() { return a() - b(); });
}
else if (t == "*")
{
Del b = stack.Pop();
Del a = stack.Pop();
stack.Push(delegate() { return a() * b(); });
}
else if (t == "/")
{
Del b = stack.Pop();
Del a = stack.Pop();
stack.Push(delegate() { return a() / b(); });
}
else
{
long a = Convert.ToInt64(t);
stack.Push(delegate() { return a; });
}
}

return stack.Pop();
}

The first time I tried this, it failed because I wasn't storing the long a as a separate variable--I was just inlining it into the anonymous method. Turns out that when you run it, it uses the t from outermost scope, which was "+". Hmm...

Anyways, this beauty can be executed with the following code:
TreeParse.Evaluator.Del del = Evaluator.GenerateDelegate(a);
del();

For the timing, I put the GenerateDelegate() outside the timer, and then looped on del(). Actual results:

Array Evaluation: 983ms
Tree Evaluation: 2449ms
Delegate Evaluation: 47ms

WHAT! That's 20x faster than the Array Evaluator. That's some serious improvement. Is that even possible?

I checked with the debugger--it does evaluate each delegate each time it loops, so it doesn't appear to be optimizing the entire thing away. Doubling the number of iterations more than doubles the time, which would not be the case if it was only being executed once.

At first I thought one of the likely causes was tail call optimization. Since I'm on x64, it looks like the delegates should be able to reuse the stack frame as they execute the sub-delegates. Unfortunately, I stepped through it and watched the call stack increase and decrease.

In any case, this is amazingly fast. It should be able to handle any number or order of parameters just as well as the other methods. Adding variables should be pretty easy as well.

Now, the only thing left is to emit IL codes and compile (at runtime) into a actual function dedicated to evaluating the inputed expression. This would be exactly identical to the figure 1 up there, in theory.

Should be the fastest, but how much faster than delegates? The only savings should be a function call. The cost might be higher to compile the IL, however, and let's not forget the cost of figuring out what opcodes I need...

No comments: