17.12.07

Anonymous Scope Blocks

I'm picking up the habit of using anonymous scope blocks in C#--that is, braces which don't correspond to anything else:
TreeParse.Node node;
{
node = new Node(new Addition());
node.Left = new Node(new Multiplication());
node.Left.Left = new Node(new Parameter("x"));
node.Left.Right = new Node(new Number(4));
node.Right = new Node(new Number(0));
}

TreeParse.Node actual;
TreeParse.Node expected = new Node(new Multiplication());
{
expected.Left = new Node(new Parameter("x"));
expected.Right = new Node(new Number(4));

}

For this unit test, I instantiate a few trees. The nodes of the trees are instantiated and pieced together inside scope blocks. This doesn't actually affect the execution, but it keeps it cleaner and very organized.

I checked the IL and there isn't much difference. Just two nops.

Of course, you can also have actual scope benefits along with organization. Variables inside { } will be local to that block, so multiple temp variables can be created with the same name, as long as they are in different blocks.

Trees vs Arrays: Expression Evaluation in C# Part 3

Alright, I'm back with the latest news. I implemented the IL emitter, which can be either painfully slow, or jaw-droppingly-ridiculously fast, depending on how it's called.

The code is pretty simple, too:
public static EvaluatorDelegate GetDynamicMethod(Tree e)
{
DynamicMethod m = new DynamicMethod("Evaluation" + DateTime.Now.ToString(), typeof(long), new Type[] { }, typeof(Evaluator), false);

ILGenerator il = m.GetILGenerator();

foreach (string t in e.PostFix)
{
if (t == "+")
{
il.Emit(OpCodes.Add);
}
else if (t == "-")
{
il.Emit(OpCodes.Sub);
}
else if (t == "*")
{
il.Emit(OpCodes.Mul);
}
else if (t == "/")
{
il.Emit(OpCodes.Div);
}
else
{
long a = Convert.ToInt64(t);
il.Emit(OpCodes.Ldc_I8, a);
}
}

il.Emit(OpCodes.Ret);

EvaluatorDelegate del = (EvaluatorDelegate)m.CreateDelegate(typeof(EvaluatorDelegate));
return del;
}
The first thing you should notice is that there are no stacks. The CLR runtime uses an execution stack, so it's all being handled internally. It loads numbers as they appear to the stack, and then operates on them. At least this implementation, if not any of the others, should make it obvious that reverse polish notation is where it's at. Evaluation is about as simple as it gets.

It will certainly get more complicated to code once I add in some function calls, but it will still be much faster than any of the previous methods.

Getting back to performance... The first time I ran it, it ran very slowly, about as fast as the Tree Evaluator. I was justifiably saddened. But I had used Invoke() to start it every time, which is what was killing performance. From what I've read, Invoke is about 100x slower than a direct call, while invoking a delegate is only a few times slower.

I rewrote it with a delegate, and it trompled the competition:

Array Evaluation: 999ms
Tree Evaluation: 2433ms
Delegate Evaluation: 47ms
Dynamic Method Delegate Evaluation: 15ms

That's 1.5% of the original time required for that evaluation. I can calculate 66x more values now! Yessss...

But realistically, is being 3x faster worth emitting my own OpCodes? It's a tough call. In this instance, no. But you add a few trig functions in there, parameter substitution, and twice as many evaluations, and it might be worth more than a few milliseconds.

Trees vs Arrays: Expression Evaluation in C# Part 2

My next idea was to do some sort of precompilation with delegates. Basically the idea is that the Array Evaluator wastes a lot of time in each iteration, comparing the value of each token. The order of tokens never changes. So we can build an anonymous method at runtime which, then execute that ten thousand times.

Something like this..
public long Evaluate()
{
return 2*3+1;
}

It doesn't have any if..else if..else if...etc. It doesn't have any stacks. Sounds good, right? My code to generate it:
public delegate long Del();

public static Del GenerateDelegate(Tree<string> e)
{
Stack<Del> stack = new Stack<Del>(16);
foreach (string t in e.PostFix)
{
if (t == "+")
{
Del b = stack.Pop();
Del a = stack.Pop();
stack.Push(delegate() { return a() + b(); });
}
else if (t == "-")
{
Del b = stack.Pop();
Del a = stack.Pop();
stack.Push(delegate() { return a() - b(); });
}
else if (t == "*")
{
Del b = stack.Pop();
Del a = stack.Pop();
stack.Push(delegate() { return a() * b(); });
}
else if (t == "/")
{
Del b = stack.Pop();
Del a = stack.Pop();
stack.Push(delegate() { return a() / b(); });
}
else
{
long a = Convert.ToInt64(t);
stack.Push(delegate() { return a; });
}
}

return stack.Pop();
}

The first time I tried this, it failed because I wasn't storing the long a as a separate variable--I was just inlining it into the anonymous method. Turns out that when you run it, it uses the t from outermost scope, which was "+". Hmm...

Anyways, this beauty can be executed with the following code:
TreeParse.Evaluator.Del del = Evaluator.GenerateDelegate(a);
del();

For the timing, I put the GenerateDelegate() outside the timer, and then looped on del(). Actual results:

Array Evaluation: 983ms
Tree Evaluation: 2449ms
Delegate Evaluation: 47ms

WHAT! That's 20x faster than the Array Evaluator. That's some serious improvement. Is that even possible?

I checked with the debugger--it does evaluate each delegate each time it loops, so it doesn't appear to be optimizing the entire thing away. Doubling the number of iterations more than doubles the time, which would not be the case if it was only being executed once.

At first I thought one of the likely causes was tail call optimization. Since I'm on x64, it looks like the delegates should be able to reuse the stack frame as they execute the sub-delegates. Unfortunately, I stepped through it and watched the call stack increase and decrease.

In any case, this is amazingly fast. It should be able to handle any number or order of parameters just as well as the other methods. Adding variables should be pretty easy as well.

Now, the only thing left is to emit IL codes and compile (at runtime) into a actual function dedicated to evaluating the inputed expression. This would be exactly identical to the figure 1 up there, in theory.

Should be the fastest, but how much faster than delegates? The only savings should be a function call. The cost might be higher to compile the IL, however, and let's not forget the cost of figuring out what opcodes I need...

Trees vs Arrays: Expression Evaluation in C#

Evaluating an expression repeatedly has much different requirements than if it was meant to be evaluated once. The overhead effort to parse an expression string into a more accessible format (postfix!) becomes negligent if the evaluation is repeated often enough.

For this test, I created a simple Tree data structure. It has a postfix iterator, which is what I use when evaluating it. I created two postfix evaluators, which use a single stack and support the four arithmetic operators. I'll assume all checks are performed in the tokenizer, so it assumes everything is one of those operators, or a long.
public static long Evaluate(Tree e)
{
Stack stack = new Stack(16);
foreach (string t in e.PostFix)
{
if (t == "+")
stack.Push(stack.Pop() + stack.Pop());
else if (t == "-")
stack.Push(stack.Pop() - stack.Pop());
else if (t == "*")
stack.Push(stack.Pop() * stack.Pop());
else if (t == "/")
stack.Push(stack.Pop() / stack.Pop());
else
stack.Push(Convert.ToInt64(t));
}

return stack.Pop();
}

I magically instantiated and filled it, because a tokenizer is going to come later. The Array version was created by iterating through the Tree once and inserting into a list, then ToArray[]'ing that.

I evaluated a simple expression (2*3+1) 10E4 times with both the Tree Evaluator and the Array Evaluator. On average, the Tree version runs at about 230% of the Array time. This is completely unsurprising, but at least it gives me a benchmark to start with.

The Postfix iterator is obviously the bottleneck, since it's code that has been factored out in the Array Evaluator:

private IEnumerable<T> PostFixIterator(Node<T> root)
{
if (root.Left != null)
{
foreach (T n in PostFixIterator(root.Left))
{
yield return n;
}
}
if (root.Right != null)
{
foreach (T n in PostFixIterator(root.Right))
{
yield return n;
}
}

yield return root.Value;
}

It's recursive, so it's incrementing the stack a handful of times even for only 5 nodes.

Now that I've taken care of the obvious implementations, there are a few others I want to try. We'll see if the simplicity of arrays holds up. Of course, that simplicity comes at the price of complexity later when adding more operators, and especially functions. But is convenience worth doubling the time?