17.12.07

Trees vs Arrays: Expression Evaluation in C#

Evaluating an expression repeatedly has much different requirements than if it was meant to be evaluated once. The overhead effort to parse an expression string into a more accessible format (postfix!) becomes negligent if the evaluation is repeated often enough.

For this test, I created a simple Tree data structure. It has a postfix iterator, which is what I use when evaluating it. I created two postfix evaluators, which use a single stack and support the four arithmetic operators. I'll assume all checks are performed in the tokenizer, so it assumes everything is one of those operators, or a long.
public static long Evaluate(Tree e)
{
Stack stack = new Stack(16);
foreach (string t in e.PostFix)
{
if (t == "+")
stack.Push(stack.Pop() + stack.Pop());
else if (t == "-")
stack.Push(stack.Pop() - stack.Pop());
else if (t == "*")
stack.Push(stack.Pop() * stack.Pop());
else if (t == "/")
stack.Push(stack.Pop() / stack.Pop());
else
stack.Push(Convert.ToInt64(t));
}

return stack.Pop();
}

I magically instantiated and filled it, because a tokenizer is going to come later. The Array version was created by iterating through the Tree once and inserting into a list, then ToArray[]'ing that.

I evaluated a simple expression (2*3+1) 10E4 times with both the Tree Evaluator and the Array Evaluator. On average, the Tree version runs at about 230% of the Array time. This is completely unsurprising, but at least it gives me a benchmark to start with.

The Postfix iterator is obviously the bottleneck, since it's code that has been factored out in the Array Evaluator:

private IEnumerable<T> PostFixIterator(Node<T> root)
{
if (root.Left != null)
{
foreach (T n in PostFixIterator(root.Left))
{
yield return n;
}
}
if (root.Right != null)
{
foreach (T n in PostFixIterator(root.Right))
{
yield return n;
}
}

yield return root.Value;
}

It's recursive, so it's incrementing the stack a handful of times even for only 5 nodes.

Now that I've taken care of the obvious implementations, there are a few others I want to try. We'll see if the simplicity of arrays holds up. Of course, that simplicity comes at the price of complexity later when adding more operators, and especially functions. But is convenience worth doubling the time?

No comments: