27.9.09

C# Map function for Generic Lists

I whipped up a quick map function for applying a modifying function to all items in a list, and potentially changing the type of each item:
public static class ListMap
{
public static List Map(List list, Func mapFunction)
{
if (list == null)
{
throw new ArgumentNullException("list");
}

if (mapFunction == null)
{
throw new ArgumentNullException("mapFunction");
}

List outputList = new List();
list.ForEach(t => outputList.Add(mapFunction(t)));
return outputList;
}
}

Usage:
List stringList = new List(new string[] { "1", "2", "3" });
List result = ListMap.Map(stringList, x => int.Parse(x));

This results in a List containing {1, 2, 3}.

12.5.09

Using CTEs to Flatten a Hierarchy

In working with the open source software TestLink, I ran across a problem that I couldn't solve with normal SQL—so I turned to Common Table Expressions.

Common Table Expressions (CTEs) are a feature in SQL Server 2005 and beyond which allows for recursive queries to be done without manually maintaining a temp table and the associated complexities. There are plenty of other sources for information about them on the Internet, so I will get right down to the problem.

TestLink is a test-case management platform, which stores Test Cases. Test Cases exist inside Test Suites, which can be nested within each other, and Test Suites ultimately exist inside a Test Project. This is all stored in a single parent/child table which looks like this:

node_hierarchy:
   [id]
   [name]
   [parent_id]
   [node_type_id]
   [node_order]

Test Case nodes have a node_type_id of 3, Test Suites have a node_type_id of 2, and Test Projects have a node_type_id of 1.

My problem was to get a list of all Test Cases and their Test Projects. It's easy to get a list of all the Test Projects, and equally easy to get a list of all of the Test Cases, but the tricky part is traversing the parent_id trail to figure out what the top-level parent Test Project is for each Test Case—since there can be any number of nested Test Suites between the two.

In comes the CTE:
WITH hierarchy (
id, parent_id,
[name], hierarchy_name,
testcase_id, testcase_name, node_type_id
) AS
(
SELECT
id,
parent_id,
[name],
CAST([name] as varchar(1024)),
id,
[name],
node_type_id
FROM
[dbo].[nodes_hierarchy]
WHERE node_type_id = 3 and id in (8343,8361,8487)
UNION ALL

SELECT
nhtc.id,
nhtc.parent_id,
nhtc.[name],
CAST(h.hierarchy_name + '|' + nhtc.[name] as varchar(1024)),
h.testcase_id,
h.testcase_name,
nt.id
FROM
[dbo].[nodes_hierarchy] nhtc INNER JOIN
hierarchy h ON (h.parent_id = nhtc.id) INNER JOIN
[dbo].[node_types] nt ON (nhtc.node_type_id = nt.id)
)

SELECT *
FROM hierarchy p
order by node_type_id

The first SELECT loads the first level of recursion: the Test Cases. All the test cases (node_type_id = 3) are added. Then the 2nd SELECT recursively joins to the previously added rows in the CTE and matches parent nodes to the test cases, and continues until all the nodes have been matched.

You end up with something like this: (Filtered to only 3 Test Cases)



There are three Test Cases shown, 8343, 8361, and 8487. Then their Test Suites are shown, also with the same Test Case IDs, and finally the Test Project Air Services is listed. It actually appears 3 times, each time with a different testcase_id and testcase_name column. This is perfect, because I can now filter the result set to just show rows with node_type_id = 1, and it will give me all Test Projects and Test Case combinations.

I also threw in the hierarchy_name columns, which does a running concatenation of the node names. It could be useful, but I don't need it for this problem. The key is passing the same testcase_id and testcase_name columns through every recursion.

28.9.08

Is C#'s Emitted IL Optimized?

Using .NET's Reflection library to dynamically compile methods at runtime is very powerful. However, I want to know if I have to optimize my opcodes, or if they will be internally simplified if possible. I will test delegates like such:
const int ci = 2*i;
int constDeli () { return ci; }
int mulDeli () { return 2*i; }
int addDeli () { return 21+22+...+2i

For an set of the integer i, I will create a constant delegate that returns the value of 2 times i. I will calculate this while emitting the delegate and inline the value, to avoid any calculations.

I will also create a delegate which actually performs the multiplication of 2 by i and returns the value.

Finally, I will create a delegate which performs i additions of 2, achieving the same value through more operations.

Here's my test code:
class ILOptimizedTest
{
public const int count = 600000000;

public static void RunILOptimizedTest()
{
for (int i = 2; i < 1000; i+=50)
{
Func constDel = GetConstDelegate(i);
Func mulDel = GetMulDelegate(i);
Func addDel = GetAddDelegate(i);

// Run once to get rid of any startup time
double constResult = constDel();
Stopwatch constTime = Stopwatch.StartNew();
for (int c = 0; c < count; c++)
{
constResult = constDel();
}
constTime.Stop();

// Run once to get rid of any startup time
double mulResult = mulDel();
Stopwatch mulTime = Stopwatch.StartNew();
for (int c = 0; c < count; c++)
{
mulResult = mulDel();
}
mulTime.Stop();

// Run once to get rid of any startup time
double addResult = addDel();
Stopwatch addTime = Stopwatch.StartNew();
for (int c = 0; c < count; c++)
{
addResult = addDel();
}
addTime.Stop();

Console.WriteLine("{0}\t{1}\t{2}\t{3}\t{4}\t{5}\t{6}",
i,
constTime.ElapsedMilliseconds,
mulTime.ElapsedMilliseconds,
addTime.ElapsedMilliseconds,
constResult, mulResult, addResult);
}
}

// eval(2*i)
public static Func GetConstDelegate(int i)
{
DynamicMethod m = new DynamicMethod(
"Const" + DateTime.Now.ToString(),
typeof(int),
new Type[] { },
typeof(ILOptimizedTest),
false);
ILGenerator il = m.GetILGenerator();

int total = (int) (2*i);

il.Emit(OpCodes.Ldc_I4, total);
il.Emit(OpCodes.Ret);

Func del = (Func)m.CreateDelegate(
typeof(Func));
return del;
}

// 2*i
public static Func GetMulDelegate(int i)
{
DynamicMethod m = new DynamicMethod(
"Mul" + DateTime.Now.ToString(),
typeof(int),
new Type[] { },
typeof(ILOptimizedTest),
false);
ILGenerator il = m.GetILGenerator();

il.Emit(OpCodes.Ldc_I4, (int)2);
il.Emit(OpCodes.Ldc_I4, i);
il.Emit(OpCodes.Mul);
il.Emit(OpCodes.Ret);

Func del = (Func)m.CreateDelegate(
typeof(Func));
return del;
}

// 2+2+2+2 ... i times
public static Func GetAddDelegate(int i)
{
DynamicMethod m = new DynamicMethod(
"Add" + DateTime.Now.ToString(),
typeof(int),
new Type[] { },
typeof(ILOptimizedTest),
false);
ILGenerator il = m.GetILGenerator();

il.Emit(OpCodes.Ldc_I4, (int)2);
for (int n = i; n > 1; n--)
{
il.Emit(OpCodes.Ldc_I4, (int)2);
il.Emit(OpCodes.Add);
}
il.Emit(OpCodes.Ret);

Func del = (Func)m.CreateDelegate(
typeof(Func));
return del;
}
}

After verifying that all three delegates are returning correct values, I cranked it up and let it go.
























Time to calculate 2*i 600,000,000 times (ms)
iConstMultAdd
2473248054979
52469146895243
102464346705255
152465949405255
202468149465303
252471649905368
302562750685423
352465750085263
402469853165401
452471149855282
502470949755286
552470249605261
602467149755257
652468349905280
702467155625323
752468549785260
802467749725260
852466849595261
902469549645259
952465449605277

The first thing to notice is that these times are a sum of running the delegate 600 million times. So the take home message is..who cares? Closer inspection reveals some problems:

It's obvious that the constant delegate runs faster than the multiplication delegate, which runs faster than the addition delegate. However, it's not clear why. As i increases, the addition delegate should linearly increase due to the increased opcodes—it doesn't. All three are actually pretty constant, which is expected for the first two methods, but not addition.

Overall, I'm not convinced these results show anything. I tried adding a dummy parameter to the delegate, and passing in the loop variable so that the result couldn't be cached, but it had no effect. My only conclusion is that something is being optimized. If it was the IL, then all three of the methods should have taken the same amount of time. Due to the number of iterations performed and the total time taken, I'm pretty confident in the numbers I got—I just don't have a clue what they mean.

The next step is to compile the dynamic method to a DLL, then use Reflector to inspect it to see exactly what is being returned. I could also inspect my loops to see if any of the code there is being optimized away.

So far, results are inconclusive.

Doing Better than Math.Pow()

This is a sequal of sorts to my previous series on expression evaluation and relates to the framework I built there.

The source to C#'s Math.Pow() method isn't publically available, but it probably uses an exponential approximation to calculate powers of numbers, using the following:
a^b = e^(b*ln(a))

This makes it extremely dependable, value-wise and time-wise. A power of .5 takes the same time to calculate as a power of 2, or 2.01, or 53 (This seems to hold for small powers, but not for larger ones. There appears to be a cutoff determined by the combination of a and b, at which point a different method is used, which takes about three times as long).

However, when calculating integer powers, multiplication can also be used, and it is much faster for small integers. Consider the following data:

































Time to calculate 3^i 800,000 times
iControlMath.PowMult
21119019
3816424
4816128
5816533
6816041
7815943
8815948
9915954
10815858
11816565
12815871
13816174
14815979
15815986
16815991
17815895
18815999
198158106
208159113
218159118
228158123
238159126
248158136
259161134
268158145
278159149
288161158
298159158


The first column is the exponent, and other three columns are the total number of milliseconds elapsed while evaluating 3^i 800000 times for each method. The first result is actually a control, where I just call a static method which returns -1. This indicates how long the static method call actually takes, in comparison to the actual calculation.

The second result is the time it took to call Math.Pow(3,i), while the last column is the result of an inline interative multiplication calculation. It's obvious that until the very highest powers tested, inline multiplication is faster--with a power of 2, it's 10x faster. Obviously inline multiplication cannot be used for decimal powers, but it would be great if I could take advantage of this speed for integer powers between 2 and 28 (and -2..-28, by using a^-b = 1/(a^b))

Easy Pickings

public override Node Simplify(Node node)
{
Number x = node.Left.Value as Number;
Number y = node.Right.Value as Number;

if (x != null && y != null)
{
return new Node(new Number(Math.Pow(x.Value, y.Value)));
}

if (y != null)
{
if (y.Value == -1)
{
Node n = new Node(new Division());
n.Left = Number.One;
n.Right = node.Left;
}
if (y.Value == 0)
{
return Number.One;
}
if (y.Value == 1)
{
return node.Left;
}
}
}

After the expression tree is parsed, it goes through a simplification phase, where I iterate through it in postfix order, simplifying each node. This is the SimplifyNode() implementation for a Power Node. As the section heading says, these are the easy pickings. If both the argument and the power (a and b) are numbers, I just evaluate the power and replace this node with the numerical result. I also replace some simple powers with a shorter form. This will help simplify the tree, but won't provide any further benefit.

In-Place Expansion

if (y.Value >= 2 && y.Value <= 28)
{
// Iteratively construct multiplication tree to replace power operation.
Node n = new Node(new Multiplication());
n.Left = node.Left;
n.Right = node.Left;
for (int i = 0; i < y.Value - 2; i++)
{
Node nParent = new Node(new Multiplication());
nParent.Left = n;
nParent.Right = node.Left;

n = nParent;
}
return n;
}

Adding this to the SimplifyNode() method will handle the cases of integer powers greater than 2 and less than 28, which is what the earlier table indicated would benefit. It actually converts a Power node into a Multiplication node with nested Multiplication nodes--in essence, converting x^2 into x*x, and x^3 into x*(x*x), etc. Let's see how this compares when I compile the tree and execute it:































Time to evaluate an x^i tree 800,000 times
iMath.PowMult
223599
3226103
4214116
5229111
6233115
7228155
8279157
9230149
10221133
11211137
12223147
13218186
14236175
15266161
16215197
17255203
18259212
19224177
20218197
21219186
22227193
23234235
24254240
25263254
26266257
27263271
28270266
29261269

Things are good, as expected. However, there is one huge shortcoming: expanding the tree causes repeated calculations:
x^3 = x*x*x
cos(x)^3 = cos(x)*cos(x)*cos(x)
















Time to evaluate an cos(x)^i tree 800,000 times
iMath.PowMult
2302204
3283253
4286303
5286365
6287419
7291482
8289534
9289595
10291645
11283739
12280739
13285798
14285845

The table above makes it clear that actual tree expansion is a terrible way to convert exponents to multiplications. Instead, I want something like this:
double x = Math.Cos(x);
return x*x*x

This is actually achievable, since I don't need to evaluate the expression being raised to a power more than once. However, my current system doesn't support this. The way I compile from an expression tree into IL is based on the CLR stack, so by the time I'm emitting the power opcodes, the arguments are already on the stack. I need to work within these confines to achieve this.

Enter the PowerExp


I created a new operator, the PowerExp. It won't be created automatically by parsing, but rather by the simplification engine.

Since all the operands are loaded to the stack before the operator, I had to remove the ^2 part from the tree. I could have done a conversion like this:
x^2 => Square(x)
x^3 = Cube(x)

That would require quite a few operators, though. Instead, I made one operator that takes the exponent as a Property. Consider these two methods:
private static Node BuildPowerTree(int n)
{
Node powerNode = new Node(new Power());
powerNode.Left = Parameter.X;
powerNode.Right = Number.Node(n);
return powerNode;
}
private static Node BuildPowerExpTree(int n)
{
Node powerExpNode = new Node(new PowerExp(n));
powerExpNode.Left = Parameter.X;
return powerExpNode;
}

In the second method, I pass n into the operator rather than making it the Right node. This allows me to avoid pushing it to the stack. Now I can implement the PowerExp emitter:
// Eval n^power.  n is already on the stack
public override void Emit(System.Reflection.Emit.ILGenerator ilg)
{
ilg.Emit(OpCodes.Stloc_0); // Store n
ilg.Emit(OpCodes.Ldloc_0); // Load n
for (int i = this.Power; i > 1; i--)
{
ilg.Emit(OpCodes.Ldloc_0); // Load n
ilg.Emit(OpCodes.Mul); // Multiply
} // Repeat to achieve power
}

Timings:































Time to evaluate an cos(x)^i tree 800,000 times
iPowerExpMath.PowMult
2183306221
3157301270
4154304333
5160298366
6161298432
7164299500
8165289556
9166289601
10165290658
11170293712
12173292770
13170291829
14178294877
15175283909
16175288952
171842791007
181752851055
191752821105
201812781172
211822811219
221882911280
231862781330
241892811400
251952831446
261962811499
272012851547
281982871596
292002881670

The results basically speak for themselves. In fact, PowerExp is faster than Math.Pow() up to around x^60. That's 59 multiplication opcodes in a row! Plus, this method doesn't clutter my tree with extra nodes. For evaluating f(x)^i...
  1. Basic multiplying decreases performance due to redundant evaluations—never a good idea

  2. Math.Pow is constant across f(x) and i (true for small values)

  3. PowerExp is linear with respect to i, regardless of complexity of f(x)

To summarize, I found a way to greatly improve Power evaluation performance for small integer values. By writing rules into my simplification engine, I can convert certain optimal cases from Power into PowerExp, taking advantage of this speed improvement.

27.9.08

Dynamic C# Compilation

This is a sequal to my previous series on expression evaluation follows up what I talked about there. I recommend reading those articles first.

In my previous post, I determined that the fastest way to repeatedly evaluate a math expression given at runtime was to dynamically compile it, by parsing it into a tree and emitting opcodes. The sample code given in that post is functional, but hardly complete. Having every operation in a giant switch is not practical considering there are over 25 operations that I should be supporting. I need to refactor. I also need a way to enter paramters, so I can graph equations in x and y.

Here's the new version:
public static EvaluatorDelegate GetDynamicMethod(Tree<IToken> e, string[] parameters)
{
DynamicMethod m = new DynamicMethod(
"Evaluation" + DateTime.Now.ToString(),
typeof(double),
new Type[] { typeof(double[]) },
typeof(Evaluator),
false);
ILGenerator il = m.GetILGenerator();

LocalBuilder temp1 = il.DeclareLocal(typeof(double));

foreach (IToken t in e.PostFix)
{
// Try op
IOperator op = t as IOperator;
if (op != null)
{
op.Emit(il);
continue;
}

// Try number
Number n = t as Number;
if (n != null)
{
il.Emit(OpCodes.Ldc_R8, n.Value);
continue;
}

// Try parameter
Parameter p = t as Parameter;
if (p != null)
{
// Check to see if we know which parameter that is.
int pindex = Array.IndexOf<string>(parameters, p.Name);
if (pindex < 0)
{
throw new ApplicationException(
"Did not receive information on parameter: " + p.ToString());
}

il.Emit(OpCodes.Ldarg_0);
il.Emit(OpCodes.Ldc_I4, pindex);
il.Emit(OpCodes.Ldelem_R8);

continue;
}

throw new ApplicationException(
"Invalid parse tree element: " + t.ToString());
}

il.Emit(OpCodes.Ret);

EvaluatorDelegate del = (EvaluatorDelegate)m.CreateDelegate(
typeof(EvaluatorDelegate));
return del;
}


The first thing to notice is that the large operator switch has been replaced with some type switching. I have a base interface IOperator, which takes care of emitting its own opcodes.

Numbers are emitted the same as before—it just pushes them to the stack as a 64-bit floating-point type.

Parameters are handled through an array of doubles. The method now takes a parameter array which is used as a reference. To make this a bit easier to understand, I'll start with how this code is used:
EvaluatorDelegate del = GetDynamicMethod(tree, new string[] { "x", "y" });
double result = del(new double[] { x, y });

The delegate created by this method takes a double array whose elements correspond to the string array passed into GetDynamicMethod(). Looking at the code, when it finds a parameter, it checks the string array for the name of the parameter and saves that index. Then it loads the parameter value from the double array with the following:
il.Emit(OpCodes.Ldarg_0);
il.Emit(OpCodes.Ldc_I4, pindex);
il.Emit(OpCodes.Ldelem_R8);

These opcodes push the double array (which is passed into the delegate as the first method argument) to the stack, followed by the desired index (found by searching the string array). The last opcode pops both of those and replaces them with the desired array element.

This could probably be improved by creating a delegate with an argument for each desired parameter. The DynamicMethod constructor takes a Type[] detailing these, so I could create that from the string array. This would remove the array indexing and reduce the opcodes to a single load—increasing performance. I'll do that later.

That's basically the only changes to this method. The nature of postfix notation—where an operator comes after its arguments, is exactly how .NET's CLR works. Assuming an expression is syntactically correct and parsed to a tree correctly, an operator is guaranteed to have its operands already on the stack by the time Emit() is called. Look at this implementation:
public class Addition : BaseOperator
{
public override void Emit(System.Reflection.Emit.ILGenerator ilg)
{
ilg.Emit(OpCodes.Add);
}
}

Painfully easy. Method calls are also possible:
public class Sine : BaseFunction
{
public override void Emit(System.Reflection.Emit.ILGenerator ilg)
{
ilg.Emit(OpCodes.Call, typeof(System.Math).GetMethod("Sin"));
}
}

Some are not quite as simple:
public class Secant : BaseFunction
{
public override void Emit(System.Reflection.Emit.ILGenerator ilg)
{
ilg.Emit(OpCodes.Call, typeof(System.Math).GetMethod("Cos"));
ilg.Emit(OpCodes.Stloc_0);
ilg.Emit(OpCodes.Ldc_R8, 1.0D);
ilg.Emit(OpCodes.Ldloc_0);
ilg.Emit(OpCodes.Div);
}
}

Secant is defined as 1/cos(x), so I evaluate cosine, then store that in the first local variable (this is the temp1 variable created above). I push 1 on the stack, followed by the temp1 variable, and divide. The Div opcode returns double.NaN when dividing by zero, so no error checking is required.