This is a sequal of sorts to my previous series on expression evaluation and relates to the framework I built there.The source to C#'s Math.Pow() method isn't publically available, but it probably uses an exponential approximation to calculate powers of numbers, using the following:
a^b = e^(b*ln(a))
This makes it extremely dependable, value-wise and time-wise. A power of .5 takes the same time to calculate as a power of 2, or 2.01, or 53 (This seems to hold for small powers, but not for larger ones. There appears to be a cutoff determined by the combination of a and b, at which point a different method is used, which takes about three times as long).
However, when calculating integer powers, multiplication can also be used, and it is much faster for small integers. Consider the following data:
Time to calculate 3^i 800,000 times
i | Control | Math.Pow | Mult |
---|
2 | 11 | 190 | 19 |
3 | 8 | 164 | 24 |
4 | 8 | 161 | 28 |
5 | 8 | 165 | 33 |
6 | 8 | 160 | 41 |
7 | 8 | 159 | 43 |
8 | 8 | 159 | 48 |
9 | 9 | 159 | 54 |
10 | 8 | 158 | 58 |
11 | 8 | 165 | 65 |
12 | 8 | 158 | 71 |
13 | 8 | 161 | 74 |
14 | 8 | 159 | 79 |
15 | 8 | 159 | 86 |
16 | 8 | 159 | 91 |
17 | 8 | 158 | 95 |
18 | 8 | 159 | 99 |
19 | 8 | 158 | 106 |
20 | 8 | 159 | 113 |
21 | 8 | 159 | 118 |
22 | 8 | 158 | 123 |
23 | 8 | 159 | 126 |
24 | 8 | 158 | 136 |
25 | 9 | 161 | 134 |
26 | 8 | 158 | 145 |
27 | 8 | 159 | 149 |
28 | 8 | 161 | 158 |
29 | 8 | 159 | 158 |
The first column is the exponent, and other three columns are the total number of milliseconds elapsed while evaluating 3^i 800000 times for each method. The first result is actually a control, where I just call a static method which returns -1. This indicates how long the static method call actually takes, in comparison to the actual calculation.
The second result is the time it took to call Math.Pow(3,i), while the last column is the result of an inline interative multiplication calculation. It's obvious that until the very highest powers tested, inline multiplication is faster--with a power of 2, it's 10x faster. Obviously inline multiplication cannot be used for decimal powers, but it would be great if I could take advantage of this speed for integer powers between 2 and 28 (and -2..-28, by using a^-b = 1/(a^b))
Easy Pickings
public override Node Simplify(Node node)
{
Number x = node.Left.Value as Number;
Number y = node.Right.Value as Number;
if (x != null && y != null)
{
return new Node(new Number(Math.Pow(x.Value, y.Value)));
}
if (y != null)
{
if (y.Value == -1)
{
Node n = new Node(new Division());
n.Left = Number.One;
n.Right = node.Left;
}
if (y.Value == 0)
{
return Number.One;
}
if (y.Value == 1)
{
return node.Left;
}
}
}
After the expression tree is parsed, it goes through a simplification phase, where I iterate through it in postfix order, simplifying each node. This is the SimplifyNode() implementation for a Power Node. As the section heading says, these are the easy pickings. If both the argument and the power (a and b) are numbers, I just evaluate the power and replace this node with the numerical result. I also replace some simple powers with a shorter form. This will help simplify the tree, but won't provide any further benefit.
In-Place Expansion
if (y.Value >= 2 && y.Value <= 28)
{
// Iteratively construct multiplication tree to replace power operation.
Node n = new Node(new Multiplication());
n.Left = node.Left;
n.Right = node.Left;
for (int i = 0; i < y.Value - 2; i++)
{
Node nParent = new Node(new Multiplication());
nParent.Left = n;
nParent.Right = node.Left;
n = nParent;
}
return n;
}
Adding this to the SimplifyNode() method will handle the cases of integer powers greater than 2 and less than 28, which is what the earlier table indicated would benefit. It actually converts a Power node into a Multiplication node with nested Multiplication nodes--in essence, converting x^2 into x*x, and x^3 into x*(x*x), etc. Let's see how this compares when I compile the tree and execute it:
Time to evaluate an x^i tree 800,000 times
i | Math.Pow | Mult |
---|
2 | 235 | 99 |
3 | 226 | 103 |
4 | 214 | 116 |
5 | 229 | 111 |
6 | 233 | 115 |
7 | 228 | 155 |
8 | 279 | 157 |
9 | 230 | 149 |
10 | 221 | 133 |
11 | 211 | 137 |
12 | 223 | 147 |
13 | 218 | 186 |
14 | 236 | 175 |
15 | 266 | 161 |
16 | 215 | 197 |
17 | 255 | 203 |
18 | 259 | 212 |
19 | 224 | 177 |
20 | 218 | 197 |
21 | 219 | 186 |
22 | 227 | 193 |
23 | 234 | 235 |
24 | 254 | 240 |
25 | 263 | 254 |
26 | 266 | 257 |
27 | 263 | 271 |
28 | 270 | 266 |
29 | 261 | 269 |
Things are good, as expected. However, there is one huge shortcoming: expanding the tree causes repeated calculations:
x^3 = x*x*x
cos(x)^3 = cos(x)*cos(x)*cos(x)
Time to evaluate an cos(x)^i tree 800,000 times
i | Math.Pow | Mult |
---|
2 | 302 | 204 |
3 | 283 | 253 |
4 | 286 | 303 |
5 | 286 | 365 |
6 | 287 | 419 |
7 | 291 | 482 |
8 | 289 | 534 |
9 | 289 | 595 |
10 | 291 | 645 |
11 | 283 | 739 |
12 | 280 | 739 |
13 | 285 | 798 |
14 | 285 | 845 |
The table above makes it clear that actual tree expansion is a terrible way to convert exponents to multiplications. Instead, I want something like this:
double x = Math.Cos(x);
return x*x*x
This is actually achievable, since I don't need to evaluate the expression being raised to a power more than once. However, my current system doesn't support this. The way I compile from an expression tree into IL is based on the CLR stack, so by the time I'm emitting the power opcodes, the arguments are already on the stack. I need to work within these confines to achieve this.
Enter the PowerExp
I created a new operator, the PowerExp. It won't be created automatically by parsing, but rather by the simplification engine.
Since all the operands are loaded to the stack before the operator, I had to remove the ^2 part from the tree. I could have done a conversion like this:
x^2 => Square(x)
x^3 = Cube(x)
That would require quite a few operators, though. Instead, I made one operator that takes the exponent as a Property. Consider these two methods:
private static Node BuildPowerTree(int n)
{
Node powerNode = new Node(new Power());
powerNode.Left = Parameter.X;
powerNode.Right = Number.Node(n);
return powerNode;
}
private static Node BuildPowerExpTree(int n)
{
Node powerExpNode = new Node(new PowerExp(n));
powerExpNode.Left = Parameter.X;
return powerExpNode;
}
In the second method, I pass n into the operator rather than making it the Right node. This allows me to avoid pushing it to the stack. Now I can implement the PowerExp emitter:
// Eval n^power. n is already on the stack
public override void Emit(System.Reflection.Emit.ILGenerator ilg)
{
ilg.Emit(OpCodes.Stloc_0); // Store n
ilg.Emit(OpCodes.Ldloc_0); // Load n
for (int i = this.Power; i > 1; i--)
{
ilg.Emit(OpCodes.Ldloc_0); // Load n
ilg.Emit(OpCodes.Mul); // Multiply
} // Repeat to achieve power
}
Timings:
Time to evaluate an cos(x)^i tree 800,000 times
i | PowerExp | Math.Pow | Mult |
---|
2 | 183 | 306 | 221 |
3 | 157 | 301 | 270 |
4 | 154 | 304 | 333 |
5 | 160 | 298 | 366 |
6 | 161 | 298 | 432 |
7 | 164 | 299 | 500 |
8 | 165 | 289 | 556 |
9 | 166 | 289 | 601 |
10 | 165 | 290 | 658 |
11 | 170 | 293 | 712 |
12 | 173 | 292 | 770 |
13 | 170 | 291 | 829 |
14 | 178 | 294 | 877 |
15 | 175 | 283 | 909 |
16 | 175 | 288 | 952 |
17 | 184 | 279 | 1007 |
18 | 175 | 285 | 1055 |
19 | 175 | 282 | 1105 |
20 | 181 | 278 | 1172 |
21 | 182 | 281 | 1219 |
22 | 188 | 291 | 1280 |
23 | 186 | 278 | 1330 |
24 | 189 | 281 | 1400 |
25 | 195 | 283 | 1446 |
26 | 196 | 281 | 1499 |
27 | 201 | 285 | 1547 |
28 | 198 | 287 | 1596 |
29 | 200 | 288 | 1670 |
The results basically speak for themselves. In fact, PowerExp is faster than Math.Pow() up to around x^60. That's 59 multiplication opcodes in a row! Plus, this method doesn't clutter my tree with extra nodes. For evaluating f(x)^i...
- Basic multiplying decreases performance due to redundant evaluations—never a good idea
- Math.Pow is constant across f(x) and i (true for small values)
- PowerExp is linear with respect to i, regardless of complexity of f(x)
To summarize, I found a way to greatly improve Power evaluation performance for small integer values. By writing rules into my simplification engine, I can convert certain optimal cases from Power into PowerExp, taking advantage of this speed improvement.