28.9.08

Is C#'s Emitted IL Optimized?

Using .NET's Reflection library to dynamically compile methods at runtime is very powerful. However, I want to know if I have to optimize my opcodes, or if they will be internally simplified if possible. I will test delegates like such:
const int ci = 2*i;
int constDeli () { return ci; }
int mulDeli () { return 2*i; }
int addDeli () { return 21+22+...+2i

For an set of the integer i, I will create a constant delegate that returns the value of 2 times i. I will calculate this while emitting the delegate and inline the value, to avoid any calculations.

I will also create a delegate which actually performs the multiplication of 2 by i and returns the value.

Finally, I will create a delegate which performs i additions of 2, achieving the same value through more operations.

Here's my test code:
class ILOptimizedTest
{
public const int count = 600000000;

public static void RunILOptimizedTest()
{
for (int i = 2; i < 1000; i+=50)
{
Func constDel = GetConstDelegate(i);
Func mulDel = GetMulDelegate(i);
Func addDel = GetAddDelegate(i);

// Run once to get rid of any startup time
double constResult = constDel();
Stopwatch constTime = Stopwatch.StartNew();
for (int c = 0; c < count; c++)
{
constResult = constDel();
}
constTime.Stop();

// Run once to get rid of any startup time
double mulResult = mulDel();
Stopwatch mulTime = Stopwatch.StartNew();
for (int c = 0; c < count; c++)
{
mulResult = mulDel();
}
mulTime.Stop();

// Run once to get rid of any startup time
double addResult = addDel();
Stopwatch addTime = Stopwatch.StartNew();
for (int c = 0; c < count; c++)
{
addResult = addDel();
}
addTime.Stop();

Console.WriteLine("{0}\t{1}\t{2}\t{3}\t{4}\t{5}\t{6}",
i,
constTime.ElapsedMilliseconds,
mulTime.ElapsedMilliseconds,
addTime.ElapsedMilliseconds,
constResult, mulResult, addResult);
}
}

// eval(2*i)
public static Func GetConstDelegate(int i)
{
DynamicMethod m = new DynamicMethod(
"Const" + DateTime.Now.ToString(),
typeof(int),
new Type[] { },
typeof(ILOptimizedTest),
false);
ILGenerator il = m.GetILGenerator();

int total = (int) (2*i);

il.Emit(OpCodes.Ldc_I4, total);
il.Emit(OpCodes.Ret);

Func del = (Func)m.CreateDelegate(
typeof(Func));
return del;
}

// 2*i
public static Func GetMulDelegate(int i)
{
DynamicMethod m = new DynamicMethod(
"Mul" + DateTime.Now.ToString(),
typeof(int),
new Type[] { },
typeof(ILOptimizedTest),
false);
ILGenerator il = m.GetILGenerator();

il.Emit(OpCodes.Ldc_I4, (int)2);
il.Emit(OpCodes.Ldc_I4, i);
il.Emit(OpCodes.Mul);
il.Emit(OpCodes.Ret);

Func del = (Func)m.CreateDelegate(
typeof(Func));
return del;
}

// 2+2+2+2 ... i times
public static Func GetAddDelegate(int i)
{
DynamicMethod m = new DynamicMethod(
"Add" + DateTime.Now.ToString(),
typeof(int),
new Type[] { },
typeof(ILOptimizedTest),
false);
ILGenerator il = m.GetILGenerator();

il.Emit(OpCodes.Ldc_I4, (int)2);
for (int n = i; n > 1; n--)
{
il.Emit(OpCodes.Ldc_I4, (int)2);
il.Emit(OpCodes.Add);
}
il.Emit(OpCodes.Ret);

Func del = (Func)m.CreateDelegate(
typeof(Func));
return del;
}
}

After verifying that all three delegates are returning correct values, I cranked it up and let it go.
























Time to calculate 2*i 600,000,000 times (ms)
iConstMultAdd
2473248054979
52469146895243
102464346705255
152465949405255
202468149465303
252471649905368
302562750685423
352465750085263
402469853165401
452471149855282
502470949755286
552470249605261
602467149755257
652468349905280
702467155625323
752468549785260
802467749725260
852466849595261
902469549645259
952465449605277

The first thing to notice is that these times are a sum of running the delegate 600 million times. So the take home message is..who cares? Closer inspection reveals some problems:

It's obvious that the constant delegate runs faster than the multiplication delegate, which runs faster than the addition delegate. However, it's not clear why. As i increases, the addition delegate should linearly increase due to the increased opcodes—it doesn't. All three are actually pretty constant, which is expected for the first two methods, but not addition.

Overall, I'm not convinced these results show anything. I tried adding a dummy parameter to the delegate, and passing in the loop variable so that the result couldn't be cached, but it had no effect. My only conclusion is that something is being optimized. If it was the IL, then all three of the methods should have taken the same amount of time. Due to the number of iterations performed and the total time taken, I'm pretty confident in the numbers I got—I just don't have a clue what they mean.

The next step is to compile the dynamic method to a DLL, then use Reflector to inspect it to see exactly what is being returned. I could also inspect my loops to see if any of the code there is being optimized away.

So far, results are inconclusive.

Doing Better than Math.Pow()

This is a sequal of sorts to my previous series on expression evaluation and relates to the framework I built there.

The source to C#'s Math.Pow() method isn't publically available, but it probably uses an exponential approximation to calculate powers of numbers, using the following:
a^b = e^(b*ln(a))

This makes it extremely dependable, value-wise and time-wise. A power of .5 takes the same time to calculate as a power of 2, or 2.01, or 53 (This seems to hold for small powers, but not for larger ones. There appears to be a cutoff determined by the combination of a and b, at which point a different method is used, which takes about three times as long).

However, when calculating integer powers, multiplication can also be used, and it is much faster for small integers. Consider the following data:

































Time to calculate 3^i 800,000 times
iControlMath.PowMult
21119019
3816424
4816128
5816533
6816041
7815943
8815948
9915954
10815858
11816565
12815871
13816174
14815979
15815986
16815991
17815895
18815999
198158106
208159113
218159118
228158123
238159126
248158136
259161134
268158145
278159149
288161158
298159158


The first column is the exponent, and other three columns are the total number of milliseconds elapsed while evaluating 3^i 800000 times for each method. The first result is actually a control, where I just call a static method which returns -1. This indicates how long the static method call actually takes, in comparison to the actual calculation.

The second result is the time it took to call Math.Pow(3,i), while the last column is the result of an inline interative multiplication calculation. It's obvious that until the very highest powers tested, inline multiplication is faster--with a power of 2, it's 10x faster. Obviously inline multiplication cannot be used for decimal powers, but it would be great if I could take advantage of this speed for integer powers between 2 and 28 (and -2..-28, by using a^-b = 1/(a^b))

Easy Pickings

public override Node Simplify(Node node)
{
Number x = node.Left.Value as Number;
Number y = node.Right.Value as Number;

if (x != null && y != null)
{
return new Node(new Number(Math.Pow(x.Value, y.Value)));
}

if (y != null)
{
if (y.Value == -1)
{
Node n = new Node(new Division());
n.Left = Number.One;
n.Right = node.Left;
}
if (y.Value == 0)
{
return Number.One;
}
if (y.Value == 1)
{
return node.Left;
}
}
}

After the expression tree is parsed, it goes through a simplification phase, where I iterate through it in postfix order, simplifying each node. This is the SimplifyNode() implementation for a Power Node. As the section heading says, these are the easy pickings. If both the argument and the power (a and b) are numbers, I just evaluate the power and replace this node with the numerical result. I also replace some simple powers with a shorter form. This will help simplify the tree, but won't provide any further benefit.

In-Place Expansion

if (y.Value >= 2 && y.Value <= 28)
{
// Iteratively construct multiplication tree to replace power operation.
Node n = new Node(new Multiplication());
n.Left = node.Left;
n.Right = node.Left;
for (int i = 0; i < y.Value - 2; i++)
{
Node nParent = new Node(new Multiplication());
nParent.Left = n;
nParent.Right = node.Left;

n = nParent;
}
return n;
}

Adding this to the SimplifyNode() method will handle the cases of integer powers greater than 2 and less than 28, which is what the earlier table indicated would benefit. It actually converts a Power node into a Multiplication node with nested Multiplication nodes--in essence, converting x^2 into x*x, and x^3 into x*(x*x), etc. Let's see how this compares when I compile the tree and execute it:































Time to evaluate an x^i tree 800,000 times
iMath.PowMult
223599
3226103
4214116
5229111
6233115
7228155
8279157
9230149
10221133
11211137
12223147
13218186
14236175
15266161
16215197
17255203
18259212
19224177
20218197
21219186
22227193
23234235
24254240
25263254
26266257
27263271
28270266
29261269

Things are good, as expected. However, there is one huge shortcoming: expanding the tree causes repeated calculations:
x^3 = x*x*x
cos(x)^3 = cos(x)*cos(x)*cos(x)
















Time to evaluate an cos(x)^i tree 800,000 times
iMath.PowMult
2302204
3283253
4286303
5286365
6287419
7291482
8289534
9289595
10291645
11283739
12280739
13285798
14285845

The table above makes it clear that actual tree expansion is a terrible way to convert exponents to multiplications. Instead, I want something like this:
double x = Math.Cos(x);
return x*x*x

This is actually achievable, since I don't need to evaluate the expression being raised to a power more than once. However, my current system doesn't support this. The way I compile from an expression tree into IL is based on the CLR stack, so by the time I'm emitting the power opcodes, the arguments are already on the stack. I need to work within these confines to achieve this.

Enter the PowerExp


I created a new operator, the PowerExp. It won't be created automatically by parsing, but rather by the simplification engine.

Since all the operands are loaded to the stack before the operator, I had to remove the ^2 part from the tree. I could have done a conversion like this:
x^2 => Square(x)
x^3 = Cube(x)

That would require quite a few operators, though. Instead, I made one operator that takes the exponent as a Property. Consider these two methods:
private static Node BuildPowerTree(int n)
{
Node powerNode = new Node(new Power());
powerNode.Left = Parameter.X;
powerNode.Right = Number.Node(n);
return powerNode;
}
private static Node BuildPowerExpTree(int n)
{
Node powerExpNode = new Node(new PowerExp(n));
powerExpNode.Left = Parameter.X;
return powerExpNode;
}

In the second method, I pass n into the operator rather than making it the Right node. This allows me to avoid pushing it to the stack. Now I can implement the PowerExp emitter:
// Eval n^power.  n is already on the stack
public override void Emit(System.Reflection.Emit.ILGenerator ilg)
{
ilg.Emit(OpCodes.Stloc_0); // Store n
ilg.Emit(OpCodes.Ldloc_0); // Load n
for (int i = this.Power; i > 1; i--)
{
ilg.Emit(OpCodes.Ldloc_0); // Load n
ilg.Emit(OpCodes.Mul); // Multiply
} // Repeat to achieve power
}

Timings:































Time to evaluate an cos(x)^i tree 800,000 times
iPowerExpMath.PowMult
2183306221
3157301270
4154304333
5160298366
6161298432
7164299500
8165289556
9166289601
10165290658
11170293712
12173292770
13170291829
14178294877
15175283909
16175288952
171842791007
181752851055
191752821105
201812781172
211822811219
221882911280
231862781330
241892811400
251952831446
261962811499
272012851547
281982871596
292002881670

The results basically speak for themselves. In fact, PowerExp is faster than Math.Pow() up to around x^60. That's 59 multiplication opcodes in a row! Plus, this method doesn't clutter my tree with extra nodes. For evaluating f(x)^i...
  1. Basic multiplying decreases performance due to redundant evaluations—never a good idea

  2. Math.Pow is constant across f(x) and i (true for small values)

  3. PowerExp is linear with respect to i, regardless of complexity of f(x)

To summarize, I found a way to greatly improve Power evaluation performance for small integer values. By writing rules into my simplification engine, I can convert certain optimal cases from Power into PowerExp, taking advantage of this speed improvement.

27.9.08

Dynamic C# Compilation

This is a sequal to my previous series on expression evaluation follows up what I talked about there. I recommend reading those articles first.

In my previous post, I determined that the fastest way to repeatedly evaluate a math expression given at runtime was to dynamically compile it, by parsing it into a tree and emitting opcodes. The sample code given in that post is functional, but hardly complete. Having every operation in a giant switch is not practical considering there are over 25 operations that I should be supporting. I need to refactor. I also need a way to enter paramters, so I can graph equations in x and y.

Here's the new version:
public static EvaluatorDelegate GetDynamicMethod(Tree<IToken> e, string[] parameters)
{
DynamicMethod m = new DynamicMethod(
"Evaluation" + DateTime.Now.ToString(),
typeof(double),
new Type[] { typeof(double[]) },
typeof(Evaluator),
false);
ILGenerator il = m.GetILGenerator();

LocalBuilder temp1 = il.DeclareLocal(typeof(double));

foreach (IToken t in e.PostFix)
{
// Try op
IOperator op = t as IOperator;
if (op != null)
{
op.Emit(il);
continue;
}

// Try number
Number n = t as Number;
if (n != null)
{
il.Emit(OpCodes.Ldc_R8, n.Value);
continue;
}

// Try parameter
Parameter p = t as Parameter;
if (p != null)
{
// Check to see if we know which parameter that is.
int pindex = Array.IndexOf<string>(parameters, p.Name);
if (pindex < 0)
{
throw new ApplicationException(
"Did not receive information on parameter: " + p.ToString());
}

il.Emit(OpCodes.Ldarg_0);
il.Emit(OpCodes.Ldc_I4, pindex);
il.Emit(OpCodes.Ldelem_R8);

continue;
}

throw new ApplicationException(
"Invalid parse tree element: " + t.ToString());
}

il.Emit(OpCodes.Ret);

EvaluatorDelegate del = (EvaluatorDelegate)m.CreateDelegate(
typeof(EvaluatorDelegate));
return del;
}


The first thing to notice is that the large operator switch has been replaced with some type switching. I have a base interface IOperator, which takes care of emitting its own opcodes.

Numbers are emitted the same as before—it just pushes them to the stack as a 64-bit floating-point type.

Parameters are handled through an array of doubles. The method now takes a parameter array which is used as a reference. To make this a bit easier to understand, I'll start with how this code is used:
EvaluatorDelegate del = GetDynamicMethod(tree, new string[] { "x", "y" });
double result = del(new double[] { x, y });

The delegate created by this method takes a double array whose elements correspond to the string array passed into GetDynamicMethod(). Looking at the code, when it finds a parameter, it checks the string array for the name of the parameter and saves that index. Then it loads the parameter value from the double array with the following:
il.Emit(OpCodes.Ldarg_0);
il.Emit(OpCodes.Ldc_I4, pindex);
il.Emit(OpCodes.Ldelem_R8);

These opcodes push the double array (which is passed into the delegate as the first method argument) to the stack, followed by the desired index (found by searching the string array). The last opcode pops both of those and replaces them with the desired array element.

This could probably be improved by creating a delegate with an argument for each desired parameter. The DynamicMethod constructor takes a Type[] detailing these, so I could create that from the string array. This would remove the array indexing and reduce the opcodes to a single load—increasing performance. I'll do that later.

That's basically the only changes to this method. The nature of postfix notation—where an operator comes after its arguments, is exactly how .NET's CLR works. Assuming an expression is syntactically correct and parsed to a tree correctly, an operator is guaranteed to have its operands already on the stack by the time Emit() is called. Look at this implementation:
public class Addition : BaseOperator
{
public override void Emit(System.Reflection.Emit.ILGenerator ilg)
{
ilg.Emit(OpCodes.Add);
}
}

Painfully easy. Method calls are also possible:
public class Sine : BaseFunction
{
public override void Emit(System.Reflection.Emit.ILGenerator ilg)
{
ilg.Emit(OpCodes.Call, typeof(System.Math).GetMethod("Sin"));
}
}

Some are not quite as simple:
public class Secant : BaseFunction
{
public override void Emit(System.Reflection.Emit.ILGenerator ilg)
{
ilg.Emit(OpCodes.Call, typeof(System.Math).GetMethod("Cos"));
ilg.Emit(OpCodes.Stloc_0);
ilg.Emit(OpCodes.Ldc_R8, 1.0D);
ilg.Emit(OpCodes.Ldloc_0);
ilg.Emit(OpCodes.Div);
}
}

Secant is defined as 1/cos(x), so I evaluate cosine, then store that in the first local variable (this is the temp1 variable created above). I push 1 on the stack, followed by the temp1 variable, and divide. The Div opcode returns double.NaN when dividing by zero, so no error checking is required.

21.9.08

Translating Manually-Buffered Graphics

This is a sequel of sorts to my previous post on double-buffering and builds on what I did there.

When I was 14 or so, I wrote a game in Visual Basic 3. It was a snake game, very originally named Nybbles, where multiple players steered their hungry, ever-increasing snake around the board to eat targets. Each target caused the snake to increase in size, making it more difficult to move around the board without hitting itself (or the other players!). When I was making this, I ran into a classic redraw problem: it took too long to redraw the entire board.

Perhaps it wasn't the most efficient method, but I stored the snakes' position on the game board in a large 2-dimensional array, where each position had a number, 1-4 for a player, or 0 if the square was blank. I used this for drawing the board and collision detection. Drawing the board was technically simple: iterate through every square and draw the correct colour segment in each one.

Of course, since each snake moved 1 square per turn, I quickly realized I could merely draw the new head segment, and redraw the tail segment as a normal square. This method only required drawing two squares per snake, which was much more responsive.

Fast-forward a decade or so, and I have returned to the same problem. I'm working on my math graphing program, and just added zoom and translation control via the mouse.



The goal: To enable dragging the graph around to view a different part of it.

The problem: Recalculating the points and redrawing the entire screen can potentially be too slow.

The solution: Translate the valid portion of the already-drawn graph, and invalidate and redraw the rest. Since the MouseMove event can be fired for every pixel change if the mouse is moving slow enough, this should provide serious time savings.



Here's a basic look at step 1, which is shown above. This graph was dragged to the down and right, leaving the black area as an invalided region. First I need to translate the graph in the direction of the mouse movement, and second, I can fill in the black area to complete the graph.
protected override void OnMouseMove(MouseEventArgs e)
{
if (this.MouseMode == MouseMode.Move && e.Button == MouseButtons.Left)
{
int xOffset = (e.X - this.startDragLocation.X);
int yOffset = (e.Y - this.startDragLocation.Y);

Rectangle translated = new Rectangle(new Point(xOffset, yOffset), this.Size);
using (BufferedGraphics newBuffer =
this.context.Allocate(this.buffer.Graphics, translated))
{
// Draw the current image translated into the newBuffer,
// then copy it back, but with everything shifted.
this.buffer.Render(newBuffer.Graphics);
newBuffer.Render(this.buffer.Graphics);
}

using (Graphics graphics = this.CreateGraphics())
{
buffer.Render(graphics);

This is (slightly simplified) my code for moving the buffer. I create a new buffer with the offset passed in as the targetRectangle, and when I render the old buffer into it, it draws it with an offset. Then I render that back into the original buffer, overwriting the previous image. After this swap, the original buffer has been translated, so I can discard the new buffer.

This may not be the best or most efficient way to achieve this, but it's the only way I could find. From reading the documentation on MSDN, one would have have a hard time believing this actually works..but it does. I looked into the TranslateTransform of the buffer's graphics, but that didn't yield any results. This appears to be very fast, so until I hear otherwise, I'm keeping it.

Rendering the original (now current) buffer to the screen gives something like the screenshot above, except remnants of the previous screen is visible instead of the black border. This is the next phase, where I draw the invalidated regions.
Region invalidedRegion = new Region(new Rectangle(Point.Empty, this.Size));
invalidedRegion.Exclude(translated);

this.Invalidate(invalidedRegion); <---

The region to invalidate is pretty easy to calculate. I start with the footprint of the control, and exclude the location of the translated image. I planned to use the built-in handling for invalidated regions, but it turned out to be too slow. The issue is simply that it queues up OnPaint() messages to redraw all the rectangles contained in the region--but these will execute after any pending OnMouseMove messages. Rapid movement will cause multiple OnMouseMove events, and the invalid region for each needs to be redrawn in sync with the location of the graph.
RectangleF[] invalidRectangles = invalidedRegion.GetRegionScans(new Matrix());
foreach (RectangleF ir in invalidRectangles)
{
this.OnPaint(new PaintEventArgs(graphics,
new Rectangle((int)ir.X, (int)ir.Y,
(int)ir.Width, (int)ir.Height)));
}

This piece of code replaces the Invalidate() call. It breaks up the region into rectangles (only 1 or 2 rectangles can be created from the moves we are doing), and forces an immediate OnPaint() of that rectangle.

Of course, this entire operation requires me to rewrite the DrawAxis and DrawGraph methods to take a region, otherwise they will redraw everything, which defeats the purpose of these changes. OnPaint takes a PaintEventArgs which contains a clipping rectangle--the same one I constructed above. I will pass that to the Draw methods, so that they draw everything whenever the entire control is invalidated, and they will redraw selectively when dragging the image around.

The result: This is much faster than before. There is still some fine-tuning required, as it slows down a bit if I maximize the window. This causes some glitching at the edges, but overall it looks pretty smooth thanks to the double buffering.

20.9.08

Manual Double-Buffering in C#

I believe double buffering is enabled by default on most .NET controls, which is convenient for the majority of simple graphics. I intend to draw once and use repeatedly, which makes manual buffering a requisite.

This program plots mathematical equations (i.e. y=sin(x)) on the surface of a custom Control. The actual rendering of a plot could take from 14ms to 200ms or even more, depending on the quality and complexity. However, I want to draw tangent lines to follow the curve in real-time--I cannot afford 14ms between MouseMove events.



The process is simple:
  1. Render the graph to a buffer stored in memory

  2. Copy the buffer to the screen

  3. When the mouse is clicked and moved, re-copy the buffer to the screen, then draw a new tangent line


Here's an excerpt from the constructor. Note, this class extends Control.
private BufferedGraphicsContext context;
private BufferedGraphics buffer;

public GraphPanel()
{
this.DoubleBuffered = false;
this.SetStyle(ControlStyles.UserPaint, true);
this.SetStyle(ControlStyles.OptimizedDoubleBuffer, true);
this.SetStyle(ControlStyles.AllPaintingInWmPaint, true);

this.context = new BufferedGraphicsContext();
using (Graphics graphics = this.CreateGraphics())
{
this.buffer = this.context.Allocate(graphics,
new Rectangle(new Point(0, 0), this.Size));
}
}

The first four lines disable automatic double buffering. I found out the hard way that all of these are actually needed.

The context allows me to manage my buffers. I use it to create a new buffer which matches the graphics of the Control, and size it to the control. I can just reuse this buffer every time I recalculate the graph, or when I'm drawing tangent lines on top of it.

Of course, I need to put similar code in the OnResize method, so I can adjust the buffer size when the window is resized.
protected override void OnPaint(PaintEventArgs e)
{
this.buffer.Graphics.FillRectangle(Brushes.White, this.ClientRectangle);

this.DrawAxis();
this.DrawGraph();

this.buffer.Render(e.Graphics);
}

The OnPaint gets fired everytime the window (or any subregion) gets invalidated. Since I set all those styles in the constructor, this is the place for things to get drawn. This event occurs when the program starts up, is resized, comes back from being minimized, or if it is invalidated programatically. It will not be fired if I move the window around, or click on things in it. Consequently, I will do a complete recalculation and drawing of the axis and graph in this method. This is important to keep in mind, since this is the most time-intensive part of the entire process.

The first thing I do is whitewash the buffer. Then I have two methods, DrawAxis and DrawGraph, which do exactly what they say. They are written to use this.buffer as well, so by the time they are finished, I can copy the buffer to the screen using the Graphics object provided in the EventArgs.
protected override void OnMouseMove(MouseEventArgs e)
{
if (this.MouseMode == MouseMode.Tangent && e.Button == MouseButtons.Left)
{
// Restore the clean version of the graph
using (Graphics graphics = this.CreateGraphics())
{
this.buffer.Render(graphics);
}

// Verify mouse is inside control
if (this.ClientRectangle.Contains(e.Location))
{
// Plot a line tangent to the graph at the location of the mouse
this.DrawTangentLine(Graph.ConvertToRealCoords(e.Location));
}
}
}

If the user presses the left button and moves the mouse, I want to draw a tangent line. I have to ensure the left button is pressed, since the same event is also fired if the mouse is moving idly across the screen. The first step is to copy the graph from the buffer onto the screen, to remove any previous tangent line. Next, I verify the mouse is inside the control and draw the tangent. The DrawTangentLine method, unlike the DrawGraph and DrawAxis methods, does not draw to the buffer, but rather to the screen directly.

The screenshot above shows this code in action. Using a buffer like this reduces drawing a tangent to just copying the buffer and drawing a line--of course I have to calculate the tangent, but the majority of the work has been removed.

Generating C# at Runtime: Casting Your Args

I spent the morning (like so many morning before this one), trying to debug some Reflection.Emit code. Naturally, the actual implementation is so much more complicated than this example I whipped up, which explains why it took so long to isolate the issue. Take a look at this sample code:

DynamicMethod m = new DynamicMethod("Demo", typeof(double), 
new Type[] { typeof(double[]) }, typeof(Program), false);
ILGenerator il = m.GetILGenerator();

il.Emit(OpCodes.Ldc_R8, 1.0);
il.Emit(OpCodes.Ldc_R8, 1); // <---
il.Emit(OpCodes.Add);
il.Emit(OpCodes.Ret);

Func del = (Func)m.CreateDelegate(typeof(Func));
Console.WriteLine("Result: " + del.Invoke());


When executed, the dynamically-compiled delegate throws an InvalidProgramException, with the message, "Common Language Runtime detected an invalid program." The problem here is that the literal 1 is not actually a 64-bit floating-point number. This can be quickly rectified by either casting 1 as a double, or using type specifiers such as 1.0 or 1D.

That's basically the moral of this story. Perhaps Emit is at fault, in that it doesn't check or typecast arguments for you, but until that gets fixed, if ever, make sure you type your arguments correctly!

I was further surprised while debugging this issue, when it allowed to me to change the Ldc_R8 to Ldc_I4, and keep the value of 1. According to the documentation on .Add, adding floats and integers is not allowed and SHOULD throw an exception. However, it does not.

il.Emit(OpCodes.Ldc_I4, 1);
il.Emit(OpCodes.Ldc_R8, 1D);


Perhaps there is another explaination for this, or perhaps the documentation is just plain incorrect.