Sean Blakemore's Blog

Like trying to fit a square peg in a round hole

5. April 2012 16:20
by Sean
0 Comments

Doing indecent things with Mono.Cecil - An alternative to ILMerge for WPF

5. April 2012 16:20 by Sean | 0 Comments

Background

ILMerge is a great tool for merging .NET assemblies, when we want to deploy an application as a single executable file we can merge in the dependencies and we’re good to go.

Unfortunately ILMerge doesn’t work on WPF applications because according to the ILMerge page on Microsoft Research:

They contain resources with encoded assembly identities. ILMerge is unable to deserialize the resources, modify the assembly identities, and then re-serialize them.

The Fix

There has been an alternative floating around for some time which can be seen here, but in short you can add the DLLs your application depends upon as an embedded resource and then load them in the AssemblyResolve event for the AppDomain. Here is the code reproduced, it’s pretty short and to the point:

AppDomain.CurrentDomain.AssemblyResolve += (sender, args) => 
{
   String resourceName = "AssemblyLoadingAndReflection." +
      new AssemblyName(args.Name).Name + ".dll";

   using (var stream = Assembly.GetExecutingAssembly()
                          .GetManifestResourceStream(resourceName)) 
   {
      Byte[] assemblyData = new Byte[stream.Length];
      stream.Read(assemblyData, 0, assemblyData.Length);
      return Assembly.Load(assemblyData);
   }
};

The End?

Great, so why am I writing another blog post about this? Well there are a couple of problems with this solution on it’s own.

  1. Adding these assemblies to your Visual Studio solution and setting them to Embedded Resource feels pretty messy and on a large project you can easily build up quite a few referenced DLLs. This is even worse when trying to embed libraries which are a project in the same solution.
  2. You can’t easily add the AssemblyResolve hook in your WPF application early enough, before it tries to load your referenced libraries. You have to shim the WPF App class with another class that has a Main method, make this the program startup object, then from there you can add the hook and run the WPF App.Main method. Obviously not ideal.
  3. If you’re distributing a library rather than an application, and have other libraries merged in as resources, there isn’t anywhere you can properly attach the hook and have everything work as expected.

The Redux

There is a little-known feature of the CLR called a Module Initializer which is simply a global function which runs when a managed module first loads and is guaranteed to run before any other code. This looks like the perfect place to hook the AssemblyResolve method, it would allow us to use this technique in libraries and it would also remove the need to shim the WPF App class.

Wait a second, that sounds really neat, how come I’ve never heard of this before? Simple, although they’re a feature of the CLR they are not exposed in C#. There is no way to create them! We can still use them though, by injecting the IL for the implementation using Mono.Cecli. There is a great post about how to do that here.

Things get slightly hairy when you start talking about rewriting assemblies and injecting IL but it really isn’t too bad and Mono.Cecil makes things easy.

But just hang on one second, if we’re doing this post-build and modifying the assembly, why can’t we also inject the implementation for the AssemblyResolve event, not just the hook, and for that matter why don’t we embed the assemblies at this point also. If we could do all this, surely we could take a pre-built application which has no special code to enable any of this, like when using ILMerge, and then inject the required code, embed the assemblies and everything should just work!

The Code

At this point I think I’m just going to dump the code on you, there isn’t too much of it but it does deal with some complex things so if you want to dig into it you can. All up there is about 150 lines of code in total, which is pretty nuts if you think about what it’s actually doing…

First up we have a console application with the following Main method:

static void Main(string[] args)
{
    var path = args[0];
    var dlls = Directory.GetFiles(Path.GetDirectoryName(path), "*.dll");
    var targetAssembly = AssemblyDefinition.ReadAssembly(path);

    var packer = new Packer(targetAssembly);
    packer.Embed(dlls);
    packer.Inject();

    targetAssembly.Write(path);
}

So we expect to be passed the path to an executable and we assume that we should merge all of the DLL files in the same directory as the executable. Then we use the nifty Packer class to embed the DLLs and inject the required IL.

Take a deep breath because here is the Packer class in full:

internal class Packer
{
    private readonly AssemblyDefinition assembly;

    public Packer(AssemblyDefinition assembly)
    {
        this.assembly = assembly;
    }

    public void Embed(IEnumerable<string> files)
    {
        foreach (var file in files)
        {
            var data = File.ReadAllBytes(file);
            var resourceName = string.Format("{0}.{1}", assembly.Name.Name, Path.GetFileName(file));
            var resource = new EmbeddedResource(resourceName, ManifestResourceAttributes.Private, data);
            assembly.MainModule.Resources.Add(resource);
        }
    }

    public void Inject()
    {
        var assemblyResolve = DefineOnAssemblyResolveMethod();
        var ctor = DefineModuleCtor(assemblyResolve);

        var moduleType = assembly.MainModule.Types.Single(x => x.Name == "<Module>");
        moduleType.Methods.Add(assemblyResolve);
        moduleType.Methods.Add(ctor);
    }

    private MethodDefinition DefineOnAssemblyResolveMethod()
    {
        var method = new MethodDefinition("OnAssemblyResolve",
            MethodAttributes.Private |
            MethodAttributes.HideBySig |
            MethodAttributes.Static,
            ImportType<Assembly>());
        method.Parameters.Add(new ParameterDefinition(ImportType<object>()));
        method.Parameters.Add(new ParameterDefinition(ImportType<ResolveEventArgs>()));

        method.Body.Variables.Add(new VariableDefinition(ImportType<Assembly>()));
        method.Body.Variables.Add(new VariableDefinition(ImportType<string>()));
        method.Body.Variables.Add(new VariableDefinition(ImportType<Stream>()));
        method.Body.Variables.Add(new VariableDefinition(ImportType<byte[]>()));
        method.Body.InitLocals = true;

        var il = method.Body.GetILProcessor();
        il.Emit(OpCodes.Call, ImportMethod<Assembly>("GetEntryAssembly"));
        il.Emit(OpCodes.Stloc_0);
        il.Emit(OpCodes.Ldstr, "{0}.{1}.dll");
        il.Emit(OpCodes.Ldloc_0);
        il.Emit(OpCodes.Callvirt, ImportMethod<Assembly>("GetName", new Type[0]));
        il.Emit(OpCodes.Callvirt, ImportMethod<AssemblyName>("get_Name"));
        il.Emit(OpCodes.Ldarg_1);
        il.Emit(OpCodes.Callvirt, ImportMethod<ResolveEventArgs>("get_Name"));
        il.Emit(OpCodes.Newobj, ImportCtor<AssemblyName>(typeof(string)));
        il.Emit(OpCodes.Call, ImportMethod<AssemblyName>("get_Name"));
        il.Emit(OpCodes.Call, ImportMethod<string>("Format", typeof(string), typeof(object), typeof(object)));
        il.Emit(OpCodes.Stloc_1);
        il.Emit(OpCodes.Ldloc_0);
        il.Emit(OpCodes.Ldloc_1);
        il.Emit(OpCodes.Callvirt, ImportMethod<Assembly>("GetManifestResourceStream", typeof(string)));
        il.Emit(OpCodes.Stloc_2);
        il.Emit(OpCodes.Ldloc_2);
        il.Emit(OpCodes.Brfalse_S, il.Create(OpCodes.Ldnull));
        il.Emit(OpCodes.Ldloc_2);
        il.Emit(OpCodes.Callvirt, ImportMethod<Stream>("get_Length"));
        il.Emit(OpCodes.Conv_Ovf_I);
        il.Emit(OpCodes.Newarr, ImportType<byte>());
        il.Emit(OpCodes.Stloc_3);
        il.Emit(OpCodes.Ldloc_2);
        il.Emit(OpCodes.Ldloc_3);
        il.Emit(OpCodes.Ldc_I4_0);
        il.Emit(OpCodes.Ldloc_3);
        il.Emit(OpCodes.Ldlen);
        il.Emit(OpCodes.Conv_I4);
        il.Emit(OpCodes.Callvirt, ImportMethod<Stream>("Read", typeof(byte[]), typeof(int), typeof(int)));
        il.Emit(OpCodes.Pop);
        il.Emit(OpCodes.Ldloc_2);
        il.Emit(OpCodes.Callvirt, ImportMethod<Stream>("Dispose"));
        il.Emit(OpCodes.Ldloc_3);
        il.Emit(OpCodes.Call, ImportMethod<Assembly>("Load", typeof(byte[])));
        il.Emit(OpCodes.Ret);
        il.Append(il.Create(OpCodes.Ldnull));
        il.Emit(OpCodes.Ret);

        return method;
    }

    private MethodDefinition DefineModuleCtor(MethodDefinition assemblyResolveMethod)
    {
        var ctor = new MethodDefinition(".cctor",
            MethodAttributes.Static |
            MethodAttributes.SpecialName |
            MethodAttributes.RTSpecialName,
            assembly.MainModule.Import(typeof (void)));

        var il = ctor.Body.GetILProcessor();
        il.Emit(OpCodes.Call, ImportMethod<AppDomain>("get_CurrentDomain"));
        il.Emit(OpCodes.Ldnull);
        il.Emit(OpCodes.Ldftn, assemblyResolveMethod);
        il.Emit(OpCodes.Newobj, ImportCtor<ResolveEventHandler>(typeof(object), typeof(IntPtr)));
        il.Emit(OpCodes.Callvirt, ImportMethod<AppDomain>("add_AssemblyResolve"));
        il.Emit(OpCodes.Ret);

        return ctor;
    }

    private TypeReference ImportType<T>()
    {
        return assembly.MainModule.Import(typeof(T));
    }
    private MethodReference ImportMethod<T>(string methodName)
    {
        return assembly.MainModule.Import(typeof(T).GetMethod(methodName));
    }
    private MethodReference ImportMethod<T>(string methodName, params Type[] types)
    {
        return assembly.MainModule.Import(typeof(T).GetMethod(methodName, types));
    }
    private MethodReference ImportCtor<T>(params Type[] types)
    {
        return assembly.MainModule.Import(typeof(T).GetConstructor(types));
    }
}

If you just let your eyes glaze over the il.Emit calls, this is pretty easy to follow. You can just assume that all the IL is the implementation of the AssemblyResolve event and the hook for it in the Module Initializer. We just add the definition of these methods and add the DLLs as resources and Mono.Cecil does the rest.

The proof is in the pudding

I was able to run the above code against a moderately large WPF application which has been under development for quite some time, and in production for over 6 months, without any thought of merging dependencies. The EXE file got considerably larger. I then deleted all the DLLs from the directory and somewhat to my surprise the application ran without any issues!

The ultimate test of something like this is whether it can merge itself. Obviously the Packer class makes heavy use of Mono.Cecil to get the job done so my console application has a reference to that DLL. Happily I can report that running this 150 lines of code application on itself successfully produces a single EXE file with Mono.Cecil packed inside and everything works as it should.

Conclusion

I was actually playing around with this almost exactly a year ago, when I ran across the code I thought it might be a bit of fun and I should write it up for a blog post. It has no real error handling and has not been thoroughly tested or used in production. As such please consider it a sample and proof of concept only. Don’t blame me if it kills your pet or anything.

Here is the sample:

Pingbacks and trackbacks (1)+

Comments are closed