Saturday, November 29, 2008

IronPython and Parallel Imports

In the latest release of Resolver One we improved startup time by about 20-30%. Most of the startup time in Resolver One is spent in importing Python modules, which is considerably more expensive in IronPython 1 than it is in CPython (even when compiled in binary form). We achieved most of our speedup by delaying the creation of certain objects until they are needed (if ever), and other standard techniques for performance improvement.

There is good news for IronPython 2. Binary compilation is much more efficient as we can compile multiple packages into a single binary and then ngen (pre-JIT), making imports faster.

Whilst exploring how to improve our startup time Kamil Dworakowski experimented with a system for performing parallel imports on multiple threads.
IronPython doesn't have the same import lock as CPython, so you need to ensure that parallel imports don't pull in the same modules simultaneously or some of the imports may fail. To get round this Kamil created a modified modulefinder that analyses a codebase and generates a dependency graph for all the imports (done at 'compile time' not runtime). So long as you start at the leaves of the graph you know you are safe. You can then configure how many threads the parallel importer should use in its threadpool to do the imports.

On a multi-core machine this yields substantial improvements for import time. Unfortuately there are a few thread safe issues for importing in IronPython 1, leading to intermittent crashes when starting Resolver One. We weren't able to use Kamil's code for Resolver One 1.3, but the specific problems we encountered are fixed in IronPython 2 so we might be able to use it in our next release...

Even though we haven't yet been able to take advantage of Kamil's work, others have. Dan Eloff, who is building a Silverlight gaming platform with IronPython and C#, has used it. He had this to say on the IronPython mailing list:

Thanks to the excellent work of Kamil and the help of Jimmy I now have importing being done in parallel in 4 threads in Silverlight.

I had to restructure my code a little to reduce circular imports (not
a bad thing to do anyway) and I had to hand-tweak the dependency graph to deal with conditional imports and other difficult to track

The result was better than I could have hoped for. I'm seeing 43%
faster loading on my dual-core development pc. I used to have blank screen at start that made you wonder if the browser has frozen. But now the browser is responsive and I have a nice working progress bar. But that's not the best news.

The kicker is that the benefit is even larger on slow single core
machines with dialup internet. I posted a while back on this list that
IE does not immediately send an asynchronous request. It waits until its UI thread is idle (which doesn't happen if your importing modules like crazy on it.) So the net effect was the application loads (about 28 seconds on a P4 2ghz) and then it sends the request, waits for the response (which is big in my case), and processes it - all in order.

So if that takes another 30 seconds, you have nearly 1 minute of load time. With the UI thread mostly idle now, this happens in parallel to the importing, and the overall load time drops by a whopping 45%.

In the words of my generation - w00t!

No comments:

Post a Comment

Note: only a member of this blog may post a comment.