Microsoft has introduced a new set of libraries, diagnostic tools and  runtime in .NET 4.0 to enhance support for parallel computing. The main objective of these features is to simplify parallel development, i.e., writing parallel code in a natural idiom without having to work directly with threads. Microsoft has been working on ways to improve the performance of parallel applications in .NET 4.5, specifically those using the Task Parallel Library. Here is a preview of what you can expect to see:



Task, Task<TResult>
At the core of .NET’s parallel programming APIs is the Task object. With such an important class Microsoft took great pains to ensure it is as small as possible. Most of the properties for Task are stored not in the class itself, but rather a secondary object called ContingentProperties. This secondary object is created on an as-needed basis, thus reducing the memory footprint for the most common scenarios.

When .NET 4.0 was released the most common scenario was fork-join style programming such as seen with Parallel.ForEach and Parallel LINQ. With .NET 4.5 and the introduction of async, continuation style programming takes the forefront. Microsoft is so confident that this will be the predominate style that they are moving ContinuationObject into Task and the other fields into ContingentProperties. The end result is faster continuations and a smaller Task object.

The net result was a 49 to 55% reduction in the time it takes to create a Task<Int32> and a 52% reduction in size.


Task.WaitAll, Task.WaitAny
Imagine waiting for 100,000 tasks at the same time. On an x64 machine that would introduce 12,000,000 bytes of overhead above and beyond the size of the tasks themselves. With .NET 4.5 that overhead has dropped to a mere 64 bytes. WaitAny likewise dropped from 23,200,000 bytes of overhead to 152 bytes.

This dramatic change came about due to a change in how kernel synchronization primitives are used. In previous versions one primitive was needed per task. This has been reduced to one per wait operation, regardless of the number of tasks involved.

ConcurrentDictionary

In .NET only reference types and small value types can be assigned atomically. Larger value types such as Guid require are not read and written atomically. To work around this in .NET 4.0, the node objects used by the ConcurrentDictionary are recreated each time the value associated with a key is changed. In .NET 4.5 new nodes are only created if the values cannot be atomically written.To Improve Performance, Reduce Memory Allocations.

One way to reduce memory usage is to avoid using closures. Rather than capturing a local variable inside an anonymous function, one can pass in that information to the Task’s constructor as its “state object”. Starting with .NET 4.5, Task.ContinueWith will also support state objects.

Another technique to reduce memory usage is to cache common used tasks. For example, consider a function that accepts an array and returns a Task<int>. Since the result for the empty array case will always be the same, it would make sense to cache the Task representing the empty array.

The next tip is to avoid unnecessarily “inflating” tasks. A task is inflated when something triggers the creation of its ContingentProperties object. The most common causes for this are:

  • The Task is created with a CancellationToken
  • The Task is created from a non-default ExecutionContext
  • The Task is participating in “structured parallelism” as a parent Task
  • The Task ends in the Faulted state
  • The Task is waited on via ((IAsyncResult)Task).AsyncWaitHandle.Wait()

It should be noted that task inflation isn’t necessarily a bad thing. Rather, it is something to be aware of so that one doesn’t do unnecessary things such as pass in a CancellationToken that isn’t ever used.