Monday, October 5, 2009

Easy Working with Threads

.NET 4.0 introduces the new Parallelism namespace which will simplify working with threads. So, as long as we are just .NET 3.5 coders we have to wait and keep our hands of these complicated multi-threading, don't we? We don't! I'll show you some approaches how to work clean and simple in a multi-threaded environment.

If you just need to start threads which have to do some work without returning any information back to the caller you are usually out of any problems. Start your threads and let them go.

Unfortunately, most times this is not the way how to work with threads usually you have an application thread which dispatches a set of tasks but needs the result of those tasks to work with. We'll simulate this with a "Task" class which will be provided to the worker methods within the following samples.

Our Task Class

In the following samples we will work with a "Task" class which will be provided to our threads to simulate a asynchronous work. Here is the definition of this class.
// represents a task which shall be handled by 
// asynchronous working thread
class Task
{
   public int Id { get; set; }
   public int ReturnValue { get; set; }
}

Classic Thread Approach

Let's start with the old-school solution, which uses lock to ensure thread safe working.
// monitor the count of working threads
static int _workingTasks;

static void Main(string[] args)
{
   // count of tasks to be simulated
   int taskCount = 10;
   // hold the dispated tasks to work with the results
   List<task> tasks = new List<task>();

   // first set the complete count of working tasks
   _workingTasks = taskCount;

   for (int i = 0; i < taskCount; i++)
   {
      // create a new thread
      Thread thread = new Thread(new ParameterizedThreadStart(DoWork));

      // create and remember the task
      Task task = new Task { Id = i };
      tasks.Add(task);
      // start the thread
      thread.Start(task);
   }

   while (_workingTasks != 0)
   {
      // wait until all tasks have been done
      Thread.Sleep(1);
   }

   // show the return values after all threads finished
   tasks.ForEach(t => 
      Console.WriteLine("Thread {0} returned: {1}", t.Id, t.ReturnValue));

   Console.ReadKey();
}

// work method
static void DoWork(object o)
{
   Task task = (Task)o;
   int id = task.Id;

   for (int i = 0; i < 10; i++)
   {
      Console.WriteLine("Thread {0} is working", id);
      // simulate some long work
      Thread.Sleep(200);

      task.ReturnValue++;
   }

   // we have to lock the monitoring variable to 
   // ensure nobody else can work with until we decremented it
   lock (typeof(Program))
   {
      _workingTasks--;
   }
}
As you see, there are quiet a lot of things to keep in mind.

We have to use the C# lock statement to ensure a thread safe work with our monitoring variable. Usually you don't have to lock a complete System.Type; you can use lock(this) or any other reference type. I just used lock(typeof(Program)) because I worked with static methods.

We use Thread.Sleep(1) to to poll the state of the dispatched tasks.

Now we'll start to simplify this work.

Using volatile

The first thing we can do to slightly simplify our method is to define our monitoring variable as

static volatile int _workingTasks;

The volatile keyword can be used to tell .NET that a variable might be accessed by many threads for write access. If you define a member variable as volatile, you don't have to use lock to ensure thread safeness.

Since we used lock just once in our previous sample this just changes the call of
lock (typeof(Program))
{
   _workingTasks--;
}
to this

_workingTasks--;

Seems to be senseless to work with volatile? Keep in mind, this is a very very simple sample. Not to need lock becomes really handy if you have to do different things with shared member fields.

Avoid the explicit Polling

The next thing to simplify your multi-threading is to remove the explicit polling. What does this mean? Since now we worked with a member field "_workingTasks" which monitored the state of our working threads. Suggest a larger class with several multi-threading implementations. In this case you would need several member fields to monitor the different threading activities. Another way to wait for the execution of a thread is to Join() it.
static void Main(string[] args)
{
   // count of tasks to be simulated
   int taskCount = 10;
   // hold the dispated tasks to work with the results
   List<task> tasks = new List<task>(taskCount);
   // remember all threads
   List<thread> threads = new List<thread>(taskCount);

   for (int i = 0; i < taskCount; i++)
   {
      // create and remember a new thread
      Thread thread = new Thread(new ParameterizedThreadStart(DoWork));
      threads.Add(thread);

      // create and remember the task
      Task task = new Task { Id = i };
      tasks.Add(task);
      // start the thread
      thread.Start(task);
   }

   // --&gt; HERE &lt;--
   // wait until all threads are finished
   threads.ForEach(thread => thread.Join());

   // show the return values after all threads finished
   tasks.ForEach(task =>
      Console.WriteLine(
         "Thread {0} returned: {1}", 
         task.Id, 
         task.ReturnValue));

   Console.ReadKey();
}

// work method
static void DoWork(object o)
{
   Task task = (Task)o;
   int id = task.Id;

   for (int i = 0; i < 10; i++)
   {
      Console.WriteLine("Thread {0} is working", id);
      // simulate some long work
      Thread.Sleep(200);

      task.ReturnValue++;
   }
}
As you see, we don't need our monitoring variable any more. The usage of Join() makes it possible to implement a much more encapsulated multi-threading.

Working with the ThreadPool

Keep in mind, threads are a system resource which are important to create. If you have to do many smaller tasks it can be more expensive to create all the threads instead work single-threaded. If you have complex tasks which probably have to handle their thread state in conjunction with your application thread or other threads, you should use the System.Threading.Thread class because it gives you the largest flexibility. If you just have to dispatch tasks and wait for them (as in our sample) it can be the wrong way always to create a new thread. A good approach to work with threads in this scenario is to reuse them for the next tasks.

Q: Okay, cool!! Let's start to code a custom thread manager!

A: Er... nope. It's already available.

For this kind of work we have to do with our "Task" objects you can use the System.Threading.ThreadPool class and it's especially made for this kind of work. It's a pool of threads which can be used to schedule working tasks within.

To schedule means you can store a larger count of small jobs within. Whenever a pooled thread is available the next scheduled (queued) task will be started. By default .NET determines the count of initially available threads by your environment information. It also starts new threads if they seem to be useful. However, you can customize all this information by static methods (e.g. ThreadPool.GetMaxThreads, ThreadPool.SetMinThreads).

So let's use the ThreadPool to do our tasks. For this propose I extended our "Task" class with an additional property.
class Task
{
   public int Id { get; set; }
   public int ReturnValue { get; set; }
   public AutoResetEvent AutoResetEvent { get; set; }
}
The AutoResetEvent is a special type of a WaitHandle which provides some special methods for multi-threading tasks.
static void Main(string[] args)
{
   // count of tasks to be simulated
   int taskCount = 10;
   // hold the dispated tasks to work with the results
   List<task> tasks = new List<task>(taskCount);

   for (int i = 0; i < taskCount; i++)
   {
      // create and remember the task
      Task task = 
         new Task 
         { 
            Id = i, 
            AutoResetEvent = new AutoResetEvent(false) 
         };
      tasks.Add(task);

      // queue the task in ThreadPool
      ThreadPool.QueueUserWorkItem(new WaitCallback(DoWork), task);
   }

   // wait until all queued tasks are finished
   tasks.ForEach(task => task.AutoResetEvent.WaitOne());

   // show the return values after all threads finished
   tasks.ForEach(task =>
      Console.WriteLine(
         "Thread {0} returned: {1}", 
         task.Id, 
         task.ReturnValue));

   Console.ReadKey();
}

// work method
static void DoWork(object o)
{
   Task task = (Task)o;
   int id = task.Id;

   for (int i = 0; i < 10; i++)
   {
      Console.WriteLine("Thread {0} is working", id);
      // simulate some long work
      Thread.Sleep(200);
      
      task.ReturnValue++;
   }

   // notify the application thread that the task finished
   task.AutoResetEvent.Set();
}
As you see, we don't need the monitoring member fields any more. We even don't need a Thread object any more. The AutoResetEvent class provides a WaitOne() method which can be used by our application thread to wait until all tasks finished. To release the wait handle of a AutoResetEvent you can use the Set() method.

Conclusion

I hope I could show you some tricks how to simplify the work with multiple threads in your application.

Almost every new computer has two or more CPUs, start to use them ;-).

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.