viksoe.dk

Threadpool in Vista


This article was submitted .


I recently updated my Thread class sample. It is a very simplistic wrapper around the Windows Thread API, and it just gives you the ability to launch a thread that runs some code from a C++ class.

Obviously if you are really going to take full advantage of the multiple CPU cores on the hardware you're running on, you'll need to consider scheduling your work on all the CPUs available and in the most efficient manner. This is where threadpools offer one solution.

In Windows Vista we now have several new APIs that deal with threads, threadpools and enhancements to the semaphore primitives.
This sample demonstrates how to work with the new threadpool APIs. I've wrapped them in a few C++ classes, but it's the usual kind of thin WTL-style wrapper that takes care of automatic cleanup.

The new threadpool is constructed like this:

CThreadpool pool;
pool.Create();
pool.SetMaximumThreadCountToCPUs();
...
pool.Destroy();
By default the threadpool will reserve from 1 to 500 threads for the pool. If you submit 500 tasks to the pool they could end up all running simultaneously. When you only have 2 physical CPUs in the machine, the scheduling of 500 threads alone would deprive you from the benefits of running things in parallel. This is why we limit the number of threads available to the pool by calling SetMaximumThreadCountToCPUs(). It is a general rule of thumb that jobs with no idle time (ie. waits for a file read operation) run most efficient when you have just one thread pr CPU (core). But in most cases you might want to do a performance test to see if the threadpool choose the more appropriate number of threads by itself. Unless you restrict the number of threads, the pool object will adjust the number of threads as it pleases to the most optimal scheduling. Previous incarnations of the threadpool were designed solely for running a series of small work pieces, but the Windows Vista version allows you to set the maximum number of threads without sharing these with the rest of the system. Though threadpools are most efficient when they schedule small work pieces, you can use a separate threadpool to run lengthy tasks in parallel, just set the thread number limit and consider using the MarkTasksAsLongRunning() method to tell the pool scheduler to expect your tasks to be long running.

To submit a task to the threadpool, we shall wrap its work in a C++ class.

class CMyTask : public CThreadpoolWorkerTask<CMyTask>
{
public:
  void Run() 
  { 
    printf("Task is running...\n"); 
  }
};
First of all, the task derives from the CThreadpoolWorkerTask class which wraps the basic functionality of a scheduling task. The Run() method will get called once the task has been scheduled for execution on a worker thread in the threadpool. It would perhaps have been nice to not have to create a C++ class for each task and use member function pointers instead. That kind of depends on how you group your work, and these wrapper classes do not offer this ability.

Worker, Wait, Timer, IO

There are several types of tasks you can derive from. The WorkerTask is the most simple, but there are also tasks that depend on timers and semaphores:

  WorkerTask   A basic task.
  WaitTask A task which waits for an Event to become signalled.
  TimerTask A task which is periodically triggered by a timer.
  IoTask A task which is controlled by IO Completion Ports. Great for reading files.

And we should not forget to run our task...
CThreadpool pool;
pool.Create();

CMyTask task;
pool.AddTask(&task);
task.WaitForTask();
task.CloseTask();

pool.Destroy();

Now, in the above sample we submitted one task. That's a little trivial. If we're going to submit several tasks, keeping track of them all to manually wait and close them would be cumbersome. Luckily, the threadpool can group the tasks together and handle the waiting and shutdown bit internally.

CThreadpool pool;
pool.Create();

pool.InitGroup();

CMyTask1 task1;
CMyTask2 task2;
pool.AddTask(&task1);
pool.AddTask(&task2);

pool.WaitForGroup();

pool.Destroy();
This allows us to submit many tasks to the pool and with the WaitForGroup() call, wait for the completion of all of them.

When you submit a task to the threadpool, you cannot assume anything about what thread it is going to be scheduled on. There is also no dependency rules beyond rudimentary for tasks, so you'll have to coordinate dependencies yourself, perhaps by using the WaitTask that would depend on another task signalling its completion.

You can even submit the same worker task multiple times:

...
pool.InitGroup();

CMyTask task;
pool.AddTask(&task);
pool.AddTask(&task);
pool.AddTask(&task);
pool.WaitForGroup();

pool.AddTask(&task);
pool.WaitForGroup();
When submitting a task multiple times, its Run() method gets called multiple times. This would allow you to split a large piece of work inside CMyTask into several chunks, which again would run in parallel when more CPUs are added. Also notice how you can submit tasks after waiting for group completion. A new group is automatically started once you add new tasks.

To create a dependency or to use different task types could be done like this. First we'll create a task type that depends on a waitable event.

class CWaitTask : public CThreadpoolWaitTask<CWaitTask>
{
public:
  void Run(TP_WAIT_RESULT WaitResult) 
  { 
    printf("WaitTask was activated\n"); 
  }
};
As you can see, the inherited class type has changed, and the arguments to the Run() method now contains a value that tells you why your task was called (WAIT_OBJECT_1 and other constants known from the ::WaitForSingleObject API are possible here).

To schedule the tasks, we could do the following...

HANDLE hEvent = ::CreateEvent(NULL, TRUE, FALSE, NULL);

pool.InitGroup();

CMyTask task1;
task1.SetEventWhenTaskReturns(hEvent);

CWaitTask task2;
task2.SetTaskWaitInfo(hEvent, 1000UL);

pool.AddTask(&task1);
pool.AddTask(&task2);

::Sleep(2000);        // Allow tasks to do their work
pool.WaitForGroup();  // Release pool memory

This first creates our old worker task which we instruct to signal the event when it has completed its work.
We then create the WaitTask, which we instruct to wait for the event to become signalled before it can run. We've also added a timeout value in the SetTaskWaitInfo call so the task also triggers in case the dependant task takes too long time to complete its job.

There is no direct way to cancel a running task. You must wait for the completion of tasks already running, but you do have the option to instruct the waiting function to remove all pending tasks in the pool's queue.

Dynamic class creation

So far our samples have created worker tasks on the stack and in the same scope as the threadpool instance, but we may wish to just throw tasks at the queue in a less structured manner. We support this by creating task classes dynamically but we'll need a custom Cleanup Group to ensure that instances get destroyed properly. Like this...
CThreadpool m_pool;
CThreadpoolAutoDeleteCleanupGroup m_group;
...
m_pool.Create();
m_pool.ActivateCleanupGroup(m_group, m_group.CleanupCallback);
...
CTask3* pTask3 = new CTask3();
m_pool.AddTask(pTask3);
...
m_group.CancelGroup();  // ... or WaitForGroup()
m_group.Close();
m_pool.Destroy();
Here we create and activate the custom CThreadpoolAutoDeleteCleanupGroup group, which ensures that the tasks are properly deleted in C++. The Cleanup Group expects all tasks to be created with the C++ new operator. Tasks are deleted when they finish their work or when the group is cancelled, so it is important to always cancel or wait for the group completion before shutting down the pool.

Parallelism

So using a threadpool doesn't seem to be all that difficult, does it?
Well, the hard part is not to make your code run in a different thread. The problem with these constructs has always been to manage the resources shared by your main thread and the worker threads. Now that it even pays to split up a large piece of work into several tasks that can run in parallel, you might end up getting much more complicated resource locking and synchronization problems than you imagined.
Your threads will read and write their result back to memory. If this memory is shared and can be accessed by multiple threads, the danger of corrupting memory structures is high. Great care and planning must be done to ensure the integrity of your data before you start to deploy a threadpool.

There are several experiments being done with adding parallelism into most programming languages these days. Recent versions of .NET have experimental extensions available to the framework. For the C++ you can find the precompiler extensions of OpenMP. If you find that your projects could benefit much more from parallelism, then such libraries will help you. I recommend OpenMP, and Intel's Threading Building Blocks would probably be worth investigating.

And in closing, if you have a design where you need to submit many many small tasks to the threadpool, then perhaps using the C++ wrappers presented here will yield too much of an overhead for your design. You should really consider using the bare APIs then.

Source Code Dependencies

Windows Vista
Microsoft Visual Studio.NET 2008

Download Files

DownloadSource Code (4 Kb)

To the top