|< Windows Process & Threads Programming 2 | Main | Windows Process & Threads Programming 4 | Site Index | Download |








Part 3: Story And Program Examples







What do we have in this Module?

  1. Context Switches

  2. Priority Boosts

  3. Priority Inversion

  4. Multiple Processors

  5. Thread Affinity

  6. Thread Ideal Processor

  7. Multiple Threads

  8. Creating Threads Example

  9. Thread Stack Size

  10. Thread Handles and Identifiers

  11. Suspending Thread Execution

  12. Synchronizing Execution of Multiple Threads

  13. Multiple Threads and GDI Objects

  14. Thread Local Storage

  15. Creating Windows in Threads















My Training Period: yy hours. Before you begin, read some instruction here.




The expected skills include:

  • Able to understand from zero ground about a process and thread implementation in Windows OSes.

  • Able to understand various terms used in the process and thread discussion.

  • Able to understand process scheduling and scheduler.

  • Able to use Windows Task Manager to view and control processes.

  • Able to understand Synchronization.

  • Able to understand various object used to synchronize multiple threads such as Mutex, Semaphore and Timer.

  • Able to understand and differentiate between multitasking and multithreading.

  • Able to understand various Inter-process communications among processes mechanisms such as Remote Procedure Call (RPC), Pipes and Windows Sockets.

Context Switches


The scheduler maintains a queue of executable threads for each priority level. These are known as ready threads. When a processor becomes available, the system performs a context switch. The steps in a context switch are:


  1. Save the context of the thread that just finished executing.

  2. Place the thread that just finished executing at the end of the queue for its priority.

  3. Find the highest priority queue that contains ready threads.

  4. Remove the thread at the head of the queue, load its context, and execute it.


The following classes of threads are not ready threads.

  1. Threads created with the CREATE_SUSPENDED flag.
  2. Threads halted during execution with the SuspendThread() or SwitchToThread() function.
  3. Threads waiting for a synchronization object or input.


 Until threads that are suspended or blocked become ready to run, the scheduler does not allocate any processor time to them, regardless of their priority.  The most common reasons for a context switch are:

  1. The time slice has elapsed.
  2. A thread with a higher priority has become ready to run.
  3. A running thread needs to wait.


When a running thread needs to wait, it relinquishes the remainder of its time slice.


Priority Boosts


Each thread has a dynamic priority. This is the priority the scheduler uses to determine which thread to execute. Initially, a thread's dynamic priority is the same as its base priority. The system can boost and lower the dynamic priority, to ensure that it is responsive and that no threads are starved for processor time. The system does not boost the priority of threads with a base priority level between 16 and 31. Only threads with a base priority between 0 and 15 receive dynamic priority boosts.  The system boosts the dynamic priority of a thread to enhance its responsiveness as follows.

  1. When a process that uses NORMAL_PRIORITY_CLASS is brought to the foreground, the scheduler boosts the priority class of the process associated with the foreground window, so that it is greater than or equal to the priority class of any background processes. The priority class returns to its original setting when the process is no longer in the foreground.
  2. When a window receives input, such as timer messages, mouse messages, or keyboard input, the scheduler boosts the priority of the thread that owns the window.
  3. When the wait conditions for a blocked thread are satisfied, the scheduler boosts the priority of the thread. For example, when a wait operation associated with disk or keyboard I/O finishes, the thread receives a priority boost. You can disable the priority-boosting feature by calling the SetProcessPriorityBoost() or SetThreadPriorityBoost() function. To determine whether this feature has been disabled, call the GetProcessPriorityBoost() or GetThreadPriorityBoost() function.


After raising a thread's dynamic priority, the scheduler reduces that priority by one level each time the thread completes a time slice, until the thread drops back to its base priority. A thread's dynamic priority is never less than its base priority.


Priority Inversion


Priority inversion occurs when two or more threads with different priorities are in contention to be scheduled. Consider a simple case with three threads: thread 1, thread 2, and thread 3. Thread 1 is high priority and becomes ready to be scheduled. Thread 2, a low-priority thread, is executing code in a critical section. Thread 1, the high-priority thread, begins waiting for a shared resource from thread 2. Thread 3 has medium priority. Thread 3 receives all the processor time, because the high-priority thread (thread 1) is waiting for shared resources from the low-priority thread (thread 2). Thread 2 won't leave the critical section, because does not have the highest priority and won't be scheduled.

The scheduler solves this problem by randomly boosting the priority of the ready threads (in this case, the low priority lock-holders). The low priority threads run long enough to exit the critical section, and the high-priority thread can enter the critical section. If the low-priority thread doesn't get enough CPU time to exit the critical section the first time, it will get another chance during the next round of scheduling. For Windows Me/98/95:  If a high-priority thread is dependent on a low-priority thread that will not be allowed to run because a medium priority thread is getting all of the CPU time, the system recognizes that the high-priority thread is dependent on the low-priority thread. It will boost the low-priority thread's priority up to the priority of the high-priority thread. This will allow the thread that formerly had the lowest priority to run and release the high-priority thread that was waiting for it.


Multiple Processors


Windows NT uses a symmetric multiprocessing (SMP) model to schedule threads on multiple processors. With this model, any thread can be assigned to any processor. Therefore, scheduling threads on a computer with multiple processors is similar to scheduling threads on a computer with a single processor. However, the scheduler has a pool of processors, so that it can schedule threads to run concurrently. Scheduling is still determined by thread priority. However, on a multiprocessor computer, you can also affect scheduling by setting thread affinity and thread ideal processor, as discussed in the following sections.


Thread Affinity


Thread affinity forces a thread to run on a specific subset of processors. Use the SetProcessAffinityMask() function to specify thread affinity for all threads of the process. To set the thread affinity for a single thread, use the SetThreadAffinityMask() function. The thread affinity must be a subset of the process affinity. You can obtain the current process affinity by calling the GetProcessAffinityMask() function.  Setting thread affinity should generally be avoided, because it can interfere with the scheduler's ability to schedule threads effectively across processors. This can decrease the performance gains produced by parallel processing. An appropriate use of thread affinity is testing each processor.


Thread Ideal Processor


When you specify a thread ideal processor, the scheduler runs the thread on the specified processor when possible. Use the SetThreadIdealProcessor() function to specify a preferred processor for a thread. This does not guarantee that the ideal processor will be chosen, but provides a useful hint to the scheduler.


Multiple Threads


Each process is started with a single thread, but can create additional threads from any of its threads.


Creating Threads


The CreateThread() function creates a new thread for a process. The creating thread must specify the starting address of the code that the new thread is to execute. Typically, the starting address is the name of a function defined in the program code. This function takes a single parameter and returns a DWORD value. A process can have multiple threads simultaneously executing the same function.  The following example demonstrates how to create a new thread that executes the locally defined function, ThreadFunc().


// For WinXp

#define _WIN32_WINNT 0x0501

#include <windows.h>

#include <stdio.h>

#include <conio.h>


DWORD WINAPI MyThreadFunction(LPVOID lpParam)


       printf("The parameter: %d.\n", *(DWORD*)lpParam);

    return 0;



int main(void)


    DWORD dwThreadId, dwThrdParam = 1;

    HANDLE hThread;

    hThread = CreateThread(

        NULL,                         // default security attributes

        0,                                // use default stack size

        MyThreadFunction, // thread function

        &dwThrdParam,     // argument to thread function

        0,                              // use default creation flags

        &dwThreadId);       // returns the thread identifier

       printf("The thread ID: %d.\n", dwThreadId);

   // Check the return value for success. If something wrong...

   if (hThread == NULL)

         printf("CreateThread() failed, error: %d.\n", GetLastError());

   //else, gives some prompt...


      printf("It seems the CreateThread() is OK lol!\n");

   if (CloseHandle(hThread) != 0)

      printf("Handle to thread closed successfully.\n");

  return 0;



A sample output:


Program output


You can terminate process through Windows Task Manager by selecting the process (Image Name column) Right click mouse Select the required task from the context menu.


Windows Task Manager: Terminating a process


Some warning message occurred as shown below. Well, if you know and understand the process, it is safe to click the Yes button otherwise don’t just terminate any process; it may stop the related programs from running or in the worst case may collapse your Windows machine!


Windows Task Manager: Terminating a process


For simplicity, this example passes a pointer to a value as an argument to the thread function. This could be a pointer to any type of data or structure, or it could be omitted altogether by passing a NULL pointer and deleting the references to the parameter in ThreadFunc(). It is risky to pass the address of a local variable if the creating thread exits before the new thread, because the pointer becomes invalid. Instead, either pass a pointer to dynamically allocated memory or make the creating thread wait for the new thread to terminate. Data can also be passed from the creating thread to the new thread using global variables. With global variables, it is usually necessary to synchronize access by multiple threads.

In processes where a thread might create multiple threads to execute the same code, it is inconvenient to use global variables. For example, a process that enables the user to open several files at the same time can create a new thread for each file, with each of the threads executing the same thread function. The creating thread can pass the unique information (such as the file name) required by each instance of the thread function as an argument. You cannot use a single global variable for this purpose, but you could use a dynamically allocated string buffer.  The creating thread can use the arguments to CreateThread() to specify the following:

  1. The security attributes for the handle to the new thread. These security attributes include an inheritance flag that determines whether the handle can be inherited by child processes. The security attributes also include a security descriptor, which the system uses to perform access checks on all subsequent uses of the thread's handle before access is granted.
  2. The initial stack size of the new thread. The thread's stack is allocated automatically in the memory space of the process; the system increases the stack as needed and frees it when the thread terminates.
  3. A creation flag that enables you to create the thread in a suspended state. When suspended, the thread does not run until the ResumeThread() function is called.


You can also create a thread by calling the CreateRemoteThread() function. This function is used by debugger processes to create a thread that runs in the address space of the process being debugged.


Thread Stack Size


Each new thread or fiber receives its own stack space consisting of both reserved and initially committed memory. The reserved memory size represents the total stack allocation in virtual memory. As such, the reserved size is limited to the virtual address range. The initially committed pages do not utilize physical memory until they are referenced; however, they do remove pages from the system total commit limit, which is the size of the page file plus the size of the physical memory. The system commits additional pages from the reserved stack memory as they are needed, until either the stack reaches the reserved size minus one page (which is used as a guard page to prevent stack overflow) or the system is so low on memory that the operation fails.

It is best to choose as small a stack size as possible and commit the stack that is needed for the thread or fiber to run reliably. Every page that is reserved for the stack cannot be used for any other purpose. A stack is freed when its thread exits. It is not freed if the thread is terminated by another thread. The default size for the reserved and initially committed stack memory is specified in the executable file header. Thread or fiber creation fails if there is not enough memory to reserve or commit the number of bytes requested. To specify a different default stack size for all threads and fibers, use the STACKSIZE statement in the module definition (.def) file. To change the initially committed stack space, use the dwStackSize parameter of the CreateThread(), CreateRemoteThread(), or CreateFiber() function. This value is rounded up to the nearest page. Generally, the reserve size is the default reserve size specified in the executable header. However, if the initially committed size specified by dwStackSize is larger than the default reserve size, the reserve size is this new commit size rounded up to the nearest multiple of 1 MB. To change the reserved stack size, set the dwCreationFlags parameter of CreateThread() or CreateRemoteThread() to STACK_SIZE_PARAM_IS_A_RESERVATION and use the dwStackSize parameter. In this case, the initially committed size is the default size specified in the executable header. For fibers, use the dwStackReserveSize parameter of CreateFiberEx(). The committed size is specified in the dwStackCommitSize parameter.


Thread Handles and Identifiers


When a new thread is created by the CreateThread() or CreateRemoteThread() function, a handle to the thread is returned. By default, this handle has full access rights, and — subject to security access checking — can be used in any of the functions that accept a thread handle. This handle can be inherited by child processes, depending on the inheritance flag specified when it is created. The handle can be duplicated by DuplicateHandle(), which enables you to create a thread handle with a subset of the access rights. The handle is valid until closed, even after the thread it represents has been terminated. The CreateThread() and CreateRemoteThread() functions also return an identifier that uniquely identifies the thread throughout the system. A thread can use the GetCurrentThreadId() function to get its own thread identifier. The identifiers are valid from the time the thread is created until the thread has been terminated. Note that no thread identifier will ever be 0. If you have a thread identifier, you can get the thread handle by calling the OpenThread() function. OpenThread() enables you to specify the handle's access rights and whether it can be inherited.

For Windows NT, Windows Me/98/95:  There is no way to get the thread handle from the thread identifier. If the handles were made available this way, the owning process could fail because another process unexpectedly performed an operation on one of its threads, such as suspending it, resuming it, adjusting its priority, or terminating it. Instead, you must request the handle from the thread creator or the thread itself. A thread can use the GetCurrentThread() function to retrieve a pseudo handle to its own thread object. This pseudo handle is valid only for the calling process; it cannot be inherited or duplicated for use by other processes. To get the real handle to the thread, given a pseudo handle, use the DuplicateHandle() function.


Suspending Thread Execution


A thread can suspend and resume the execution of another thread using the SuspendThread() and ResumeThread() functions. While a thread is suspended, it is not scheduled for time on the processor. The SuspendThread() function is not particularly useful for synchronization because it does not control the point in the code at which the thread's execution is suspended. However, you might want to suspend a thread in a situation where you are waiting for user input that could cancel the work the thread is performing. If the user input cancels the work, have the thread exit; otherwise, call ResumeThread().

If a thread is created in a suspended state (with the CREATE_SUSPENDED flag), it does not begin to execute until another thread calls ResumeThread() with a handle to the suspended thread. This can be useful for initializing the thread's state before it begins to execute. Suspending a thread at creation can be useful for one-time synchronization, because this ensures that the suspended thread will execute the starting point of its code when you call ResumeThread(). A thread can temporarily yield its execution for a specified interval by calling the Sleep() or SleepEx() functions. This is useful particularly in cases where the thread responds to user interaction, because it can delay execution long enough to allow users to observe the results of their actions. During the sleep interval, the thread is not scheduled for time on the processor. The SwitchToThread() function is similar to Sleep() and SleepEx(), except that you cannot specify the interval. SwitchToThread() allows the thread to give up its time slice.


Synchronizing Execution of Multiple Threads


To avoid race conditions and deadlocks, it is necessary to synchronize access by multiple threads to shared resources. Synchronization is also necessary to ensure that interdependent code is executed in the proper sequence.  There are a number of objects whose handles can be used to synchronize multiple threads. These objects include:

  1. Console input buffers.
  2. Events.
  3. Mutexes.
  4. Processes.
  5. Semaphores.
  6. Threads.
  7. Timers.


The state of each of these objects is either signaled or not signaled. When you specify a handle to any of these objects in a call to one of the wait functions, the execution of the calling thread is blocked until the state of the specified object becomes signaled. Some of these objects are useful in blocking a thread until some event occurs. For example, a console input buffer handle is signaled when there is unread input, such as a keystroke or mouse button click. Process and thread handles are signaled when the process or thread terminates. This allows a process, for example, to create a child process and then block its own execution until the new process has terminated.

Other objects are useful in protecting shared resources from simultaneous access. For example, multiple threads can each have a handle to a mutex object. Before accessing a shared resource, the threads must call one of the wait functions to wait for the state of the mutex to be signaled. When the mutex becomes signaled, only one waiting thread is released to access the resource. The state of the mutex is immediately reset to not signaled so any other waiting threads remain blocked. When the thread is finished with the resource, it must set the state of the mutex to signaled to allow other threads to access the resource.

For the threads of a single process, critical-section objects provide a more efficient means of synchronization than mutexes. A critical section is used like a mutex to enable one thread at a time to use the protected resource. A thread can use the EnterCriticalSection() function to request ownership of a critical section. If it is already owned by another thread, the requesting thread is blocked. A thread can use the TryEnterCriticalSection() function to request ownership of a critical section, without blocking upon failure to obtain the critical section. After it receives ownership, the thread is free to use the protected resource. The execution of the other threads of the process is not affected unless they attempt to enter the same critical section. The WaitForInputIdle() function makes a thread wait until a specified process is initialized and waiting for user input with no input pending. Calling WaitForInputIdle() can be useful for synchronizing parent and child processes, because CreateProcess() returns without waiting for the child process to complete its initialization.  More story about synchronization of processes and threads is discussed in Windows Processes & Threads Synchronization.


Multiple Threads and GDI Objects


To enhance performance, access to graphics device interface (GDI) objects (such as palettes, device contexts, regions, and the like) is not serialized. This creates a potential danger for processes that have multiple threads sharing these objects. For example, if one thread deletes a GDI object while another thread is using it, the results are unpredictable. This danger can be avoided simply by not sharing GDI objects. If sharing is unavoidable (or desirable), the application must provide its own mechanisms for synchronizing access.


Thread Local Storage


All threads of a process share its virtual address space. The local variables of a function are unique to each thread that runs the function. However, the static and global variables are shared by all threads in the process. With thread local storage (TLS), you can provide unique data for each thread that the process can access using a global index. One thread allocates the index, which can be used by the other threads to retrieve the unique data associated with the index. The constant TLS_MINIMUM_AVAILABLE defines the minimum number of TLS indexes available in each process. This minimum is guaranteed to be at least 64 for all systems. The limits are as follows:




New Windows Version... ...

Windows XP and Windows 2000

1088 indexes per process

Windows 98/Me

80 indexes per process

Windows NT and Windows 95

64 indexes per process


Table 8


When the threads are created, the system allocates an array of LPVOID values for TLS, which are initialized to NULL. Before an index can be used, it must be allocated by one of the threads. Each thread stores its data for a TLS index in a TLS slot in the array. If the data associated with an index will fit in an LPVOID value, you can store the data directly in the TLS slot. However, if you are using a large number of indexes in this way, it is better to allocate separate storage, consolidate the data, and minimize the number of TLS slots in use.  The following diagram illustrates how TLS works.


Thread Local Storage (TLS) operation


Figure 12


The process has two threads, Thread 1 and Thread 2. It allocates two indexes for use with TLS, gdwTlsIndex1 and gdwTlsIndex2. Each thread allocates two memory blocks (one for each index) in which to store the data, and stores the pointers to these memory blocks in the corresponding TLS slots. To access the data associated with an index, the thread retrieves the pointer to the memory block from the TLS slot and stores it in the lpvData local variable.  It is ideal to use TLS in a dynamic-link library (DLL). Use the following steps to implement TLS in a DLL:


  1. Declare a global variable to contain the TLS index. For example:

static DWORD gdwTlsIndex;

  1. Use the TlsAlloc() function during initialization to allocate the TLS index. For example, include the following call in the DllMain() function during DLL_PROCESS_ATTACH:

gdwTlsIndex = TlsAlloc();

  1. For each thread using the TLS index, allocate memory for the data, then use the TlsSetValue() function to store the address of the memory block in the TLS slot associated with the index. For example, include the following code in your DllMain() during DLL_THREAD_ATTACH:

LPVOID lpvBuffer;

lpvBuffer = (LPVOID) LocalAlloc(LPTR, 256);

TlsSetValue(gdwTlsIndex, lpvBuffer);

  1. When a function requires access to the data associated with a TLS index, specify the index in a call to the TlsGetValue() function. This retrieves the contents of the TLS slot for the calling thread, which in this case is a pointer to the memory block for the data. For example, include the following code in any of the functions in your DLL:

LPVOID lpvData;

lpvData = TlsGetValue(gdwTlsIndex);

  1. When each thread no longer needs to use a TLS index, it must free the memory whose pointer is stored in the TLS slot. When all threads have finished using a TLS index, use the TlsFree() function to free the index. For example, use the following code in your DllMain() during DLL_THREAD_DETACH:

lpvBuffer = TlsGetValue(gdwTlsIndex);

LocalFree((HLOCAL) lpvBuffer);


And the following code during DLL_PROCESS_DETACH:



Creating Windows in Threads


Any thread can create a window. The thread that creates the window owns the window and its associated message queue. Therefore, the thread must provide a message loop to process the messages in its message queue. In addition, you must use MsgWaitForMultipleObjects() or MsgWaitForMultipleObjectsEx() in that thread, rather than the other wait functions, so that it can process messages. Otherwise, the system can become deadlocked when the thread is sent a message while it is waiting. The AttachThreadInput() function can be used to allow a set of threads to share the same input state. By sharing input state, the threads share their concept of the active window. By doing this, one thread can always activate another thread's window. This function is also useful for sharing focus state, mouse capture state, keyboard state, and window Z-order state among windows created by different threads whose input state is shared.
















Further reading and digging:


  1. Microsoft Visual C++, online MSDN.

  2. For Multibytes, Unicode characters and Localization please refer to Locale, wide characters & Unicode (Story) and Windows users & groups programming tutorials (Implementation).

  3. Structure, enum, union and typedef story can be found C/C++ struct, enum, union & typedef.

  4. Check the best selling C / C++ and Windows books at Amazon.com.








 |< Windows Process & Threads Programming 2 | Main | Windows Process & Threads Programming 4 | Site Index | Download |