Asynchronous I/O request support for Windows Embedded Compact 7
Windows NT and its successors including Windows 7 have had asynchronous I/O request support as part of the device driver model. This feature enabled application developers to perform other tasks while waiting for a lengthy I/O operation to complete without being blocked by it. Applications access device driver services via file system APIs such as ReadFile or WriteFile. While Windows CE was sporting a microkernel architecture the lack of asynchronous I/O request support in the device driver model wasn’t really a problem as such. However, moving to a monolithic kernel architecture in Windows CE 6, showed the need for this support and this capability became a reality in Windows Embedded Compact 7.
In Windows CE versions up until version 7 a call to ReadFile resulted in the application waiting for the I/O request to complete before ReadFile returned. Although ReadFile for example had an argument allowing for an OVERLAPPED structure to be passed in, the implementation ignored it and it had to be set to NULL. This meant that if an application developer wanted to perform other tasks while waiting for a lengthy I/O operation to complete the developer would implement a simple thread for this purpose.
boolResult = ReadFile(….);
SetEvent // to notify caller thread of IO completion
It is clear that I/O requests that can be performed in a short time span do not need this capability in a device driver as a call by the application to an API such as ReadFile will return almost immediately. However lengthy I/O process means that an application is blocked while this takes place and valuable time is lost for non I/O related processing. Allowing the kernel device driver take charge of this process in parallel to the user mode application leaves the application designer with less complexity of handling I/O. Basically in this case, the same ReadFile API will return immediately while the device driver goes about handling the I/O request and the application is not blocked and is free to perform non I/O related processing while the read operation goes on in the background. The difference between synchronous and asynchronous I/O request handling can be seen in figure 1.
In order for Windows Embedded Compact 7 stream device drivers to support asynchronous I/O requests Device Manger has to be able to pass on the OVERLAPPED structure to the device driver in a coherent protocol, and provide the application with APIs to handle I/O request completion notification and cancelation. Device Manger has a new IO packet manager component built in.
Figure 1 Asynchronous vs. Synchronous I/O request handling
Device Manager in Windows Embedded Compact 7 handles all I/O requests via I/O Packet Manager which creates an I/O request packet for any such request. Depending on whether valid OVERLAPPED structure has been passed in or not, an I/O packet will be created either as an asynchronous request packet or a free synchronous request packet. The I/O packet itself does most of the processing of a request. To better understand how this works you want to study 3 files located in %WINCEROOT%\private\winceos\COREOS\device\devcore; “iopck.hpp” “iopck.cpp” and “filemgr.cpp”.
The following flow diagram in figure 2 shows how Device Manager processes the OVERLAPPED structure and passes the information to the device driver.
Figure 2 Asynchronous I/O call handling by Device Manager
Not each and every device driver has to provide support for asynchronous I/O request handling. The only reason to add this capability to a device driver would be if the size of data transfers between the caller process and the device driver is so large that it would cause the caller process to be halted while a lengthy I/O request is being processed. This is the motivation you would have as a device driver designer to incorporate the extra complexity needed to support asynchronous I/O request handling.
One of the complexities when implementing asynchronous I/O request support is setting up threads to handle asynchronous I/O requests. Because such requests are initiated by the caller process passes an initialized OVERLAPPED structure to the device driver which to respond correctly for multiple such requests and therefore each such request needs to be handled separately. As tempting as it may be to use the IST as a handler for input requests, beware of this practice. The reason is that an IST is initialized during device driver initialization and has no information about the event object passed in the OVERLAPPED structure by the caller process. More than that, an IST is triggered by an interrupt caused by the peripheral hardware and not as a result of a caller process request.
The tasks of the IST has no business to take on the role of handling I/O requests by the caller process, because the caller process has no prior knowledge of hardware interaction with the device driver. Therefore if the IST completed handling an interrupt request as initialized by the hardware it is done. When a caller process requests to read this data the device drivers returns the data buffer to the caller. If the buffers are large enough to warrant asynchronous I/O request handling it should create a dedicated thread for this request and terminate the thread on completion. The complexity of trying to avoid creating a dedicated thread for an asynchronous I/O request outweighs the code required to implement such a thread which is extremely simple.
In Windows Embedded Compact 7 Device Manger implementation has been extended to support this asynchronous mode of I/O handling and if a device driver is implemented to support it an application can call ReadFile which will return immediately while the device driver will launch the read I/O operation and Device Manager will set an event, created by the calling application and provided to the device driver by implementing the OVERLAPPED structure, to notify the application that the read operation has completed.
Device Manager exposes three new functions that can be used by kernel mode device drivers to support asynchronous I/O requests:
· CreateAsyncIoHandle - This function is used by a device driver to change XXX_Read, XXX_Write and XXX_IOControl entry points implementations to support asynchronous I/O
· CompleteAsyncIo - This function signals completion of asynchronous I/O and updates the final count of bytes that were completed by asynchronous I/O
· SetIoProgress - This function is used by an asynchronous device driver to update the status and progress on an asynchronous I/O request
The following code example shows how to implement an asynchronous I/O operation within a device driver and a possible processing of such asynchronous I/O processing. In this example a device driver implements a long write operation (not really that long in the example, still), within the XXX_IOControl entry point. To perform the long write operation the device driver creates a special thread that actually copies the input buffer to the device driver’s local buffer. Not to smart, but really drives the point home. The only code that distinguishes this device driver from any other kernel mode stream device driver is this particular behavior. The following in Listing 1 is a snippet of code which demonstrates the implementation of the XXX_IOControl entry point. The code marked in red signifies the special additions that Device Manager provides for device drivers developers to support asynchronous I/O handling. The first thing to notice is the new argument passed to the XXX_IOControl entry point, the hAsyncRef handle that points to an IO packet object. The next offering by Device Manager is the function CreateAsyncIoHandle. This function connects the IO packet to an asynchronous buffer to allocate a copy buffer, in this instance just the input buffer.
In Listing 2 the code sets up parameters to be passed on to thread that will process the I/O operation, and create the thread and run it. The rest is up to the thread. The code here is simple and leaves the thread at 251 priority level. When designing your driver you may want to change the priority of this thread if the I/O operation is huge. However you should remember the impact this will have on the system, and consider the priority of the user mode caller thread.
The thread itself performs the I/O operation and when it completes it, the code calls another function exposed by Device Manager CompleteAsyncIo to notify the caller process that the I/O operation has completed. As a result the event that was created by the caller process is signaled.
The process itself is a very simple console application that opens the device driver, creates an OVERLAPPED structure and an event and calls DeviceIoControl and is shown in Listing 3, most of the time the code listing omits debug messages to make for clarity of code. The code in bold red fonts emphasizes the code specific to asynchronous I/O request support.
BOOL TST_IOControl(DWORD hOpenContext, DWORD dwCode, PBYTE pBufIn, DWORD dwLenIn, PBYTE pBufOut, DWORD dwLenOut, PDWORD pdwActualOut, HANDLE hAsyncRef)
PTST_DEVCONTXT pDevContxt = (PTST_DEVCONTXT)hOpenContext;
BOOL bRet = TRUE;
DWORD dwErr = 0;
HANDLE hAsyncIO = NULL;
hAsyncIO = CreateAsyncIoHandle(hAsyncRef,
g_AsyncIOParams.pSrcBufIn = (PBYTE)pBufIn;
(PVOID)pBufIn, dwLenIn, ARG_I_PTR, TRUE);
g_AsyncTestParams.hAsyncIO = hAsyncIO;
g_AsyncTestParams.dwInLen = dwLenIn;
g_hAsyncThread = CreateThread(NULL,163840,
CREATE_SUSPENDED | STACK_SIZE_PARAM_IS_A_RESERVATION,
if (g_hAsyncThread == NULL)
// Raise asyn thread priority if you want IO operation
// to not last ethernity
// pass back appropriate response codes
Listing 1 Adding asynchronous I/O handling support
The following code shows the thread parameters and the thread itself. Of course this is a very simple example and you may want to consider having an array of parameters structures to allow for more than one thread at the time. You may also want the structure to hold output buffer pointer and length so that this structure is generic for both input and output I/O request handling threads. The pSrcBufIn field is there to allow closing the pointer to the marshalled caller buffer that was created in TST_IOControl function.
// Parameters structure for I/O thread
typedef struct _IOCTLWTPARAMS_tag
volatile HANDLE hAsyncIO;
volatile PBYTE pBufIn;
volatile DWORD dwInLen;
// Global variables
DWORD AsyncTestThread(LPVOID lpParameter)
PIOCTLWTPARAMS pParam = (PIOCTLWTPARAMS)lpParameter;
BOOL bComplete = 0;
PBYTE pBuf = pParam->pBufIn;
for (int i = 0; i < (int)pParam->dwInLen; i++)
buf[i] = *pBuf++;
if (pParam->hAsyncIO != NULL)
bComplete = CompleteAsyncIo(pParam->hAsyncIO,
// Remember that we didn’t close the caller buffer in TST_IOControl
hRes = CeCloseCallerBuffer(pParam->pBufIn, pParam->pSrcBufIn,
Listing 2 Thread for I/O handling
The following code is a very simple demonstration application that calls for the asynchronous I/O operation. In figure 3 you can see the output resulting from running this demo application.
#define WRITE_TEST_STRING_SIZE 65536
int _tmain(int argc, TCHAR *argv, TCHAR *envp)
BOOL bRet = FALSE;
volatile OVERLAPPED ovlpd;
HANDLE hCompltEvent = NULL;
HANDLE hTstDrvr = INVALID_HANDLE_VALUE;
DWORD dwBytes = 0;
memset((void*)&ovlpd, 0, sizeof(ovlpd));
// Try open an instance of TestDrvr
hTstDrvr = CreateFile(L"TST1:",
GENERIC_READ | GENERIC_WRITE,
if (INVALID_HANDLE_VALUE == hTstDrvr)
DWORD bdw = GetLastError();
// Format message and printf it
// Create a completion event for IOControl IO operation
ovlpd.hEvent = CreateEvent(NULL, TRUE, FALSE, NULL);
for (int i = 0; i < WRITE_TEST_STRING_SIZE; i++)
szBuf[i] = i;
bRet = DeviceIoControl(hTstDrvr,IOCTL_TESTDRVR_ASYNCTEST,
szBuf, WRITE_TEST_STRING_SIZE, NULL,
while (!bRet) // I/O is not done yet
bRet = GetOverlappedResult(hTstDrvr, (LPOVERLAPPED)&ovlpd,
if (!bRet )
_tprintf(_T("Asynch IO is not yet completed %d bytes
_tprintf(_T("DeviceIoControl has completed operation\r\n"));
Listing 3 User mode application to demonstrate asynchronous I/O handling
Figure 3 Running the asynchronous IO request demo application
 Note that a context switch may occur