I recently came across a situation where it was taking too long to process multi-threaded I/O requests in a driver. While investigating this issue it became apparent that all but one of the pending requests was actually hitting the driver dispatch. All the other threads ended up blocking in nt!IopAcquireFileObjectLock (via a call to KeWaitForSingleObject).
My first suspicion was to check if the the device was configured to be an exclusive1 device. The exclusive bit (DO_EXCLUSIVE) gets set in the Flags member of DEVICE_OBJECT and can be checked by using windbg command !devobj on the device object. The device turned out to be an exclusive device. However turning off the exclusive bit made no difference to the situation. Massive number of threads wanting the driver to process something, were still getting blocked in IopAcquireFileObjectLock.
It was time to check the documentation of the exclusive bit and I quickly realized that my initial suspicion was baseless since DO_EXCLUSIVE will not stop multiple requests from being blocked, but will block second handle being opened. All these requests were going through the same handle (since the device was exclusive), it was time to turn to bits on the handle that might lead to I/O Manager serializing requests to device2.
The first parameter to IopAcquireFileObjectLock turned out to be coming from the FILE_OBJECT corresponding to the device handle and doing a !fileobj on that file object revealed that FO_SYNCHRONOUS_IO bit was set. This bit is set when handle is opened for synchronous I/O (which is default in win32) ie. a read/write/ioctl will return when I/O is actually complete. Asynchronous or overlapped I/O on the other hand returns immediately and there are win32 APIs to check for I/O completion status at a later point of time.
Could it be that this bit is forcing serialization of all I/O ? There was one way to find out - specify FILE_FLAG_OVERLAPPED when opening the device via CreateFile and rerun the tests. And voila, the driver was indeed starting to process multiple requests concurrently instead of one request at a time.
It did not take long to find some direct confirmation of this in documentation. Here is a direct quote from Getting your driver to handle more than one I/O request at a time
You might think that your driver is blocking in some obscure way or that you need more threads in your application, but the solution is often much simpler: Make sure your application has opened the device for overlapped I/O. Otherwise, the I/O Manager serializes I/O requests by synchronizing through a lock in the file object before dispatching the IRP. Even if your application uses multiple threads, only one request at a time (per file handle) will get through.
The lock in the file object is actually a KEVENT member called Lock3.
This was a surprise for me. After all, what has overlapped I/O got to do with request serialization4 ? I have used FILE_FLAG_OVERLAPPED flag before but never realized that it is required for multiple I/O requests to be processed concurrently. Note that all I/O requests were still being made synchronously i.e. no OVERLAPPED structures were allocated or passed in calls to ReadFile, WriteFile or DeviceIoControl. So in a sense FILE_FLAG_OVERLAPPED is actually 2 flags in one
- It makes it possible to do Overlapped I/O. If you do not specify this flag, even if you pass OVERLAPPED structure in your I/O requests, things are not going to work.
- It enables multiple requests to be processed concurrently. If you do not specify this flag, even if you issue multiple requests on a handle, the requests will all be processed one by one.
- you cannot serialize Overlapped I/O.
- if you want to do synchronous I/O in multiple threads and you wish those requests to go in parallel, you must specify FILE_FLAG_OVERLAPPED even though you are not planning on doing overlapped I/O.
1When a driver sets up a device for exclusive access (by calling IoCreateDevice or IoCreateDeviceSecure and passing TRUE in the BOOLEAN Exclusive parameter), only one handle can be opened to the device. An attempt to open an already opened exclusive device, leads to CreateFile returning INVALID_HANDLE_VALUE. GetLastError returns 5 (ERROR_ACCESS_DENIED).
2I liked this new direction. I mean after all it was a FileObject lock that was blocking all these threads wasn't it ?
3At offset 0x4c of FILE_OBJECT on x86 XP SP3 and x86 Vista SP2
4 I would realize later that request serialization would be in general a bad thing for overlapped I/O