Published Wednesday, July 20, 2005 6:34 AM by robertvv

Demystifying IRQL_NOT_LESS_OR_EQUAL

When working with Windows NT/2000/XP/2003 one has probably run into BSOD's (Blue Screen Of Death) on more than one occasion. Some BSOD's (or Bug Checks or Stop Screens), appear more often than others. One of the most appearing bug checks is the 0x0000000A aka IRQL_NOT_LESS_OR_EQUAL. In this blog I'll try to explain a bit of the myth surrounding this bug check. But before I'll explain what this bug check is all about, I'll talk a bit about drivers, interrupts, driver threads and dispatchers.

Drivers:
The main responsibility of drivers is to communicate with I/O devices on behalf of a process (or thread). These devices are either hardware devices or "virtual" devices. In the first case we speak of function driver, in the latter case we speak of filter drivers. Actually it's a bit more complicated, since there are upper-filter drivers and lower-filter drivers, bus drivers, class drivers, port drivers, but for the sake of clarity I'll stick with function driver and filter driver. An example of a function driver is: atapi.sys (which is responsible for accessing ATA based devices) and an example of a filter driver is: ndis.sys, which is NT's Network Driver Interface Specification (NDIS) library driver.

Every process/thread runs at one time or an other time on the CPU. A CPU communicates to the outside world and this is true for all commonly known CPU architectures today, using interrupts. An interrupt tells to CPU to stop doing what it was doing and start doing something else. When that task is finished, the CPU will return to the point where it was before it got interrupted. There are mainly two types of interrupts:

  1. Hardware initiated interrupts, also known als exceptions, usually externally raised.
  2. Software initiated interrupts, internally raised by the kernel.

Interrupts:
In either case, the CPU enters an ISR (Interrupt Service Routine) and starts doing what it should do. For physical devices the interrupt will be acknowledged and a Deferred Procedure Call (DPC) will be queued to complete its I/O operation. When an interrupt is raised (either external or internal) the CPU must determine whether is should grant the interrupt or ignore the interrupt (for the time being). This is done by setting a value in its interrupt mask and set a system spinlock (in case of an external interrupt) on multi processor systems, so the same interrupt is not being serviced at the same time by another CPU. All interrupts equal or higher than the number set by the interrupt mask, are granted. All others are ignored, but will stay raised until either the CPU allows it or the initiator cancels its request. When the CPU enters an ISR and another interrupt with a higher Interrupt Request Level (IRQL)  is pending, that interrupt will be granted and its ISR will be executed. This is normal behaviour for all processor architectures.

For software interrupts the kernel uses several IRQL's to prioritize any thread running on the system. There are three levels used by the Windows kernel. These levels are highlighted using "green" in the table below.

IRQL                            IRQL value      Description 
                               x86 IA64 AMD64
PASSIVE_LEVEL                   0    0    0     User threads and most kernel-mode operations
APC_LEVEL                       1    1    1     Asynchronous procedure calls and page faults
DISPATCH_LEVEL                  2    2    2     Thread scheduler and deferred procedure calls (DPCs)
CMC_LEVEL                       N/A  3    N/A   Correctable machine-check level (IA64 platforms only)
Device interrupt levels (DIRQL) 3-26 4-11 3-11  Device interrupts
PC_LEVEL                        N/A  12   N/A   Performance counter (IA64 platforms only)
PROFILE_LEVEL                   27   15   15    Profiling timer for releases earlier than Windows 2000
SYNCH_LEVEL                     27   13   13    Synchronization of code and instruction streams across processors
CLOCK_LEVEL                     N/A  13   13    Clock timer
CLOCK2_LEVEL                    28   N/A  N/A   Clock timer for x86 hardware
IPI_LEVEL                       29   14   14    Interprocessor interrupt for enforcing cache consistency
POWER_LEVEL                     30   15   14    Power failure
HIGH_LEVEL                      31   15   15    Machine checks and catastrophic errors; profiling timer for Windows XP and later

It's the responsibility of an ISR to run at IRQL = DIRQL as little as possible. Preferrable ISR code runs at IRQL = PASSIVE_LEVEL. The Dispatcher itself runs at IRQL = DISPATCH_LEVEL, so any ISR running at IRQL >= DISPATCH_LEVEL prevents the dispatcher from running and effectively block all other threads in the system. Whenever a ISR runs at IRQL >= DISPATCH_LEVEL its code cannot ever be pageable, since the system thread handling page faults runs at IRQL = APC_LEVEL.

Driver threads:
Although a driver can create a new thread by calling PsCreateSystemThread(), drivers rarely do so, because switching thread context is a relatively time-consuming operation that can degrade driver performance. For dedicated threads only to perform continually repeated or long-term activities a driver might create a thread. For temporay short term tasks, a driver can use a system supplied thread by queuing a work item using IoQueueWorkItem().
Whenever a driver thread needs to raise its IRQL (usually) to DIRQL it calls the KeSynchronizeExecution() function. But, this also means, that a buggy driver can "hang" the system, since its runs at IRQL= DISPATCH_LEVEL. It's also another way of saying the preemptive NT kernel now runs in cooperative mode... But, Microsoft doesn't like to call it cooperative multi tasking...

Dispatcher:
The NT dispatcher is NT's main scheduler for processes and threads. That is a thread that handles all preemptive multi tasking functions supplied by the kernel, not the Task Manager. ;-) Its IRQL = DISPATCH_LEVEL, it preempts all IRQL < DISPATCH_LEVEL threads, effectively all user threads and most kernel threads.

An IRQL_NOT_LESS_OR_EQUAL occurs when one of the following conditions is true:

  1. IRQL >= DISPATCH_LEVEL when calling KeWaitForSingleObject() or KeWaitForMultipleObjects() with a waiting value.
  2. IRQL > DISPATCH_LEVEL when acquiring a spinlock. Spinlocks (except the system spinlock for entering an ISR) run at IRQL = DISPATCH_LEVEL.
  3. IRQL >= DISPATCH_LEVEL when a page fault occurs. Page faults are handled at IRQL = APC_LEVEL.

There are other circumstances when an IRQL_NOT_LESS_OR_EQUAL occurs, but the most ocurring ones are listed above. The most famous one, is when kernel threads, usually driver code, tries to allocate memory from the paged pool or access memory from the paged pool, but the page needs to be paged in (either hard page or soft page), but the threads IRQL >= DISPATCH_LEVEL. Page faults are handled by the page fault thread, which runs at IRQL = APC_LEVEL. Now we have a problem. On kernel level, the systems runs in cooperative mode, which means that a system/driver thread (still running at IRQL >= DISPATCH_LEVEL) has to wait for an event handled by the page fault thread (running at IRQL = APC_LEVEL) which will never happen, since the page fault thread IRQL < driver thread IRQL. As a result, the system bug checks with a IRQL_NOT_LESS_OR_EQUAL.

I will write something on how to debug a bugcheck in a future blog.

 


Filed Under: