Upgraded the site

For those who didn't notice it yet, but I've upgraded my site to a new version of Telligent's Community Server.

I will be tweaking the configuration for the coming days, so stay tuned.

R.

Calculating single disk performance for random I/O

I have been doing a lot of performance analysis lately. Nine out of ten times the bad performance is caused by a not optimized storage solution. Why? When data doesn't fit into the memory of a server, it has to come from disk sooner or later. In my experience it's usually later (due to non optimized solutions), whereas sooner was to be expected. So, it's time to write down some disk performance basics.

I used a Seagate ST-373307FC 10.000 Rotations Per Minute (RPM) disk with a Fibre Channel (FC) interface for my calculations. You can find its specification sheet here. All information that is used for calculating RAW disk performance are obtained from this spec sheet.

                             Random I/O (R/W)
Average Seek (mSec)               4.7/5.3   
Average latency (mSec)           2.99/2.99  
Command and data transfer (mSec)  0.2/0.2   
------------------------------------------- +
Total access time (mSec)          7.9/8.5   

These numbers were obtained from the spec sheet

So the total number of I/O Operations Per Second, better known as IOP's can be calculated by dividing "1 second / total access time". Using the numbers from the table above: 1/0.0079 = ~ 126 IOP's for read operations and 1/0.0085 = ~ 117 IOP's for write operations.

Now we know how to calculate maximum random I/O performance for a single disk. Why is it so important to be able to calculate random I/O performance and not sequential I/O performance? The answer is quite simple. The more clients/processes that are to be served by a server, the bigger the chance its data access pattern will be random. Do you really think that when 4000 concurrent users connected to a file server generate sequential I/O and that the server will be able to predict the next piece of data to be requested by a client? No, it won't. And the fact it won't be able to predict it, limits the positive performance impact of caching algorithms in the process.

To make things even more complicated, it doesn't matter when a disk has to read one sector of 512 bytes or 32 sectors (16 KB block). Its platters rotate with 10.000 RPM per minute (in case of our Seagate disk), remember. So it reads 32 sectors almost as fast as one sector. Only when track boundary is reached, the performance drops a bit, due to the time it needs to seek to the adjacent track.

Knowing this, let's calculate the maximum random I/O performance for reading 8 KB blocks (16 sectors) and 16 KB blocks (32 sectors).

The formula is quite simple: Total numbers of IOP's x block size
Max random I/O (8 KB): 126 x 8 KB = 1008 KB per second!!!
Max random I/O (16 KB): 126 x 16 KB = 2016 KB per second!!!

Max random I/O (32 KB): 126 x 32 KB = 4032 KB per second
Max random I/O (64 KB): 126 x 64 KB = 8064 KB per second

You might want to increase block sizes, segment sizes, stripe size etc. in order to increase the performance... Please, consider the impact of RAID-levels on this and choose the right one. For more information on the impact of RAID-levels, read this blog.

As a baseline, use the following numbers for your calculations:
for a 10.000 RPM disk: 125 IOP's.
for a 15.000 RPM disk: 175 IOP's.

What about latency then? The killing factor for disk performance is the actual time needed to perform an I/O operation. Most disks run optimally with 2-3 outstanding I/O's in their queue. Using 7.9 mSec random latency for read operations, this gives us 7.9 mSec x 3 = 23.7 mSec maximum latency. For write operations this is 8.5 mSec x 3 = 25.5 mSec maximum latency (still using our Seagate disk as reference). As a rule of thumb, 21 mSec is considered the maximum, any average latency above that is considered a bottleneck. Any reoccurring spike far above that value is also considered a bottleneck.

Next time, I will talk about sequential disk performance.


Filed Under:

To be or not to be... Anti Virus solutions on a server

Not to be... (at least not always)

Why? Simple, try to image, you are a real time mail server, doing 5000 I/O's per second and you're interrupted everytime by Anti Virus (AV) software when you try to write, or even worse try to read to and from a storage system. The attempt made by the AV solution fails after a number of retries, since you have a non shared lock on the file you are accessing. In one of my previous posts, I talked a bit about drivers and DPC's and how Windows schedule system threads/drivers using the dispatcher (effectively doing cooperative multi-tasking on kernel level). Imagine the serialization this is causing, not to mention the fact virusses only infect files when performing a write operation, not when doing a read operation.

To all people out there in the world, please look closely at your standard AV configs deploying on servers. Make sure scan policies are set to scan write operations only, when this is not enough, schedule as many manual scan tasks as you see fit and please exclude any files that are exclusive in use by an application or operating system (database files, cluster configuration files immediately pop-up in my mind).

Don't say I didn't warn you...

P.S.: Yes I understand the necessity of having an AV solution, I don't say never to install one, just be careful with its configuration, because AV solutions can be as dangerous for systems stability as virusses can be for security.

The sense and nonsense of scalability by adding CPU's

I'm often faced with the following customer situation: "I added additional CPU's to the system because the average CPU utilization exceeded 80%, but the application doesn't run any smoother. Why is that?". Well, let me try to explain.

To determine a CPU is busy in terms of the amount of work (threads) that is waiting to be executed, take a closer look at the following Performance Monitor counter: "System\Processor queue length". If this one exceeds 10 per CPU installed, you definitely need more CPU's. But... take a close look at the following two Task Manager screenshots. I used the Task Manager, because it's easier to visualize my point.


A two CPU system at approx 80%.


A quad CPU (the same) system at approx. 40% (with extra memory added).

What do you see there? You'll expect to see what percentage of the total 100% of CPU time the CPU is spending on executing threads... Wrong. You see the amount of time the Windows "idle thread" runs subtracted from 100 (it counts the amount the times the "Idle Thread" is called). Why is this important, because it shows you one important thing. Two CPU's at approx 80% is roughly the same as four CPU's at 40%. 80 x 2 = 40 x 4. Simple mathematics, right?. Right. So, this tells you one thing. The application doesn't run any smoother or "faster" then a system equipped with two CPU's. This tells you a lot about the application, not about the system. It tells you the application doesn't scale from two CPU's to four CPU's. So, one should have looked at the "System\Processor queue length" counter. If that one indicates more then 20 threads (more then 10 per processor), the application generates enough work to take advantage of extra CPU's. If the "System\Processor queue length" counter is far away from 20, don't bother to add additional CPU's, the chances are that the application won't take advantage of them anyway.

So, the moral of the story... Looking at CPU utilization and one doubles the amount of CPU's and the utilization droppes by 50%, run away as fast as you can... You just encountered one of many bad scaling applications and they blame Windows or the system for not scaling.


Filed Under:

Microsoft Cluster Service configuration backup/restore utility

Several weeks ago I wrote a small utility, based on MSDN sample code, to backup and restore Microsoft Cluster Service configuration database information. I experienced some problems with restoring information, so I wanted to be sure a backup existed that could be restored without causing problems afterwards.
Anyway I wrote this small tool, using Visual Studio 2005, to only find out several days later, a similar tool from Microsoft existed already. I looked for a similar tool before I wrote one, but obviously I overlooked it...

You can download clusbck32.zip from my FTP site.


Filed Under:

Windows Server Performance considerations - introduction

The last months I am more and more engaged in accounts who often experience serious performance issues. By the time I am called on to the scene, they status of the account is usually at a critical level. After I have done my research, I often write a report on what is causing the performance issues and what to do in order to solve it and hot to prevent it from happening again. I do this, what I call, "one man's squad team" kind of job for several years now. The environments I am engaged in, are getting bigger and bigger. I am talking here about Storage Area Networks (SAN's), ususally with Tera Bytes (TB) connected to tens and even hunderds of systems with tens of thousands of users. Analyzing these kinds of environments is more than just specialize on e.g. Windows, or storage, or network. It's the area that connects these technologies where problems arise. This is 999 out of a 1000 times, caused by lack of experience or to be more precise, understanding of the involved technologies when designing and implementing these environments. To give an example, consider the following:

You drive a truck, loaded with 2000 crates of pineapples and you need to transport them from point A to point B over a distance of 400 km with an average speed of 80 kph. You need to make a decision. A) you drive the truck via the beach (on the sand) or B) via the highway. Which one would you choose? It's obvious, right? option "B". Now image the following environment. You have a fully loaded file server with 1000 concurrent users and you need to choose between a file server connected to a storage solution with A) 3x 300 GB SCSI disks or b) 12x 75 GB SCSI disks. Which one would you choose? The answer depends who makes the decision. The person who owns the budget or the specialist who happens to know a bit about performance. In the field option "A" is chosen most of the time, because of the costing factor involved. Why is that important you think? The answer is as simple as it is obvious. You don't expect a truck to do 80 kph driving of the beach, but you do expect the truck to make 80 kph on a highway. Why do you expect 3x 300 GB SCSI disks to perform as well as 12x 75 GB SCSI disks? The point I am trying to make, is that most performance problems are caused by lack of proper insight on the technologies used (truck on the beach vs. a truck on the highway) results in degraded performance. Using our truck as example, the truck on the beach doesn't make 80 kph, so people want to buy a bigger one, or one with a bigger engine capacity, but most of people forget to consider, a beach is simply not the right environment for the truck to operate in. A truck loaded with 2000 crates of pineapples has other requirements than a buggy loaded with just one crate of pineapples. A buggy happens to thrive well on a beach, the truck needs a proper infrastructure in order to complete its task. A decent highway, preferably without traffic jams. The same applies to servers who need to serve a lot of clients at the same time. A truck is made to transport many things at the same time instead of just one thing very fast. if you want that, buy a Ferrari instead.

A server is designed to serve many clients at the same time instead of completing one task very fast. Then why don't you give it the means to do so?

In my next post on this topic, I will discuss some Windows performance considerations on a technical level.


Filed Under:

Reading SQL Server logs using stored procedures

While I was searching for a way to determine the backup status of SQL databases, I came across the following Stored Procedure: "sp_readerrorlog". However, it's not documented in the SQL Servers books online. After some googling, I decided to use a trial and error method to find out how it can be used. It was obvious the stored procedure accepts up to 4 parameters and after some 15 minutes trial and error for the different combinations, I came to the following conclusion for the stored procedure's syntax:

exec sp_readerrorlog @1, @2, @3, @4

where:
@1 = Log file number (0=current).
@2 > 1 = Info.
     2 = Warning/Error
@3 = Search param #1
@4 = Search param #2
 

Of course since it's not documented, at least I couldn't find it, it's an assumed syntax.

 


Filed Under:

System hang on a cluster

Last couple of days I have been setting up and tuning Windows Server 2003 clusters. When performing the various fail-over tests as part of a test plan, one of the cluster nodes suddenly experienced a complete system hang. This happened when I tried to start the cluster server service on the node I was testing on. My terminal session, appeared to be dead, but the TCP/IP session remained open and the RDP client did not time-out as result. I could ping the system, but everything else appeared to be dead. So opening a MMC snap-in to see whether the cluster server service did actually start, didn't do much either, except giving me a empty right pane. Since the servers where physically located at the other side of the country (The Netherlands), and the remote supervisor adapter was not yet built in, due to back-order of the part, I was facing the possibility to need to drive to the server myself. Not something one wants to do in the evening after a long day. So I decided to wait and decide to do with it in the morning. A 15 minutes later or so, I wanted to check the logs on the remaining cluster nodes to see if there's anything out of the ordinary in there, when I noticed the terminal session of the failing node, was back again. I started checking logs and noticed the system had a time gap of 10 minutes in every log I could think of (event log, cluster log etc. etc.). I wanted to see whether I could reproduce the behavior on the other node as well. So I repeated the same tasks, stopping the cluster server service and start it again. Exactly the same happened on the other node as well. I decided to wait and see what happens. After 10 minutes, the system came back again, and responded like nothing happened. This system also had a gap of 10 minutes in its log files. My suspicion rose, that the Anti-virus program (Sophos) was interfering with cluster operations. When strange problems on server occurs, it's usually: lack of disk space, security right issue, driver issue or anti virus software. In my case, anti virus software. I confirmed this, by stopping the Sophos services and do the testing again. This time everything went as it supposed to go. Within minutes, I found a workaround. Simply exclude the %SystemRoot%\cluster folder from on-access scanning. This is a best practise anyway, but this didn't satisfy my need to know...

Why did Sophos hang the system for 10 minutes?

Before I explain what happened,  you need to understand what kind of drivers a common on Windows platforms. Basically you can categorize drivers into two groups. One groups of drivers is called device drivers, which communicate with hardware and the second group of drivers called filter drivers, which usually acts as a layer/interface between the OS components and other drivers. All drivers have one thing in common. They're called via an interrupt. There're two types of interrupts.

  • Hardware initiated interrupts, which are handled by device drivers.
  • Software initiated interrupts.

Interrupts have assigned interrupt levels with it. In the Intel world, the higher the interrupt level, the higher the priority. Below an overview from MSDN with interrupt priority levels:

IRQL                            IRQL value      Description
                              x86  IA64 AMD64
PASSIVE_LEVEL                   0    0    0     User threads and most kernel-mode operations
APC_LEVEL                       1    1    1     Asynchronous procedure calls and page faults
DISPATCH_LEVEL                  2    2    2     Thread scheduler and deferred procedure calls (DPCs)
CMC_LEVEL                       N/A  3    N/A   Correctable machine-check level (IA64 platforms only)
Device interrupt levels (DIRQL) 3-26 4-11 3-11  Device interrupts
PC_LEVEL                        N/A  12   N/A   Performance counter (IA64 platforms only)
PROFILE_LEVEL                   27   15   15    Profiling timer for releases earlier than Windows 2000
SYNCH_LEVEL                     27   13   13    Synchronization of code and instruction streams across processors
CLOCK_LEVEL                     N/A  13   13    Clock timer
CLOCK2_LEVEL                    28   N/A  N/A   Clock timer for x86 hardware
IPI_LEVEL                       29   14   14    Interprocessor interrupt for enforcing cache consistency
POWER_LEVEL                     30   15   14    Power failure
HIGH_LEVEL                      31   15   15    Machine checks and catastrophic errors; profiling timer for Windows XP and later

I highlighted, the three most important ones. Most threads run at IRQL (Interrupt Request Level) = PASSIVE_LEVEL, the lowest available interrupt level. The Windows thread scheduler aka. dispatcher runs are IRQL = DISPATCH_LEVEL. Device interrupts run at IRQL >= 3. IRQL = DISPATCH_LEVEL, is the higest software interrupt available.

Whenever a ISR (Interrupt Service Routine) is entered due to an external interrupt pending. For example, a network packet is received by the NIC (Network Interface Card), the payload is received through its driver TCP/IP receive buffers, a Deferred Procedure Call (DPC) is made and is queued for execution. This is done, to minimize driver code execution time and to speed up things. The DPC thread is later executed by the dispatcher as soon as the ISR exits and no other high priority threads are scheduled.

The dispatcher is the core thread scheduler of the Windows NT family. It's responsible for scheduling threads to run on a processor. This can be driver threads, systems threads (DPC's for instance), user threads. It runs threads in order by priority. This is important to know, since it will point us into the right direction.

So the following order of executing is maintained by the dispatcher and the system:

  1. Hardware interrupts (IRQL >= 3).
  2. Software interrupts at DISPATCH_LEVEL (IRQL = 2).
  3. Software interrupts at APC_LEVEL (IRQL = 1).
  4. Software interrupts at PASSIVE_LEVEL (IRQL = 0).

Actually, there is an "IRQL = 0.5" level, better known as critical region. This one is available so a portion of the code within a thread can not be interrupted by code that runs at a higher IRQL. Except of course, for hardware interrupts. The Associated API (Application Programmer Interface) with critical regions is: KeEnterCriticalRegion().

Now that I explained the basics, it's time to analyze what happened.

When the Cluster Server service tries to start, it needs to establish communication with the cluster its about to join and it needs to determine whether the cluster is in a valid state. To do that, it needs to execute the following steps:

  1. Open its local copy of the cluster configuration database to read the cluster configuration parameters like: cluster name, cluster IP address, etc. etc.
  2. Join the cluster using the informatin obtained from the cluster configuration database.

However, when the Cluster Server service was trying to op its configuration database, Sophos intervened. It did by using its filter driver to intercept the I/O API calls made by the system. Probably API calls like: CreateFile() with exclusive access (dwShareMode=0). To know this for sure, a debugger would have been required, but time and resources prohibited me from investigating. So this is the guessing part.

The following sequence of events probably happened:

  1. IRQL = 0 : Request by the Cluster Server service account to open its configuration database.
  2. IRQL = 0 : System thread that opens the file exclusively.
  3. IRQL = 2 : Sophos filter driver interrupts and open the file itself to scan its content on virusses.
  4. IRQL = 2 : Sophos probably tries to aquire a spin lock, but can't since the Cluster Server service owns one on the file. Which is fair, since the Cluster Server service probably doesn't want any other program being able to manipulate the configuration database while its being used.
  5. IRQL > 2 : Network packet is received by NIC and put into the DPC queue by the driver. My RDP session to the system.
  6. IRQL = 2 : System resumes normal operation, the dispatcher runs and determines the DPC needs to be processed and probably some other system threads too.
  7. IRQL = 0 : User threads are executed if no other higher priority threads waiting.

At "4" the Sophos filder driver thread loops, until a) the lock is aquired, b) a timer on thread level expires. Since the network packets for RDP session to the system initiates hardware interrupts at "5", the ISR for the NIC is entered and the payload put into the DPC queue. The system then returns to "4" and since the DPC queue runs at IRQL = DISPATCH_LEVEL, the same level as the Sophos filter driver runs, which happens to loop for appromimately 10 minutes, no DPC's are executed and the system appears to be hanging. After these 10 minutes, the Sophos filter driver thread lowers its priority to IRQL = PASSIVE_LEVEL. The dispatches then determines it's time to process the DPC queue at "6" and schedules its thread. After that, any other thread user/system that needs to be run are executed and the system comes back to normal operation at "7" as if nothing happened.

Of course more is involved, since it was a multi processor system, but that would make this explanation unnecessary complicated.

My two cents...

 

Windows Vista beta 1 available for download

Microsoft has posted Windows Vista beta 1 on MSDN subscriptions.

 


Filed Under:

Longhorn Client becomes Windows Vista

Microsoft has annouced the official name of Windows with the codename "longhorn"...

From the Microsoft site.

Bringing clarity to your world.

Today we live in a world of more information, more ways to communicate, more things to do. There is more you can do and even more you can discover.

Every day, millions of people around the globe rely on their Windows PC to manage their increasingly digital lives. While familiar tools for managing digital information are powerful, today's world requires more.

In today's digital world, you want the PC to adapt to you, so you can cut through the clutter and focus on what's important to you.

For more, visit the Microsoft Windows Vista homepage.


Filed Under:

Demystifying IRQL_NOT_LESS_OR_EQUAL

When working with Windows NT/2000/XP/2003 one has probably run into BSOD's (Blue Screen Of Death) on more than one occasion. Some BSOD's (or Bug Checks or Stop Screens), appear more often than others. One of the most appearing bug checks is the 0x0000000A aka IRQL_NOT_LESS_OR_EQUAL. In this blog I'll try to explain a bit of the myth surrounding this bug check. But before I'll explain what this bug check is all about, I'll talk a bit about drivers, interrupts, driver threads and dispatchers.

Drivers:
The main responsibility of drivers is to communicate with I/O devices on behalf of a process (or thread). These devices are either hardware devices or "virtual" devices. In the first case we speak of function driver, in the latter case we speak of filter drivers. Actually it's a bit more complicated, since there are upper-filter drivers and lower-filter drivers, bus drivers, class drivers, port drivers, but for the sake of clarity I'll stick with function driver and filter driver. An example of a function driver is: atapi.sys (which is responsible for accessing ATA based devices) and an example of a filter driver is: ndis.sys, which is NT's Network Driver Interface Specification (NDIS) library driver.

Every process/thread runs at one time or an other time on the CPU. A CPU communicates to the outside world and this is true for all commonly known CPU architectures today, using interrupts. An interrupt tells to CPU to stop doing what it was doing and start doing something else. When that task is finished, the CPU will return to the point where it was before it got interrupted. There are mainly two types of interrupts:

  1. Hardware initiated interrupts, also known als exceptions, usually externally raised.
  2. Software initiated interrupts, internally raised by the kernel.

Interrupts:
In either case, the CPU enters an ISR (Interrupt Service Routine) and starts doing what it should do. For physical devices the interrupt will be acknowledged and a Deferred Procedure Call (DPC) will be queued to complete its I/O operation. When an interrupt is raised (either external or internal) the CPU must determine whether is should grant the interrupt or ignore the interrupt (for the time being). This is done by setting a value in its interrupt mask and set a system spinlock (in case of an external interrupt) on multi processor systems, so the same interrupt is not being serviced at the same time by another CPU. All interrupts equal or higher than the number set by the interrupt mask, are granted. All others are ignored, but will stay raised until either the CPU allows it or the initiator cancels its request. When the CPU enters an ISR and another interrupt with a higher Interrupt Request Level (IRQL)  is pending, that interrupt will be granted and its ISR will be executed. This is normal behaviour for all processor architectures.

For software interrupts the kernel uses several IRQL's to prioritize any thread running on the system. There are three levels used by the Windows kernel. These levels are highlighted using "green" in the table below.

IRQL                            IRQL value      Description 
                               x86 IA64 AMD64
PASSIVE_LEVEL                   0    0    0     User threads and most kernel-mode operations
APC_LEVEL                       1    1    1     Asynchronous procedure calls and page faults
DISPATCH_LEVEL                  2    2    2     Thread scheduler and deferred procedure calls (DPCs)
CMC_LEVEL                       N/A  3    N/A   Correctable machine-check level (IA64 platforms only)
Device interrupt levels (DIRQL) 3-26 4-11 3-11  Device interrupts
PC_LEVEL                        N/A  12   N/A   Performance counter (IA64 platforms only)
PROFILE_LEVEL                   27   15   15    Profiling timer for releases earlier than Windows 2000
SYNCH_LEVEL                     27   13   13    Synchronization of code and instruction streams across processors
CLOCK_LEVEL                     N/A  13   13    Clock timer
CLOCK2_LEVEL                    28   N/A  N/A   Clock timer for x86 hardware
IPI_LEVEL                       29   14   14    Interprocessor interrupt for enforcing cache consistency
POWER_LEVEL                     30   15   14    Power failure
HIGH_LEVEL                      31   15   15    Machine checks and catastrophic errors; profiling timer for Windows XP and later

It's the responsibility of an ISR to run at IRQL = DIRQL as little as possible. Preferrable ISR code runs at IRQL = PASSIVE_LEVEL. The Dispatcher itself runs at IRQL = DISPATCH_LEVEL, so any ISR running at IRQL >= DISPATCH_LEVEL prevents the dispatcher from running and effectively block all other threads in the system. Whenever a ISR runs at IRQL >= DISPATCH_LEVEL its code cannot ever be pageable, since the system thread handling page faults runs at IRQL = APC_LEVEL.

Driver threads:
Although a driver can create a new thread by calling PsCreateSystemThread(), drivers rarely do so, because switching thread context is a relatively time-consuming operation that can degrade driver performance. For dedicated threads only to perform continually repeated or long-term activities a driver might create a thread. For temporay short term tasks, a driver can use a system supplied thread by queuing a work item using IoQueueWorkItem().
Whenever a driver thread needs to raise its IRQL (usually) to DIRQL it calls the KeSynchronizeExecution() function. But, this also means, that a buggy driver can "hang" the system, since its runs at IRQL= DISPATCH_LEVEL. It's also another way of saying the preemptive NT kernel now runs in cooperative mode... But, Microsoft doesn't like to call it cooperative multi tasking...

Dispatcher:
The NT dispatcher is NT's main scheduler for processes and threads. That is a thread that handles all preemptive multi tasking functions supplied by the kernel, not the Task Manager. ;-) Its IRQL = DISPATCH_LEVEL, it preempts all IRQL < DISPATCH_LEVEL threads, effectively all user threads and most kernel threads.

An IRQL_NOT_LESS_OR_EQUAL occurs when one of the following conditions is true:

  1. IRQL >= DISPATCH_LEVEL when calling KeWaitForSingleObject() or KeWaitForMultipleObjects() with a waiting value.
  2. IRQL > DISPATCH_LEVEL when acquiring a spinlock. Spinlocks (except the system spinlock for entering an ISR) run at IRQL = DISPATCH_LEVEL.
  3. IRQL >= DISPATCH_LEVEL when a page fault occurs. Page faults are handled at IRQL = APC_LEVEL.

There are other circumstances when an IRQL_NOT_LESS_OR_EQUAL occurs, but the most ocurring ones are listed above. The most famous one, is when kernel threads, usually driver code, tries to allocate memory from the paged pool or access memory from the paged pool, but the page needs to be paged in (either hard page or soft page), but the threads IRQL >= DISPATCH_LEVEL. Page faults are handled by the page fault thread, which runs at IRQL = APC_LEVEL. Now we have a problem. On kernel level, the systems runs in cooperative mode, which means that a system/driver thread (still running at IRQL >= DISPATCH_LEVEL) has to wait for an event handled by the page fault thread (running at IRQL = APC_LEVEL) which will never happen, since the page fault thread IRQL < driver thread IRQL. As a result, the system bug checks with a IRQL_NOT_LESS_OR_EQUAL.

I will write something on how to debug a bugcheck in a future blog.

 


Filed Under:

SQL 2005 DBO

While experimenting with SQL 2005 I found out the following:

  1. SQL 2005 does not grant the user SA, DB Owner (DBO) role when a new database is created.
  2. When creating a SQL 2005 user account, the option "User must change password at next logon" is checked by default.

Save yourself some headaches and keep this in mind when using SQL 2005. Obviously the above points are good things, but I rather find these kind of "features/things" out the hard way. ;-)


Filed Under:

Metaframe Presentation Server 4.0 for x64 rocks... first impression

Yesterday I installed MPS 4.0 x64 Edition. In my previous post I quoted part of the readme from the CD stating the following: "(Windows Server 2003 R2 Enterprise x64 Edition supports up to eight CPUs on one server)". Since the R2 beta code requires a trial version of Windows Server 2003, I installed a trial version just in case MPS 4.0 x64 does require a R2 installation, but it doesn't.

MPS 4.0 x64 obviously needs a license server. This license server, needs to be installed on a 32-bit version of Windows. So, after having installed the license server and after creating a published application or two, I installed the Citrix client on a desktop and launched it.

I made some screenshots, you can view them in the Photo Gallery.

The next weeks I will be participating in a PoC for Metaframe Presentation Server 4.0 for x64 and I will be in the position to do some real testing on some real x64 hardware, so stay tuned...

P.S.: I have seen some Citrix confidential numbers on scalability of MPS 4.0 x64 edition vs. MPS 4.0 ia32 edition. I cannot disclose them, let's just say don't miss this TCO saving train (technology)...


Filed Under:

Citrix Presentation Server 4.0 for Microsoft Windows Server 2003 x64 Edition

Yes, I have it! Probably one of the first here in The Netherlands. As part of a PoC (Proof of Concept) I'll be doing some testing and preparing demo's on x64 based Windows environment. According to the readme, Microsoft suggests one needs or prefers??? Windows Server 2003 R2 in order to run MPS 4.0 for x64 (see below). I'll do some testing whether it's just a suggestion or also a requirement, since I happen to have latest beta of Windows Server 2003 R2. I'll post my experiences later.

 

Readme for Citrix Presentation Server 4.0 for Microsoft Windows Server 2003 x64 Edition

Early Release

June 2005

Introduction

Installing the Early Release Software

System Requirements

Microsoft recommends the following:

  • Minimum CPU: x64 architecture-based computer with Intel Pentium or Xeon family with Intel Extended Memory 64 Technology, or AMD Opteron family, AMD Athlon 64 family, or compatible processor (Windows Server 2003 R2 Enterprise x64 Edition supports up to eight CPUs on one server)
  • Minimum RAM: 512MB
  • Multiprocessor Support: Up to eight
  • Minimum Disk Space for Setup: 4GB

Filed Under:

CS 1.1 not sending mail

I had problems with the e-mail function of CS 1.1. The cs_Exceptions table in the CS database had the following exception entry logged:

System.Runtime.InteropServices.COMException (0x80040211): The message could not be sent to the SMTP server. The transport error code was 0x800ccc15. The server response was not available.

According to Microsoft Support, 0x800ccc15 means:

0x800CCC15   SOCKET_CONNECT_ERROR           Unable to open Windows Socket.

So, the system couldn't set up a TCP socket to my SMTP relay box, which was confirmed by the output of a netstat -an | findstr ":25".

Then it hit me... I have Mcafee Virusscan 8 running on my server including the Access protection rules and guess what. Virusscan thought a mass worm/virus mailer was spamming my SMTP relay box and refused the socket connect attempt, made by the aspnet_wp.exe (IIS 6.0 worker process). See log entry:

<timestamp removed> Blocked by port blocking rule  aspnet_wp.exe Prevent mass mailing worms from sending mail a.b.c.d

Adjusting this rule proved to be the solution and this is confirmed by the log of my SMTP relay box.

 


Filed Under:
More Posts Next page »