NT Reliability - Feedback from Nat Tate
Wed, 13 May 1998 14:09:01 +0100
Hi John
I'm an IT professional working in the UK and have been using Unix since 1984 and NT since 1993. I have a few comments to make with regard to the section on reliability.
Many people I meet who use NT claim that they have never seen a BSOD. To my mind this can only mean one thing... they don't push NT hard. However I think some of NT's unreliability is reminiscent of the problems faced by Unix in the early days.
In previous jobs I have worked as a kernel developer on Unix systems 5.3 and 5.4 (a joint project between Siemens and AT&T to port Unix 5.4 to Intel hardware) and OSF/1 (at the OSF). I have seen plenty of Unix panics in that time. Also early versions of Unix, up to 5.3, could be quite unstable if pushed hard, this was especially true where the hardware, especially CPUs such as the 286, didn't offer sufficient protection between user level and kernel level processes.
One famous Unix panic is "freeing free frag", I last saw one of these circa: 1987. NTs equivalent could be said to be the inexplicable (try searching DejaNews for causes): "IRQ NOT LESS THAN" or "PAGE FAULT IN NON PAGED AREA".
As you stated, NT BSOD can occur simply because the system has been up too long. Being up "too long" may also result in performance degradation. Some earlier incarnations of Unix had similar problems, often caused by memory or buffer leaks.
In terms of reliability NT would appear to be around 1985 in Unix terms. What is really sad is that in not learning from Unix's history Microsoft has condemned itself to repeat it. Witness the many TCP/IP related Denial of Service (DOS) attacks that have afflicted NT in the last 18 months. Most of these were fixed years ago in Unix TCP/IP stacks, the SYN attack is even documented in an RFC!
Unix panics are most frequently found in badly written device drivers. Writing device drivers is nontrivial in so far as this is the interface between the virtual and real worlds. Device timings and interrupts can all play unexpected havoc with the driver. NT's problem is that it attempts to support too wide a range of hardware and that driver writers, especially amongst the peripheral manufacturers, are more interested in the mass Windows 95 marketplace rather than the minority and more onerous NT one. Unix driver writers have always had to worry about reliability and I believe it is this difference in hereditary that accounts for Unix's better performance in this respect.
The usual answer from Microsoft about problems is to install the latest service pack or hot fix. With NT 4.0 we have seen 3 service packs and numerous hot fixes. Most of these are to do with system insecurities permitting amongst others:
- Internet based hackers to cause BDOD on remote NT systems (POD, PODII, WinNuke, Teardrop, Boink)
- hackers to gain admin privileges (GetAdmin)
- hackers to break NT's supposedly impregnable password encryption techniques (lophtcrack)
The problem is, that in the rush to patch up the latest security hole, Microsoft cannot test these fixes against all the hardware used by customers. Applying hot fixes can cause as many problems as it solves with certain hardware combinations. In any event System Administrators can't simple go around each and every machine applying the fixes, instead some period of testing and evaluation must take place.
The consequence of all this is that sites wishing to ensure a modicum of reliability when using NT are severely restricted in their hardware choices, and freedom from particular hardware manufacturers was one of Microsoft's major selling points.
From being something of a mild NT advocate I have lost all confidence the operating system and I only see more trouble in store with NT 5.0. As Steve Bellovin has said, all code has bugs, and the more code there is the more bugs.
regards,
Nat Tate
ps around our office we have a motto for NT --- suprisingly unreliable!