HP-UX Reliability - Feedback from Adrian Filipi-Martin

Fri, 08 May 1998 19:48:02 -0400

> If none of the above the above occurs, then a UNIX system's uptime can be
> measured in years. Reports of uptimes reaching 3 years are not uncommon in
> the Linux community. 

Ummm.. you switch from how reliable UNIX is to how reliable Linux is. Aside from slighting the non-Linux unices out there, it is also an example of proof by analogy, which is simply not a proof of anything.

FYI, I have had HP-UX servers in production use with 100% uptime requirements achieve uptimes of 460 or more days. This was invaluable to us because the machine was routing data from our hospital's labs to the Neuro/Natal Intensive Care Unit.

I'd also like ot point out that when buying non-consumer grade unix box hardware, i.e. anything but Intel, you often get much more for your dollar. To the best of my knowledge there are no PC mother boards that do anything intelligent with the error correcting memory. When it fails to correct a multiple bit error, they usually just singal a non-maskable interrupt which typically halts every Intell based OS. Compare this with an HP based workstation. I occasionally get log messages about single bit errors being correected in the system RAM, it bound to happen when you have alot of memory, then ione day I see that a particular address is getting hit with a higher frequency. I get some replacment RAM, shutdown the box, replace it and it is up and runnign with no need to go through a messy crash/reboot/- diagnose/crash-again cycle.

Similarly I started getting log entries indicating that the instruction cache on the CPU was failing. It was certainly hurting performance, but the system continued to run without real problems until we had a replacment. You just dont' get that kind of hardware support on Intel. This alone makes NT a poor mission critical OS because of the lack of such high quality hardware.

One last anecdote and I'll leave you alone. The HP server in our NNICU that had a 460 day uptime had three disks removed from its SCSI bus about 10 days into the 460. We knew there was sufficient slack on the system to coallesce all the FS data onto the remaing disks, then we unmounted them, powered them off and unhooked them. The fact that it ran a full year without incident is testament to the overengineering that goes into top dollar hardware for running UNIX-like OS's.

cheers,

Adrian


 <= Back to the Feedback Index

 <= Back to the article