Why I Prefer HP over IBM
For almost 15 years now I’ve heavily favoured HP over IBM
for a number of reasons. My negative
impression of IBM first came about with an experience at one of Canada’s large
banks where IBM was the supplier of record at the time. We were deploying a brand new financial
application that was critical for managing high profile investor accounts. The implementation mandated stability for the
application so we decided to use Microsoft’s clustering technology on IBM
hardware and host MS SQL Server on the cluster.
I was responsible for the implementation of the equipment, and both the
software and hardware technology were relatively new at the time. During testing and load testing the equipment,
and overall clustering solution performed well.
After going live one evening during a maintenance failover the active
node was switched to the passive, and what happened next turned out to be a
living hell for an entire weekend. At
the end of the failover a check disk operation occurred and during this
operation the entire disk contents were destroyed (I can’t do a check disk now
without thinking of this). The check
disk was trying to repair what it thought were damaged files, and ending up
corrupting the entire disk!
Of course when something like this happens it gets the
attention of both technical and line of business management and valid questions
start coming in about the technology choices.
Coming up with an explanation quickly was impossible, and the immediate
conclusion was that the hardware was improperly configured, which rested
squarely on my shoulders. Prudent
follow-up to the event had experts from IBM come in and evaluate the
configuration. There were minor changes
and optimizations that were suggested by IBM, and subsequent testing seemed to
indicate that the system was healthy and working.
Unfortunately the next time a failover occurred on the
production environment the same issue occurred where the disk was completely
wiped out again. Back to restoring from
tape, a slow an onerous process. At this
point I was determined to find out what was going on and try to figure a way to
reproduce the situation on demand. I
felt I had no choice as the pressure was enormous to come up with the root
cause. No one was trusting the
technology (and they shouldn’t have), and it was affecting the reputation of
the project as a whole.
After a substantial amount of testing I could finally
reproduce the problem on demand. What a
relief. During certain write operations
while a failover was taking place the disk became marked as bad and all of the
links to the Master File Table were recreated, and the volume was unusable.
At this point we involved software developers from both IBM
and Microsoft to work together on the problem, which in my mind is always a bad
sign, but also a very good sign. A good
software developer can look at exactly what’s going on and determine what the
potential issue is. They have the luxury
of using tools that can get to the lowest level of the problem. Fortunately because of the bank’s profile we
were able to get the developers from both Microsoft and IBM with clustering
development experience.
During a remote debug session, Microsoft loaded up the debug
symbols for the clustering drivers, and I explained to them how to recreate the
situation. I’ll never forget the words
that this Microsoft developer used just before we got started, he said “Let’s
see if their driver is doing half of what it’s supposed to do”. Sure enough the IBM driver was doing almost nothing
to shut down all write operations during the failover sequence of operations,
and this is what turned out to be the problem.
After a few minutes trying various additional operations we conferenced
in an IBM representative to discuss the situation. This discussion was brief, and IBM promised
to rectify the situation be writing updated clustering drivers. We also had a Microsoft KB article created
based on this situation.
Since that time I’ve had a number of additional experiences
with IBM hardware that are memorable and negative, but I’ll share those another
time.
No comments:
Post a Comment