Friday, 11 July 2014

Why I Prefer HP over IBM

Why I Prefer HP over IBM


For almost 15 years now I’ve heavily favoured HP over IBM for a number of reasons.  My negative impression of IBM first came about with an experience at one of Canada’s large banks where IBM was the supplier of record at the time.  We were deploying a brand new financial application that was critical for managing high profile investor accounts.  The implementation mandated stability for the application so we decided to use Microsoft’s clustering technology on IBM hardware and host MS SQL Server on the cluster.  I was responsible for the implementation of the equipment, and both the software and hardware technology were relatively new at the time.  During testing and load testing the equipment, and overall clustering solution performed well.  After going live one evening during a maintenance failover the active node was switched to the passive, and what happened next turned out to be a living hell for an entire weekend.  At the end of the failover a check disk operation occurred and during this operation the entire disk contents were destroyed (I can’t do a check disk now without thinking of this).  The check disk was trying to repair what it thought were damaged files, and ending up corrupting the entire disk!

Of course when something like this happens it gets the attention of both technical and line of business management and valid questions start coming in about the technology choices.   Coming up with an explanation quickly was impossible, and the immediate conclusion was that the hardware was improperly configured, which rested squarely on my shoulders.  Prudent follow-up to the event had experts from IBM come in and evaluate the configuration.  There were minor changes and optimizations that were suggested by IBM, and subsequent testing seemed to indicate that the system was healthy and working.

Unfortunately the next time a failover occurred on the production environment the same issue occurred where the disk was completely wiped out again.  Back to restoring from tape, a slow an onerous process.  At this point I was determined to find out what was going on and try to figure a way to reproduce the situation on demand.  I felt I had no choice as the pressure was enormous to come up with the root cause.  No one was trusting the technology (and they shouldn’t have), and it was affecting the reputation of the project as a whole. 

After a substantial amount of testing I could finally reproduce the problem on demand.  What a relief.  During certain write operations while a failover was taking place the disk became marked as bad and all of the links to the Master File Table were recreated, and the volume was unusable.

At this point we involved software developers from both IBM and Microsoft to work together on the problem, which in my mind is always a bad sign, but also a very good sign.  A good software developer can look at exactly what’s going on and determine what the potential issue is.  They have the luxury of using tools that can get to the lowest level of the problem.  Fortunately because of the bank’s profile we were able to get the developers from both Microsoft and IBM with clustering development experience.

During a remote debug session, Microsoft loaded up the debug symbols for the clustering drivers, and I explained to them how to recreate the situation.  I’ll never forget the words that this Microsoft developer used just before we got started, he said “Let’s see if their driver is doing half of what it’s supposed to do”.  Sure enough the IBM driver was doing almost nothing to shut down all write operations during the failover sequence of operations, and this is what turned out to be the problem.  After a few minutes trying various additional operations we conferenced in an IBM representative to discuss the situation.  This discussion was brief, and IBM promised to rectify the situation be writing updated clustering drivers.  We also had a Microsoft KB article created based on this situation.

Since that time I’ve had a number of additional experiences with IBM hardware that are memorable and negative, but I’ll share those another time.


No comments:

Post a Comment