Wednesday, March 11, 2009

Adventures in SQL 2008 Clustering on Windows 2008

So I’m sure most of you (who follow these things) were possibly aware of much of what I’m sayin, but it was news to me.
Different to 2008 clustering (from what I was used to in 2000 clustering)
Each application gets its own ‘virtual server’ and both windows nodes are “active”. Meaning, that I can have cluster aware applications running on different nodes. This is a good and a bad thing. Good because you can “Share the wealth” and in ‘normal’ operational modes have redundancy and gain performance. It is bad because if you have a fail over you will add burden to another node and possibly cause performance issues.
Each application ‘group’ gets its own set of disk resources. This simplifies things to some degree, however, it makes my old habit of loging into the virtual server a bit bad… At least on my cluster if I log into the virtual node I get the node that is hosting the witness disk (used to be quorum). This is ‘bad’ as my network admin and I discovered that we have better fail over functionality if we keep the witness disk on one node and SQL Server on the other node… .so I never log into the right node off the bat.
Odd settings (to me) that caused me grief for testing:
One ‘pain’ is the dreaded “Invalid SKU” error. Yup, when adding a 2nd node to my cluster I got this error when using the Setup.exe from a mounted ISO downloaded from MSDN. Well the ONLY way to deal with this is via the command line…
The bad news is that IF you were to make a mistake with the Username or password the setup takes 10 minutes before it errors out. Yup.. and it took me far too many tries to figure out WHAT exactly I wasn’t telling it right…
15 min retry time for network failover.. I was assured by my network admin this was a default setting. If so, it is not very good.. I mean really, I am not waiting 15 min if my NIC dies to fail over.
Another setting limits each application service to 2 fail overs during a 6 hour period. This hampers testing quite a bit. It also caused me to be very confused. Mordac the Network Admin unplugged both Network cables from the server (non-iSCSI) the node disappeared from site.. and yet NO FAIL OVER?!? Yeah, that wasted 4 hours.
Configuring DTC to use same disk as SQL Server. There are some SQL Server blogs that go over this in full detail ( ) What all the reading I did failed to say is that IF you were to add the DTC clustering service before SQL Server is installed, THEN move the DTC service under the SQL cluster service the resources say they have moved in the Fail Over Cluster administrator. HOWEVER the SQL Server cannot ‘see’ the disks.. I had to repair the installation to get that to work.
Once set up it worked like a champ. We added a couple of non-default dependencies but other than that nada.
I am now in the throes of upgrading some dated SQL 2000 databases, DTS packages and what not on the cluster. It is mildly interesting. Well, no, not really.
I’m not really looking forward to the next big step… Replication setup on the cluster. I’ve never replicated on Sql 2008, AND I only got my toes wet with SQL 2005 replication… so I’ll be interested in seeing if it is any better… well, I won’t be using my beloved merge replication either.. we will be doing transactional replication.. it should be much less complicated to setup…. But that is quite a bit later.