Perfil de RussRuss KaufmannBlogListas Herramientas Ayuda

Blog


19 junio

Tech-Ed and the High Availability Pre-Conference Session

I have learned over the years that a successful presentation depends on solid planning, good input from many sources, and preparation. So, what do you do when things go wrong despite all prepartions going right?

What do you do when:

  1. Three computers fail during the presentation and one of the three catches fire
  2. The computer used for displaying the PowerPoints reboots five times during the presentation
  3. The rack holding demonstration equipment makes tons of clacking noises as power spikes hit the PDU and force it to reset continuously
  4. The spot lights flicker on and off continuously
  5. There are seven to nine technicians on stage trying to fix everything during the presentation
  6. There are technicians replacing hardware during the presentation
  7. Demos have to be copied multiple times between computers because of hardware failures

Yes, it was challenging. Would you believe that it was still a great deal of fun and everyone that I saw during the rest of TechEd that was in the session said they still learned a great deal of information?

I am shocked that I didn't burst out in a tirade of obscene statements. [:D]

Somebody asked me if I would do it again knowing that the same circumstances would come up, and I said that I would.

Really, I had a great time, and it appears that the attendees were still happy despite all of the facility issues.

BTW, I heard that another Pre-Conference session was cancelled during the first few minutes because of problems that they had.

Windows Server 2008 Failover Clustering - Microsoft Official Courseware

Microsoft has released its first Windows Server 2008 course based on the RTM version. Lucky for us high availability geeks, it happens to be the coruse on Failover Clustering.

The course will be available May 15th, 2008. In the meantime, I strongly suggest everyone take a look at the syllabus for the class. You can find it here.

08 abril

CCR and Multi-Site Environments

I have been hearing more and more people talk about the virtues of using CCR with a node in each site. This talk has escalated now that Windows Server 2008 has released to manufacturing. With Windows Server 2008 Failover Cluster environments now have the ability to have nodes in multiple sites without having to use Virtual LANs (VLANs) to provide the networking support.

On the surface, CCR and Windows Server 2008 in a multi-site cluster sounds like the answer to many organization needs. Obviously, I am setting up the argument against this kind of implementation. OK, maybe it wasn't obvious to some of you. <G>

Anyways, here is a rough sketch (this means that lots of non-discussed components are not shown, i.e. CAS, DC/GC, DNS, etc.) of how this would look if you had two physical locations with them both being in the same AD site to support CCR. In the drawing, Node1 is the active node and replication traffic flows over the WAN link to Node2 which is the passive node. If you look at the drawing, you should immediately see some issues.

CCR - Multi Site

Consideration number 1. Where should you put the FSW? In this drawing, it is in the site on the left. Well, what if that is the site that goes down in a flood, tornado, meteor strike, or whatever? If the FSW is lost along with one of the nodes, there will not be an automated failover. OK, this is fixable since we can manually force the cluster to start, but it will impact life in the real world if there is a major disaster, especially if you lose your administrators along with the site. Make sure you document the process in your DR documentation as somebody else might need to perform the task.

Consideration number 2. How do you know which Hub Transport to use for the transport dumpster in order to back fill the surviving node? After all HT1 and HT2 are in the same AD site, which means that they would be used in a load balanced manner, so it is not possible to use one of them to provide full replay of lost transactions. Yes, you can hard code which HT to use, but that makes no sense to me in an HA environment as if you did that, you would lose the redundancy/load balancing functionality gained by having multiple HTs in a site. Of course, you might even have two in the same physical location. Also, let's say you hard code HT1 for the CMS and it is active on Node1. If you do that, then you lose the transport dumpster along with the location in the event of a major disaster. OK, so let's say you hard code HT2 for the CMS which is active on Node1. That would mean all of your traffic would be going across the WAN link, which is not exactly a good idea.

Consideration number 3. What about the use of the Wide Area Network (WAN) and its uncontrolled use by many different services? After all, if both physical locations are in the same AD site, will you have issues with clients logging on and authenticating across the WAN link? Will you have problems with the Clustered Mailbox Server (CMS) using the Hub Transport (HT) on the other side of the WAN link? What about the HT using the wrong Domain Controller/Global Catalog server and thus all of its queries being run over the WAN link? Again, you can hard code some of these settings for some applications and services, but even if you do that, there is again the issue of potentially losing redundancy/load balancing.

Consideration number 4. Using Windows Server 2008 and its multi-site improvements impacts DNS and resolution. For example, when Node1 is active, its VIP address is registered with the CMS name. If there is a failover, then the other VIP (for the physical location of Node2) must be registered within DNS and DNS updates needs to be replicated to all DNS servers in the organization. During the time of the updates and shortly after, there will be clients that have the old VIP address in its cache, so it will resolve incorrectly until the cache is updated on the clients. This is not an Exchange issue, but something else that should be considered.

So, what do I recommend? I am glad you asked that question. If you didn't, too bad, I will answer it anyways.

I highly recommend using CCR within a single physical site that is also an AD site. For disaster recovery reasons, I recommend using Standby Continuous Replication (SCR) to copy transactions to a remote site's Exchange mailbox server.

FYI, I updated based on some of Scott Schnoll's comments to me. Scott had some excellent points regarding my concerns listed above. I won't go through them one by one, but it basically came down to my making the assumption that CCR in a multi-site (stretched AD site) environment would be configured for automatic failover. I did make this assumption because if we were looking for a manual process that would require administrator intervention to get it up and running, then we should be talking SCR, not CCR. High Availability (HA) and Disaster Recovery (DR) are very different in my mind. HA means that processes are automated to reduce downtime to a minimal amount. DR is something that is done when there is a major disaster that requires steps to be taken to recover the environment. CCR is an HA technology and SCR is a DR technology, in my opinion.

28 enero

TechEd 2008 PreCon: High Availability Planning with Windows Server 2008

I just wanted to point out that there will be an excellent Pre-Conference session on Windows Server 2008 High Availability with several demonstrations at the upcoming TechEd in Orlando.

You can see the pre-conference sessions here: https://www.msteched.com/itpro/public/precons.aspx

I have multiple reasons for promoting this session:

PRC18 High Availability Planning with Windows Server 2008

Speaker(s): Manish Kalra
This pre-conference seminar is designed to help you build a highly available infrastructure for all your organizational needs. We cover the deep technical issues to be addressed and all the possible issues you could run into when designing a Messaging, Database, Virtualized, Web, and File/Print infrastructure
Reason Number 1: It will have lots of great information on Failover Clustering and using Failover Clustering with key applications such as Exchange and SQL.
Reason Number 2: While Manish Kalra is not the speaker, he is the Business Lead for High Availability for all of Microsoft, and he is responsible for making this seminar happen. Manish will absolutely be involved in the presentation, and will make himself available during the presentation time as well as during the entire time of TechEd.
Reason Number 3: You will get the best bang for the buck as you, the attendee, will get a chance to see several demos and get to hear lots of pointers on the proper way to deploy Failover Clustering as well as Network Load Balancing.
 
The real reason I am promoting this besides that it is great stuff on clustering? OK, I confess! I will be presenting the content of this session along with good friend Rodney R. Fournier. While I am no longer a Microsoft MVP (because I am now a Microsoft employee), you can still see my MVP Profile for some more info about me.
 
Updated: Feb 3, 2008. I also wanted to add that Microsoft will have some of its key players available during and after the Pre-Con seminar to answer questions. At this time, it looks like we will have some excellent hardware as part of the demo. I will provide more information as it is solidified.
03 agosto

eLearning - Windows Server 2008 Failover Clustering

Microsoft recently published a two hours online course on Failover Clustering for Windows Server 2008 (formerly known as Longhorn).

You can access this content here and for $39.99, you can spend as much time as you want for the next three years reviewing the content.

This course, Course 6051: Implementing High Availability and Virtualization in Windows Server 2008 is part of a larger group of courses that can be purchased as a group, or you can purchase this individual course separate from any other eLearning course.

Rod Fournier and I are working on very similar material, with much more depth for our ClusterHelp.com course. However, we will not be releasing it until Windows Server 2008 is released to manufacturing. Look for more information here as Windows Server 2008 comes closer to release. Great Deals @ Geeks.com!  

Windows Server 2008 Failover Clustering - Top Questions

The number one question by far at the cluster booth during TechEd was, "What are the differences between Windows Server 2003 Server Clustering and Windows Server 2008 Failover Clustering?"

The major differences that I can discuss off the top of my head include:

Service Account - There is no longer a need for a service account for the cluster service. It now uses the local system account.

Validate - The new validate tool tests the complete configuration and provides a report of items that need to be fixed or a nice report saying that the cluster passed. If the cluster passes the validate test tool, it will be fully supported by Microsoft.

Quroum - The quroum has changed considerably. I will discuss these changes in a blog in the next few days.

SAN Support - This also deserves its own blog entry. Basically, there is no longer a 2TB disk limit imposed by MBR formatting. 2008 supports GPT so you can get 16 Exabyte, theoretically, but really only about 160 TB or so based on today's tecnology. Also the nodes can use the virtual storage API to create LUNs if the storage vendor supports the APIs. Also, the new clustering technology uses persistent reservations, which is not supported by many current iSCSI implementations. Most iSCSI vendors are working to update to meet there requirements.

Configurable Heartbeat timeout - There is no longer a 500ms roundtrip limitation for heartbeat communications. 

OR - Networking - The addition of the OR logic is a huge change in my opinion as now we can use the OR with two TCP/IP addresses and the addresses can be from different subnets. This option enables geo clustering without the use of a VLAN. Yes, you got that right, no more need for a VLAN for geo clustering.

Migration - Yes, the new 2008 is x64. You can not combine 32 bit and 64 bit nodes, so you can't do a rolling upgrade from 2003 to 2008. All upgrades will have to be migrations.

Scoping - One of the complaints with 2003 was that if you had multiple file share virtual servers and they were on the same physical node, all of the shares would show up in all of the virtual servers. Now scoping allows control so these other shares are not visible.

Ease of Use - The ability to create a cluster group with its appropriate resources takes about 5 screens and you are done. It is incredibly simple to cluster resources now. I will provide a comparison in a blog in the next week or two.

The second most asked question appeared (yeah, yeah, I didn't bother actually tracking) was about Virtual Server 2005 R2 and whether virtual servers could be clustered. The answer is yes, they can be clustered. The process is called Host clustering since the host machines are clustered, and the virtualized servers can be moved (or failed) over to other hosts. You can read all about it at http://technet2.microsoft.com/windowsserver/en/library/9a3de6d0-c820-41ac-860c-de950d271f8d1033.mspx?mfr=true.TigerDirect  

Windows Server 2008 Failover Clustering - The New Quorum Model

ne of the big changes in Windows Server 2008 Failover Clustering is the new quorum model. In Windows Server 2003, we had only two choices, either the single disk quorum that has been around since NT 4.0 or Majority Node Set (MNS). Actually, there are three if you consider MNS with the File Share Witness (FSW) as a separate option.

In Windows Server 2008 Failover Clustering, administrators now have four choices on how to implement the quorum.

  • One option is to use Node majority. In this option, a vote is given to each node of the cluster and the cluster will continue to run so long as there are a majority of nodes up and running.
  • A second option is to use both the nodes and the standard quorum disk. In this option, a common option for two node clusters, each node gets a vote and the quorum, now called a witness disk) also gets a vote. So long as two of the three are running, the cluster will continue. In this situation, the cluster can actually lost the witness disk and still run.
  • A third option is to use the classic/legacy model and assign a vote to the witness disk only. This type of quroum equates to the well known, tried, and true model that has been used for years.
  • A fourth option, is, of course, to use the MNS model with a file share witness.

It has been a few days since I have seen the GUI, so I can't tell you off the top of my head which order they appear in within the GUI.

Two notes that caught my attention the other day when talking about these options is that it is not possible to use DFS as the file share witness and with changes to the quorum there aren't any checkpoints so there is no longer a need for the -resetquorumlog switch on starting the cluster service.TigerDirect

 

Windows Server 2008 Failover Clustering - Storage Changes

In a previous blog, I talked a little bit about some of the major changes to disk storage for Windows Server 2008 Failover Clustering. Now, while I wait for dinner to cook, is a good time to cover some of the changes.

  • 2TB Limit- The biggest change, in my opinion, has to be elimination of the Master Boot Record (MBR) requirement for clustered disks. MBR forced a limitation to the physical disk resource size of 2TB. Now, Windows Server 2008 supports the use of GUID (sometimes called Global) Partition Table (GPT) which allows up to 16 Exabyte. The practical limit is really around 160-200 TB based on today's technology. Keep in mind that just because you can do it doesn't mean you should do it. Can you imagine how long it would take to degrag a 100TB disk? How about reindexing it? How about chkdsk?
  • SCSI Bus Resets - In Windows Server 2003 Server Clustering, SCSI bus resets are used to break disk reservations forcing it to become disconnected so that another controller can take control of the disks. The problem with SCSI bus resets is that they require all devices on the same bus to lose their connection. These resets were not exactly warmly received by the disks that were impacted without any reason for it. In Windows Server 2008 Failover Clustering, SCSI bus resets are no longer used as persistent reservations are now required.
  • Persistent Reservations - WIndows Server 2008 Failover Clustering supports the use of persistent reservations. This means that directly attached SCSI storage will no longer be supported in 2008 for Failover Clustering. Serially Attached Storage (SAS), Fiber Channel, and iSCSI will be the only supported technology. However, not all vendors support persistent reservations, so this will be a problem as organizations move to Windows Server 2008 Failover Clustering without full and proper testing.
  • Maintenance Mode - In maintenance mode allows administrators to gain exclusive control to clustered disks
  • Disk Signatures - In Windows Server 2003 (and earlier), administrators would come to our classes at http://www.clusterhelp.com/ and would cringe whenver we talked about disk signatures. I even had one burst into tears from reliving past stress. OK, maybe not real tears. Anyways, clustering used disk signatures to identify each clustered disk. The disk signature, at sector 0, is often an issue in disaster recovery scenarios (in class I show how to totally get around disk signature issues). To make it easier, Failover Clustering makes use of SCSI Inquiry Data is often written to a LUN by a SAN. In the event the disk signature gets out of whack, the disk signature can be reset once the disk has been verified by the SCSI Inquiry Data. If for some reason, both the disk signature and the SCSI Inquiry Data are not available or are misconfigured/corrupted, an administrator can a Repair button is available on the Physical Disk Resource properties general page. The Repair button can be used to point the actual disk to the resource.
  • Disk Management - The virtual disk service (VDS) APIs first became available with 2003 R2. Using the disk tools in Windows Server 2003 R2 (and now 2008), an administrator can build, delete, and extend volumes in the SAN.

All in all, there have been some pretty significant changes when it comes to the way Windows Server 2008 Failover Clusters work with disk storage. TigerDirect  

Standby Continuous Replication for Exchange Server 2007

As previously discussed, when SP1 for Exchange Server 2007 ships, it will include some new technologies, too. One is Standby Continuous Replication (SCR). I am completely psyched by this new technology.

Myself, I see SCR as the perfect remote site Disaster Recovery solution for Exchange Server 2007. What would make it the perfect solution would be having a hot site available for the implementation.

Please read more about SCR on the Exchange Team's blog. Scott Schnoll wrote a wonderful post about it last week. I am sure you will love it, too.Buy.com  

Building a Windows Server 2008 Failover Cluster - finally!

I have been dying to get my first Windows Server 2008 Failover Cluster built. Of course, I want to do it on the cheap. That means using virtualization. The problem is that Virtual Server 2005 R2 does not provide support for Serially Attached SCSI (SAS) and there is just a complete dearth of virtualized SANs out there with virtualized HBAs. :)  OK, that isn't going to happen anytime soon.

So, my first Failover Cluster didn't have any shared disks. That makes it pretty worthless for testing.

Microsoft, though its acquisition of Stringbean Software (WinTarget), put an iSCSI target into the new version of Windows Storage Server, but it has not been released and is not available outside of Microsoft. So, Microsoft has been the only real source of testing outside of using real hardware.

Well, our partner, Rocket Division Software, is in the final stages of upgrading their iSCSI target software (Starwind) to support persistent reservations. It works wonderfully, so it should be released pretty soon. The only problems I have found have to do with a user interface issue that they already know about.

Anyways, now I have my first Failover Cluster using iSCSI. It passed the validate tests and runs like a charm.

Keep an eye out for Rocket Division's release and then you will be able to try it, too.Don't be a Dork... Shop at The Geeks!  

Windows Server 2008 Default Share Permissions

 

I was working on the June CTP for Windows Server 2008 when I created a basic file share. One of my students, looking over my shoulder, asked me to check the default share permissions.

  • Windows 2000 Server, the default is Everyone with Full Control
  • Windows Server 2003, the default is Everyone with Read

In Windows Server 2008, there are no default permissions. The only permission that is there when you create the share is the Administrator with Ownership, but that is it. When you create the share, it bring up the window to create the permissions right away.

I tell you, the more I see of Windows Server 2008, the more I like it.

Windows Server 2008 Failover Clustering Virtual Lab

Microsoft posted a virtual lab today for Failover Clustering. This lab walks through the following procedures:

  1. Adding a second node to a cluster
  2. Creating a File Share in the cluster
  3. Creating a Print Share in the cluster
  4. Configuring Failback Policies
  5. Installing WINS in the cluster
  6. Installing DHCP in the cluster
  7. Bonus - Adding Physical Disk resources to the cluster

You can get to the virtual lab here: http://msevents.microsoft.com/CUI/WebCastEventDetails.aspx?EventID=1032345932&EventCategory=3&culture=en-US&CountryCode=US 

The lab should take about an hour to an hour and a half to complete. That, of course, depends on how many times you get interrupted with the kids wanting you to take them to the store to buy candy.

I went through part of the lab, and it looks like it will work just fine for everyone except it will not pass validate. This is because of an issue that seems to have come up in the June CTP.

BTW, I recorded a step-by-step today using Camtasia. I will probably put it someplace on the web for download early next week. Keep an eye out for it.

 

18 julio

Upper and Lower and Mixed Case in SQL Cluster

I had a well known geek in my cluster class last week. Ben Miller, a former Microsoft MVP lead and SQL expert sat our cluster class, www.clusterhelp.com.
 
During our many conversations, Ben told me about a problem he ran into recently in SQL Server 2005 clustering. Basically, the issue is that the way the server names were recognized differed depending on what tool pulled the name since the different applications did not pull the name from the same place in the registry. So, if one node was all upper case, and the other node was mixed case or lower case, clustering would install and work just fine. However, SQL Server 2005, which pulls the name from a completely separate key in the registry, does not install properly unless both nodes are all upper case.
 
Ben says he is going to blog more about the issue after he has a chance to do some further testing and documents the issue completely. During class, he was able to replicate the problem with complete predictability.
 
See Ben Miller's blog for more detail.
 
 
Don't be a Dork... Shop at The Geeks!
12 junio

TechEd 2007 - Orlando

I am still in Orlando trying to figure out how I missed my plane. Oh well, there is always tomorrow.
 
Anyways, putting that aside, it has been a fantastic week. It was great to see old friends again, it was great to see current friends, and it was fantastic meeting some new people that I have been dying to meet. For example, Evan Dodds has always been a great person when it comes to getting information on some off-the-wall attribute setting, or even more mainstream stuff. Evan has been great to talk to on the phone and via email, but I have never met him until this last week. Evan is definitely a great guy, and I am glad to say that I finally met him and did my best to not blow cigar smoke on him. Another person that I have been wanting to meet for a few years is Eileen Brown. I was walking around the Technical Learning Center (Yellow) area saying hello to several people that I know when I was accosted by Jane. OK, accosted is a bit strong, but she stopped me to introduce herself. While talking, I asked her if she had seen Eileen around at all as I as dying to meet her. About 20 minutes passed, and poof, there was Eileen. She is just like I imagined her. What a wonderful lady and just chock full of knowledge.
 
OK, side question to over 50% of the world, "Why are there so few women in the IT field?"  The question is addressed to women in general. I just don't get why there are so few women in the field, when women have been proven to excel in the field without having to have any specialized scientific or math skills.
 
I have to admit that this was a strange TechEd for me as I didn't seem to like any of the parties and spent many evenings with close friends smoking cigars by the pool of one hotel or another. I was a little ticked that I missed out on a meeting with the MCTs and Ken Rosen which was basically a goodbye roast for Ken. I missed out on that one because I was working the booth for Windows Server 2008 Failover Clustering. We had a great time at the booth helping all sorts of people. One question that I dreaded was the, "I know nothing about clustering, can you tell me everything I need to know?"  This has got to be the worst question to answer as I can go on for several days, but I wasn't able to get some people to be more specific. My favorite question was the, "So, what is different between Windows Server 2003 Server Clustering and Windows Server 2008 Failover Clustering." At least this one could be answered with some basic information and why the changes are important.
 
I am already looking forward to next year. Bring on the geeks!
20 febrero

Fibre Channel Information Tool (fcinfo)

Released just yesterday (Feb19th, 2007), this tool can be run from Windows Server 2003 or Windows 2000 to enumerate disks and configuration information for attached SANs.

Download it here: http://www.microsoft.com/downloads/details.aspx?familyid=73d7b879-55b2-4629-8734-b0698096d3b1&displaylang=en&tm

06 febrero

Cluster Prep

YES! It is finally released to the public. The Microsoft Cluster Configuration Wizard released today to production.

Clusprep (Clus Prep), as it is known with affection, can be used to test the configuration before configuring for clustering. The tool can be installed on either node or another computer altogether. It should be installed on a 32 bit server, however, even then it can still inventory and test the configuration of both 32 bit and64 bit systems. Clusprep tests the hardware configuration and evaluates the OS, patches, and hot fixes.

While clusprep is not 100% fool proof, if the potential nodes all pass through the tool properly, you can be pretty confident that clustering will configure without any issues.

Good luck everyone!

14 enero

Cluster Training in London

The contract is finally complete. ClusterHelp.com will be partnering with Global Knowledge in the UK to provide a cluster training class. Come join us March 6-9, this year.

Global Knowledge issued this press release today: http://www.trainingpressreleases.com/newsstory.asp?NewsID=2487. The actual location has not yet been selected. I have requested that it be someplace close to Heathrow airport to make it easier for those coming from other countries in Europe. It may end up in downtown London, though, depending on availability and classroom size.

So, anyone in Europe that would like to attend the training provided by ClusterHelp.com can now save some money and attend the class in the UK instead of flying across the pond. We are all excited about doing this class and hope to fill up the classroom like we do in New York at Netlan and in Denver at Ameriteach.

I am really looking foward to this trip.

19 octubre

DFS and Clustering

There seems to be some confusion around the Distributed File System (DFS) and Windows Server 2003 server clustering.

First, let's look at the terms for DFS so we start from the same foundation in this discussion.

  • DFS Root - Think of this as the name space or share name. This is the name that you connect to as a client computer. Underneath the root are the many different folders and files that may be on a single server or may be distributed around to multiple servers. There are two types of roots. There are domain based roots and there are server based roots. Clustering only supports server based roots. Since a domain based root can be created on multiple servers, it is more highly available. A server root does not have that capability as it can only be created on a single server.
  • DFS Link - This is a "leaf" type of object that goes under the root. For example, the root (let's call it CompanyFiles) may be hosted on Server1, but Server2 may have a file share space (call it accounting) that is linked under the root. Once linked, you can access the accounting share two ways. 1. You can connect to the UNC path of \\server2\accounting or 2. You can connect to the DFS at \\CompanyFiles\Accounting or even browse from \\CompanyFiles and find the leaf object of Accounting underneath it.
  • DFS Replica (also called a target) - This is another root or even possibly a link in another DFS tree. What we can do is we can use the source and target information and build replicas so that certain leaf objects or entire DFS trees can replicate between locations.

OK, granted this is just some very high level and basic information, but let's get rolling with it. What does this all have to do with clustering?

Clustering is used to achieve high availability for certain resources. As a business requirement, we may be told to provide solutions that can help us achieve our goals for the company. One of the requirements is to make certain files highly available as they are needed all the time to keep the business running smoothly.

We can achieve our goal a few different ways:

  1. We can use DFS and replicas to make copies of the file structures that we deem to be extremely important. The problem with using DFS in this manner is that if there is a great deal of change, the replication process may not be efficient enough to keep up. It is not a good idea to use DFS replication in cases where there is constant change. DFS replication works wonderfully where this is little change, i.e. like hosting application source files and drivers.
  2. We can use server clustering and create file share resources hosted in our cluster environment. One of the nice options of using file server clusters is that we can use the cluster to host a server based DFS root. Because the root is held in the cluster, it makes it highly available. Also, any data stored on the file share on the cluster is also highly available. The value of using DFS in this implementation is that the name space and the link are highly available and we can connect to many links around the organization to help build an easy to navigate file server structure while only hosting the most important files on the cluster itself.
  3. We can deploy a domain root and use it to link to a server cluster running a file share resource and use it as a leaf in our domain root DFS. The domain root can be made highly available because it can be built on multiple servers, and the most important of our files can be hosted on the server cluster file share.

It is important to note that clusters can not host domain roots. They can only host server roots.

Damn, I hope I got that right. If not, email me.

DNS Round Robin and IIS

Open up the attachment below. It demonstrates how DNS round robin works.

DNS Round robin is a common solution for enabling load balancing for Internet server farms. Consider the following example in which there are three IP address entries for the same host name on a DNS server. 

In DNS, there are three entries:

   192.168.1.100   WebApp1

   192.168.1.101   WebApp1

   192.168.1.102   WebApp1

You can also replicate this example some time just by playing with your own DNS server. If you create three host records with the same name but with three different IP addresses, you will have implemented DNS round robin as a solution. What happens is that the first client receives the first address, the second client receives the second address, the third client receives the third address, the fourth client receives the first address, and they continue to loop. Using DNS round robin, it is possible to spread the load among multiple servers.

The problem with round robin DNS, is that it is completely unable to handle a down server. In the event one of the servers fail, its address will continue to be given to clients and a portion of the clients will basically be pointed to an invalid address and a portion of the clients will fail to connect.

 

Round Robin DNS is not a high availability solution.

DNS Round Robin and File and Print Servers

A common question comes up in the public newsgroups on windows clustering all the time. "Can I use DNS round robin to provide high availability for printers or file shares."

The answer is usually, "No."

The reason is that NetBIOS names are used for these types of connections and the client must know the NetBIOS name of the target server. So, when you try to connect to a UNC path, i.e. \\servername\sharename, this is treated the same as if you were to run the Net command, i.e. "net use * \\servername\sharename" to connect. The * in the command is normally replaced with a specific drive letter and then the drive is mapped. The result is that an attempt is made to resolve the name using normal NetBIOS resolution methods. If those processes fail, then it is possible to use DNS for the resolution if the "Enable DNS for Windows Name Resolution" check box is enabled.

Another reason that you would not want to use DNS round robin for highly available printer or file shares is that DNS resolution of the name to IP address will continue even if the server is not available. For example, if two client computers connect using DNS round robin, one will get the first IP address of the first server, and the second client will get the IP address of the second server, and so on. If the second server goes down, DNS will continue to supply every other client request with the IP address of the server that is down. What you get instead of highly available resources, what you get is halfly available resources. I wonder if I can trademark that term, "halfly available" for this kind of discussion. Of course, the fraction of failures will change depending on the number of servers used.

If your organization needs highly available file resources, you can look at a couple of solutions. First would be DFS with DFS replicas and the second would be server clustering. For highly available printing solutions, server clustering should be your main focus.