Administering ArcGIS for Server (2014)

Chapter 6. Clustering and Load Balancing

One of the vital features of ArcGIS for Server is load balancing. How a GIS server can take over in case others are busy or highly loaded is certainly a very important aspect of maintaining good response time. When a request is made to consume a GIS service, the Web server (whether dedicated or built in) keeps logs in the Server site. These include which GIS server is free and which one is not. The decision regarding which GIS server should get this request next is made accordingly. The server then executes the request efficiently using the optimization tools we discussed in Chapter 5, Optimizing GIS Services. The load balancing module is a closed box, which means you get to enjoy the experience of Server balancing the requests between your GIS servers but cannot peek under the hood and configure it. It would be really useful if Esri were to expose this part for us to play with; it did, however, enable us to tap into something really interesting—clustering.

In this chapter, we will discuss clustering technology, which has its benefits and limitations. By implementing clustering, you will see how easy it is to scale up a Server site and add machines. You will know how to group and categorize GIS servers based on their characteristics to ensure proper load balancing on your Server site. Despite its advantages, clustering does come with some limitations that we will discuss as well.

Clustering

For any service you publish on ArcGIS for Server, one or more instances will start on the GIS servers to represent that service. Each instance takes resources from the machine it is running on. The number of instances on each server can be configured when you publish the service. Each GIS service differs in terms of memory usage and processing consumption, and the same thing applies on the GIS servers. You might have different generations of servers with different specs and resources, so it makes sense to have some sort of distribution window by which you can specify which services will run on which servers. To manage this in an efficient way, Esri came up with a technique to group GIS servers into clusters and then permit you to configure which service goes to which cluster of machines. Clustering is an advanced technique that can prove to be of use if configured correctly. For instance, you have some unused workstations or some standard-issue PCs lying around in your inventory that you can format; make them fresh and ready, add them to your Server site, and place them into a commodity computing cluster. You can then assign those services with low priority to run on this cluster and free up your more powerful GIS servers to host services with higher affinity. Within a cluster, the GIS servers need to communicate with each other and update each other with vital information to help in the load balancing process. This communication happens using the Transmission Control Protocol (TCP) by default on a unique port. Each cluster gets assigned a dedicated port, and if there are any firewalls in place, that port on which the GIS servers communicate must be opened or an exception must be added to the firewall rules to allow the servers to exchange information freely.

Note

Commodity computing

Commodity computing is the use of a large number of available, average-power machines into a cluster to obtain high computing power at a lower cost.

Creating clusters

Before you start creating clusters, you have to determine what type of GIS services you possess. This is done by properly planning and designing GIS services, analyzing their nature, and predicting what kind of resources are required, which you have already done back in Chapter 4, Planning and Designing GIS Services. Once you identify your GIS services, you can decide what kind of clusters you want to create. You might not require any clustering at all; however, sometimes you need to group your GIS servers by some factors. GIS services can be grouped by resources and computing power, where you put the most resourceful GIS servers into one cluster and your typical ones in another cluster.

You can group your servers by security level; you can assign high-profile and sensitive GIS services running on servers with tightened cyber and physical security to a dedicated cluster. Some even create clusters by networking area, where servers within the same area network and subnet are grouped together and remote servers are put into a separate cluster. Of course, there is always the ownership factor to consider; you can group servers by owner, making them easily manageable.

Take a look at the following network diagram: there are two high-power servers—GIS-SERVER01 and GIS-SERVER02—connected directly to the database that hosts the GIS data—SDE-SERVER01. Another five PCs—GIS-PC01 to GIS-PC05—are connected to the database via a 1 Gbps Ethernet, and finally, to one powerful, cloud VPN-leased server— GIS-REMOTE01—in China with a 42 Mbps Internet bandwidth.

Creating clusters

All these eight GIS servers are joined to an ArcGIS for Server site and are load balanced. You have four services running on the Server site: Building, Parcels, Electricity, and Geoprocessing. Users are frequently experiencing slow performance on the overall services despite the high-spec configuration and networking setup, and the management is not happy to hear this, especially after spending a large sum of money purchasing and leasing servers. You were asked to solve the slow performance problem by first identifying the cause of the problem. On looking, you will find that load balancing is not intelligent enough to take into consideration the resources and networking factors. For example, if you run a geoprocessing task, you might get diverted to one of the commodity PCs, which are not designed for such tasks. To prevent this, the first thing we need to do is to create clusters for our machines and put each machine in the right cluster.

Log in to your ArcGIS Server Manager, activate the Site tab, and from the left-hand pane, click on Clusters. By default, there is always one cluster—default—which is created when you set up your Server site, and all the machines are placed into this cluster. Manager looks as follows:

Creating clusters

Adding machines to the default cluster

First, we need to create a new cluster for the five PCs. Click on New Cluster, and in the New Cluster form, type Commodity Computers in the Cluster Name field. You will notice that all the available machines get listed in the Machines box. In this case, you might not see any machine, and that is ok, since they are already assigned to another cluster default. Click on Create to add the new cluster. The New Cluster form now looks as follows:

Adding machines to the default cluster

Note that there are no machines registered on your new cluster, and this is expected because you didn't add any machine to this cluster yet. That is why we need to rearrange the machines. To do that, we need to edit the default cluster; we will show how to do this in a while. Our Clusters form now looks as follows:

Adding machines to the default cluster

Grouping machines by resources

Now that all machines are added to the default cluster, we need to group them into separate cluster by their resources, as we discussed earlier. Commodity Computers represents the five PCs that are already assigned to the default cluster: we need to remove those machines from the default cluster and assign them to the Commodity Computers cluster. To start editing the cluster, click on the edit icon next to the default cluster. In fact, let us remove all GIS servers from the default clusters and turn them into available machines so that we can easily assign them later.

From the Added Machines list, remove all servers, add them into the Available Machines list, and then click on Apply. The Edit Cluster Machines page looks as shown in the following screenshot.

Tip

Best practice

It is a good idea to implement clustering if you have three or more GIS servers on your Server site.