Implementing Cloud Storage with OpenStack Swift (2014)

Chapter 6. Choosing the Right Hardware

Users who utilize OpenStack Swift as a private cloud will be faced with the task of hardware selection. This chapter walks you through all the hardware you need to select, the criteria to be used, and finally a vendor-selection strategy. If you are using a public cloud, the only hardware you can select is the cloud gateway so you can skip this chapter.

The hardware list

The list of minimal hardware required to implement Swift is as follows:

Item	Description
Storage servers	These are physical servers that run the object server software and generally also run the account and container server software. Storage servers require disks to store objects.
Proxy server(s)	These are physical servers that run the proxy server software. At least one is required.
Network switch(es)	Chapter 3, Installing OpenStack Swift, describes the various networks required. At a minimum, one switch is required.

The following is a list of optional hardware that may need to be purchased:

Item	Description
Account servers	For large installations where container listings and updates are overwhelming the storage servers, separate account servers may be needed.
Container servers	For large installations where object listings and updates are overwhelming the storage servers, separate container servers may be needed.
Auth servers	For large installations where user authentication is overwhelming the proxy servers, separate auth servers may be needed.
JBODs	For installations where disk density is important, a storage server may be connected to a JBOD (just a bunch of disks) using a SAS connection to increase the disk density.
Load balancer / SSL acceleration	This is useful to provide a single IP address for the entire cluster (there are software mechanisms to accomplish this as well, but these are not covered in this book). The SSL functionality in the load balancer offloads software SSL in the proxy server.
Firewall and security appliances	For public, community, and some private networks, firewall and security appliances such as intrusion detection/prevention may be required depending on your company's security policies.
On-premise cloud gateway	To adapt applications that have not ported to the REST HTTP APIs yet, you will need a protocol translation device that converts a familiar file and blocks protocols to REST APIs. This device is called a cloud gateway and is the only piece of hardware that you may need even with a public cloud.

To complicate things even further, each server has the following numerous design elements to configure:

· CPU performance: The CPU performance is specified in terms of the number of processors and number of cores/processors. This has the most direct impact on the server's performance.

· Memory: The next important consideration is the amount of DRAM memory, which is specified in GB.

· Flash memory: Flash memory is another critical performance consideration and is typically in the TB range.

· Disk/JBOD: For storage servers, you need to specify the number of disks and types of disks (interface, speed, rating, and so on). These disks could be in the server, connected via a JBOD, or a combination.

· Network I/O: A server needs network I/O connectivity via a LAN-on-motherboard (LOM) or an add-on network interface card (NIC). This is typically 1 Gbps or 10 Gbps in terms of speed.

· Hardware management: Servers vary widely in hardware management features, starting with rudimentary monitoring only through the operating system, OS independent IPMI, to sophisticated remote KVM and remote storage.

The hardware selection criteria

Clearly, the universe of hardware to choose from and the elements within each server are mind boggling. Furthermore, the ratio of proxy to account to container to storage servers is yet another complication. Before we go through the systematic selection criteria, you need to determine the following characteristics about your environment:

· Point of optimization for your environment: You will need to decide whether you care more about performance or cost.

· Scale: Scale also has a huge impact on hardware selection. For simplicity, let's say small is in the hundreds of TB range, medium is in the PB range, and large is in the tens of PB range and beyond. You will need to determine what range you are in.

The process for choosing hardware is as follows.

Step 1 – choosing the storage server configuration

For small and medium installations, the storage server can include the object, account and container server software. For large installations, we would recommend a separate account and/or container servers. For performance-optimized clusters, the aggregate disk performance must match the total performance of other server components (CPU, memory, flash, and I/O). For cost-optimized clusters, the disk performance can exceed the performance of other components (in other words, saving money to throttle performance). In fact, consider attaching JBODs to really get great disk density.

A higher disk density also results in slight reliability degradation since a node failure takes longer to self-heal, and two additional failures (if you have three copies) have a slightly higher probability of occurring during this longer duration. Of course, the probability of two failures occurring in one self-heal window is very low in both cases. The following figure denotes a storage server with disks (of course, an optional JBOD may also be connected to it):

Step 1 – choosing the storage server configuration

The OpenStack configuration guide (http://docs.openstack.org/havana/install-guide/install/apt/content/object-storage-system-requirements.html) recommends the following server specifications:

· Processor: Dual quad-core.

· Memory: 8 to 12 GB.

· Network I/O: 1 x 1 Gbps NIC. Cost permitting, our recommendation would be to go beyond the official recommendation and use 10 Gbps.

RAID should not be turned on due to performance degradation (there is an exception: if you want to ensure consistency even with a full power loss, you may need to consider RAID).

Finally, a key consideration is the type of disk: enterprise or desktop. Within enterprise disks, there are 15K, 10K, or 7.2K rotations per minute (RPM) drives and a variety of capacity configurations. For small and medium installations, you might want to consider enterprise drives as they are more reliable than desktop drives. Most small and medium installations are typically not set up to deal with the higher failure rate of desktop drives. The performance and capacity that you choose for an enterprise drive obviously depends on your specific requirements.

For large installations that are also very cost-sensitive, you may want to consider desktop drives. The density of desktop drives (up to 6 TB at the time of writing) also contributes favorably to large installations. In addition to the reliability, desktop drives are not specified to be able to run 24 x 7. This means that your IT staff has to be sophisticated enough to deal with a large number of failures and/or spin down drives to conform to the specification.

Step 2 – determining the region and zone configuration

Next, we need to decide on regions and zones. The number of regions stems from the desire to protect data from a disaster or to be closer to the sources that consume data. Once you have decided on the number of regions, pick the number of zones for each region. You need to have at least as many zones as replicas. We would recommend no less than three zones and Rackspace recommends five (http://docs.openstack.org/havana/install-guide/install/apt/content/example-object-storage-installation-architecture.html). Small clusters may be fine with four. Please refer to Chapter 2, OpenStack Swift Architecture, for a refresher on the definition of regions and zones.

Step 3 – choosing the account and container server configuration

As previously mentioned, unless you are installing a large configuration, you don't need to worry about a separate account and container servers. For a separate account and/or container servers, you need to ensure that the SQLite performance is adequate to meet your database listing and update needs by selecting the right amount of memory and flash. The OpenStack configuration guide recommends the following specifications (you may be able to reduce the requirements based on your cluster's size and performance requirements):

Step 3 – choosing the account and container server configuration

An optional account and a container server

· Processor: Dual quad-core.

· Memory: 8 to 12 GB.

· Network I/O: 1 x 1 Gbps NIC. Cost permitting, our recommendation would be to go beyond the official recommendation and use 10 Gbps.

· Flash: Not specified. This depends on user's performance requirements.

Step 4 – choosing the proxy server configuration

In general, the proxy server needs to keep up with the number of API requests. As discussed in Chapter 2, OpenStack Swift Architecture, additional middleware modules may also be running on the proxy server. Therefore, the proxy server needs a performance level that can keep up with this workload. Using a few powerful proxy servers as opposed to a large number of "wimpy" servers was proven to be more cost-effective by Zmanda (http://www.zmanda.com/blogs/?p=774). The OpenStack configuration guide seems to concur, and recommends the following specifications:

Step 4 – choosing the proxy server configuration

A proxy server

· Processor: Dual quad-core.

· Network I/O: A 1 x 1Gb/s NIC. Our recommendation would be to have at least two NICs, one for internal storage cluster connectivity and one for client (API) facing traffic. Cost permitting, our recommendation would be to go beyond the official recommendation and use 10 Gbps at least for internal storage cluster connectivity. Also see the related SSL discussion in the Step 7 – choosing additional networking equipment section that affects network I/O.

If your proxy server is running a lot of middleware modules, consider moving some of them to dedicated servers. The most common middleware to be separated is the auth software.

Step 5 – choosing the network hardware

There are three networks mentioned earlier—client (API) facing, internal storage cluster, and replication. See Chapter 3, Installing OpenStack Swift for an architecture view of the three networks. This might be a combination of 1 Gbps, 10 Gbps, or hybrid 1/10 Gbps ethernet switches. The following are some performance-related sizing techniques:

· Client facing network: The throughput requirement of the overall cluster dictates the network I/O for this network. For example, if your cluster has 10 proxy servers and is sized to satisfy 10,000 I/O requests per second of 1 MB size each, then clearly, each proxy server needs 10 Gbps network I/O capability.

· Internal storage cluster: The network requirements depend on the overall cluster throughput and size of the cluster. The size of the cluster matters since it will generate a large amount of postprocessing software component traffic (see Chapter 2, OpenStack Swift Architecture). As mentioned, cost permitting, we recommend the use of a 10 Gbps network.

· Replication network: This depends on the overall write throughput and the size of the cluster. For example, if you expect 1,000 write requests per second of 1 KB each, a 10 Mbps network might just work.

An additional consideration is the durability model. Since network switches take down entire zones or regions, unless you can service the failed switches rapidly, you might want to consider dual-redundant configurations. The following figure denotes a network switch:

Step 5 – choosing the network hardware

A network switch

Step 6 – choosing the ratios of various server types

After selecting the individual server configurations, the ratios of different server types have to be chosen. Since most configurations will have only two types, that is, proxy and storage, we will only discuss the ratios of these two. According to work done by Zmanda, the proxy server should neither be underutilized nor overutilized. If the throughput of one storage server is 1 Gbps and that of the proxy server is 10 Gbps for example, then the ratio is 10 (this simple calculation applies to large objects dominated by throughput. For smaller objects, the calculation needs to focus on the number of requests).

Instead of buying hardware piecemeal, this ratio exercise allows a user to define a "unit" of purchase. The unit may be a full rack of hardware, multiple racks, or a few rack units. A unit of hardware is orthogonal to Swift zones, and typically you would want each unit to add capacity to every zone in a symmetric fashion. Each unit can have a set of proxy servers, storage servers, network switches, and so on defined in detail. Scaling the Swift cluster as data grows becomes a lot simpler using this technique of purchase. As mentioned earlier, you need to start with at least two proxies to provide for adequate durability.

For example, assume you want to grow your cluster in roughly 1 PB raw storage increments, with dense configurations. You might consider a unit of hardware with one proxy server, 2 x 10 Gbps switches, one management switch, and five storage servers with 60 drives of 4 TB each (that is, 240 TB x 5 = 1.2 PB). Given the previous comment regarding the need for at least two proxy servers, the initial installation would have to be 2.4 PB. With triple replication, the 1.2 PB raw storage translates to 400 TB usable storage. Thisexample is not perfect because it may not fit cleanly within the rack boundaries, but it is meant to illustrate the point.

Step 7 – choosing additional networking equipment

The final step is to choose the load balancer, SSL acceleration hardware, and security appliances. A load balancer is required if there is more than one proxy node. Furthermore, you need to ensure that the load balancer is not a performance bottleneck. SSL hardware acceleration is required if most of the traffic is over secure HTTP (HTTPS) and the software SSL operation is overwhelming the proxy servers. Finally, security appliances such as IPS and IDS are required if the cloud is on the public Internet. Similar to the load balancer, these additional pieces of hardware must have enough performance to keep up with the aggregate proxy server's performance. The following figure denotes additional networking equipment needed for your Swift cluster:

Step 7 – choosing additional networking equipment

Additional networking equipment

Step 8 – choosing a cloud gateway

This piece of equipment is the odd man out. It is not required to build an OpenStack Swift cluster. Instead, it is needed on premise (in case of a public cloud) or near the application (in case of a private cloud) if your application has not yet been ported to REST HTTP APIs. In this situation, the application is expecting a traditional block or file storage, which is the interface exposed by these cloud gateways. The gateway performs protocol translation and interfaces with the cloud on the other side. In addition to protocol translation, cloud gateways often add numerous other features such as WAN optimization, compression, deduplication, and encryption.

While most of this section has dealt with performance, there are other considerations as well, and these are covered in the next section.

Additional selection criteria

In addition to the previous criteria, the following items also need to be considered before finalizing hardware selection:

· Durability: Durability is a measure of reliability and is defined as 100 percent minus the probability of losing a 1 KB object in one year. Therefore, 99.999999999 percent durability (simply stated as 11 x 9 in this case) would imply that every year, you statistically lose one object if you have 100 billion 1 KB objects, or given 10,000 objects, expect a loss of a one object every 10,000,000 years. Calculating the durability of a Swift cluster is outside the scope of this book, but the selected hardware needs to meet your durability requirements. For users that require a high level of durability, low density enterprise-class disk drives, servers with dual fans and power supplies, and so on, are some considerations.

· Availability: Availability is defined as the percentage of time that requests are successfully processed by the cluster. Availability mostly impacts frontend network architecture in terms of having a single network (that is, a single point of failure) versus dual-redundant networks. As mentioned earlier, networks in a given zone can be single points of failure as long as your IT staff have the ability to troubleshoot them quickly.

· Serviceability: The serviceability of various pieces of hardware depends heavily on your strategy. If you choose fail-in-place (typically for large installations), serviceability is not a big concern. If you choose a repair/servicing strategy (typically for small and medium installations), serviceability is a concern. Each device should lend itself to repair or servicing. A smaller scale installation may also force the choice of more expensive hardware in items of dual-redundant fans, power-supplies, and so on. The reason is that if there is a failure, there simply will not be too many back-off devices available for the Swift ring to choose from.

· Manageability: As previously discussed, servers come in all different types of flavors when it comes to hardware management and associated software. You should choose servers with management features that match your overall IT strategy.

The vendor selection strategy

If you really want to be like a web giant, you should buy hardware from ODMs and other commodity hardware manufacturers (either directly or through a systems integrator). However, in reality, the decision is not that simple. The questions you need to ask yourself are as follows:

Question	Yes for all questions	No for even one question
Can you specify the configuration of each server taking into account performance, durability, availability, serviceability, and manageability (versus needing vendor sales engineers to help)?	You are ready for commodity hardware!	You should stick with branded hardware.
Can you self-support (that is, if you get a 2 a.m. call, are you prepared to root-cause what happened versus calling the vendor)?
Are you prepared to accept less sophisticated warranty, lead-times, end-of-life policies, and other terms?
Can you live with minimal vendor-provided hardware management capabilities and software?

Branded hardware

If you choose branded hardware, the process is fairly simple and involves issuing RFQs to your favorite server manufacturers such as HP, Dell, IBM, and FTS or to networking manufacturers such as Cisco, Juniper, and Arista, and choosing the one you like.

Commodity hardware

If you go down this route, there are numerous manufacturers to consider—Taiwanese ODMs and other storage hardware specialists such as Xyratex and Sanmina. Perhaps, the most interesting option to look at is an open source hardware movement called theOpen Compute Platform (OCP).

According to their website, http://www.opencompute.org, OCP's mission is to design and enable the delivery of the most efficient server, storage, and datacenter hardware designs for scalable computing. All of OCP's work is in the open source. A number of manufacturers sell OCP-compliant hardware, and this compliance makes it somewhat simpler for users to choose consistent hardware across manufacturers.

The OCP Intel Motherboard Hardware v2.0, for example, supports two CPUs, four channels of DDR3 memory per CPU, a mini-SAS port for potential JBOD expansion, 1 Gbps network I/O, and a number of hardware-management features. It can also accept a PCIe mezzanine NIC card for a 10 Gbps network I/O. This server would be suitable for both the proxy and storage server (with different items populated).

The OCP OpenVault JBOD, as another example, is a 2U chassis that can hold up to 30 drives. This would make it a suitable companion for dense storage servers.

Summary

In this chapter, we have looked at the complex process of selecting hardware for an OpenStack Swift installation and the various trade-offs that can be made. In the next chapter, we will look at how to benchmark and tune our Swift cluster.