Cloud Computing, Amazon Web Services, and Their Impacts - Programming Elastic MapReduce (2014)

Programming Elastic MapReduce (2014)

Appendix B. Cloud Computing, Amazon Web Services, and Their Impacts

Though cloud computing was originally conceived in the 1960s by pioneering thinkers like J.C.R. Licklider—who thought computing resources would become a public utility like electricity-—it has only been recently with the start of AWS in 2006 and Windows Azure in 2008 that we have seen businesses seriously moving many of their core services outside of private data centers. There have been many discussions and descriptions about what cloud computing is and its value to businesses. However, in general we characterize it as a set of computing resources, CPU, memory, disk, and the like that is available to an end user and the interactions that user has with these resources.

AWS Service Delivery Models

There are a number of delivery models for cloud services and how the end user accesses these resources in the cloud. We will focus on the delivery methods specific to AWS and the resources used in this book for Elastic MapReduce.

Platform as a Service

Platform as a Service (PaaS) allows the deployment of custom-built applications within the cloud provider’s infrastructure. Elastic MapReduce is an example of an Amazon cloud service that is delivered as a PaaS. As a user, you can deploy a number of preconfigured Amazon EC2 instances with the EMR software preinstalled. You can specify the compute capacity and memory for these instances, and have access to make configuration changes to the EMR software. Amazon takes care of much of the customization needed for the EMR software to work in its data center and with other Amazon services. As a user of EMR, you can tune the configuration to your application’s needs and install much of application through Amazon’s APIs and tools.

Infrastructure as a Service

Infrastructure as a Service (IaaS) is probably the simplest cloud delivery method, and one that seems most familiar to many professionals that have developed solutions to run in private data centers. As a consumer of IaaS services, you have access to computing resources of CPU, memory, disk, network, and other resources. Amazon’s EC2 is an example of a cloud service delivered in the IaaS model. You can specify the size of an EC2 instance and the operating system used, but it is up to you as a consumer of an EC2 instance to install OS patches, configure OS settings, and install third-party applications and software components.

Storage as a Service

Storage as a Service (SaaS) allows you to store files or data in the provider’s data center. Amazon S3 and Amazon Glacier are the storage services we use throughout this book. Amazon charges on a per-gigabyte basis for these services and has replication and durability options.

We have discussed some of the benefits of AWS throughout the book, but we would be remiss if we did not cover many of the key issues businesses must consider when moving critical business data and infrastructure into the cloud.

Performance

Performance in cloud computing can vary widely between cloud providers. This variability can be due to the time of day, applications running, and how many customers have signed up for service from the cloud provider. It is a result of how the physical hardware of memory and CPU in a cloud provider is shared among all the customers.

Most cloud providers operate in a multitenancy model where a single physical server may run many instances of virtual computers. Each virtual instance uses some amount of memory and CPU from the physical server on which it resides. The sharing and allocation of the physical resources of a server to each virtual instance is the job of a piece of software installed by the cloud provider called the hypervisor. Amazon uses a highly customized version of the Xen hypervisor for AWS. As a user of EC2 and other AWS services, you may have your EC2 instance running on the same physical hardware as many other Amazon EC2 customers.

Let’s look at a number of scenarios at a cloud provider to understand why variability in performance can occur. Let’s assume we have three physical servers, each with four virtual instances running. Figure B-1 shows a number of virtual instances running in a cloud provider.

Physical servers in the cloud with no hypervisor vacancies

Figure B-1. Physical servers in the cloud with no hypervisor vacancies

Multiple customers are running on the same physical server and kept separated virtually by the hypervisor. In Figure B-1, Physical Computer A has four virtual instances running with Customer B, C, and D running at 100% utilization. Physical Computer B does not have the same load profile with only one instance, Customer A, running an instance at 100% utilization. Physical Computer C does not have any instances with high resource utilization and all instances on this computer are running at 25% or less utilization. Even though Customer A has virtual instances running at low utilization on server A and server C in this scenario, the software running on server A may run noticeably slower than the software on server C due to the high load placed on the server by other virtual instances running on the same physical hardware. This issue is commonly referred to as the “noisy neighbor” problem.

We know that cloud providers rarely run at 100% utilization and due to the elasticity provided in cloud infrastructure, vacancies on an individual server would occur from time to time. Figure B-2 shows the same physical servers at a later time.

Physical servers in the cloud with three hypervisor vacancies

Figure B-2. Physical servers in the cloud with three hypervisor vacancies

Now a number of vacancies have appeared due to some customers turning off excess capacity. The software on server A may now be performing significantly better and may be similar to the performance of server C because server A now has a 50% vacancy in its hypervisor.

This is an initial shock to many businesses that first move to the cloud when they are accustomed to dedicated physical servers for applications. AWS provides a number of ways to tailor cloud services to meet performance needs.

Auto scaling

Amazon allows you to quickly scale up and down additional instances of many of its AWS services. This allows you to meet variable traffic and compute needs quickly and only pay for what you use. In a traditional data center, business have to estimate potential demand, and typically find themselves purchasing too much or too little capacity.

Multiple EC2 configuration options

Amazon has a wide variety of configurations for EC2 instances. They range from micro instances all the way up to double extra-large instances. Each of the instance types has a defined allocation of memory and CPU capacity. Amazon lists compute capacity in terms of EC2 compute capacity. This is a rough measure of the CPU performance of an early 2006 1.7 GHz Xeon processor and allows businesses to translate current physical hardware requirements to cloud performance. Elastic MapReduce uses these EC2 instance types to execute MapReduce jobs. You can find more information on Amazon EC2 instance types on the AWS website under Amazon EC2 Instance Types.

EC2 dedicated instances

Businesses may have very specialized needs for which they would like greater control over the variable aspects of cloud computing, Amazon offers EC2 dedicated instances as an option for customers with these needs. An EC2 dedicated instance includes Amazon EC2 instances that run on hardware dedicated to a single customer. This is similar to the traditional data center hosting model where customers have their own dedicated hardware that only runs their software. A key difference, though, is that customers still only pay for the services they use and can scale up and down these dedicated resources as needed. However, there is an extra per-hour cost for this service that can greatly increase the cost of cloud services. You can find more information on dedicated EC2 instances on the AWS website under Amazon EC2 Dedicated Instances.

Provisioned IOPS

Some applications require a high amount of disk read and write capacity. This is typically measured as inputs and outputs per second (IOPS). Database systems and other specialized applications are typically more bound by IOPS performance than CPU and memory. Amazon has recently added the ability to specify IOPS capacity and needs to its AWS EC2 instances.

We explored the performance of Elastic MapReduce throughout this book and helped you understand how to size your AWS capacity. Chapter 6, in particular, looked at the costs and trade-offs of different AWS options for our Elastic MapReduce application.

Elasticity and Growth

IT elasticity and the ability to quickly scale up and scale down is a major reason why many enterprises begin to look at moving resources to the cloud. In the traditional IT model, operations and engineering management need to evaluate what they believe will be expected demand, and scale up IT infrastructure many months before the launch of a project or a major initiative. Throughout the lifetime of an application there is an ongoing cycle of estimating future IT resource demand with actual application demand growth. This typically creates periods of excess and undercapacity throughout the lifetime of an application due to the time between demand estimation and bringing new capacity online in the data center.

AWS and cloud services reduce the time between increased demand for services and capacity being available to meet that demand. Amazon Elastic MapReduce allows you to scale capacity in the following ways.

Fixed Capacity

You can specify the size and number of each of the EC2 instances used in your EMR Job Flows by specifying the instance count for each of the EMR Job Flow components. Figure B-3 shows an example of the New Cluster, or Job Flow, configuration screen where the number of EC2 instances are specified.

Configuring compute capacity for an Amazon EMR Job Flow

Figure B-3. Configuring compute capacity for an Amazon EMR Job Flow

The size and number of instances will affect the amount of data you can process over time. This is the capacity the job flow will use throughout its lifetime, but can be adjusted using Amazon’s command-line tools or EMR Console to increase the instance counts while the job is running. You will be charged reserve or on-demand hourly rates unless you choose to request spot instances.

Variable Capacity

Amazon offers spot instance capacity for a number of the AWS services. Spot instances allow customers to bid for spare compute capacity by naming the price they are willing to pay for additional capacity. When the bid price exceeds the current spot price, the additional EC2 instances are launched. Figure B-4 shows an example of bidding for spot capacity for an EMR Job Flow.

We explored spot capacity in greater detail in Chapter 6, where we reviewed the cost analysis of EMR configurations.

Bidding for spot capacity for an Amazon EMR Job Flow

Figure B-4. Bidding for spot capacity for an Amazon EMR Job Flow

Security

Concern about security is one of the biggest inhibitors to using cloud services in most organizations. According to a 2009 Forrester survey of North American and European businesses, 50% said their chief reason for avoiding cloud computing was security concerns. Within five years, however, Forrester expects cloud security to be one of the primary drivers for adopting cloud computing.

So why has there been such a change in the view of security in the cloud? A lot of this has come from the cloud providers themselves realizing that a key to increasing cloud adoption is a focus on security. IBM and Amazon AWS have come out in recent years with a robust set of details on how they protect cloud services and the results of independent evaluations of their security and responses to independent organizations like the Cloud Security Alliance.

Security Is a Shared Responsibility

Amazon has an impressive set of compliance and security credentials on its AWS Security and Compliance Center. Delving deeper into the AWS security whitepapers, clients will note that Amazon has clearly stated that security is a shared responsibility in AWS. Amazon certifies the infrastructure, physical security, and host operating system. This takes a significant portion of the burden of maintaining compliance and security off of AWS customers. However, AWS customers are still responsible for patching the software they install into the infrastructure, guest operating system updates, and firewall and access policies in AWS. AWS customers will need to evaluate their in-house policies and how they translate to cloud services.

Data Security in Elastic MapReduce

Amazon EMR heavily uses S3 for data input and output with Job Flows. All data transfers to and from S3 are performed via SSL. Also, the data read and written by EMR is subject to the permissions set on the data in the form of access control lists (ACLs). An EMR job only has access to the data written by the same user. You can control these permissions by editing the S3 bucket’s permissions to allow only the applications that need access to the data to use it.

TIP

Amazon has a number of excellent whitepapers at its Security and Compliance Center. A review of its security overview with your internal security team should be done before you move critical components and data to AWS services. Every project should also review the list of security best practices prior to launch to verify it is compliant with Amazon’s recommendations. If your organization works with medical and patient data, make sure to also check out the AWS HIPAA and HITECH compliance whitepapers.

Uptime and Availability

As applications and services are moved to the cloud, businesses need to evaluate and determine the risk of having an outage of their cloud services. This is a concern even with private data centers, but many organizations fear a lack of control when they no longer have physical access to their data center resources. For some, this fear has been validated by a number of high-profile outages and cloud service providers, including Amazon AWS services. The most recent was the infamous Christmas Eve AWS outage that took Netflix services offline during the holiday season.

AWS has a number of resources to help customers manage availability and uptime risks to their cloud services.

Regions and availability zones

Amazon has data centers located in the United States and around the globe. These locations are detailed as regions, and customers can pick multiple regions when setting up AWS services to reduce the risk of an outage in an Amazon region. Each region has redundancy built in, with multiple data centers laid out in each region in what Amazon calls availability zones. Amazon’s architecture center details how to make use of these features to build fault-tolerant applications on the AWS platform.

Service level agreement (SLA)

Amazon provides uptime guarantees for a number of the AWS services we covered in this book. These SLAs provide for 99.95% uptime and availability for the EC2 instances, and 99.9% availability for S3 data services. Businesses are eligible for service credits of up to 25% when availability drops below certain availability thresholds.