Cloud Computing - Information Management: Strategies for Gaining a Competitive Advantage with Data (2014)

Information Management: Strategies for Gaining a Competitive Advantage with Data (2014)

Chapter 13. Cloud Computing

On-Demand Elasticity

Nothing will change the way IT operates like cloud computing. The cloud can be any combination of deployment model, cloud service, and application domain. No two companies will approach it the same way, but it’s important to take heads-up activity toward an enterprise cloud strategy for information.

Keywords

cloud computing; virtual data center; public cloud; private cloud; virtualization; internet-based services; utility computing; business intelligence; information technology

Everyone is talking about the cloud—and for good reason. Nothing will change the way IT operates like cloud computing. Those of us who have been deploying applications for a long time are used to being able to, at some point, perform maintenance and upgrades, or perhaps simply view, our physical servers. Those days are passing us by.

Server management is one of those functions that is not considered part of the core competencies in developing applications. It is increasingly being delivered by service level to applications, including information management applications that need servers to function, which is most of them.

Service levels will include things like up time, response time, data size, tiered performance levels, level of willing risk (vs. cost), resource management needs, etc. Though these may seem to constitute a quite reasonable range of the needs of an information management application, if there is any temptation to turn over platform selection to the cloud provider, let me dissuade you from that type of thinking. The information management owner needs to make those decisions, the decisions that this book is geared toward supporting.

The Changing Nature of IT

IT was once determined to be best fit in an organization as a central organization, providing all of the technical services that the organization requires. As of this writing, it still is in many organizations. However, many other organizations are taking a more balanced approach with IT. There is still a central IT, but there is latitude for distributed IT organizations (most likely not in name, but in function). Wherever the technical roles reside is what I mean when I reference “IT.”

Defining Cloud Computing

So what is “the cloud” and how does it fit into this discussion of turning server management over to service levels? The cloud is perhaps the most nebulous (or nebula image) term in technology. At one end of the spectrum (spectra image), the cloud is erroneously used as a term to simply mean the Systems Administration group no longer wants to allow you to see your server due to difficulties with security access, the questions that the visits raise, or as a means of emphasizing the split in responsibilities between Systems Administration and the application teams. This can be referred to as a “private cloud.” However, I will give some criteria later that I think a true private cloud adheres to—some good functionality that provides true value to an organization.

At the other end of the spectrum, you have established public clouds like Amazon Web Services, Softlayer (an IBM Company), Rackspace, GoGrid, etc. that provide exhaustive services that go beyond the basic cloud criteria I will give you later.

The best definition for cloud computing I’ve found is the one by the U.S. National Institute for Standards and Technology (NIST).

It defines cloud computing as “a model for enabling convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction.”1 NIST defines the 5 characteristics of cloud computing as:

• On-Demand and Self-Service

• Broad Network Access

• Resource Pooling

• Rapid Elasticity

• Measured Service

On-demand and self-service disintermediate IT and others from an application owner and her ability to provision servers. Broad Network Access means access to a wide variety of modern platforms—much more than one could get on its own. Resource Pooling is gaining the benefits of economies of scale. Rapid Elasticity is the “accordion effect” of resources growing and shrinking seamlessly as needs dictate. Measured Services refers to the reporting infrastructure of the cloud provider, providing detailed reports to you on usage.

We are all consumers of cloud services, whether we refer to it as such or not. We might use a version of Microsoft Office in the cloud (starting with Version 365), DropBox or Google Docs for file sharing, iCloud from Apple, etc. Public email services are cloud based. Many organizations use one of the first pieces of software ready for the enterprise from the cloud: salesforce.com (whose phone number is 1-800-NO-SOFTWARE). Salesforce.com software is for contact management and many see this as an enterprise function that can be easily isolated. However, the idea of offloading not only the servers referenced above, but also the software (installation, maintenance, upgrades, etc.), is catching on in many enterprise domains, including all of the ones related to information management—data storage, data integration, data access, etc.

The cloud is a metaphor for the internet and telephone networks.

You are completely hamstrung these days selecting information management software and hardware if you patently rule out the cloud option. In many areas, the cloud options are the rule, not the exception. You may end up with pockets of software, hardware, and data in many clouds—private and public—and your new dominant role in IT is to choose and integrate wisely.2

Information management is a frequent leading domain in bringing the cloud strategy to an organization or taking the strategy to new levels. The cloud is essential for the information manager to understand, which I why I must talk about it in this book on information management.

Moving into the Cloud3

The most elaborate public cloud that I’ve visited is the Switch SuperNAP in Las Vegas (www.switchlv.com). This is where companies like eBay, Google, DreamWorks, Sony, Cisco, Pixar, Joyent, HP, Vmware, Nirvanix, and many others call home for their server infrastructure.

The SuperNAP is the world’s most efficient, high-density data center in the world. It is (currently) 2,200,000 square feet of data center space for 520,000 square feet of operational data space with 500 megavolt amperes (MVA) of power capacity, 567 MVA of generator capacity, and 294 MVA of Uninterruptible Power Supply.

The highly redundant systems and service levels for on-call power delivery would seem to make it the last building to lose any power if the worst were to happen. Electrical power inside Switch facilities is delivered on System+System design so each primary and failover system is completely separated from the other. The fault-tolerant power design enables Switch to deliver 100 percent power uptime to its customers. As needed, the SuperNAP can be power self-sufficient with its inclusion of up to fifty 2.8 megawatt diesel generators onsite for a total of 140 megawatts of generator capacity.

The cooling system is a work of art itself. With 202,000 tons of cooling, it has 22,000,000 CFM (cubic feet per minute) of airflow. Depending on weather conditions, the system chooses which of the four ways it has at its disposal to cool. Each customer has a private secure caged area called t-scif™ that enables higher power and cooling densities per cabinet than any other data center in the world.

They also believe the location supports, rather than competes with the cooling system. As hot as Las Vegas is understood to be, 70% of the time the temperature is 72 degrees or less (i.e., nighttime). The temperatures are consistent. As well, Las Vegas is one of the few places without a real threat of natural disasters like floods, hurricanes, and earthquakes.

The cleanliness rivals the clean rooms where chips are made. Our guards said the physical aspects were the most important element to security. The armed escort guards at either end of our group certainly attested to that.

And in those over 31,000 cabinets reside the data that is supported by this infrastructure and made available in many forms, including collocation, connectivity, and cloud computing. This is data that many of us interact with daily. SuperNAP clients are built on data and have huge downsides if transactions are not collected, hospital records are compromised, or television content fails.

The SuperNAP is simply unattainable infrastructure for a company to build for themselves. Yet, it reflects the pseudo-requirements that I see all of the time. This is what it takes to provide true “99.9999%” availability. I was impressed with the decision making process across so many factors that the designers must have gone through.

Similarly, many companies are looking to build their own private clouds and evaluate public cloud providers and will need to determine their requirements. The SuperNAP sets the bar. While designs and public clouds state high availability, greenness, efficiency, security, performance (eBay’s numbers are outstanding), and elasticity, it’s the infrastructure that delivers.

image

FIGURE 13.1 The switch SuperNAP.


3Reference: http://mike2.openmethodology.org/wiki/Going_into_the_premier_cloud

Benefits of the Cloud

Why are companies embracing arms-length arrangements with hardware and software? Why do CIOs repeatedly tell me that they need to get more and more of their databases to the cloud in short order?

One of the important categories is cost. You pay only for what you use and you pay a third party for the resources on a periodic basis—usually monthly. The cloud lowers in-house staff requirements as well as the physical requirements for a proper data center, which brings costs down for power, floor storage, and network systems—not the core competency for most organizations.

More importantly, the costs are treated, for tax purposes, as operational costs (OPEX), whereas internally managed systems are usually capital expenses (CAPEX) and written off over time. The whole OPEX vs. CAPEX debate is outside the scope of this book, but there is no doubt that the cloud is an enabler of OPEX, which is highly favorable to the CFO.

VMS found that “The overall 5-year TCO savings for SAP systems running on cloud range between 15% and 22% over identical systems that are deployed on-premises or outsourced. You pay only for what you use. When using cloud, you turn infrastructure costs from being capital expenses into variable operational costs.”4

The other area of major benefit of the cloud is in deployment or provisioning speed. It should be ready when you are. When I’ve laid out timetables for projects recently, I’ve had CIOs tell me the cloud provisioning is so fast, it’s to the point of being uncomfortable. They are used to lining up resources, support, and project details during the time it usually takes to provision the hardware, but here I am saying the software is ready for us to use this week.

Certainly, the fast provisioning of the cloud will require some getting used to!

The cloud also obviates that whole guesstimation process that projects go through early on to try to determine how much space and resources they are going to need. This has seldom turned out to be within 25% of accuracy. Either the project does extraordinarily well, in which case it exceeds its estimates, or it bombs and dramatically underutilizes the estimates.

I have done provisioning only to have it padded (by 2X) by one level of management (“just in case”) only to be padded (by another 2X) by another level of management (also “just in case”). Disk costs keep coming down, but this approach is overkill.

With its dynamic provisioning capability, cloud resources are going to expand and contract according to need, like an accordion. I still recommend the projects go through a rough estimating exercise, if only to ensure a proper platform selection. However, with the cloud, companies will not be saddled with unused resources. Also, if worse comes to worst for a project (or company), it will be a major benefit in deprovisioning to be in the cloud.

Certainly with the data sizes involved in big data (Chapters 10 and 11), the cloud makes the utmost of sense—large, fast-growing amounts of non-sensitive data for which it is extremely difficult to estimate its ultimate size.

Challenges with the Cloud

Although the arguments are slowing down, security remains the biggest challenge to using the cloud for information management. After all, you are turning data over to a third party.

Early cloud deployments fueled the security concern by commingling data, even for large companies, in the same servers. Now, “single tenant” (no commingling) and “multi-tenant” (commingling) is a decision point in a cloud deployment. Initially, single tenant came with prohibitive extra cost, but, as it has become the norm, additional costs for single tenant are minimal or nothing.

Performance of a prospective information management cloud deployment must be tested—load, query, failover, etc. Performance can be an issue, although recent tests are showing performance for most tasks equivalent to or better than in-house deployments.

The multiple clouds that inevitably will support an organization cause issues as well, which is a large reason for thinking about the company’s cloud strategy from a holistic, top-down perspective as well as tactically project-by-project. If cloud platforms need data from each other, how will they get it? While it’s usually possible (worst case: cloud 1 to on-premises, on-premises to cloud 2), the performance could be an issue. Again, test performance across the clouds you are considering.

Moving to the cloud can also feel like buying a franchise business. You can change some things, but not the big things.5 Standards could be different. While you’re seldom locked into a cloud contract for long, should you wish to bring the data back in-house or go with another cloud provider, that process is effort-laden and time-intensive. I believe the industry will evolve to have more portability, much like the telecommunications industry mandates allowing portability of phone numbers in 1996 and 2003.

Availability can be an issue that may dictate which applications are put in the cloud. Reliability and performance are also variable around the world, depending on the country’s networking infrastructure, among other things.

Finally, there are regulatory requirements that prohibit some cross-country storage of data. This affects the smaller European countries more than companies based in the U.S. The Patriot Act,6 for example, prohibits Canadian companies from using US clouds.

Cloud Availability

Availability can be an issue as well with the cloud. Not all cloud providers are like the SuperNAP. Amazon Web Services (AWS) famously provides 99.95% availability, far less than the 99.9999% availability that many applications aspire to. While this may seem like a lower class of service (and occasionally it’s not met), CIOs are rationalizing that the aspiration of 99.9999% availability is simply that—an aspiration that is not realized by most in-house implementations.

When, occasionally, AWS has gone down,7 the anticipated exodus from AWS did not happen. Instead, companies opted in to more failover from AWS!


7AWS credits 5% of the fee for every 30 minutes of downtime

Image naming, server clocks, and time zones become a real challenge with the cloud if not managed well.

While I’ve found public clouds usually measure up to the cloud criteria above quite well, private clouds need to evolve to meet the criteria. With captive private clouds (i.e., your former System Administration group), old habits die hard. All of the characteristics could be compromised unless there is a corresponding mindset change (very challenging, see the chapter on Organizational Change Management). Usually the last characteristic to accomplish for a private cloud is sufficiently broad network access. By starting out with in-house servers, there is seldom broad network access. A private cloud must provide inventory and choice to its customers. Servers to accommodate most of the data stores referenced in this book, including Hadoop, need to be made available.

Cloud Provider Agreements

There are many “line item” costs in a cloud provider agreement. Several have to do with the level of knowledge the provider will give to you about the servers provisioned for your needs. Even though you may not know what to do with them immediately, it is best to pay for all of these insights. You’re already saving dollars. Pay the nickels for this insight.

Cloud Deployment Models

Throughout this chapter, I have referenced the public and private cloud options. I will now say more about them and the other major options you have in developing a cloud strategy for information management.

The public versus private cloud is the dimension with the most leverage in a cloud strategy. The direction of public or private will absolutely influence not only every cloud decision, but perhaps every software decision the company will make henceforth. Public clouds are made available to a broad audience—not just your company. Public clouds are in the business of providing virtual infrastructure to companies. While providers certainly compete on features and functions, they are much more homogeneous than private clouds.

The term “private cloud” is highly abused as it can be used to refer to, as I’ve said, the System Administration group creating some space and distance for itself. However, I’ve seen just as many excellent and quickly matured private clouds that offer all of the 5 characteristics of cloud computing.

Sometimes they get so good that they become a cloud that can be shared with other companies. At some point in that progression, the cloud can become a “hybrid” cloud. Hybrid can also be used to refer to a company’s cloud strategy of utilizing BOTH public and private clouds. This I see as an inevitable path for information management.

The Path to Hybrid Cloud

Enterprises need to evolve their current IT infrastructure to become more “cloud-like”—to become a better internal service provider to the lines of business and departments and to provide greater agility and responsiveness to business needs, higher quality of service in terms of latency and availability, lower costs, and higher utilization.

The first step that many enterprises are taking is to move to a virtualized environment—a private cloud that moves from a dedicated, rigid, physical structure for each application to a virtual environment with shared services, dynamic provisioning, and standardized configurations or appliances.

This is a self-service and pay-per-use environment. A user goes to the employee portal, signs in, makes a request for a virtual machine(s) with a certain amount of CPU, memory and disk, picks a VM image for database or middleware, then clicks “submit.” If that employee’s role and entitlements allow him or her to have that amount of IT resource, then it gets provisioned without an IT person being involved. If not, perhaps his or her request gets routed to his or her manager and/or IT for workflow approval.

Ultimately, the best arrangement will be “hybrid clouds,” combining both a private cloud or clouds and multiple public clouds.

The second major dimension of emphasis in a cloud strategy is what category of service the cloud is expected to provide. There are three categories:

• Software-as-a-Service (SaaS),8

• Platform-as-a-Service (PaaS), and

• Infrastructure-as-a-Service (IaaS).

IaaS is the lowest category of service and gives you access to servers, storage, and networking over the internet. Amazon famously provides this level of service. A way to look at IaaS is that it’s just VMs in the cloud. Adding the middleware and operating system to the servers, storage, and networking is PaaS—platform-as-a-service.

SaaS are the cloud services we are all familiar with such as email, collaboration tools, productivity tools, Google applications, etc. The previously mentioned salesforce.com is also SaaS. SaaS provides “everything”—just sign up and use the software today. There is limited customization, especially at the server level. There is limited to no visibility at the server level because it is controlled by the software provider. Usually these providers are aware that an enterprise runs on much more software than theirs and provides APIs to access your data should you want to distribute it to other servers and clouds.

As individuals, many of us consume SaaS applications on the cloud daily, such as Google Apps, DropBox, and iCloud. Perhaps the most prominent enterprise package using a SaaS model is salesforce.com.

The third dimension of cloud computing is also addressing “what” you are putting on the cloud and this is where the possibilities have become numerous. Take a data warehouse, with its sourcing/integration/transformation, data storage, and data access layers, broadly speaking. Any one, two, or three of these can be placed into the cloud, irrespective of the others. Many are starting with the database and working “out” to the data access layer.

The voyage into the cloud can begin from any of the combinations of the dimensions: public/private, IaaS/PaaS/IaaS, and the chosen layer. It should be a mindful evolution to multiple forms—with lessons learned and absorbed along the way—that the organization can take through the worthwhile journey into the cloud.

A Look at Platform-as-a-Service

Of the services choices, PaaS provides the strong combination of commoditized services and application flexibility that makes sense for many information management needs. In many ways, it essentially becomes the organization’s operating system.

PaaS must provide support for the programming languages to be used by the developers. Microsoft.NET, Java, PERL, Python, and RUBY developers will need those capabilities in the PaaS APIs they program to. Many shops will need multiple programming languages available.

The nice stress remover from the organization with PaaS has to do with the fact that it talks to its infrastructure through APIs. These programmatic calls between layers remove much of the integration challenges in the components between the hardware and the operating system, such as networks, ports, and disks. Resource allocation to the application layer is also handled by PaaS, focusing the organization on its application development, not its infrastructure.

For example, Jaspersoft, a prominent business intelligence software company, has ported its commercial open source business intelligence solutions to OpenShift Flex by Red Hat, which uses Amazon Elastic Compute Cloud (EC2)9 for the IaaS layer and supports a wide range of development languages. The suite now provides convenient tools to work with the applications and platforms there, MongoDB (a document store, see Chapter 10) being notable due to its integration into OpenShift as well as being previously supported by Jaspersoft.

Some PaaS characteristics to keep in mind are whether it is closed or open source, public or private cloud, and what programming languages are supported.

Although most PaaS solutions are public cloud, some are designed such that they could be taken to the private cloud. Here again, the larger enterprise cloud decision will affect the PaaS solution choice. The important aspect of this decision is the proximity and interaction with the other cloud(s) of the organization. The inevitable integration with the on-premises systems must be considered, especially the performance of the integration. Smart deployments take this into account and collocate systems with a high degree of sharing needed.


9https://aws.amazon/ec2

Open Source software is often associated with the cloud, since you are often looking for inexpensive software when you’re looking for a quick-provisioned, inexpensive environment. For an excellent treatment of Open Source in Business Intelligence, see “Using Open Source Platforms for Business Intelligence: Avoid Pitfalls and Maximize ROI,” by Lyndsay Wise.

Information Management in the Cloud

Any or all layers of an information management stack can be placed in the cloud. Consider a data warehouse stack of data platform, data integration, and data access. Most will opt to put the data platform in the cloud first, whereas others would start with the software running in the cloud, following the data.

Among cloud benefits, the largest ones for information management are:

• Flexibility to scale computing resources

• Ability to shorten implementation windows

• Reduced cost

• Testing, development, sandboxing environments

• Geographic scalability

In order to keep one project on-time and on-budget, we decided to build the environment in the cloud. In five days, we were able to build the complete BI landscape in the cloud, load the data, schedule the ongoing loads, and build the data access layer. In a traditional on-premises hosting environment, this landscape would have taken weeks to months to create.

Action Plan

• Inventory organizationally proximate uses of the cloud today

• Understand your company’s abilities and aspirations for a private cloud

• Evaluate public cloud providers and understand their services for your information management needs

• During the next information management software contract review, mentally compute what it would look like to deploy that software in the cloud

• Determine your desired SaaS, IaaS, and PaaS footprint profile

www.mcknightcg.com/bookch13


1http://csrc.nist.gov/publications/nistpubs/800-145/SP800-145.pdf

2Since it interacts with so many systems, master data management is a challenging choice for the cloud.

4http://www.vms.net/content/e457/e458/e1707/items1711/VMS_EN_TCO_Study_AWS_with_CWI_EXTRACT_205_ger.pdf

5Especially with public cloud. Some captive private clouds are malleable

6Full content of the Patriot Act: http://www.gpo.gov/fdsys/pkg/PLAW-107publ56/html/PLAW-107publ56.htm

8yes, the “aa” is lowercase in the acronyms