Client-Server Web Apps with JavaScript and Java (2014)

Chapter 12. Virtualization

People don’t appreciate the substance of things. Objects in space. People miss out on what’s solid.

— Jubal Early (Firefly, Objects in Space)

The word “computer” immediately brings to mind some image of machinery, perhaps a monitor and keyboard, a laptop, or a stack of servers. In each case, the image is a tangible, solid, material object. This common impression is not particularly useful in software development when the physical details of hardware are masked by additional abstraction layers. Virtualization is a term used to describe a technology that hides details and specific implementation characteristics. It applies to a range of technologies related to hardware platforms, operating systems, storage devices, and network resources. While it is not directly tied to client-server web development or any other paradigm, it is interesting because many such applications are intended for large-scale deployments built on virtualization solutions. Virtualization is a powerful concept that impacts practices related to active development, deployment, scaling, and disaster recovery.

Full Virtualization

Without virtualization, a server is defined by and limited to physical constraints. Server administrators build and configure servers with a specific set of options suited to their particular hardware and purpose. Individuals on a development team each install software and configure their machine to conform to the target server. Hardware limitations, performance bottlenecks, and capacity issues are often overcome by adding additional hardware to a given machine. Automation might be used to some degree, but in many cases, manual processes are the order of the day due to the unique details of a particular machine’s hardware. In situations where there are a number of different servers to maintain, complex backup and recovery practices become imperative to system availability and reliability.

The type of virtualization that alleviates these challenges involves virtual machines that take the place of physical servers. Full virtualization seeks to provide a complete simulation of underlying hardware to the extent that a distinct unmodified operating system can run within a virtual machine. Figure 12-1 shows the continuum of virtualization, ranging from traditional, persistent, manually configured physical hardware to highly transient virtual machine instances that are automatically created and might only exist for a few seconds in a cloud.

Server types

Figure 12-1. Server types

Challenges to maintaining physical servers are often alleviated using virtual machines. Since full virtualization involves a full simulation of physical hardware, individual developers can be given isolated environments for testing new software without the expense of purchasing and maintaining additional machines. Virtual machines can be created to test software on multiple operating systems on a single physical server. Entire software installations can be shipped in preconfigured virtual machines. The ability to create a snapshot that captures the state of a machine at a point in time opens up many possibilities related to managing disaster recovery. Rather than managing many physical machines with specific hardware concerns, servers can be replaced with virtual machines to reduce energy consumption and hardware costs.

One downside to virtual machines is that additional processing comes at a cost. Virtualization software must be purchased in some cases, and running them incurs additional processing power. Projects that require low-level hardware access are not good candidates for virtualization, but most web applications are. The problem with virtualization versus physical hardware is similar to the use of a high-level programming language versus a lower-level language like C or assembly. Tasks that require high performance can only be implemented “close to the metal” using specific, highly tailored solutions. A myriad of other tasks do not have such stringent requirements.

One other cost to consider is additional complexity. Rather than paying attention to one physical server, a virtual machine’s behavior must be understood not in isolation, but also in relation to its host machine. A performance problem could be due to an issue on the VM itself or on the host machine. Virtualization layers can be nested several layers deep, adding additional complexity. Virtualization removes entire classes of problems, but does require specific technical awareness and know-how to use effectively.

The popularity of VMs in place of physical servers has resulted in a significant shift in server management and scaling practices. So-called cloud providers host virtualized computer resources in one form or another. In some cases, the virtualization provided is in the form of specific machines. In this case, the need to actively manage physical servers is eliminated altogether.

WHAT ABOUT THE “V” IN JVM?

The Java Virtual Machine provides partial virtualization, allowing class files containing bytecode to be interpreted and executed on any hardware where a virtual machine can be run. It does not provide any sort of distinct container for isolating code and does not result in the creation of a distinct machine that can be managed as an independent server on a network. Any type of virtualization can be extremely effective in masking underlying implementation details, but different levels are applicable to specific problems.

Virtual Machine Implementations

Virtualization dates back to an IBM research system created in the 1960s called the CP-40. It was followed in 1967 by the CP 67, which was a virtual-machine operating system developed for the IBM System/360-67. Other virtualization solutions were introduced in the years that followed, most specific to particular operating systems. The applicability and popularity of virtualization grew as powerful hardware became available at lower costs to a larger number of developers.

Among the myriad virtual machine implementations available today, VMWare, VirtualBox, and Amazon EC2 are among the most popular. They are also specific targets of provisioned servers created with tools such as Vagrant and Packer.

VMWare

The first encounter many web developers had with virtual machines was VMWare workstation released in 1999. VMWare now offers a range of related virtualization, cloud management, backup, and desktop products. Open source versions of VMWare software are available today as well.

VirtualBox

In January 2007, an open source version of VirtualBox was released as a full virtualization solution that runs on Windows, Linux, and OS X. It was acquired by Sun and later by Oracle where it was rebranded as Oracle VM VirtualBox. It is comparable to VMWare’s offerings in general functionality, but differs most substantially in non-technical fine points like licensing terms, paid features, ease of use, and availability of documentation.

Amazon EC2

An Amazon Machine Image (AMI) is a template that defines the server configuration that can be run on Amazon Elastic Compute Cloud (Amazon EC2). An AMI is selected when an instance is launched and afterward is available as a virtual server in the cloud. As such, AMIs are only relevant for deployments targeted for Amazon Web Services.

Management of Virtual Machines

Management of VMs becomes a significant undertaking as their number and complexity increases. Each implementation has proprietary mechanisms for defining and maintaining virtual machines. Open Virtualization Format (OVF) is an open standard for packaging and distributing VMs, and there are a number of noteworthy projects designed to assist in creating and maintaining them.

Vagrant

One challenge is that each virtualization technology has unique processes, scripts, and utilities for creating and maintaining an environment. Vagrant provides mechanisms to configure reproducible and portable VMs provisioned on top of VirtualBox, VMware, AWS, or any other provider. The vagrant command is used to complete all related operations.

A Vagrantfile contains configuration for a given machine. Once this file is configured, a box (base image used to create VMs) must be available or added. It is an initial image that is cloned but never actually modified. With a box added, a machine can be started by running vagrant up, and the new machine can be accessed via ssh using vagrant ssh. Additional commands can be used to provision a machine as well as to stop and clean up old machines and boxes. Mitchell Hashimoto, the creator of Vagrant, has a book on Vagrant that covers it in depth. He has also more recently authored another project that further promotes simplified cross-VM implementation configuration called Packer.

Packer

As useful as Vagrant is, the creation and management of images remains a tedious, difficult, and largely manual process. Packer uses a template written in a single portable input format to generate images for multiple platforms in parallel. Packer is used to automate the creation of base boxes for various VM providers. Components of Packer called builders create machine images for a given platform in a form known as artifacts. For example, Packer’s VirtualBox builder can create VirtualBox VMs and export them in OVF format. Artifacts are comprised of IDs or files that represent a virtual-machine image. Packer also compliments Vagrant’s functionaity, as it can take the artifact and turn it into a Vagrant box using post-processors.

A consistant syntax and workflow for configuring VMs for different providers does not address provisioning and maintenance concerns. Additional automation can initially be provided through a few shell scripts. Terminal enhancements like csshX for OS X to run ssh commands on multiple machines or a tool like Capistrano might suffice to manage multiple servers in a small-scale environment. These solutions are not sufficient for general-purpose systems administration when the number of servers grows beyond a small number.

DevOps Configuration Management

Simple Vagrant machines can be set up with individual commands or shell scripts referenced in a file named Vagrantfile. More complex configurations can use Chef or Puppet to automatically install and configure VMs. Both Puppet and Chef are written in Ruby, but Puppet uses a JSON-based language to determine what to install based on dependencies defined, while Chef requires an install script written in Ruby itself. More recently, Ansible and Salt have emerged as alternatives (or in some cases compliments) to these. There is a tremendous amount of overlap between what can be accomplished with these tools, but each is particularly suited for certain projects and administrators.

Table 12-1 lists DevOps configuration management tools.

Table 12-1. DevOps configuration management tools

Tool	Initial release	Notes
CFEngine	1993	C-based, fast, lightweight, steep learning curve
Capistrano	2005	Focus on Rails app deployment
Puppet	2005	Inspired by CFEngine
Chef	2009	Ruby for configuration
Salt	2011	Fast, large-scale orchestration and admin
Ansible	2012	Simple, agentless administration

The year of initial release is helpful for understanding the role of each tool and its relation to the existing ones. Puppet is inspired by CFEngine. The authors of Chef used and learned from Puppet but took a somewhat different approach based on their admin experiences. More recently,Ansible and Salt have been gaining traction as simplified, streamlined tools akin to Chef or Puppet. They perform both initial configuration and provisioning of a server as well as execution of commands to retrieve results from arbitrary nodes.

DEVOPS

It often seems like as soon as the number of tools, techniques, and acronyms in a technical area gains a certain critical mass, a new job title appears. DevOps is the one that was introduced in 2009 to represent the role filled by professionals whose responsibility spans traditional development and operations tasks. While individual developers might not use the tools listed in this section in depth, it is beneficial to understand it and be able to interact intelligently with the DevOps professionals who do.

Containers

While full virtualization was an early goal with many useful applications, more limited forms have also had a great impact. Partial virtualization only attempts to emulate a portion of an entire operating system and does not provide a full-blown virtual machine. Instead, operating-system-specific container technology allows a limited form of virtualization.

Development of container technology was driven by the problem of obtaining process isolation and security beyond what is possible through other operating-system mechanisms. Traditional user and group management is cumbersome and incomplete for many situations. A limited isolation available since the late 1970s is the chroot utility. Though useful in some circumstances, it stops short of providing the capability of running a fully functioning independent container. Containers can be considered from a high level as partial VMs or from a low level as enhanced chroots.

Containers might be described as operating-system-level virtualization or with other vendor-specific terms. They provide user-space instances that allocate private resources within a container but execute commands against the host’s kernel. Rather than emulating an entire machine, container technologies are focused on virtualizing an individual operating-system process such that it runs in an isolated, secure environment independent of the rest of the server.

LXC

LinuX Container (LXC) virtualization is available on Linux. It allows one or more isolated containers to run on a single server. This provides a better balance between resource usage and security than is possible in a single monolithic system running standard OS-level processes. Containers run instructions native to the core CPU without intermediate steps required by standard virtualization techniques, and so are better performing. Since they do not have all of the overhead included in a full operating system, they are lighter weight and take up fewer resources than would be required in a full-blown VM. Linux containers will run regardless of the host system’s kernel version or distribution.

Docker

Docker extends LXC with a high-level API. Like other container technologies, Docker is intended to simplify application packaging and deployment and the creation of individual private environments for end users. In large part, Docker makes the functionality available through LXC much easier to use. In this respect, Docker is to LXC as Vagrant is to the underlying virtual machine implementations it supports, as shown in Figure 12-2.

Docker and Vagrant high-level APIs

Figure 12-2. Docker and Vagrant high-level APIs

In Docker parlance, you run containers that are based on images. When exiting a container, its filesystem state and exit value are preserved but its memory state is not. Containers can be started, stopped, or restarted. A container can also be promoted to an image using the Docker commit command. The image can then be used as a parent for new containers.

Docker images can have parent images. Base images have no parent. A collection of images used to create containers are stored in a Docker repository. Repositories can be referenced in a registry. The implicit top-level registry is index.docker.io, so Docker also includes mechanisms to publish and share images.

Although it is a very new project, Docker has tremendous potential with its promise of standard containers to distribute environments. Used properly, the time spent by individual developers and administrators setting up machines could be eliminated. The popularity of Linux on a wide range of hardware suggests new possibilities for distribution of applications to anything from a fellow developer’s machine, to a cloud service, to an embedded device.

Project

The project uses several of the virtualization technologies mentioned previously. The project requires Git, VirtualBox, and Vagrant as prerequisites. In just a few steps, Docker will be set up (in a Vagrant-managed VirtualBox). Figure 12-3 shows the Vagrant file that resides on an OS X host system and defines the configuration of a VirtualBox instance where Docker instances run. A Java SDK will be installed in a Docker container. The container will then be used to compile and run a Java program. This simple example includes all the steps required in larger, more extensive installations to Docker.

Java running on Docker for OS X

Figure 12-3. Java running on Docker for OS X

The directions for setting up Docker using Vagrant varies slightly by operating system. In general:

1. Docker sources including the Vagrantfile for machine setup are fetched using Git.

2. The VM is started using Vagrant.

3. The user logs into the new VM using ssh and switches to the Docker user.

4. Docker is available to create and maintain containers.

From the host machine, you can initially download the Docker project using Git:

git clone https://github.com/dotcloud/docker.git

cd docker

Docker is under heavy development and is changing quickly. After initial installation, you can update your version of Docker using Git as well:

git pull

The Docker project for OS X consists of a Vagrant-managed VirtualBoxVM. To start up and log into the VM hosting Docker:

vagrant up

vagrant ssh

Once logged in, you can log out of the Vagrant VM at any time by entering exit. From the Vagrant managed box, Docker can be called:

sudo docker

The ubuntu base image can be downloaded and installed with set of standard Linux utilities available:

sudo docker pull ubuntu

sudo docker run ubuntu /bin/echo Docker is running!

Docker Help

You can learn a great deal about Docker by using the built-in help. Since the project is changing so rapidly, there is a chance that documentation available online or elsewhere is not applicable to the version you are using. To list available commands, simply type docker. Options available for each command can be listed by adding the -help argument:

docker

docker build -help

Image and Container Maintenance

Once you have been working with Docker for a while, you will amass a number of images and containers. The info command can be used to view a report describing system-wide information, including the total number of containers and images:

docker info

These comprehensive totals, exited containers, and intermediate images are the subset that is often of immediate interest. Each container after exit remains available until removed. Images tend to accumulate quickly as an image is created during each step defined in a docker file. The pscommand lists running Docker containers. The images command lists images (excluding intermediate images used to build):

docker ps

docker images

To list all containers or images, include -a. A .dot diagram of images that can be viewed using GraphViz can be created by specifying the -viz:

docker images -viz > docker1.dot

Much more detailed information is available on a given container by running docker inspect <container name>. The rm command is used to clean up containers and images. These can be passed as a list to the command:

docker rm $(docker ps -a -q)

docker rmi $(docker images -q)

Java on Docker

The Docker Git repository included a Vagrantfile used by Vagrant to configure and provision the VM. Docker uses a Dockerfile to configure and provision a container. The FROM instruction indicates the base image for the new machine. There are public repositories of images available, or you can use one of your own. The MAINTAINER indicates the author of the image. The RUN instruction executes commands on the current image and returns the results. The following steps install Oracle’s Java 7 SDK and accepts the license as presented. The ADD instruction will copy a file named Hello.java to the container where it will be compiled and available for executions:

FROM ubuntu:precise

MAINTAINER Casimir Saternos

RUN echo "deb http://ppa.launchpad.net/webupd8team/java/ubuntu precise main"\

| tee -a /etc/apt/sources.list

RUN echo "deb-src http://ppa.launchpad.net/webupd8team/java/ubuntu precise main"\

| tee -a /etc/apt/sources.list

RUN apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys EEA14886

RUN apt-get update

RUN echo oracle-java7-installer shared/accepted-oracle-license-v1-1 select true\

| /usr/bin/debconf-set-selections

RUN apt-get -y install oracle-java7-installer

RUN update-alternatives --display java

RUN echo "JAVA_HOME=/usr/lib/jvm/java-7-oracle" >> /etc/environment

ADD Hello.java Hello.java

RUN javac Hello.java

A simple Hello World needs to be created in the directory where the Dockerfile resides:

public class Hello{

public static void main (String args[]){

System.out.println("hey there from java");

}

With the Dockerfile and Java class in place, the container can be built from the image. The -t option specifies a repository name to be applied to the resulting image and identifies it when listing available images:

docker build -t cs/jdk7 .

In addition, the JDK installed earlier is also installed, and the program that we copied to the Docker container can be run:

docker run cs/jdk7 java -version

docker run cs/jdk7 java Hello

To be clear, these commands ran within the docker container. Try running them on the Vagrant VM to see a different result (indicating that Java is not installed on the VM):

java -version

java Hello

A Docker file to run a web application as a WAR on Jetty can be configured by appending to the Dockerfile defined above. The ADD command can be used to copy a file from the Vagrant VM to the Docker container, while the RUN command can use wget or another utility to download a needed file from a referenced URL:^[4^]

ADD rest-jersey-server.war rest-jersey-server.war

RUN wget http://repo2.maven.org/[...see the footnote...]/jetty-runner.jar

COPYING FILES TO VAGRANT

You might be wondering how rest-jersey-server.war ended up on the Vagrant VM, or you might have just cleverly downloaded it using Curl or wget. While downloading is a fine option, it is possible to copy files through a file share or using scp as well.

By default, Vagrant shares the directory with the Vagrantfile to the /vagrant directory in the VM. In addition, by default, Vagrant forwards SSH (from port 22 to 2222), so files can be copied between the VM and the host machine using scp. For example, from the host machine, docker.png can be copied from within the VM to the host machine by running the following command and typing vagrant for the password when prompted:

scp -P2222 vagrant@localhost:docker.png .

The container can then be built from the image and run interactively:

docker build -t cas/restwar .

docker run -p 49005:49005 -name restwarcontainer cas/restwar \

java -jar jetty-runner-8.1.9.v20130131.jar \

--port 49005 rest-jersey-server.war

Note that with the container running interactively, other vagrant ssh sessions can be opened to run additional commands. If you want to run the web app noninteractively, the command to launch the app server would be included as a final RUN command in the Dockerfile.

By default, Docker invents a name for a newly started container. The -name argument is used above to name the container in a meaningful way, but does introduce the need to take additional manual steps. If you decide to rerun the container with the command listed above, you must either specify a different container name or delete the one previously created:

ID=$(docker ps -a | grep restwar | awk '{print $1}')

docker rm $ID

Docker and Vagrant Networking

One of the confusing bits of working with Docker on OS X or Windows is that it involves a physical machine and two levels of virtualization, as depicted in Table 12-2. The physical machine hosts a Vagrant instance providing a fully virtualized VM on which Docker containers are run. There are several different IP addresses visible from different locations and a number of ports which must be open.

Table 12-2. Project servers

Server	Description
Base machine	Includes VirtualBox software maintained using Vagrant
Vagrant instance	A VirtualBox Linux VM with Docker software installed
Docker instance	Hosts a Jersey server running the web application

A port on the base machine needs to be available to Vagrant. This port is opened by configuring the Vagrant file. It is also included in the command used to run the Docker instance. From the outside, it appears that the base machine is simply listening and responding on the port. The networking possibilities are extensive, but this example can be set up in only a few steps.

To start, we will open a port in Vagrant so that your host machine will be able to see things running on it. Port forwarding is the practice of specifying ports on the VM to share through a port on the host machine. The specified port is permitted to be the same or different as the one for the host machine. In this example, we will forward port 49005 on the Vagrant VM through to port 49005 on the host machine by modifying the Vagrantfile that comes with Docker:

...

Vagrant::Config.run do |config|

...

config.vm.forward_port 49005, 49005

...

With the single container running the WAR on Jetty as listed above, the ID and IP address of the container can be determined by running a few commands from within the Vagrant VM and the accessed page:

ID=$(docker ps | awk '{print $1}' | grep -v CONTAINER)

IP_ADDRESS=$(docker inspect $ID | grep IPAddress | awk -F'"' '{print $4}')

echo $ID

echo $IP_ADDRESS

curl $IP_ADDRESS

The IP address here is meaningful within the Vagrant VM itself. It is not visible to the outside world. This is where the port forwarding specified in the Vagrantfile comes into play. From the host machine, view http://localhost:49005/ in a browser and you will see the main page from the WAR displayed.

Conclusion

In the 1980s, the term “virtual reality” was popularized by Jaron Lanier. Movies, video games, and sophisticated simulations have benefitted from VR advances since then, but the world of virtualization that has had a larger scale impact among computer professionals is the virtualization of computer hardware itself.

Java’s success is largely due to the Java Virtual Machine, an abstraction layer that hides underlying operating system details. Servlet containers and JEE application servers provide an additional level of abstraction. The development of higher levels of abstraction allows a higher degree of specialization by removing entire classes of problems from the immediate problem space. Client-server applications easily run on highly scalable solutions using modern virtualization and can be discretely packaged for easy deployment due to their structured, compartmentalized architecture. There is obviously tremendous benefit to be found in technologies that are in essence the same as—yet not formally equivalent to—some underlying layer of functionality.

It seemed fitting to open a chapter on virtualization with a quote from a fictional character. A compelling character in a movie—no matter how engaging—is distinct from a real person. Even so, virtualization in its many forms imitiates some underlying technology in a way that can make it appear from certain vantage points indistinguishable from the physical representation it emulates.

^[4^]This URL was too long to fit, and line breaks in URLs are not supported. The actual reference is http://repo2.maven.org/maven2/org/mortbay/jetty/jetty-runner/8.1.9.v20130131/jetty-runner-8.1.9.v20130131.jar.