Request Handlers and Instances - Programming Google App Engine (2012)

Programming Google App Engine

Chapter 4. Request Handlers and Instances

When a request arrives intended for your application code, the frontend routes it to the application servers. If an instance of your app is running and available to receive a user request, App Engine sends the request to the instance, and the instance invokes the request handler that corresponds with the URL of the request. If none of the running instances of the app are available, App Engine starts up a new one automatically. App Engine will also shut down instances it no longer needs.

The instance is your app’s unit of computing power. It provides memory and a processor, isolated from other instances for both data security and performance. Your application’s code and data stay in the instance’s memory until the instance is shut down, providing an opportunity for local storage that persists between requests.

Within the instance, your application code runs in a runtime environment. The environment includes the language interpreter, libraries, and other environment features you selected in your app’s configuration. Your app can also access a read-only filesystem containing its files (those that you did not send exclusively to the static file servers). The environment manages all the inputs and outputs for the request handler, setting up the request at the beginning, recording log messages during, and collecting the response at the end.

If you have multithreading enabled, an instance can handle multiple requests concurrently, with all request handlers sharing the same environment. With multithreading disabled, each instance handles one request at a time. Multithreading is one of the best ways to utilize the resources of your instances and keep your costs low. But it’s up to you to make sure your request handler code runs correctly when handling multiple requests concurrently.

The runtime environment and the instance are abstractions. They rest above, and take the place of, the operating system and the hardware. It is these abstractions that allow your app to scale seamlessly and automatically on App Engine’s infrastructure. At no point must you write code to start or stop instances, load balance requests, or monitor resource utilization. This is provided for you.

In fact, you could almost ignore instances entirely and just focus on request handlers: a request comes in, a request handler comes to life, a response goes out. During its brief lifetime, the request handler makes a few decisions and calls a few services, and leaves no mark behind. The instance only comes into play to give you more control over efficiency: local memory caching, multithreading, and warm-up initialization. You can also configure the hardware profile and parameters of instance allocation, which involve trade-offs of performance and cost.

In this chapter, we discuss the features of the runtime environments. We introduce a way of thinking about request handlers, and how they fit into the larger notion of instances and the App Engine architecture. We also cover how to tune your instances for performance and resource utilization.

The Runtime Environment

All code execution occurs in the runtime environment you have selected for your app. There are four major runtime environments: Java, Python 2.5, Python 2.7, and Go. For this edition of this book, we’re focusing on the Python 2.7 and Java environments.

The runtime environment manages all the interaction between the application code and the rest of App Engine. To invoke an application to handle a request, App Engine prepares the runtime environment with the request data, calls the appropriate request handler code within the environment, then collects and returns the response. The application code uses features of the environment to read inputs, call services, and calculate the response data.

The environment isolates and protects your app to guarantee consistent performance. Regardless of what else is happening on the physical hardware that’s running the instance, your app sees consistent performance as if it is running on a server all by itself. To do this, the environment must restrict the capabilities normally provided by a traditional server operating system, such as the ability to write to the local filesystem.

An environment like this is called a “sandbox”: what’s yours is yours, and no other app can intrude. This sandbox effect also applies to your code and your data. If a piece of physical hardware happens to be running instances for two different applications, the applications cannot read each other’s code, files, or network traffic.

App Engine’s services are similarly partitioned on an app-by-app basis, so each app sees an isolated view of the service and its data. The runtime environment includes APIs for calling these services in the form of language-specific libraries. In a few cases, portions of standard libraries have been replaced with implementations that make service calls.

The Sandbox

The runtime environment does not expose the complete operating system to the application. Some functions, such as the ability to create arbitrary network connections, are restricted. This “sandbox” is necessary to prevent other applications running on the same server from interfering with your application (and vice versa). Instead, an app can perform some of these functions using App Engine’s scalable services, such as the URL Fetch service.

The most notable sandbox restrictions include the following:

§ An app cannot spawn additional processes. All processing for a request must be performed by the request handler’s process. Multiple threads within the process are allowed, but a request handler is not considered finished (and does not return a response) until all threads have exited.

§ An app cannot make arbitrary network connections. Networking features are provided by the App Engine services, such as URL Fetch and Mail.

§ The app does not manipulate the socket connection with the client directly. Instead, the app prepares the response data, then exits. App Engine takes care of returning the response. This isolates apps from the network infrastructre, at the expense of preventing some niceties like streaming partial results data.

§ An app can only read from the filesystem, and can only read its own code and resource files. It cannot create or modify files. Instead of files, an app can use the datastore to save data.

§ An app cannot see or otherwise know about other applications or processes that may be running on the server. This includes other request handlers from the same application that may be running simultaneously.

§ An app cannot read another app’s data from any service that stores data. More generally, an app cannot pretend to be another app when calling a service, and all services partition data between apps.

These restrictions are implemented on multiple levels, both to ensure that the restrictions are enforced and to make it easier to troubleshoot problems that may be related to the sandbox. For example, some standard library calls have been replaced with behaviors more appropriate to the sandbox.

Quotas and Limits

The sandboxed runtime environment monitors the system resources used by the application and limits how much the app can consume. For the resources you pay for, such as running time and storage, you can lift these limits by allocating a daily resource budget in the Administration Console. App Engine also enforces several system-wide limits that protect the integrity of the servers and their ability to serve multiple apps.

In App Engine parlance, “quotas” are resource limits that refresh at the beginning of each calendar day (at midnight, Pacific Time). You can monitor your application’s daily consumption of quotas using the Administration Console, in the Quota Details section.

Since Google may change how the limits are set as the system is tuned for performance, we won’t state some of the specific values of these limits in this book. You can find the actual values of these limits in the official App Engine documentation. Google has said it will give 90 days’ notice before changing limits in a way that would affect existing apps.

Request limits

Several system-wide limits specify how requests can behave. These include the size and number of requests over a period of time, and the bandwidth consumed by inbound and outbound network traffic.

One important request limit is the request timer. An application has 60 seconds to respond to a user request.

Near the end of the 60 seconds, the server raises an exception that the application can catch for the purposes of exiting cleanly or returning a user-friendly error message. For Python, the request timer raises a google.appengine.runtime.DeadlineExceededError. For Java, the request timer throws a com.google.apphosting.api.DeadlineExceededException.

If the request handler has not returned a response or otherwise exited after 60 seconds, the server terminates the process and returns a generic system error (HTTP code 500) to the client.

The 60-second limit applies to user web requests, as well as requests for web hooks such as incoming XMPP and email requests. A request handler invoked by a task queue or scheduled task can run for up to 10 minutes in duration. Tasks are a convenient and powerful tool for performing large amounts of work in the background. We’ll discuss tasks in Chapter 16.

The size of a request is limited to 32 megabytes, as is the size of the request handler’s response.

Service limits

Each App Engine service has its own set of quotas and limits. As with system-wide limits, some can be raised using a billing account and a budget, such as the number of recipients the application has sent emails to. Other limits are there to protect the integrity of the service, such as the maximum size of a response to the URL Fetch service.

In Python, when an app exceeds a service-specific limit or quota, the runtime environment raises a ...runtime.apiproxy_errors.OverQuotaError.

In Java, the service call throws a com.google.apphosting.api.ApiProxy.OverQuotaException when a service limit is exceeded. (Note the apphosting package name here, not appengine.)

With a few notable exceptions, the size of a service call and the size of the service response are each limited to 1 megabyte. This imposes an inherent limit on the size of datastore entities and memcache values. Even though an incoming user request can contain up to 32 megabytes, only 1 megabyte of that data can be stored using a single datastore entity or memcache value.

The datastore has a “batch” API that allows you to store or fetch multiple data objects in a single service call. The total size of a batch request to the datastore is unlimited: you can attempt to store or fetch as many entities as can be processed within an internal timing limit for datastore service calls. Each entity is still limited to 1 megabyte in size.

The memcache also has a batch API. The total size of the request of a batch call to the memcache, or its response, can be up to 32 megabytes. As with the datastore, each memcache value cannot exceed 1 megabyte in size.

The URL Fetch service, which your app can use to connect to remote hosts using HTTP, can issue requests up to 5 megabytes, and receive responses up to 32 megabytes.

We won’t list all the service limits here. Google raises limits as improvements are made to the infrastructure, and numbers printed here may be out-of-date. See the official documentation for a complete list, including the latest values.

Deployment limits

Two limits affect the size and structure of your application’s files. A single application file cannot be larger than 32 megabytes. This applies to resource files (code, configuration) as well as static files. Also, the total number of files for an application cannot be larger than 10,000, including resource files and static files. The total size of all files must not exceed 150 megabytes.

These limits aren’t likely to cause problems in most cases, but some common tasks can approach these numbers. Some third-party libraries or frameworks can be many hundreds of files. Sites consisting of many pages of text or images (not otherwise stored in the datastore) can reach the file count limit. A site offering video or software for download might have difficulty with the 32-megabyte limit.

The Python runtime offers two ways to mitigate the application file count limit. If you have many files of Python code, you can store the code files in a ZIP archive file, then add the path to the ZIP archive to sys.path at the top of your request handler scripts. The request handler scripts themselves must not be in a ZIP archive. Thanks to zipimport, a feature built into Python, the Python interpreter recognizes the ZIP file automatically and unpacks it as needed when importing modules. Unpacking takes additional CPU time, but since imports are cached, the app only incurs this cost the first time the module is imported in a given app instance:

import sys

sys.path.insert(0, 'locales.zip')

import locales.es

The Python App Engine runtime includes a similar mechanism for serving static files from a ZIP archive file, called zipserve. Unlike zipimport, this feature is specific to App Engine. To serve static files from a ZIP archive, add the zipserve request handler to your app.yaml, associated with a URL path that represents the path to the ZIP file:

- url: /static/images/.*

script: $PYTHON_LIB/google/appengine/ext/zipserve

This declares that all requests for a URL starting with /static/images/ should resolve to a path in the ZIP file /static/images.zip.

The string $PYTHON_LIB in the script path refers to the location of the App Engine libraries, and is the only such substitution available. It’s useful precisely for this purpose, to set up a request handler whose code is in the App Engine Python modules included with the runtime environment. (zipserve is not a configurable built-in because it needs you to specify the URL mapping.)

When using zipserve, keep in mind that the ZIP archive is uploaded as a resource file, not a static file. Files are served by application code, not the static file infrastructure. By default, the handler advises browsers to cache the files for 20 minutes. You can customize the handler’s cache duration using the wrapper WSGIApplication. See the source code for google/appengine/ext/zipserve/__init__.py in the SDK for details.

Java applications have a common solution for reducing the file count for application code: JARs. If your app has too many .class files, simply put them in a JAR file by using the jar utility included with the Java development kit. As with Python, app caching reduces the overhead of unpacking JARs. There is no equivalent to zipserve for static files in Java included with the SDK, but it’s easy to write something that behaves similarly using JAR files.

Also, with Java, make sure to use <static-files> and <resource-files> directives in your appengine-web.xml file to exclude the appropriate files. By default, all files outside of WEB-INF/ belong to both groups, and so are counted twice, once for each group. The file count limit is the total count for both groups.

An application can only be uploaded a limited number of times per day, currently 1,000. You may not notice this limit during normal application development. If you are using app deployments to upload data to the application on a regular schedule, you may want to keep this limit in mind.

Billable quotas

Every application gets a limited amount of computing resources for free, so you can start developing and testing your application right away. You can purchase additional computing resources at competitive rates. You only pay for what you actually use, and you specify the maximum amount of money you want to spend.

You can create an app by using the free limits without setting up a billing account. Free apps never incur charges, but are constrained by the free quotas. When you are ready for your app to accept live traffic or otherwise exceed the free quotas, you enable billing for the app, and set a resource budget.

There are two categories of billing accounts: standard, and Premier. Standard account holders pay a minimum weekly charge for each app with billing enabled, in addition to resource charges. (Apps without billing enabled never incur a charge.) Standard accounts are paid via Google Wallet, which uses a credit card or a signature debit card. Premier account holders pay a flat monthly rate for unlimited apps, and only pay additionally for resource charges. Premier accounts are also eligible for monthly invoicing, a service-level agreement, and technical support from Google staff. See the App Engine website for information on pricing and Premier accounts.

Each app gets its own maximum daily resource budget. To enable billing and set the budget for an app, sign in to the Administration Console with the developer account that is to be the billing account. Select Billing Settings from the sidebar. Click the Enable Billing button, and follow the prompts.

For standard accounts, when you set a budget, you are prompted to go through Google Checkout to approve the maximum charge. Setting a budget does not apply a charge to your account, it only sets the maximum.

The budget specifies the amount of money App Engine can “spend” on resources, at the posted rates, over the course of a day. This budget is in addition to the free quotas: the budget is not consumed until after a resource has exceeded its free quota. After the budget for the calendar day is exhausted, service calls that would require more resources raise an exception. If there are not enough resources remaining to invoke a request handler, App Engine will respond to requests with a generic error message. The budget resets at the beginning of each calendar day (Pacific Time).

TIP

It’s worth repeating: you are only charged for the resources your app uses. If you set a high daily resource budget and App Engine only uses a portion of it, you are only charged for that portion. Typically, you would test your app to estimate resource consumption, then set the budget generously so every day comes in under the budget. The budget maximum is there to prevent unexpected surges in resource usage from draining your bank account—a monetary surge protector, if you will. If you’re expecting a spike in traffic (such as for a product launch), you may want to raise your budget in advance of the event.

The owner of the billing account can change the budget and the resource allocations for the app at any time using the Administration Console. A change to your budget takes about 10 minutes to complete, and you will not be able to change the setting again during those 10 minutes.

The official documentation includes a complete list of the free quota limits, the increased free quota limits with billing enabled, the maximum allocation amounts, and the latest billing rates.

The Python Runtime Environment

When an app instance receives a request intended for a Python application, it compares the URL path of the request to the URL patterns in the app’s app.yaml file. As we saw in Configuring a Python App, each URL pattern is associated with either the Python import path for a WSGI application instance, or a file of Python code (a “script”). The first pattern to match the path identifies the code that will handle the request.

If the handler is a WSGI instance, the runtime environment prepares the request and invokes the handler according to the WSGI standard. The handler returns the response in kind.

If the handler is a file of Python code, the runtime environment uses the Common Gateway Interface (CGI) standard to exchange request and response data with the code. The CGI standard uses a combination of environment variables and the application process’s input and output streams to handle this communication.

You’re unlikely to write code that uses the WSGI and CGI interfaces directly. Instead, you’re better off using an established web application framework. Python developers have many web frameworks to choose from. Django, Pyramid (of the Pylons Project), Flask, and web2py are several “full-stack” frameworks that work well with App Engine. For convenience, App Engine includes Django as part of the runtime environment. You can include other frameworks and libraries with your application simply by adding them to your application directory. As we saw in Chapter 2, App Engine also includes a simple framework of its own, called webapp2.

By the time an app instance receives the request, it has already fired up the Python interpreter, ready to handle requests. If the instance has served a request for the application since it was initialized, it may have the application in memory as well, but if it hasn’t, it imports the appropriate Python module for the request. The instance invokes the handler code with the data for the request, and returns the handler’s response to the client.

When you run a Python program loaded from a .py file on your computer, the Python interpreter compiles the Python code to a compact bytecode format, which you might see on your computer as a .pyc file. If you edit your .py source, the interpreter will recompile it the next time it needs it. Since application code does not change after you’ve uploaded your app, App Engine precompiles all Python code to bytecode one time when you upload the app. This saves time when a module or script is imported for the first time in each instance of the app.

The Python interpreter remains in the instance memory for the lifetime of the instance. The interpreter loads your code according to Python’s module import semantics. Typically, this means that once a module is imported for the first time on an instance, subsequent attempts to import it do nothing, since the module is already loaded. This is true across multiple requests handled by the same instance.

The Python 2.7 runtime environment uses a modified version of the official Python 2.7 interpreter, sometimes referred to as “CPython” to distinguish it from other Python interpreters. The application code must run entirely within the Python interpreter. That is, the code must be purely Python code, and cannot include or depend upon extensions to the interpreter. Python modules that include extensions written in C cannot be uploaded with your app or otherwise added to the runtime environment. The “pure Python” requirement can be problematic for some third-party libraries, so be sure that libraries you want to use operate without extensions.

A few popular Python libraries, including some that depend on C code, are available within the runtime environment. Refer back to Python Libraries for more information.

App Engine sets the following environment variables at the beginning of each request, which you can access using os.environ:

APPLICATION_ID

The ID of the application. The ID is preceded by s~ when running on App Engine, and dev~ when running in a development server.

CURRENT_VERSION_ID

The ID of the version of the app serving this request.

AUTH_DOMAIN

This is set to gmail.com if the user is signed in using a Google Account, or the domain of the app if signed in with a Google Apps account; not set otherwise.

SERVER_SOFTWARE

The version of the runtime environment; starts with the word Development when running on the development server.

For example:

import os

# ...

if os.environ['SERVER_SOFTWARE'].startswith('Development'):

# ... only executed in the development server ...

The Python interpreter prevents the app from accessing illegal system resources at a low level. Since a Python app can consist only of Python code, an app must perform all processing within the Python interpreter.

For convenience, portions of the Python standard library whose only use is to access restricted system resources have been disabled. If you attempt to import a disabled module or call a disabled function, the interpreter raises an ImportError. The Python development server enforces the standard module import restrictions, so you can test imports on your computer.

Some standard library modules have been replaced with alternate versions for speed or compatibility. Other modules have custom implementations, such as zipimport.

The Java Runtime Environment

The Java runtime environment behaves like a J2EE servlet container. When the app instance receives a request, it determines the servlet class to call by comparing the URL path to the servlet mappings in the deployment descriptor. The server uses the standard servlet interface to invoke the servlet, passing it a populated request object and an empty response object. The application’s servlet code populates the response object and exits, and App Engine returns the response to the client.

The Java runtime environment uses the Java 6 virtual machine (JVM). The JVM runs Java bytecode, which is what you get from passing Java code to a Java compiler. It’s also what you get from compilers for other languages that produce Java bytecode, such as Scala, and from interpreters for other languages implemented in Java bytecode, such as JRuby (Ruby), Rhino (JavaScript), Groovy, and even Jython (a Python interpreter implemented in Java). You can use any language that compiles to or has an interpreter for the JVM to write applications for App Engine, as long as the result implements a servlet interface.

Having a complete JVM also means you can use many third-party libraries with your application. Some restrictions apply—we’ll look at a few in a moment—but in most cases, using a library is a simple matter of including the JAR or class files in the application’s WAR.

An app can ask for information about the current environment by using the SystemProperty API, in the com.google.appengine.api.utils package. App Engine sets static fields of this class to the application ID (applicationId), the application version (applicationVersion), the version of the runtime environment (version), and whether the app is running in the development environment or on App Engine (environment):

import com.google.appengine.api.utils.SystemProperty;

// ...

String applicationId = SystemProperty.applicationId.get();

if (SystemProperty.environment.value() ==

SystemProperty.Environment.Value.Development) {

// ... only executed in the development server ...

}

In the Java runtime environment, sandbox restrictions are enforced within the JVM. These restrictions are implemented using a combination of JVM permissions, a Java Runtime Environment (JRE) class whitelist, and alternate implementations for specific functions. This fine-grained approach allows more third-party libraries to work and makes other code easier to port than relying on JVM permissions alone.

The Java runtime environment includes a subset of the JRE classes. You can find a complete list of supported JRE classes in the official documentation. The development server enforces this list, so if your code (or some library code) crosses a line, the development server throws an exception.

Reflection is supported for all the app’s own classes. Custom class loaders are supported, but with all classes granted the same permissions. Native JNI code is not supported.

The Request Handler Abstraction

Let’s review what we know so far about request handlers. A request handler is an entry point into the application code, mapped to a URL pattern in the application configuration. Here is a section of configuration for a request handler in a Python app, which would appear in the app.yaml file:

handlers:

- url: /profile/.*

script: users.profile.app

A Python source file named users/profile.py contains a WSGI application instance in a variable named app. This code knows how to invoke the webapp2 framework to handle the request, which in turn calls our code:

import jinja2

import os

import webapp2

from google.appengine.api import users

from google.appengine.ext import db

class UserProfile(db.Model):

user = db.UserProperty()

template_env = jinja2.Environment(

loader=jinja2.FileSystemLoader(os.path.dirname(__file__))

class ProfileHandler(webapp2.RequestHandler):

def get(self):

# Call the Users service to identify the user making the request,

# if the user is signed in.

current_user = users.get_current_user()

# Call the Datastore service to retrieve the user's profile data.

profile = None

if current_user:

profile = UserProfile.all().filter('user', current_user).fetch(1)

# Render a response page using a template.

template = template_env.get_template('profile.html')

self.response.out.write(template.render({'profile': profile})

app = webapp2.WSGIApplication([('/profile/?', ProfileHandler)], debug=True)

When a user visits the URL path /profile/ on this application’s domain, App Engine matches the request to users.profile.app via the application configuration, and then invokes it to produce the response. The WSGIApplication creates an object of the ProfileHandler class with the request data, then calls its get() method. The method code makes use of two App Engine services, the Users service and the Datastore service, to access resources outside of the app code. It uses that data to make a web page, then exits.

In theory, the application process only needs to exist long enough to handle the request. When the request arrives, App Engine figures out which request handler it needs, makes room for it in its computation infrastructure, and creates it in a runtime environment. Once the request handler has created the response, the show is over, and App Engine is free to purge the request handler from memory. If the application needs data to live on between requests, it stores it by using a service like the datastore. The application itself does not live long enough to remember anything on its own.

Figure 4-1 illustrates this abstract lifecycle of a request handler.

Request handlers in the abstract: 1. A request arrives; 2. A request handler is created; 3. The request handler calls services and computes the response; 4. The request handler terminates, the response is returned

Figure 4-1. Request handlers in the abstract: 1. A request arrives; 2. A request handler is created; 3. The request handler calls services and computes the response; 4. The request handler terminates, the response is returned

On App Engine, a web application can handle many requests simultaneously. There could be many request handlers active at any given moment, in any stage of its lifecycle. As shown in Figure 4-2, all these request handlers access the same services.

A web application handles many requests simultaneously; all request handlers access the same services

Figure 4-2. A web application handles many requests simultaneously; all request handlers access the same services

Each service has its own specification for managing concurrent access from multiple request handlers, and for the most part, a request handler doesn’t have to think about the fact that other request handlers are in play. The big exception here is datastore transactions, which we’ll discuss in detail in Chapter 7.

The request handler abstraction is useful for thinking about how to design your app, and how the service-oriented architecture is justified. App Engine can create an arbitrary number of request handlers to handle an arbitrary number of requests simultaneously, and your code barely has to know anything about it. This is how your app scales with traffic automatically.

Introducing Instances

The idea of a web application being a big pot of bubbling request handlers is satisfying, but in practice, this abstraction fails to capture an important aspect of real-world system software. Starting a program for the first time on a fresh system can be expensive: code is read into RAM from disk, memory is allocated, data structures are set up with starting values, and configuration files are read and parsed. App Engine initializes new runtime environments prior to using them to execute request handlers, so the environment initialization cost is not incurred during the handler execution. But application code often needs to perform its own initialization that App Engine can’t do on its own ahead of time. The JVM and Python interpreters are designed to exploit local memory, and many web application frameworks perform initialization, expecting the investment to pay off over multiple requests. It’s wasteful and impractical to do this at the beginning of every request handler, while the user is waiting.

App Engine solves this problem with instances, long-lived containers for request handlers that retain local memory. At any given moment, an application has a pool of zero or more instances allocated for handling requests. App Engine routes new requests to available instances. It creates new instances as needed, and shuts down instances that are excessively idle. When a request arrives at an instance that has already handled previous requests, the instance is likely to have already done the necessary preparatory work, and can serve the response more quickly than a fresh instance.

The picture now looks something like Figure 4-3. The request handler still only lives as long as it takes to return the response, but its actions can now affect instance memory. This instance memory remains available to the next request handler that executes inside the instance.

An instance handles a request, while another instance sits idle

Figure 4-3. An instance handles a request, while another instance sits idle

Keep in mind that instances are created and destroyed dynamically, and requests are routed to instances based purely on availability. While instances are meant to live longer than request handlers, they are as ephemeral as request handlers, and any given request may be handled by a new instance. There is no guarantee that requests of a particular sort will always be handled by the same instance, nor is it assured that an instance will still be around after a given request is handled. Outside of a request handler, the application is not given the opportunity to rescue data from local memory prior to an instance being shut down. If you need to store user-specific information (such as session data), you must use a storage service. Instance memory is only suitable for local caching.

Instances can provide another crucial performance benefit: multithreading. With multithreading enabled in your application configuration, an instance will start additional request handlers in separate threads as local resources allow, and execute them concurrently. All threads share the same instance memory just like any other multithreaded application—which means your code must take care to protect shared memory during critical sections of code. Java, Python, and Go have language and library features for synchronizing access to shared memory.

Figure 4-4 illustrates an instance with multithreading enabled. Refer to Multithreading for information on how to enable or disable multithreading in application configuration.

A multithreaded instance handles multiple requests concurrently

Figure 4-4. A multithreaded instance handles multiple requests concurrently

Instance uptime is App Engine’s billable unit for computation, measured in fractions of an instance hour. This makes multithreading an important technique for maximizing throughput and minimizing costs. Most request handlers will spend a significant amount of time waiting for service calls, and a multithreaded instance can use the CPU for other handlers during that time.

Request Scheduling and Pending Latency

App Engine routes each request to an available instance. If all instances are busy, App Engine starts a new instance.

App Engine considers an instance to be “available” for a request if it believes the instance can handle the request in a reasonable amount of time. With multithreading disabled, this definition is simple: an instance is available if it is not presently busy handling a request.

With multithreading enabled, App Engine decides whether an instance is available based on several factors. It considers the current load on the instance (CPU and memory) from its active request handlers, and its capacity. It also considers historical knowledge of the load caused by previous requests to the given URL path. If it seems likely that the new request can be handled effectively in the capacity of an existing instance, the request is scheduled to that instance.

Incoming requests are put on a pending queue in preparation for scheduling. App Engine will leave requests on the queue for a bit of time while it waits for existing instances to become available, before deciding it needs to create new instances. This waiting time is called the pending latency.

You can configure a maximum pending latency and a minimum pending latency for your app. By default, App Engine will determine appropriate latency bounds automatically.

The maximum pending latency is the most amount of time a request will wait on the pending queue before App Engine decides more instances are needed to handle the current level of traffic. Lowering the maximum pending latency potentially reduces the average wait time, at the expense of activating more instances. Conversely, raising the maximum favors reusing existing instances, at the expense of potentially making the user wait a bit longer for a response.

The minimum pending latency specifies a minimum amount of time a request must be on the pending queue before App Engine can conclude a new instance needs to be started. Raising the minimum encourages App Engine to be more conservative about creating new instances. (This minimum only refers to creating new instances. Naturally, if an existing instance is available for a pending request, the request is scheduled immediately.)

To adjust the pending latency bounds for your app, go to the Administration Console. Select the Application Settings panel, then scroll down to the Performance heading to find the Pending Latency widget (pictured in Figure 4-5). Adjust the minimum and maximum sliders as needed, and then click the Save Settings button.

The Pending Latency control in the Administration Console, under Application Settings

Figure 4-5. The Pending Latency control in the Administration Console, under Application Settings

Warm-up Requests

There is a period of time between the moment App Engine decides it needs a new instance and the moment the instance is available to handle the next request off the request queue. During this time, App Engine initializes the instance on the server hardware, sets up the runtime environment, and makes the app files available to the instance. App Engine takes this preparation period into account when scheduling request handlers and instance creation.

The goal is to make the instance as ready as possible prior to handling the first request, so when the request handler begins, the user only waits on the request handler logic, not the initialization. But App Engine can only do so much on its own. Many initialization tasks are specific to your application code. For instance, App Engine can’t automatically import every module in a Python app, because imports execute code, and an app may need to import modules selectively.

App-specific initialization potentially puts undue burden on the first request handler to execute on a fresh instance. A “loading request” typically takes longer to execute than subsequent requests handled by the same instance. This is common enough that App Engine will add a log message automatically when a request is the first request for an instance, so you can detect a correlation between performance issues and app initialization.

You can mitigate the impact of app initialization with a feature called warm-up requests. With warm-up requests enabled, App Engine will attempt to issue a request to a specific warm-up URL immediately following the creation of a new instance. You can associate a warm-up request handler with this URL to perform initialization tasks that are better performed outside of a user-facing request handler.

To enable warm-up requests, activate the warmup inbound service in your app configuration. (Refer to Inbound Services.) In Python, set this in your app.yaml file:

inbound_services:

- warmup

In Java, set this in your appengine-web.xml file:

<inbound-services>

<service>warmup</service>

</inbound-services>

Warm-up requests are issued to this URL path:

/_ah/warmup

You bind your warm-up request handler to this URL path in the usual way.

NOTE

There are a few rare cases where an instance will not receive a warm-up request prior to the first user request even with warm-up requests enabled. Make sure your user request handler code does not depend on the warm-up request handler having already been called on the instance.

Resident Instances

Instances stick around for a while after finishing their work, in case they can be reused to handle more requests. If App Engine decides it’s no longer useful to keep an instance around, it shuts down the instance. An instance that is allocated but is not handling any requests is considered anidle instance.

Instances that App Engine creates and destroys as needed by traffic demands are known as dynamic instances. App Engine uses historical knowledge about your app’s traffic to tune its algorithm for dynamic instance allocation to find a balance between instance availability and efficient use of resources.

You can adjust how App Engine allocates instances by using two settings: minimum idle instances and maximum idle instances.

The minimum idle instances setting ensures that a number of instances are always available. The actual number of idle instances will still fluctuate with traffic, but increasing this number will cause App Engine to be more aggressive about starting new instances as it tries to ensure the minimum.

Setting a nonzero minimum for idle instances also ensures that at least this many instances are never terminated due to low traffic. Because App Engine does not start and stop these instances due to traffic fluctuations, these instances are not dynamic; instead, they are known as resident instances.

You must enable warm-up instances to set the minimum idle instances to a nonzero value.

Reserving resident instances can help your app handle sharp increases in traffic. For example, you may want to increase the resident instances prior to launching your product or announcing a new feature. You can reduce them again as traffic fluctuations return to normal. You might also reserve instances prior to executing a large batch job, or keep instances available for task queues, so spikes in load are less likely to affect end users.

App Engine only maintains resident instances for the default version of your app. While you can make requests to nondefault versions, only dynamic instances will be created to handle those requests. When you change the default version (in the Versions panel of the Administration Console), the previous resident instances are allowed to finish their current request handlers, then they are shut down and new resident instances running the new default version are created.

TIP

Resident instances are billed at the same rate as dynamic instances. Be sure you want to pay for 24 instance hours per day per resident instance before changing this setting.

The maximum idle instances setting adjusts how aggressively App Engine terminates idle instances above the minimum. Increasing the maximum causes idle dynamic instances to live longer; decreasing the maximum causes them to die more quickly. A larger maximum is useful for keeping more dynamic instances available for rapid fluctuations in traffic, at the expense of greater unused (dynamic) capacity. The name “maximum idle instances” is not entirely intuitive, but it opposes “minimum idle instances” in an obvious way: the maximum can’t be lower than the minimum.

To adjust the minimum and maximum idle instances settings for your app, go to the Administration Console. Select the Application Settings panel, then scroll down to the Performance heading to find the Idle Instances widget (pictured in Figure 4-6). Adjust the minimum and maximum sliders as needed, and then click the Save Settings button.

The Idle Instances control in the Administration Console, under Application Settings; the Console warns that warm-up requests must be enabled before setting a minimum

Figure 4-6. The Idle Instances control in the Administration Console, under Application Settings; the Console warns that warm-up requests must be enabled before setting a minimum

The Instances Console

The Administration Console includes a panel for inspecting your app’s currently active instances. A portion of such a panel is shown in Figure 4-7.

An excerpt of the Instances panel of the Administration Console

Figure 4-7. An excerpt of the Instances panel of the Administration Console

You can use this panel to inspect the general behavior of your application code running in an instance. This includes summary information about the number of instances, and averages for QPS, latency, and memory usage per instance over the last minute of activity. Each active instance is also itemized, with its own QPS and latency averages, total request and error counts over the lifetime of the instance, the age of the instance, current memory usage, and whether the instance is resident or dynamic. You can query the logs for requests handled by the individual instance.

You can also shut down an instance manually from this panel. If you shut down a resident instance, a new resident instance will be started in its place, effectively like restarting the instance. If you shut down a dynamic instance, a new instance may or may not be created as per App Engine’s algorithm and the app’s idle instance settings.

As with several other Console panels, the Instances panel is specific to the selected version of your app. If you want to inspect instances handling requests for a specific app version, be sure to select it from the Console’s app version drop-down at the top of the screen.

Instance Hours and Billing

Instance use is a resource measured in instance hours. An instance hour corresponds to an hour of clock time that an instance is alive. An instance is on the clock regardless of whether it is actively serving traffic or is idle, or whether it is resident or dynamic.

Each instance incurs a mandatory charge of 15 minutes, added to the end of the instance’s lifespan. This accounts for the computational cost of instance creation and other related resources. This is one reason why you might adjust the minimum pending latency and maximum idle instances settings to avoid excess instance creation.

The free quotas include a set of instance hours for dynamic instances. The free quota for dynamic instances is enough to power one instance of the most basic class (“F1”) continuously, plus a few extra hours per day.

When you set a resource budget for your app, you can also set up a plan to prepurchase instance hours at a discounted rate on a weekly basis. Once your app is up and receiving live traffic, you may be able to establish a lower bound for the number of instance hours your app consumes on a regular basis. You can save money by prepurchasing these hours at the discounted rate.

Instance Classes

Each instance gets a fixed amount of memory and CPU resources. You can select from one of several configurations of memory and CPU, called the instance class. The most basic instance class, the “F1” class, provides the equivalent of a 600 MHz CPU with 128 megabytes of memory.

Each class is defined as a multiple of the “F1” class, and is billed accordingly. An instance of the “F2” class provides a 1,200 MHz CPU and 256 megabytes of memory, twice as much of each resource as an “F1” instance. An instance hour of the “F2” class costs twice as much as an instance hour of the “F1” class. There’s also an “F4” class, which is twice as powerful, and twice as expensive, as the “F2” class. (As usual, see the official App Engine website for the latest prices.)

The free quota and prepurchased discount hours are measured in the units of the “F1” class. Instances of higher classes consume these resources at multiplied rates.

You can select the instance class for user requests in the Administration Console: select the Application Settings screen, and scroll down to Performance. Select the desired Frontend Instance Class from the drop-down menu, then click the Save Settings button. When you change this setting, all resident instances are restarted to use the new class, and all new dynamic instances are created using this class.