Configuring an Application - Programming Google App Engine with Python (2015)

Programming Google App Engine with Python (2015)

Chapter 3. Configuring an Application

Many of App Engine’s features can be tailored and controlled using configuration files that you deploy alongside your code. A few of these features apply to the entire project’s use of a service, such as datastore index configuration (which we’ll cover in Chapter 7). The most important configuration file controls how App Engine manages incoming requests, how App Engine runs your code on its scalable servers, and how App Engine routes requests to your code on those servers.

To build an App Engine application, you write code for one or more request handlers, and provide configuration describing to App Engine which requests go to which handlers. The life of a request handler begins when a single request arrives, and ends when the handler has done the necessary work and computed the response. App Engine does all the heavy lifting of accepting incoming TCP/IP connections, reading HTTP request data, ensuring that an instance of your app is running on an application server, routing the request to an available instance, calling the appropriate request handler code in your app, and collecting the response from the handler and sending it back over the connection to the client.

The system that manages and routes requests is known generally as the App Engine frontend. You can configure the frontend to handle different requests in different ways. For instance, you can tell the frontend to route requests for some URLs to App Engine’s static file servers instead of the application servers, for efficient delivery of your app’s images, CSS, or JavaScript code. If your app takes advantage of Google Accounts for its users, you can tell the frontend to route requests from signed-in users to your application’s request handlers, and to redirect all other requests to the Google Accounts sign-in screen. The frontend is also responsible for handling requests over secure connections, using HTTP over SSL/TLS (sometimes called “HTTPS,” the URL scheme for such requests). Your app code only sees the request after it has been decoded, and the frontend takes care of encoding the response.

In this chapter, we take a look at App Engine’s request handling architecture, and follow the path of a web request through the system. We discuss how to configure the system to handle different kinds of requests, including requests for static content, requests for the application to perform work, and requests over secure connections. We also cover other frontend features such as custom error pages, and application features you can activate called “built-ins.”

We’ll also take this opportunity to discuss related features that you manage from the Cloud Console, including setting up custom domain names and SSL/TLS certificates.

The App Engine Architecture

The architecture of App Engine—and therefore an App Engine application—can be summarized as shown in Figure 3-1. (There are some lines missing from this simplified diagram. For instance, frontends have direct access to Cloud Storage for serving large data objects from app URLs. We’ll take a closer look at these in later chapters.)

pgap 0301

Figure 3-1. The App Engine request handling architecture

The first stop for an incoming request is the App Engine frontend. A load balancer, a dedicated system for distributing requests optimally across multiple machines, routes the request to one of many frontend servers. The frontend determines the app for which the request is intended from the request’s domain name, either the associated custom domain or subdomain, or the appspot.com subdomain. It then consults the app’s configuration to determine the next step.

The app’s configuration describes how the frontends should treat requests based on their URL paths. A URL path may map to a static file that should be served to the client directly, such as an image or a file of JavaScript code. Or, a URL path may map to a request handler, application code that is invoked to determine the response for the request. You upload this configuration data along with the rest of your application.

If the URL path for a request does not match anything in the app’s configuration, the frontends return an HTTP 404 Not Found error response to the client. By default, the frontends return a generic error response. If you want clients to receive a custom response when accessing your app, such as a friendly HTML message along with the error code, you can configure the frontend to serve a static HTML file. (In the case of Not Found errors, you can also just map all unmatched URL paths to an application handler, and respond any way you like.)

If the URL path of the request matches the path of one of the app’s static files, the frontend routes the request to the static file servers. These servers are dedicated to the task of serving static files, with network topology and caching behavior optimized for fast delivery of resources that do not change often. You tell App Engine about your app’s static files in the app’s configuration. When you upload the app, these files are pushed to the static file servers.

If the URL path of the request matches a pattern mapped to one of the application’s request handlers, the frontend sends the request to the app servers. The app server pool starts up an instance of the application on a server, or reuses an existing instance if there is one already running. The server invokes the app by calling the request handler that corresponds with the URL path of the request, according to the app configuration.

An instance is a copy of your application in the memory of an application server. The instance is isolated from whatever else is running on the machine, set up to perform equivalently to a dedicated machine with certain hardware characteristics. The code itself executes in a runtime environment prepared with everything the request handler needs to inspect the request data, call services, and evaluate the app’s code. There’s enough to say about instances and the runtime environment that an entire chapter is dedicated to the subject (Chapter 4).

You can configure the frontend to authenticate the user with Google Accounts. The frontend can restrict access to URL paths with several levels of authorization: all users, users who have signed in, and users who are application administrators. The frontend checks whether the user is signed in, and redirects the user to the Google Accounts sign-in screen if needed.

The frontend takes the opportunity to tailor the response to the client. Most notably, the frontend compresses the response data, using the gzip format, if the client gives some indication that it supports compressed responses. This applies to both app responses and static file responses, and is done automatically. The frontend uses several techniques to determine when it is appropriate to compress responses, based on web standards and known browser behaviors. If you are using a custom client that does not support compressed content, simply omit the “Accept-Encoding” request header to disable the automatic gzip behavior. Similarly, for clients that support the SPDY protocol, App Engine will use SPDY instead of HTTP 1.1, automatically and invisibly.

The frontends, app servers, and static file servers are governed by an “app master.” Among other things, the app master is responsible for deploying new versions of application software and configuration, and updating the “default” version served on an app’s user-facing domain. Updates to an app propagate quickly, but are not atomic in the sense that only code from one version of an app is running at any one time. If you switch the default version to new software, all requests that started before the switch are allowed to complete using their version of the software. If you have an app that makes an HTTP request to itself, you might run into a situation where an older version is calling a newer version or vice versa, but you can manage this in code if needed.

Configuring a Python App

The files for a Python application include Python code for request handlers and libraries, static files, and configuration files. On your computer, these files reside in the application root directory. Static files and application code may reside in the root directory or in subdirectories. Configuration files always reside in fixed locations in the root directory.

You configure the frontend for a Python application using a file named app.yaml in the application root directory. This file is in a format called YAML, a concise human-readable data format with support for nested structures like sequences and mappings.

Example 3-1 shows an example of a simple app.yaml file. We’ll discuss these features in the following sections. For now, notice a few things about the structure of the file:

§ The file is a mapping of values to names. For instance, the value python is associated with the name runtime.

§ Values can be scalars (python, 1), sequences of other values, or mappings of values to names. The value of handlers in Example 3-1 is a sequence of two values, each of which is a mapping containing two name-value pairs.

§ Order is significant in sequences, but not mappings.

§ YAML uses indentation to indicate scope.

§ YAML supports all characters in the Unicode character set. The encoding is assumed to be UTF-8 unless the file uses a byte order mark signifying UTF-16.

§ A YAML file can contain comments. All characters on a line after a # character are ignored, unless the # is in a quoted string value.

Example 3-1. An example of an app.yaml configuration file

application: ae-book

version: 1

runtime: python27

api_version: 1

threadsafe: true

handlers:

- url: /css

static_dir: css

- url: /.*

script: main.application

libraries:

- name: webapp2

version: "2.5.2"

Runtime Versions

Among other things, this configuration file declares that this application (or, specifically, this version of this application) uses the Python 2.7 runtime environment. It also declares which version of the Python 2.7 runtime environment to use. Currently, there is only one version of this environment, so api_version is always 1. If Google ever makes changes to the runtime environment that may be incompatible with existing applications, the changes may be released using a new version number. Your app will continue to use the version of the runtime environment specified in your configuration file, giving you a chance to test your code with the new runtime version before upgrading your live application.

You specify the name and version of the runtime environment in app.yaml, using the runtime and api_version elements, like so:

runtime: python27

api_version: 1

Google originally launched App Engine with a runtime environment based on Python 2.5. You can use this older environment by specifying a runtime of python. Note that this book mostly covers the newer Python 2.7 environment. You’ll want to use Python 2.7 for new apps, as many recent features only work with the newer environment.

App IDs and Versions

Every App Engine application has an application ID that uniquely distinguishes the app from all other applications. As described in Chapter 2, you can register an ID for a new application using the Cloud Console. Once you have an ID, you add it to the app’s configuration so the developer tools know that the files in the app root directory belong to the app with that ID. This ID appears in the appspot.com domain name:

app-id.appspot.com

The app’s configuration also includes a version identifier. Like the app ID, the version identifier is associated with the app’s files when the app is uploaded. App Engine retains one set of files and frontend configuration for each distinct version identifier used during an upload. If you do not change the app version in the configuration before you upload files, the upload replaces the existing files for that version.

Each distinct version of the app is accessible at its own domain name, of the following form:

version-id.app-id.appspot.com

When you have multiple versions of an app uploaded to App Engine, you can use the Cloud Console to select which version is the one you want the public to access. The Console calls this the “default” version. When a user visits your custom domain or the appspot.com domain without the version ID, she sees the default version.

The appspot.com domain containing the version ID supports an additional domain part, just like the default appspot.com domain:

anything.version-id.app-id.appspot.com

NOTE

Unless you explicitly prevent it, anyone who knows your application ID and version identifiers can access any uploaded version of your application using the appspot.com URLs. You can restrict access to nondefault versions of the application by using code that checks the domain of the request and only allows authorized users to access the versioned domains. You can’t restrict access to static files this way.

Another way to restrict access to nondefault versions is to use Google Accounts authorization, described later in this chapter. You can restrict access to app administrators while a version is in development, then replace the configuration to remove the restriction just before making that version the default version.

All versions of an app access the same datastore, memcache, and other services, and all versions share the same set of resources. Later on, we’ll discuss other configuration files that control these backend services. These files are separate from the configuration files that control the frontend because they are not specific to each app version.

There are several ways to use app versions. For instance, you can have just one version, and always update it in place. Or you can have a “dev” version for testing and a “live” version that is always the public version, and do separate uploads for each. Some developers generate a new app version identifier for each upload based on the version numbers used by a source code revision control system.

You can have up to 60 active versions, if billing is enabled for the app. You can delete previous versions, using the Cloud Console or the appcfg.py command.

Application IDs and version identifiers can contain numbers, lowercase letters, and hyphens.

The application ID and version identifier appear in the app.yaml file. The app ID is specified with the name application. The version ID is specified as version.

Here is an example of app.yaml using dev as the version identifier:

application: ae-book

version: dev

This would be accessible using this domain name:

http://dev.ae-book.appspot.com/

Multithreading

The Python 2.7 runtime environment supports handling multiple requests concurrently within each instance. This is a significant way to make the most of your instances, and is recommended. However, your code must be written with the knowledge that it will be run concurrently, and take the appropriate precautions with shared data. You must declare whether your code is “threadsafe” in your application configuration.

To set this preference, specify the threadsafe value in app.yaml, either true or false:

threadsafe: true

Request Handlers

The app configuration tells the frontend what to do with each request, routing it to either the application servers or the static file servers. The destination is determined by the URL path of the request. For instance, an app might send all requests whose URL paths start with /images/ to the static file server, and all requests for the site’s home page (the path /) to the app servers. The configuration specifies a list of patterns that match URL paths, with instructions for each pattern.

For requests intended for the app servers, the configuration also specifies the request handler responsible for specific URL paths. A request handler is an entry point into the application code. With the Python runtime environment, this entry point is an object that conforms to the Web Server Gateway Interface (WSGI). All Python web application frameworks provide this object for use with application containers such as App Engine.

NOTE

The URL /form is reserved by App Engine and cannot be used by the app. The explanation for this is historical and internal to App Engine, and unfortunately this is easy to stumble upon by accident. This URL will always return a 404 Not Found error.

All URL paths under /_ah/ are reserved for use by App Engine libraries and tools.

All URL paths for Python apps are described in the app.yaml file, using the handlers element. The value of this element is a sequence of mappings, where each item includes a pattern that matches a set of URL paths and instructions on how to handle requests for those paths. Here is an example with four URL patterns:

handlers:

- url: /profile/.*

script: userprofile.application

- url: /css

static_dir: css

- url: /info/(.*\.xml)

static_files: datafiles/\1

upload: datafiles/.*\.xml

- url: /.*

script: main.application

The url element in a handler description is a regular expression that matches URL paths. Every path begins with a forward slash (/), so a pattern can match the beginning of a path by also starting with this character. This URL pattern matches all paths:

url: /.*

If you are new to regular expressions, here is the briefest of tutorials: the . character matches any single character, and the * character says the previous symbol, in this case any character, can occur zero or more times. There are several other characters with special status in regular expressions. All other characters, like /, match literally. So this pattern matches any URL that begins with a / followed by zero or more of any character.

If a special character is preceded by a backslash (\), it is treated as a literal character in the pattern. Here is a pattern that matches the exact path /home.html:

- url: /home\.html

See the Python documentation for the re module for an excellent introduction to regular expressions. The actual regular expression engine used for URL patterns is not Python’s, but it’s similar.

App Engine attempts to match the URL path of a request to each handler pattern in the order the handlers appear in the configuration file. The first pattern that matches determines the handler to use. If you use the catchall pattern /.*, make sure it’s the last one in the list, as a later pattern will never match.

To map a URL path pattern to application code, you provide a script element. The value is the Python import path (with dots) to a global variable containing a WSGI application instance.1 The application root directory is in the lookup path, so in the preceding example,main.application could refer to the application variable in a Python source file named main.py:

import webapp2

class MainPage(webapp2.RequestHandler):

def get(self):

# ...

application = webapp2.WSGIApplication([('/', MainPage)], debug=True)

If the frontend gets a request whose path matches a script handler, it routes the request to an application server to invoke the script and produce the response.

In the previous example, the following handler definition routes all URL paths that begin with /profile/ to the application defined in a source file named userprofile.py:

- url: /profile/.*

script: userprofile.application

The URL pattern can use regular expression groups to determine other values, such as the script path. A group is a portion of a regular expression inside parentheses, and the group’s value is the portion of the request URL that matches the characters within (not including the parentheses). Groups are numbered starting with 1 from left to right in the pattern. You can insert the value of a matched group into a script path or other values with a backslash followed by the group number (\1). For example:

- url: /project/(.*?)/home

script: apps.project_code.\1.app

With this pattern, a request for /project/registration/home would be handled by the WSGI application at apps.project_code.registration.app.

Static Files and Resource Files

Most web applications have a set of files that are served verbatim to all users, and do not change as the application is used. These can be media assets like images used for site decoration, CSS stylesheets that describe how the site should be drawn to the screen, JavaScript code to be downloaded and executed by a web browser, or HTML for full pages with no dynamic content. To speed up the delivery of these files and improve page rendering time, App Engine uses dedicated servers for static files. Using dedicated servers also means the app servers don’t have to spend resources on requests for static files.

Static files are uploaded with your code when you deploy the application. This makes them well suited for web support files like images of icons, but not so well suited for content files like photos to accompany a magazine article. In most cases, content served by your web application belongs in a content management system built into your app that separates the content publishing workflow from the application deployment workflow.

Locally, static files sit with your app code in the app’s root directory. You tell the deployment process and the frontend which of the application’s files are static files using app configuration. The deployment process reads the configuration and delivers the static files to the dedicated static file servers. The frontend remembers which URL paths refer to static files, so it can route requests for those paths to the appropriate servers.

The static file configuration can also include a recommendation for a cache expiration interval. App Engine returns the cache instructions to the client in the HTTP header along with the file. If the client chooses to heed the recommendation (and most web browsers do), it will retain the file for up to that amount of time, and use its local copy instead of asking for it again. This reduces the amount of bandwidth used, but at the expense of clients retaining old copies of files that may have changed.

To save space and reduce the amount of data involved when setting up new app instances, static files are not pushed to the application servers. This means application code cannot access the contents of static files by using the filesystem.

The files that do get pushed to the application servers are known as “resource files.” These can include app-specific configuration files, web page templates, or other static data that is read by the app but not served directly to clients. Application code can access these files by reading them from the filesystem. The code itself is also accessible this way.

We’ve seen how request handlers defined in the app.yaml file can direct requests to scripts that run on the app servers. Handler definitions can also direct requests to the static file servers.

There are two ways to specify static file handlers. The easiest is to declare a directory of files as static, and map the entire directory to a URL path. You do this with the static_dir element, as follows:

handlers:

- url: /images

static_dir: myimgs

This says that all the files in the directory myimgs/ are static files, and the URL path for each of these files is /images/ followed by the directory path and filename of the file. If the app has a file at the path myimgs/people/frank.jpg, App Engine pushes this file to the static file servers, and serves it whenever someone requests the URL path /images/people/frank.jpg.

Notice that with static_dir handlers, the url pattern does not include a regular expression to match the subpath or filename. The subpath is implied: whatever appears in the URL path after the URL pattern becomes the subpath to the file in the directory.

The other way to specify static files is with the static_files element. With static_files, you use a full regular expression for the url. The URL pattern can use regular expression groups to match pieces of the path, then use those matched pieces in the path to the file. The following is equivalent to the static_dir handler shown earlier:

- url: /images/(.*)

static_files: myimgs/\1

upload: myimgs/.*

The parentheses in the regular expression identify which characters are members of the group. The \1 in the file path is replaced with the contents of the group when looking for the file. You can have multiple groups in a pattern, and refer to each group by number in the file path. Groups are numbered in the order they appear in the pattern from left to right, where \1 is the leftmost pattern, \2 is the next, and so on.

When using static_files, you must also specify an upload element. This is a regular expression that matches paths to files in the application directory on your computer. App Engine needs this pattern to know which files to upload as static files, because it cannot determine this from thestatic_files pattern alone (as it can with static_dir).

While developing a Python app, you keep the app’s static files in the application directory along with the code and configuration files. Application code files and static files are separated based on the configuration during the deployment process.

The Python SDK treats every file as either a resource file or a static file. If you have a file that you want treated as both a resource file (available to the app via the filesystem) and a static file (served verbatim from the static file servers), you can create a symbolic link in the project directory to make the file appear twice to the deployment tool under two separate names. The file will be uploaded twice, and count as two files toward the file count limit.

MIME Types

When the data of an HTTP response is of a particular type, such as a JPEG image, and the web server knows the type of the data, the server can tell the client the type of the data by using an HTTP header in the response. The type can be any from a long list of standard type names, known as MIME types. If the server doesn’t say what the type of the data is, the client has to guess, and may guess incorrectly.

By default, for static files, App Engine will guess the file type based on the last few characters of the filename (such as .jpeg). If the filename does not end in one of several known extensions, App Engine serves the file as the MIME type application/octet-stream, a generic type most web browsers treat as generic binary data.

If this is not sufficient, you can specify the MIME type of a set of static files by using the mime_type element in the static file handler configuration. For example:

- url: docs/(.*)\.ps

static_files: psoutput/\1.dat

upload: psoutput/.*\.dat

mime_type: application/postscript

This says that the application has a set of datafiles in a directory named psoutput/ whose filenames end in .dat, and these should be served using URL paths that consist of docs/, followed by the filename with the .dat replaced with .ps. When App Engine serves one of these files, it declares that the file is a PostScript document.

You can also specify mime_type with a static_dir handler. All files in the directory are served with the declared type.

Cache Expiration

It’s common for a static file to be used on multiple web pages of a site. Because static files seldom change, it would be wasteful for a web browser to download the file every time the user visits a page. Instead, browsers can retain static files in a cache on the user’s hard drive, and reuse the files when they are needed.

To do this, the browser needs to know how long it can safely retain the file. The server can suggest a maximum cache expiration in the HTTP response. You can configure the cache expiration period App Engine suggests to the client.

To set a default cache expiration period for all static files for an app, you specify a default_expiration value. This value applies to all static file handlers, and belongs at the top level of the app.yaml file, like so:

application: ae-book

version: 1

runtime: python

api_version: 1

default_expiration: "5d 12h"

handlers:

# ...

The value is a string that specifies a number of days, hours, minutes, and seconds. As shown here, each number is followed by a unit (d, h, m, or s), and values are separated by spaces.

You can also specify an expiration value for static_dir and static_files handlers individually, using an expiration element in the handler definition. This value overrides the default_expiration value, if any. For example:

handlers:

- url: /docs/latest

static_dir: /docs

expiration: "12h"

If the configuration does not suggest a cache expiration period for a set of static files, App Engine does not give an expiration period when serving the files. Browsers will use their own caching behavior in this case, and may not cache the files at all.

Sometimes you want a static file to be cached in the browser as long as possible, but then replaced immediately when the static file changes. A common technique is to add a version number for the file to the URL, then use a new version number from the app’s HTML when the file changes. The browser sees a new URL, assumes it is a new resource, and fetches the new version.

You can put the version number of the resource in a fake URL parameter, such as /js/code.js?v=19, which gets ignored by the static file server. Alternatively, in Python, you can use regular expression matching to match all versions of the URL and route them to the same file in the static file server, like so:

- handlers:

url: /js/(.*)/code.js

static_files: js/code.js

expiration: "90d"

This handler serves the static file js/code.js for all URLs such as /js/v19/code.js, using a cache expiration of 90 days.

TIP

If you’d like browsers to reload a static file resource automatically every time you launch a new major version of the app, you can use the multiversion URL handler just discussed, then use the CURRENT_VERSION_ID environment variable as the “version” in the static file URLs:

self.response.out('<script src="/js/' +

os.environ['CURRENT_VERSION_ID'] +

'/code.js" />')

Domain Names

Every app gets a free domain name on appspot.com, based on the application ID. Requests for URLs that use your domain name are routed to your app by the frontend:

http://app-id.appspot.com/path...

But chances are, you want to use a custom domain name with your app. You can register your custom domain name with any Internet domain registrar. With your domain name, you will also need Domain Name Service (DNS) hosting, a service that advertises the destination associated with your name (in this case, App Engine). Name registrars such as Hover include DNS hosting with the cost of the registration. Alternatively, you can use Google Cloud DNS, a high-performance DNS solution with powerful features.

You can configure your domain name so that all requests for the name (example.com) go to App Engine, or so only requests for a subdomain (such as www.example.com) go to App Engine. You might use a subdomain if the root domain or other subdomains are pointing to other services, such as a company website hosted on a different service.

TIP

If you intend to support secure web traffic over secure connections (SSL/TLS, aka “HTTPS”), skip ahead to the next section, “Google Apps”. You must use Google Apps to set up your custom domain to use SSL/TLS with the domain.

The appspot.com domain supports SSL/TLS. See “Configuring Secure Connections” for more information.

To set up a custom domain, go to Cloud Console, select the project, then select Compute, App Engine, Settings. From the tabs along the top, select “Custom domains.” This panel is shown in Figure 3-2.

pgap 0302

Figure 3-2. The “Custom domains” settings panel

The setup procedure involves three main steps:

1. Verify that you own the domain. You can verify the domain by adding a verification code to the DNS record, or if the domain is already pointing to a web host, by adding a verification code to a file on the web host.

2. Add the domain or subdomain to the project.

3. Configure the DNS record to point to App Engine.

Cloud Console will walk you through these steps with specific instructions.

The appspot.com domain has a couple of useful features. One such feature is the ability to accept an additional domain name part:

anything.app-id.appspot.com

Requests for domain names of this form, where anything is any valid single domain name part (that cannot contain a dot, .), are routed to the application. This is useful for accepting different kinds of traffic on different domain names, such as for allowing your users to serve content from their own subdomains.

You can determine which domain name was used for the request in your application code by checking the Host header on the request. Here’s how you check this header using Python and webapp:

class MainHandler(webapp2.RequestHandler):

def get(self):

host = self.request.headers['Host']

self.response.out.write('Host: %s' % host)

Google Apps

Google Apps is a service that gives your organization its own suite of Google’s productivity applications, such as Gmail, Docs, Drive, and Hangouts. These apps live on subdomains of your organization’s Internet domain name (such as Google Drive on drive.example.com), and your organization’s employees all get Google accounts using the domain name (juliet@example.com). Access to all of the apps and accounts can be managed by the domain administrator, making the suite suitable for businesses, schools, and government institutions. Google Apps for Work is available for a per user per month fee. If your organization is a school, be sure to look for Google Apps for Education, which is free of charge.

A compelling feature of Google Apps is the ability to add an App Engine application on a subdomain (yourapp.example.com, or even www.example.com). You can configure App Engine’s Google accounts features to support domain accounts specifically, making it easy to built intranet apps that only your organization’s members can see. You can also make the app on your domain accessible to the public—with no per-user fee for doing so. (The per-user fee only applies to accounts on the domain. You will still need one administrator account.) You can configure a public app on a domain to accept regular Google accounts, or you can implement your own account mechanism.

TIP

Google Apps is currently the only way to use secure connections (SSL/TLS, aka “HTTPS”) with custom domains on App Engine. This has the advantage of using the Google Apps SSL/TLS infrastructure. In exchange, you lose the ability to serve the App Engine app from the “naked” domain (http://example.com/): all Google Apps applications must be associated with a subdomain (such as www.example.com). We’ll discuss that next, in “Secure Connections with Custom Domains”.

Google Apps can perform a redirect from the naked domain to any desired subdomain. For example, you can set the naked domain to redirect to www, and put the app on that subdomain.

To get started, go to the Google Apps for Work website, or if you’re part of an educational institution, use the Google Apps for Education website.

Follow the instructions to create a Google Apps account. You must already have registered your domain name to set up Google Apps. This process will include the opportunity to create an “administrator” account for the domain, which is a new Google account with an email address on the domain (your.name@example.com).

Next, add the App Engine app to the domain, as follows:

1. From the Google Admin console (the Google Apps console), sign in using the Apps domain’s administrator account.

2. Expand the “More controls” panel at the bottom, then locate App Engine Apps and click it. You may need to click the right arrow to find it.

3. Click “Add services to your domain,” or click the plus symbol (+).

4. Under Other Services, in the Enter App ID field, enter the project ID for your App Engine app. Click “Add it now.” Follow the prompts to accept the terms of service.

5. When prompted, under “Web address,” click “Add new URL,” and enter the subdomain you wish to use. If you try to use the www subdomain and it complains “Already used, please remove previous mapping first,” this is likely because Google Sites is configured to use www. Navigate to Google Apps, click Sites, then click Web Address Mapping. Check the www mapping in this list, then click Delete Mapping(s). Navigate back to App Engine Apps, select the app, then try again.

6. As instructed, in another browser window, go to your domain’s DNS service, and create a CNAME record for the subdomain. Set the destination to ghs.googlehosted.com. Return to the Google Admin panel window, then click “I’ve completed these steps.” Google verifies your CNAME record. It may take a few minutes for your DNS service to update its records.

Your app can now be accessed using the custom domain, with the subdomain you configured.

While you’re here, you should make the Apps domain’s administrator account an owner of the app. This is required for setting up secure connections. There are three parts to this: adding the Cloud Console as an “app” that the domain admin can use, inviting the domain admin to be an owner of the app, and finally accepting the invitation as the domain admin. (You must add the Cloud Console as a service before the domain admin can accept the invitation.)

To enable the Cloud Console as a service for the domain:

1. While signed in as the domain administrator, return to the Google Admin console’s dashboard.

2. Locate and click Apps, then select Additional Google services.

3. In the list, locate Google Developers Console, then click the pop-up menu icon on the right. Select the “On for everyone” option from the menu. Confirm that you want domain users to be able to access the Google Developers Console.

To make the domain administrator an owner of the app:

1. Sign out of Google, then sign in with the account you used to create the App Engine app.

2. From the Cloud Console, select the app, then click Permissions.

3. Add the Apps domain’s administrator account as a member, and set its permission to “Is owner.”

4. Sign out of Google again, then sign in again using the Apps domain’s administrator account.

5. Go to the account’s Gmail inbox, find the invitation email, then click the link to accept the invitation to join the project.

TIP

App Engine developer invitations do not work well with Google’s multiple sign-in feature. If you click an invitation link, it will attempt to accept the invitation on behalf of the first (“primary”) account you’re using, then fail because the signed-in account is not the intended recipient of the invitation. To perform this self-invitation maneuver, you must sign out of Google completely then sign in again with the invited account. Alternatively, you can use a Chrome Incognito window to sign in with the invited account and visit the invitation link.

If you intend to use the Google accounts features of App Engine with accounts on your organization’s domain, go to Cloud Console, then select Compute, App Engine, Settings. Change the Google Authentication option to “Google Apps domain,” then click Save. This ensures that only your domain accounts can be authorized with the app via these features. See “Authorization with Google Accounts” for more information.

As you can see, Google Apps is a sophisticated service and requires many steps to set up. With Apps on your domain, not only can you run your App Engine app on a subdomain, but you get a customized instance of Google’s application suite for your company or organization. Take a deep breath and congratulate yourself on getting this far. Then proceed to the next section.

Configuring Secure Connections

When a client requests and retrieves a web page over an HTTP connection, every aspect of the interaction is transmitted over the network in its final intended form, including the URL path, request parameters, uploaded data, and the complete content of the server’s response. For web pages, this usually means human-readable text is flying across the wire, or through the air if the user is using a wireless connection. Anyone else privy to the network traffic can capture and analyze this data, and possibly glean sensitive information about the user and the service.

Websites that deal in sensitive information, such as banks and online retailers, can use a secure alternative for web traffic. With servers that support it, the client can make an HTTPS connection (HTTP over the Secure Socket Layer, or SSL/TLS). All data sent in either direction over the connection is encrypted by the sender and decrypted by the recipient, so only the participants can understand what is being transmitted even if the encrypted messages are intercepted. Web browsers usually have an indicator that tells the user when a connection is secure.

App Engine supports secure connections for incoming web requests. By default, App Engine accepts HTTPS connections for all URLs, and otherwise treats them like HTTP requests. You can configure the frontend to reject or redirect HTTP or HTTPS requests for some or all URL paths, such as to ensure that all requests not using a secure connection are redirected to their HTTPS equivalents. The application code itself doesn’t need to know the difference between a secure connection and a standard connection: it just consumes the decrypted request and provides a response that is encrypted by App Engine.

All URL paths can be configured to use secure connections, including those mapped to application code and those mapped to static files. The frontend takes care of the secure connection on behalf of the app servers and static file servers.

App Engine only supports secure connections over TCP port 443, the standard port used by browsers for https:// URLs. Similarly, App Engine only supports standard connections over port 80. The App Engine frontend returns an error for URLs that specify a port other than the standard port for the given connection method.

The development server does not support secure connections, and ignores the security settings in the configuration. You can test these URLs during development by using the nonsecure equivalent URLs.

To configure secure connections for a URL handler in a Python application, add a secure element to the handler’s properties in the app.yaml file:

handler:

- url: /profile/.*

script: userprofile.py

secure: always

The value of the secure element can be always, never, or optional:

§ always says that requests to this URL path should always use a secure connection. If a user attempts to request the URL path over a nonsecure connection, the App Engine frontend issues an HTTP redirect code telling it to try again using a secure HTTP connection. Browsers follow this redirect automatically.

§ never says that requests to this URL path should never use a secure connection, and requests for an HTTPS URL should be redirected to the HTTP equivalent. Note that some browsers display a warning when a secure page is redirected to a nonsecure page.

§ optional allows either connection method for the URL path, without redirects. The app can use the HTTPS environment variable to determine which method was used for the request, and produce a custom response.

If you don’t specify a secure element for a URL path, the default is optional.

When configured to allow (or require) SSL, you can access the default version of your app using the HTTPS version of the appspot.com URL:

https://ae-book.appspot.com/

Because HTTPS uses the domain name to validate the secure connection, requests to versioned appspot.com URLs, such as https://3.ae-book.appspot.com/, will display a security warning in the browser saying that the domain does not match the security certificate, which only applies to the immediate subdomains (*.appspot.com). To prevent this, App Engine has a trick up its sleeve: replace the dots (.) between the version and app IDs with -dot- (that’s hyphen, the word “dot,” and another hyphen), like this:

https://3-dot-ae-book.appspot.com/

A request to this domain uses the certificate for *.appspot.com, and avoids the security warning.

TIP

Secure connections are an increasingly important part of the Web. Even if you do not collect user data or perform sensitive transactions explicitly, secure connections protect the user transactional data implicit when visiting web pages. They’re so important, Google’s search engine considers HTTPS a ranking signal when evaluating the quality of a website for appearing in search results.2

When possible, use secure: always. It’s worth it.

Secure Connections with Custom Domains

To enable secure connections over a custom domain for your app, you need to set up Google Apps for the domain, with the app added on a subdomain and the domain administrator account set up as an owner of the app. If you haven’t done this yet, see “Google Apps”.3

The protocol for secure connections depends on an SSL/TLS certificate,4 a document that says who you are and that you are responsible for traffic served from the domain. You acquire a certificate from a certificate authority (CA). The CA may also be certified, and this certification can be traced back to a list of known CAs built in to your user’s web browser. Browsers will only make secure connections with websites whose certificates can be traced back to known authorities, thereby assuring the user that the connection to your app is genuine, and not being intercepted by a third party.

You can purchase a certificate valid for a limited time from any of a number of CAs, much like registering a domain name from a registrar. CAs offer certificates at different levels of assurance, and some CAs, such as StartSSL, offer free certificates at the lowest level. Some browsers attempt to communicate the assurance level to the user in various ways. Be sure to follow your CA’s procedures for verifying your domain name and adding it to the certificate, and for creating a TLS/SSL certificate for a web server.

For example, StartSSL initially grants you an “S/MIME and Authentication” certificate to authenticate with its website. After you have used the StartSSL website to validate your email address and domain name, you can generate a “Web Server TLS/SSL certificate,” with a private key protected by a password. StartSSL then prompts you to copy and paste the encrypted private key into a text file (ssl.key), then run the following command to encode it using the RSA method:

openssl rsa -in ssl.key -out ssl.key

Enter the password you used to encrypt the private key when prompted. The key is decrypted, then encoded using RSA, suitable for uploading to Google.

TIP

The openssl command is installed on most Mac OS X and Linux systems. Windows users can get it from the OpenSSL website.

Whatever process your CA uses, the end result should be a TLS/SSL certificate associated with your root domain name and your app’s subdomain, as well as the unencrypted RSA-encoded private key. You will upload both of these files to Google in the next step.

Before you can complete this process, you must decide which method App Engine should use to serve your secure traffic. There are two choices: Server Name Indication (SNI) or Virtual IP (VIP). SNI associates one or more certificates with your app’s domain name. SNI is a relatively new standard, and only modern web clients (most browsers) support it. If you need broader support for SSL-capable clients, Google also offers a virtual IP (VIP) solution, which ties your certificate and application to an IP address. This expensive resource comes with a monthly fee.

You are now ready to activate SSL for your domain, using the Google Apps Admin console:

1. Open the Google Admin console. Expand “More controls” (located at the bottom of the page), then locate and select Security. Select SSL for Custom Domains, clicking “Show more” if necessary to reveal it.

2. In the panel that opens, enter the project ID for the app. This confirms that the app will be responsible for SSL-related computation.

3. On the following screen,5 click Enable SSL. You are returned to the Google Apps Admin console to complete the process. Now that SSL is enabled, you can get to this screen at any point in the future by navigating to Security, SSL for Custom Domains.

4. If you wish to use SNI for the certificate, click the “Increase SNI certificate slots by 5” button. If you need the VIP solution, look for the Add a VIP button. If it is disabled with a message prompting you to increase the budget for the app, do so in the Cloud Console, under Compute, App Engine, Settings. The VIP option needs a nonzero budget for its resources.

5. Still in the SSL for Custom Domains screen, click Configure SSL Certificates. In the subsequent screen, click “Upload a new certificate.” For the “PEM encoded X.509 certificate,” select the certificate file. For the “Unencrypted PEM encoded RSA private key,” select the ssl.key file. Click Upload. The certificate information appears in the window.

6. In the box that has appeared, under “Current state,” change “Serving mode” to the method you have chosen, either SNI or VIP. An Assigned URLs section appears. Use it to assign your subdomain to the certificate. For VIP, use your domain’s DNS hosting to add a CNAME record with the value shown. (No DNS change is needed for SNI only.)

7. Click “Save changes.”

That’s it! It was a long haul, but you now have full HTTPS support for your app on a custom domain. Give it a try: visit your subdomain using the https:// method in your browser. The browser indicates that a secure connection is successful, usually with an icon in the address bar. In Chrome and other browsers, you can click the icon to get more information about the certificate.

Authorization with Google Accounts

Back in Chapter 2, we discussed how an App Engine application can integrate with Google Accounts to identify and authenticate users. We saw how an app can use library calls to check whether the user making a request is signed in, access the user’s email address, and calculate the sign-in and sign-out URLs of the Google Accounts system. With this API, application code can perform fine-grained access control and customize displays.

Another way to do access control is to leave it to the frontend. With just a little configuration, you can instruct the frontend to protect access to specific URL handlers such that only signed-in users can request them. If a user who is not signed in requests such a URL, the frontend redirects the user to the Google Accounts sign-in and registration screen. Upon successfully signing in or registering a new account, the user is redirected back to the URL.

You can also tell the frontend that only the registered developers of the application can access certain URL handlers. This makes it easy to build administrator-only sections of your website, with no need for code that confirms the user is an administrator. You can manage which accounts have developer status in the Cloud Console, in the Developers section. If you revoke an account’s developer status, that user is no longer able to access administrator-only resources, effective immediately.

Later on, we will discuss App Engine services that call your application in response to events. For example, the scheduled tasks service (the “cron” service) can be configured to trigger a request to a URL at certain times of the day. Typically, you want to restrict access to these URLs so not just anybody can call them. For the purposes of access control enforced by the frontend, these services act as app administrators, so restricting these URLs to administrators effectively locks out meddling outsiders while allowing the services to call the app.

This coarse-grained access control is easy to set up in the frontend configuration. And unlike access control in the application code, frontend authentication can restrict access to static files as well as application request handlers.

You establish frontend access control for a URL handler with the login element in app.yaml, like so:

handlers:

- url: /myaccount/.*

script: account.py

login: required

The login element has two possible values: required and admin.

If login is required, then the user must be signed in to access URLs for this handler. If the user is not signed in, the frontend returns an HTTP redirect code to send the user to the Google Accounts sign-in and registration form.

If login is admin, then the user must be signed in and must be a registered developer for the application.

If no login is provided, the default policy is to allow anyone to access the resource, regardless of whether the client represents a signed-in user, and regardless of whether or not the app is set to use a members-only access policy.

You can use the login element with both script handlers and static file handlers.

Environment Variables

You can use app configuration to specify a list of environment variables to be set prior to calling any request handlers. This is useful to control components that depend on environment variables, without having to resort to hacks in your code to set them.

To set environment variables, provide the env_variables element in app.yaml with a mapping value:

env_variables:

DJANGO_SETTINGS_MODULE: 'gnero.prod.settings'

Inbound Services

Some App Engine services call an application’s request handlers in response to external events. For example, the Mail service can call a request handler at a fixed URL when it receives an email message at an email address associated with the app. This is a common design theme in App Engine: all application code is in the form of request handlers, and services that need the app to respond to an event invoke request handlers to do it.

Each service capable of creating inbound traffic must be enabled in app configuration, to confirm that the app is expecting traffic from those services on the corresponding URL paths. To enable these services, provide the inbound_services element in app.yaml with a list of service names:

inbound_services:

- mail

- warmup

Table 3-1 lists the services that can be enabled this way, and where to find more information about each service.

Service

Description

Name

Handler URLs

Channel Presence

Receive channel connection notifications

channel_presence

/_ah/channel/.*

Mail

Receive email at a set of addresses; see Chapter 14

mail

/_ah/mail/.*

XMPP Messages

Receive XMPP chat messages; for all XMPP services, see Chapter 15

xmpp_message

/_ah/xmpp/message/chat/

XMPP Presence

Receive XMPP presence notifications

xmpp_presence

/_ah/xmpp/presence/.*

XMPP Subscribe

Receive XMPP subscription notifications

xmpp_subscribe

/_ah/xmpp/subscription/.*

XMPP Error

Receive XMPP error messages

xmpp_error

/_ah/xmpp/error/

Warmup Requests

Initialize an instance, with warmup requests enabled; see “Warmup Requests”

warmup

/_ah/warmup

Table 3-1. Services that create inbound traffic for an app, which must be enabled in service configuration

Custom Error Responses

When your application serves a status code that represents an error (such as 403 Forbidden or 500 Internal Server Error) in a response to a browser, it can also include an HTML page in the body of the response. The browser typically shows this HTML to the user if the browser expected to render a full page for the request. Serving an error page can help prevent the user from being disoriented by a generic error message—or no message at all.

There are cases when an error condition occurs before App Engine can invoke your application code, and must return an error response. For example, if none of the request handler mappings in the app’s configuration match the request URL, App Engine has no request handler to call and must return a 404 Not Found message. By default, App Engine adds its own generic HTML page to its error responses.

You can configure custom error content to be used instead of App Engine’s error page. You provide the response body in a file included with your app, and mention the file in your application configuration.

To set error pages, add an error_handlers element to your app.yaml. Its value is a list of mappings, one per error file:

error_handlers:

- file: error.html

- error_code: over_quota

file: busy_error.html

- error_code: dos_api_denial

file: dos_denial.txt

mime_type: text/plain

The file value specifies the path from the application root directory to the error file. The optional mime_type specifies the MIME content type for the file, which defaults to text/html.

The error_code value associates the error file with a specific error condition. If omitted, the file is associated with every error condition that doesn’t have a specific error file of its own. Error codes include the following:

over_quota

The request cannot be fulfilled because the app has temporarily exceeded a resource quota or limit.

dos_api_denial

The origin of the request is blocked by the app’s denial-of-service protection configuration. (See the App Engine documentation for more information about this feature.)

timeout

The request handler did not return a response before the request deadline.

WARNING

Custom error files must be stored on application servers. They must not be static files. Be careful not to configure static file handlers that match these files.

Python Libraries

On your own computer, Python programs run in an environment with access to many libraries of modules. Some of these modules—quite a few, actually—come with Python itself, in the Python standard library. Others you may have installed separately, such as with pip install oreasy_install. Perhaps you use virtualenv to create multiple isolated Python environments, each with its own set of available libraries. A Python program can import any module within its environment, and a module must be available in this environment (or elsewhere on the Python library load path) to be importable.

On App Engine, a Python app also runs in an environment with access to libraries. This environment includes a slightly modified version of the Python 2.7 standard library. (The modifications account for restrictions of the App Engine runtime environment, which we’ll discuss in Chapter 4.) App Engine adds to this the libraries and tools included with the App Engine SDK, such as APIs for accessing the services (such as google.appengine.api.urlfetch), and utilities such as the data modeling libraries (google.appengine.ext.ndb).

Naturally, the environment also includes any Python modules you provide in your application directory. In addition to your own code, you might add a copy of a third-party library your app uses to your app directory, where it is uploaded as part of your app. Note that this method only works for “pure Python” libraries, and not libraries that have portions written in C.

For convenience, the Python runtime environment includes several third-party libraries popular for web development. We’ve already seen Jinja, the templating library. The Django web application framework is also included. You can use NumPy for data processing and numerical analysis. The Python Cryptography Toolkit (PyCrypto) provides strong encryption capabilities.

To use one of the provided third-party libraries, you must declare it in the app.yaml file for the app, like so:

libraries:

- name: django

version: "1.3"

This declaration is necessary to select the version of the library your app will use. When a new version of a third-party library becomes available, your app will continue to use the declared version until you change it. You’ll want to test your app to make sure it’s compatible with the new version before making the switch with your live app. With this declaration in place, import django will load the requested version of the library. (Without it, the import will fail with an ImportError.)

TIP

For more information about using Django with App Engine, see Chapter 18.

You can specify a version of latest to always request the latest major version of the library. This may be desired for small libraries, where new versions are typically backward compatible. For larger packages like Django, you almost certainly want to select a specific version, and upgrade carefully when a new version is added:

libraries:

- name: jinja2

version: latest

- name: markupsafe

version: latest

While the App Engine runtime environment provides these libraries, the Python SDK does not. You must install third-party libraries in your local Python environment yourself, and make sure your version matches the one requested in your app.yaml. Installation instructions are specific to each library.

Table 3-2 lists third-party libraries available as of SDK version 1.9.18. Check the official documentation for an up-to-date list.

Library

Description

Name

Versions

Django

A web application framework; see the Django website for installation instructions

django

1.5, 1.4, 1.3, 1.2

Endpoints

The Google Endpoints library

endpoints

1.0

Jinja2

A templating library; MarkupSafe is recommended with Jinja2; to install, use: sudo easy_install jinja2

jinja2

2.6

lxml

An XML parsing and production toolkit; see the lxml website for installation instructions, including the libxml2 and libxslt libraries

lxml

2.3.5, 2.3

MarkupSafe

Fast HTML-aware string handler; to install, use: sudo easy_install markupsafe

markupsafe

0.15

matplotlib

A 2D mathematical plotting package

matplotlib

1.2.0

MySQLdb

A common interface to MySQL databases, useful with Google Cloud SQL; see Chapter 11

MySQLdb

1.2.4b4

NumPy

Data processing and numerical analysis; see the SciPy website for installation

numpy

1.6.1

Python Imaging Library (PIL)

Image manipulation toolkit; see the PIL website for installation

pil

1.1.7

protorpc

An efficient remote procedure call bundling format, used by Google

protorpc

1.0

PyAMF

For manipulating Action Message Format (AMF) messages, used for server messaging from Adobe Flash Player

PyAMF

0.6.1

Python Cryptography Toolkit (PyCrypto)

Cryptographic routines; see the PyCrypto website for installation; export restrictions may apply

pycrypto

2.6, 2.3

setuptools

For discovering which packages are installed (you don’t use this for installing packages)

setuptools

0.6c11

webapp2

The webapp2 web application framework

webapp2

2.5.2, 2.5.1, 2.3 (deprecated)

WebOb

An object-oriented interface to HTTP requests and responses; used by (and included automatically with) the webapp framework; included in the SDK

webob

1.1.1

YAML

Library for parsing the YAML message serialization format; used by the SDK for the config files; included in the SDK

yaml

3.10

Table 3-2. Third-party Python libraries available by request in the runtime environment

Built-in Handlers

Some of the utilities included with the Python runtime environment use their own request handlers to provide functionality, such as a web-based administrative UI or web service endpoints. Typically, these handlers map to URLs with paths beginning with /_ah/, which are reserved for App Engine use. Because this code runs within your application, you must enable this functionality by setting up these request handlers.

To make it easy to do (and difficult to do incorrectly), many of these tools are available as “built-ins.” You enable a built-in feature by naming it in your app.yaml file, in a mapping named builtins:

builtins:

- appstats: on

- remote_api: on

Table 3-3 lists the built-ins available as of SDK version 1.9.18. As usual, check the official documentation for an up-to-date list.

Feature

Description

Name

AppStats

Sets up the AppStats control panel at /_ah/stats; see “Visualizing Calls with AppStats”

appstats

Deferred work

Sets up the task queue handler for the deferred library; see “Deferring Work”

deferred

Remote API

Establishes the web service endpoint for remote API access; see “Remote Controls”

remote_api

Table 3-3. Built-in features that must be enabled using the built-ins directive in app.yaml

Includes

An app.yaml file can get rather large, especially if you use it to route your app’s URLs to multiple handlers. You can organize your app’s configuration into separate component files by using the includes directive. This also makes it easy to write App Engine components that can be installed in other apps, regardless of which frameworks the apps are using.

The includes value is a list of file or directory paths, like so:

includes:

- lib/component/ae_config.yaml

The path can be an absolute path, a path relative to the app root directory, or a path relative to the file that contains the includes. If the path is to a file, the file is parsed as a YAML file. If the path is to a directory, the filename is assumed to be include.yaml in the given directory.

An included file can contain builtins, includes, handlers, and admin_console values. These list values are prepended to the list that appears in the current file.

For handlers, this means that handler URL patterns from includes are tested before those in the current file. If your main app.yaml file has a handler mapped to the URL pattern /.*, handlers from includes will be tested first, and only those that don’t match will fall to the catch-all handler. Notice that if an included file maps a handler to /.*, none of the handlers in the current file (or any file that includes the current file) will ever match a request! So don’t do that.

Includes are aggregated in the order they appear in the list. For example, consider this app.yaml:

handlers:

- url: /.*

script: main.app

includes:

- lib/component_one

- lib/component_two

Here, a request URL will try to match each of the handlers in lib/component_one/include.yaml in the order they appear in that file, followed by each of the handlers in lib/component_two/include.yaml, followed by the /.* handler in app.yaml.

1 The word “script” is a misnomer: the value is a Python path to a variable. In the legacy Python 2.5 runtime environment, this is a filesystem path to a Python CGI script. The object path more accurately represents the resident WSGI app.

2 See HTTPS as a Ranking Signal.

3 The procedure for setting up secure connections via Google Apps is a bit convoluted. If your only interest is to use SSL with a custom domain, check the Cloud Console and the official documentation for an easier way in case one was added since this book was published.

4 TLS, or Transport Layer Security, refers to the latest standard, and is a successor to SSL, or Secure Socket Layer. SSL is still sometimes used as an umbrella term for secure connections.

5 As of August 2014, this screen appears on appengine.google.com, a website we haven’t mentioned yet. This is the old App Engine console, the one App Engine launched with in 2008. It is in the process of being replaced by the Cloud Console. Once the last few features (such as this screen) have been moved to Cloud Console, this old site will be decommissioned. For now, you can use either console to access any of these features.