Datastore Administration - Programming Google App Engine (2012)

Programming Google App Engine

Chapter 8. Datastore Administration

Your data is the heart of your application, so you’ll want to take good care of it. You’ll want to watch it, and understand how it grows and how it affects your app’s behavior. You’ll want to help it evolve as your app’s functionality changes. You’ll probably want to make periodic backups, because nothing is perfect. You may even want up-to-date information about data types and sizes. And you’ll want to poke at it, and prod it into shape using tools not necessarily built into your app.

App Engine provides a variety of administrative tools for learning about, testing, protecting, and fixing your datastore data. In this chapter, we look at a few of these tools, and their associated best practices.

Inspecting the Datastore

The first thing you might want to do with your app’s datastore is see what’s in it. Your app provides a natural barrier between you and how your data is stored physically in the datastore, so many data troubleshooting sessions start with pulling back the covers and seeing what’s there.

The Datastore Viewer panel of the Administration Console (shown in Figure 8-1) is your main view of individual datastore entities. You can browse entities by kind, and you can perform queries using GQL by expanding the Options in the Query tab.

The Datastore Viewer panel of the Administration Console

Figure 8-1. The Datastore Viewer panel of the Administration Console

The GQL query you type into the datastore viewer is executed the same way it would be in an application running live. You can only perform queries for which there are indexes. If you enter a query that needs a custom index that isn’t already in the uploaded index configuration, the query will return an error.

The GQL query parser is finicky in its syntax and strict typing of values. A few things to keep in mind, especially if your GQL queries are not returning the expected results:

§ Quoted string values always use single quotes ('...'), never double quotes ("...").

§ Key values must always use the KEY() constructor. This constructor has two forms: KEY('...') with one argument takes a string-encoded key, like you might see elsewhere in the datastore viewer. KEY('...', ...) with an even number of arguments takes alternating pairs of a kind name (in single quotes) followed by either a string key name (in single quotes), or a numeric ID (no quotes). Only the full key, with the entire ancestor path, will match. KEY('Foo', 897) and KEY('Bar', 123, 'Foo', 897) are distinct keys.

§ User values must always use the USER() constructor. Its sole argument is an email address (in single quotes). A query for just the email address string ('foo@example.com') will not match a User value of that address.

§ In general, all datastore values are typed, and most typed literals use a constructor form. The DATETIME(), DATE(), and TIME() types are all considered the same type when matching property values.

Refer to GQL for more information on the GQL syntax.

You can click on any entity key in the results list to view the entity in detail, as in Figure 8-2.

A single entity in the Datastore Viewer

Figure 8-2. A single entity in the Datastore Viewer

You can change the value of the entity’s properties directly from this screen, within a few limitations. You can update the value to another value of the same type by entering the new value in the field. Each datastore value type has a special syntax in this panel, such as YYYY-MM-DD HH:MM:SS for date-times. (The panel will provide instructions for the current type.)

You can set any property to the null value. When a property is null, you can update it to a value of any core datastore type. This allows you to change the type of a property in two steps: first set it to null, save it, and then select a new type.

You can’t add or delete (unset) a property on an existing entity in the datastore viewer. You also can’t convert an indexed property to an unindexed property, and vice versa. Multivalued properties are not supported.

On the main viewer screen, there is a Create tab next to the Query tab. From here, you can create a new entity of any kind that already exists in the datastore. The viewer determines which properties the entity ought to have and what their types ought to be based on properties of existing entities of that kind. If you have entities with varying property sets or multiple value types for a property, this is going to be a best guess, and you may have to create the entity with null values, then edit the entity to set the desired type. Again, multivalued properties and blob values are not supported in this interface. You can’t create entities of a kind that does not already exist.

TIP

The Datastore Viewer is useful for inspecting entities and troubleshooting data issues, and may be sufficient for administrative purposes with simple data structures. However, you will probably want to build a custom administrative panel for browsing app-specific data structures and performing common administrative tasks.

The Datastore Statistics panel gives an overview of your data by type and size. This is especially useful for evaluating costs, and catching cases of excessive unused data. Statistics represent the cost of storing an entity and the cost of indexing an entity separately, so you can evaluate whether indexing a property is worth the storage cost. Metadata (information about each entity, such as keys and property names) is also measured separately. You can browse these statistics for all data, or for just entities of a given kind. Statistics are updated regularly, about once a day.

Figure 8-3 shows an example of the Datastore Statistics panel for entities of a kind.

The Datastore Statistics panel of the Administration Console

Figure 8-3. The Datastore Statistics panel of the Administration Console

Managing Indexes

When you upload the datastore index configuration for an app, the datastore begins building indexes that appear in the configuration but do not yet exist. This process is not instantaneous, and may take many minutes for new indexes that contain many rows. The datastore needs time to crawl all the entities to build the new indexes.

You can check on the build status of new indexes using the Administration Console, in the Indexes section. An index being built appears with a status of “Building.” When it is ready, the status changes to “Serving.” Figure 8-4 shows a simple example of the Indexes section.

The Datastore Indexes panel of the Administration Console, with two indexes in the “Serving” status

Figure 8-4. The Datastore Indexes panel of the Administration Console, with two indexes in the “Serving” status

If an index’s build status is “Error,” the index build failed. It’s possible that the failure was due to a transient error. To clear this condition, you must first remove the index from your configuration and then upload the new configuration. It is also possible for an index build to fail due to an entity reaching its index property value limit. In these cases, you can delete the entities that are causing the problem. Once that is done, you can add the index configuration back and upload it again.

If your application performs a query while the index for the query is building, the query will fail. You can avoid this by uploading the index configuration, waiting until the index is built, and then making the app that uses that query available. The most convenient way to do this depends on whether you upload the new application in a new version:

§ If you are uploading the new application with the version identifier that is currently the “default” version, upload the index configuration alone using the appcfg.py update_indexes command. When the indexes are built, upload the app.

§ If you are uploading the application as a new version, or as a version that isn’t the default and that nobody is actively using, you can safely upload the application and index configuration together (appcfg.py update). Wait until the indexes are built before making the new version the default.

If you upload index configuration that does not mention an index that has already been built, the datastore does not delete the unused index, since it might still be in use by an older version of the app. You must tell App Engine to purge unused indexes. To do this, run the AppCfg command with the vacuum_indexes option. For instance, in Python:

appcfg.py vacuum_indexes app-dir

App Engine will purge all custom indexes not mentioned in the index configuration uploaded most recently. This reclaims the storage space used by those indexes.

TIP

As we saw earlier, the development server tries to be helpful by creating new index configuration entries for queries that need them as you’re testing your app. The development server will never delete an index configuration. As your app’s queries change, this can result in unnecessary indexes being left in the file. You’ll want to look through this file periodically, and confirm that each custom index is needed. Remove the unused index configuration, upload the file, and then vacuum indexes.

The Datastore Admin Panel

The fourth datastore-related panel of the Administration Console is the Datastore Admin panel. You can do three things from this panel: download a backup of all datastore entities or entities of a selected set of kinds, upload and restore data from a backup, and delete every entity of a given kind. For Python apps, the backup and restore feature can also be used to migrate large quantities of datastore data between two different apps. The panel also summarizes statistics about the sizes of entities, so you can estimate the scale of these operations before doing them. Figure 8-5shows an example of this panel.

The Datastore Admin panel of the Administration Console (enabled)

Figure 8-5. The Datastore Admin panel of the Administration Console (enabled)

Unlike other Administrator Console panels, this panel is implemented to run within your app, so major data operations are billed to your account. Because of this, you must enable the panel when you visit it for the first time. You can enable the panel by visiting it and following the prompt, or by enabling it in the Application Settings panel.

When you request a backup, the backup job crawls the datastore and aggregates the data in a Blobstore value (or, if you set it up when initiating the job, a Google Cloud Storage value). After it is finished, you can download the data, as prompted. The backup feature only downloads datastore data, it does not include Blobstore data in the backup.

Before doing a backup or restore, you may want to use another App Engine feature to prevent your application from writing to the datastore during the job and potentially corrupting data. To disable writing to the datastore, visit the Application Settings panel, then find the Disable Datastore Writes section and click the Disable Writes button. Confirm the prompt. With writes disabled, any attempt to write to the datastore by any version of your app will raise an exception. Reads and queries will still succeed. Note that the restore operation gets special permission to write to the datastore when writes are otherwise disabled. When the job is complete, you can re-enable writes from the Application Settings panel.

If you intend to disable writes for backups and restores, you may want to implement a “curtain” feature of your app that you activate prior to starting this process, to display a message to your users about the service interruption.

Backup, restore, and bulk delete are large jobs involving very many datastore operations. When you execute these jobs, the cost of the operations is charged to the app. You will want to do small tests before using these features with very large sets of data. These jobs also take a significant amount of time, scaling linearly with the size of your data. The jobs use task queues to throttle and parallelize the work running on your instances.

TIP

As of this writing, there’s a bug that affects users with multiple active accounts and Google’s multi-login feature. If you use multi-login and see a blank panel when trying to access Datastore Admin, the panel may be trying to display the multi-login selector, but failing because the panel is rendered in an iframe, and the multi-login selector uses a security policy that prevents login-related screens from displaying in frames. The workaround is to view the source of the Console page, get the iframe’s URL, visit it in a separate window, and then acknowledge the multi-login selector. You can then reload the panel in the Administration Console.

I wouldn’t normally mention a bug like this in a printed book (since it might get fixed before you read this), but as a multi-login user I run into it all the time, and it’s good to know the workaround. It’s also a good illustration of how the Datastore Admin panel works: it runs separately from the Administration Console so its resource usage gets billed to your app. The Datastore Admin panel is served from a reserved URL path in your app.

The Datastore Admin panel is currently described as an “experimental” feature, and its features or details of its implementation may change. See the official App Engine documentation for more information.

Accessing Metadata from the App

There are several ways to get information about the state of your datastore from within the application itself. The information visible in the Datastore Statistics panel of the Administration Console is also readable and queryable from entities in your app’s datastore. Similarly, the facilities that allow the Datastore Viewer panel to determine the current kinds and property names are available to your app in the form of APIs and queryable metadata. You can also use APIs to get additional information about entity groups, index build status, and query planning.

We won’t describe every metadata feature here. Instead, we’ll look at a few representative examples. You can find the complete details in the official App Engine documentation.

Querying Statistics

App Engine gathers statistics about the contents of the datastore periodically, usually about once a day. It stores these statistics in datastore entities in your application. The Datastore Statistics panel of the Administration Console gets its information from these entities. Your app can also fetch and query these entities to access the statistics.

In Python, each statistic has an ext.db data model class, in the google.appengine.ext.db.stats module. You use these model classes to perform queries, like you would any other query. The actual kind name differs from the model class name.

In Java, you query statistics entities by setting up queries on the statistics entity kind names, just like any other datastore query. The query returns entities with statistics as property values.

Here’s an example in Python that queries storage statistics for each entity kind:

import logging

from google.appengine.ext.db import stats

# ...

kind_stats = stats.KindStat.all()

for kind_stat in kind_stats:

logging.info(

'Stats for kind %s: %d entities, '

'total %d bytes (%d entity bytes)',

kind_stat.kind_name, kind_stat.count,

kind_stat.bytes, kind_stat.entity_bytes)

Here’s another example that reports the properties per kind taking up more than a terabyte of space:

import logging

from google.appengine.ext.db import stats

# ...

q = stats.KindPropertyNameStat.all()

q.filter('bytes >', 1024 * 1024 * 1024 * 1024)

for kind_prop in q:

logging.info(

'Large property detected: %s:%s total size %d',

kind_prop.kind_name, kind_prop.property_name,

kind_prop.bytes)

Every statistic entity has a count property, a bytes property, and a timestamp property. count and bytes represent the total count and total size of the unit represented by the entity. As above, the statistic for a kind has a bytes property equal to the total amount of storage used by entities of the kind for properties and indexes. The timestamp property is the last time the statistic entity was updated. Statistic entity kinds have additional properties specific to the kind.

The __Stat_Total__ kind (represented in Python by the GlobalStat class) represents the grand total for the entire app. The count and bytes properties represent the number of all entities, and the total size of all entities and indexes. These numbers are broken down further in several properties: entity_bytes is the storage for just the entities (not indexes), builtin_index_bytes and builtin_index_count are the total size and number of indexed properties in just the built-in indexes, and composite_index_bytes and composite_index_count are the same for just custom (composite) indexes. There is only one __Stat_Total__ entity for the app.

The __Stat_Kind__ kind (KindStat) represents statistics for each datastore kind individually, as existed at the time the statistics were last updated. There is one of these statistic entities for each kind. The kind_name property is set to the kind name, so you can query for a specific kind’s statistics, or you can iterate over all kinds to determine which kinds there are. These entities have the same statistic properties as __Stat_Total__.

The __Stat_PropertyName_Kind__ kind (KindPropertyNameStat) represents each named property of each kind. The property_name and kind_name properties identify the property and kind for the statistic. The statistic properties are count, bytes, entity_bytes,builtin_index_bytes, and builtin_index_count, defined as above.

For a complete list of the statistics entity kinds, see the official App Engine website.

Querying Metadata

The datastore always knows which namespaces, kinds, and property names are in use by an application. Unlike statistics, this metadata is available immediately. Querying this metadata can be slower than querying a normal entity, but the results reflect the current state of the data.

Each namespace has an entity of the kind __namespace__. Each kind is a __kind__, and each property name (regardless of kind) is a __property__. These entities have no properties: all information is stored in the key name. For example, a __kind__ entity uses the kind name as its key name. (The full key is __kind__ / KindName.) A __property__ entity has both the kind name and the property name encoded in its key name.

This information is derived entirely from the built-in indexes. As such, only indexed properties have corresponding __property__ entities.

In Python, these all have ext.db model classes defined in the google.appengine.ext.db.metadata module. They are named Namespace, Kind, and Property. The classes include Python property methods for accessing names, as if they were datastore properties. The module also provides several convenience functions for common queries.

Here’s a simple example in Python that lists all the kinds for which there is an entity, using a convenience function to get the list of kind names:

import logging

from google.appengine.ext.db import metadata

# ...

kinds = metadata.get_kinds()

for k in kinds:

logging.info('Found a datastore kind: %s', k)

Index Status and Queries

The Datastore Indexes panel of the Administration Console reports on the indexes configured for the app, and the serving status of each. The app can get this same information by using the datastore API. A Python app can also ask the datastore which index was used to resolve a query, after the query has been executed.

In Python, you ask for the state of indexes by calling the get_indexes() function of the google.appengine.ext.db module. This function returns a list of tuples, each representing an index. Each tuple contains an index object, and a state value. The index object has the methodskind(), has_ancestor(), and properties(), representing the latest uploaded index configuration. The state value is one of several constants representing the index build states: db.Index.BUILDING, db.Index.SERVING, db.Index.DELETING, or db.Index.ERROR:

from google.appengine.ext import db

# ...

for index, state in db.get_indexes():

if state != db.Index.SERVING:

kind = index.kind()

ancestor_str = ' (ancestor)' if index.has_ancestor() else ''

index_props = []

for name, dir in index.properties():

dir_str = 'ASC' if dir == db.Index.ASCENDING else 'DESC'

index_props.append(name + ' ' + dir_str)

index_property_spec = ', '.join(index_props)

index_spec = '%s%s %s' % (kind, ancestor_str, index_property_spec)

logging.info('Index is not serving: %s', index_spec)

A Python Query (or GqlQuery) instance has an index_list() method. This returns a list of index objects representing the indexes used to resolve the query. You must execute the query before calling this method.

In Java, the datastore service method getIndexes() returns a Map<Index, Index.IndexState>. An Index has accessor methods getKind(), isAncestor(), and getProperties(). getProperties() returns a List<Index.Property>, where each Index.Propertyprovides getName() and getDirection() (a Query.SortDirection). The index state is one of Index.IndexState.BUILDING, Index.IndexState.SERVING, Index.IndexState.DELETING, or Index.IndexState.ERROR:

import java.util.Map;

import java.util.logging.Logger;

import com.google.appengine.api.datastore.DatastoreService;

import com.google.appengine.api.datastore.DatastoreServiceFactory;

import com.google.appengine.api.datastore.Index;

import com.google.appengine.api.datastore.Query;

public class MyServlet extends HttpServlet {

private static final Logger log = Logger.getLogger(MyServlet.class.getName();

public void doGet(HttpServletRequest req, HttpServletResponse resp)

throws IOException {

// ...

DatastoreService datastore = DatastoreServiceFactory.getDatastoreService();

Map<Index, Index.IndexState> indexes = datastore.getIndexes();

for (Index index : indexes.keySet() {

if (indexes.get(index) != Index.IndexState.SERVING) {

StringBuffer indexPropertySpec = new StringBuffer();

for (Index.Property prop : index.getProperties() {

indexPropertySpec.append(prop.getName();

indexPropertySpec.append(

prop.getDirection() == Query.SortDirection.ASCENDING ?

" ASC " : " DESC ");

}

log.info(

"Index is not serving: " +

index.getKind() +

(index.isAncestor() ? " (ancestor) " : " ") +

indexPropertySpec.toString();

}

}

}

}

There is not currently a way to inspect the indexes used by a query in the Java API.

Entity Group Versions

In Chapter 7, we described the datastore as using multiversioned optimistic concurrency control, with the entity group as the unit of transactionality. Each time any entity in an entity group is updated, the datastore creates a new version of the entity group. If any process reads an entity in the entity group before the new version is fully stored, the process simply sees the earlier version.

Each of these versions gets an ID number, and this number increases strictly and monotonically. You can use the metadata API to get the entity group version number for an entity.

In Python, this is the get_entity_group_version() function in the google.appengine.ext.db.metadata module. It takes an ext.db model instance or db.Key as an argument, and returns an integer, or None if the given entity group doesn’t exist:

from google.appengine.ext import db

from google.appengine.ext.db import metadata

class MyKind(db.Expando):

pass

# ...

# Write to an entity group, and get its version number.

parent = MyKind()

parent.put()

version = metadata.get_entity_group_version(parent)

# Update the entity group by creating a child entity.

child = MyKind(parent=parent)

child.put()

# The version number of the entire group has been incremented.

version2 = metadata.get_entity_group_version(parent)

In Java, you get this information by fetching a fake entity with a specific key. You get the entity group key by calling the static method Entities.createEntityGroupKey(), passing it the Key of an entity in the group. The Entity that corresponds to the entity group key has a__version__ property with an integer value:

import com.google.appengine.api.datastore.DatastoreService;

import com.google.appengine.api.datastore.DatastoreServiceFactory;

import com.google.appengine.api.datastore.Entities;

import com.google.appengine.api.datastore.Entity;

// ...

DatastoreService datastore = DatastoreServiceFactory.getDatastoreService();

// Write to an entity group, and get its version number.

Entity parent = new Entity("MyKind");

datastore.put(parent);

Key groupKey = Entities.createEntityGroupKey(parent.getKey();

Long version = (Long) datastore.get(groupKey).getProperty("__version__");

// Update the entity group by creating a child entity.

Entity child = new Entity("MyKind", parent);

datastore.put(child);

// The version number of the entire group has been incremented.

Long version2 = (Long) datastore.get(groupKey).getProperty("__version__");

Remote Controls

One of the nice features of a relational database running in a typical hosting environment is the ability to connect directly to the database to perform queries and updates on a SQL command line or to run small administrative scripts. App Engine has a facility for doing something similar, and it works for more than just the datastore: you can call any live service on behalf of your application using tools running on your computer. The tools do this using a remote proxy API.

The proxy API is a request handler that you install in your app. It is restricted to administrators. You run a client tool that authenticates as an administrator, connects to the request handler, and issues service calls over the connection. The proxy performs the calls and returns the results.

App Engine includes versions of the proxy handler for Python and for Java. The client library and several tools that use it are implemented in Python only, but they can be used with Java apps. If you’re primarily a Java developer, you will be installing the Python SDK to get the tools and client library.

The remote shell tool opens a Python command prompt, with the App Engine Python service libraries modified to use the remote API. You type Python statements as they would appear in app code, and all calls to App Engine services are routed to the live app automatically. This is especially useful in conjunction with Python apps, where you can import your app’s own data models and request handler modules, and do interactive testing or data manipulation. You can also write your own tools in Python using the remote API, for repeated tasks.

WARNING

The remote API is clever and useful, but it’s also slow: every service call is going over the network from your local computer to the app, then back. It is not suitable for running large jobs over arbitrary amounts of data. For large data transformation jobs, you’re better off building something that runs within the app, using task queues.

Let’s take a look at how to set up the proxy for Python and Java, how to use the remote Python shell, and how to write a Python tool that calls the API.

Setting Up the Remote API for Python

The Python remote API request handler is included in the runtime environment. To set it up, you activate a built-in in app.yaml, like so:

builtins:

- remote_api: on

This establishes a web service endpoint at the URL /_ah/remote_api/. Only clients authenticated using application administrator accounts can use this endpoint.

You can test this URL in a browser using the development server. Visit the URL (such as http://localhost:8080/_ah/remote_api), and make sure it redirects to the fake authentication form. Check the box to sign in as an administrator, and click Submit. You should see this message:

This request did not contain a necessary header.

The remote API expects an HTTP header identifying the remote API protocol version to use, which the browser does not provide. But this is sufficient to test that the handler is configured correctly.

Setting Up the Remote API for Java

To use the remote API tools with a Java application, you set up a URL path with a servlet provided by the SDK, namely com.google.apphosting.utils.remoteapi.RemoteApiServlet. You can choose any URL path; you will give this path to the remote API tools in a command-line argument. Be sure to restrict access to the URL path to administrators.

The following excerpt for your deployment descriptor (web.xml) associates the remote API servlet with the URL path /_ah/remote_api, and restricts it to administrator accounts:

<servlet>

<servlet-name>remoteapi</servlet-name>

<servlet-class>

com.google.apphosting.utils.remoteapi.RemoteApiServlet

</servlet-class>

</servlet>

<servlet-mapping>

<servlet-name>remoteapi</servlet-name>

<url-pattern>/_ah/remote_api</url-pattern>

</servlet-mapping>

<security-constraint>

<web-resource-collection>

<web-resource-name>remoteapi</web-resource-name>

<url-pattern>/_ah/remote_api</url-pattern>

</web-resource-collection>

<auth-constraint>

<role-name>admin</role-name>

</auth-constraint>

</security-constraint>

Using the Remote Shell Tool

With the remote API handler installed, you can use a tool included with the Python SDK to manipulate a live application’s services from an interactive Python shell. You interact with the shell by using Python statements and the Python service APIs. This tool works with both Java and Python applications by using the remote API handler.

To start a shell session, run the remote_api_shell.py command. As with the other Python SDK commands, this command may already be in your command path:

remote_api_shell.py app-id

The tool prompts for your developer account email address and password. (Only registered developers for the app can run this tool, or any of the remote API tools.)

By default, the tool connects to the application via the domain name app-id.appspot.com, and assumes the remote API handler is installed with the URL path /remote_api. To use a different URL path, provide the path as an argument after the application ID:

remote_api_shell.py app-id /admin/util/remote_api

To use a different domain name, such as to use a specific application version, or to test the tool with the development server, give the domain name with the -s ... argument:

remote_api_shell.py -s dev.app-id.appspot.com app-id

The shell can use any service API that is supported by the remote API handler. This includes URL Fetch, memcache, Images, Mail, Google Accounts, and of course the datastore. (As of this writing, XMPP is not supported by the remote API handler.) Several of the API modules are imported by default for easy access.

The tool does not add the current working directory to the module load path by default, nor does it know about your application directory. You may need to adjust the load path (sys.path) to import your app’s classes, such as your data models.

Here is an example of a short shell session:

% remote_api_shell.py clock

Email: juliet@example.com

Password:

App Engine remote_api shell

Python 2.5.1 (r251:54863, Feb 6 2009, 19:02:12)

[GCC 4.0.1 (Apple Inc. build 5465)]

The db, users, urlfetch, and memcache modules are imported.

clock> import os.path

clock> import sys

clock> sys.path.append(os.path.realpath('.')

clock> import models

clock> books = models.Book.all().fetch(6)

clock> books

[<models.Book object at 0x7a2c30>, <models.Book object at 0x7a2bf0>,

<models.Book object at 0x7a2cd0>, <models.Book object at 0x7a2cb0>,

<models.Book object at 0x7a2d30>, <models.Book object at 0x7a2c90>]

clock> books[0].title

u'The Grapes of Wrath'

clock> from google.appengine.api import mail

clock> mail.send_mail('juliet@example.com', 'test@example.com',

'Test email', 'This is a test message.')

clock>

To exit the shell, press Ctrl-D.

Using the Remote API from a Script

You can call the remote API directly from your own Python scripts by using a library from the Python SDK. This configures the Python API to use the remote API handler for your application for all service calls, so you can use the service APIs as you would from a request handler directly in your scripts.

Here’s a simple example script that prompts for a developer account email address and password, then accesses the datastore of a live application:

#!/usr/bin/python

import getpass

import sys

# Add the Python SDK to the package path.

# Adjust these paths accordingly.

sys.path.append('~/google_appengine')

sys.path.append('~/google_appengine/lib/yaml/lib')

from google.appengine.ext.remote_api import remote_api_stub

from google.appengine.ext import db

import models

# Your app ID and remote API URL path go here.

APP_ID = 'app_id'

REMOTE_API_PATH = '/remote_api'

def auth_func():

email_address = raw_input('Email address: ')

password = getpass.getpass('Password: ')

return email_address, password

def initialize_remote_api(app_id=APP_ID,

path=REMOTE_API_PATH):

remote_api_stub.ConfigureRemoteApi(

app_id,

path,

auth_func)

remote_api_stub.MaybeInvokeAuthentication()

def main(args):

initialize_remote_api()

books = models.Book.all().fetch(10)

for book in books:

print book.title

return 0

if __name__ == '__main__':

sys.exit(main(sys.argv[1:])

The ConfigureRemoteApi() function (yes, it has a TitleCase name) sets up the remote API access. It takes as arguments the application ID, the remote API handler URL path, and a callable that returns a tuple containing the email address and password to use when connecting. In this example, we define a function that prompts for the email address and password, and pass the function to ConfigureRemoteApi().

The function also accepts an optional fourth argument specifying an alternate domain name for the connection. By default, it uses app-id.appspot.com, where app-id is the application ID in the first argument.

The MaybeInvokeAuthentication() function sends an empty request to verify that the email address and password are correct, and raises an exception if they are not. (Without this, the script would wait until the first remote call to verify the authentication.)

Remember that every call to an App Engine library that performs a service call does so over the network via an HTTP request to the application. This is inevitably slower than running within the live application. It also consumes application resources like web requests do, including bandwidth and request counts, which are not normally consumed by service calls in the live app.

On the plus side, since your code runs on your local computer, it is not constrained by the App Engine runtime sandbox or the 30-second request deadline. You can run long jobs and interactive applications on your computer without restriction, using any Python modules you like—at the expense of consuming app resources to marshal service calls over HTTP.