Datastore Queries - Programming Google App Engine (2012)

Programming Google App Engine

Chapter 6. Datastore Queries

Inevitably, an application that manages data must do more than store and retrieve that data one record at a time. It must also answer questions about that data: which records meet certain criteria, how records compare to one another, what a set of records represents in aggregate. Web applications in particular are expected not only to know the answers to questions about large amounts of data, but to provide them quickly in response to web requests.

Most database systems provide a mechanism for executing queries, and the App Engine datastore is no exception. But App Engine’s technique differs significantly from that of traditional database systems. When the application asks a question, instead of rifling through the original records and performing calculations to determine the answer, App Engine simply finds the answer in a list of possible answers prepared in advance. App Engine can do this because it knows which questions are going to be asked.

This kind of list, or index, is common to many database technologies, and some relational databases can be told to maintain a limited set of indexes to speed up some kinds of queries. But App Engine is different: it maintains an index for every query the application is going to perform. Since the datastore only needs to do a simple scan of an index for every query, the application gets results back quickly. And for large amounts of data, App Engine can spread the data and the indexes across many machines, and get results back from all of them without an expensive aggregate operation.

This indexing strategy has significant drawbacks. The datastore’s built-in query engine is downright weak compared to some relational databases, and is not suited to sophisticated data processing applications that would prefer slow but powerful runtime queries to fast simple ones. But most web applications need fast results, and the dirty secret about those powerful query engines is that they can’t perform at web speeds with large amounts of data distributed across many machines. App Engine uses a model suited to scalable web applications: calculate the answers to known questions when the data is written, so reading is fast.

In this chapter, we explain how queries and indexes work, how the developer tools help you configure indexes automatically, and how to manage indexes as your application evolves. We also discuss several powerful features of the query engine, including cursors and projection queries. By understanding indexes, you will have an intuition for how to design your application and your data to make the most of the scalable datastore.

Queries and Kinds

You’ve seen how to retrieve an entity from the datastore given its key. But in most cases, the application does not know the keys of the entities it needs; it has only a general idea that it needs entities that meet certain criteria. For example, a leaderboard for the game app would need to retrieve the 10 Player entities with the highest score property values.

To retrieve entities this way, the app performs a query. A query includes:

§ The kind of the entities to query

§ Zero or more filters, criteria that property values must meet for an entity to be returned by the query

§ Zero or more sort orders that determine the order in which results are returned based on property values

A query based on property values can only return entities of a single kind. This is the primary purpose of kinds: to determine which entities are considered together as possible results for a query. In practice, kinds correspond to the intuitive notion that each entity of the same nominal kind represents the same kind of data. But unlike other database systems, it’s up to the app to enforce this consistency if it is desired, and the app can diverge from it if it’s useful.

It is also possible to perform a limited set of queries on entities regardless of kind. Kindless queries can use a filter on the ID or key name, or on ancestors. We’ll discuss ancestors and kindless queries in Chapter 7.

Query Results and Keys

When retrieving results for an entity query, the datastore returns the full entity for each result to the application.

For large entities, this may mean more data is transmitted between the datastore and the app than is needed. It’s possible to fetch only a subset of properties under certain circumstances (we’ll see this in Projection Queries), but these are not full entities. Another option is to store frequently accessed properties on one entity, and less popular properties on a separate related entity. The first entity stores the key of the second entity as a property, so when the app needs the less popular data, it queries for the first entity, then follows the reference to the second entity.

The datastore can return just the keys for the entities that match your query instead of fetching the full entities. A keys-only query is useful for determining the entities that match query criteria separately from when the entities are used. Keys can be remembered in the memcache or in a datastore property, and vivified as full entities when needed. We’ll look at keys-only queries when we discuss each query API.

GQL

The Python and Java runtime environments provide several ways to formulate queries. They all do the same thing: they ask the datastore to return the entities whose keys and properties meet the given filter criteria, returned in an order determined by sorting the given properties.

One way to formulate a query is with GQL. GQL is a text-based query language that is intended to resemble SQL, the query language of relational databases. It supports only the features of the datastore’s query engine, and therefore lacks many features common to SQL. But it is expressive and concise enough to be useful for datastore queries.

Say we have entities of the kind Player representing players in an online role-playing game. The following GQL query retrieves all Player entities whose “level” property is an integer between 5 and 20, sorted by level in ascending order, then by score in descending order:

SELECT * FROM Player

WHERE level >= 5

AND level <= 20

ORDER BY level ASC, score DESC

You can use GQL to browse the contents of the datastore of your live application by using the Administration Console. To do this, select Data Viewer from the sidebar. You can browse entities by kind, or enter a GQL query. You can also create new entities through the Console, although new entities are limited to the kinds, properties, and value types of existing entities. You can also edit the properties of an entity by clicking on its ID. Figure 6-1 shows the results of running a GQL query in the Console.

The Data Viewer panel of the Administration Console

Figure 6-1. The Data Viewer panel of the Administration Console

NOTE

You can only perform a query in the Administration Console if the query does not require a custom index, or if the app already has the requisite index for the query. We discuss indexes in depth later in this chapter.

Python apps can also perform queries using GQL from code. We’ll look at this API in a moment. The Java datastore API does not yet support GQL. Java apps using JDO or JPA have other textual query languages at their disposal.

The first part of the SQL-like syntax in the preceding GQL query, SELECT * FROM Player, says that the datastore should return complete entities as results, and that it should only consider entities of the kind Player. This is the most common kind of query. Unlike a SQL database, the datastore cannot “join” properties from entities of multiple kinds into a single set of results. Only one kind can appear after the FROM.

Despite its similarity with SQL, GQL can only represent queries, and cannot perform updates, inserts, or deletes. In other words, every GQL query begins with SELECT.

The rest of the query syntax also resembles SQL, and translates directly to filters and sort orders. The WHERE clause, if present, represents one or more filter conditions, separated by AND. The ORDER BY clause, if present, represents one or more sort orders, separated by commas, applied from left to right.

Each condition in a WHERE clause consists of the name of a property, a comparison operator, and a value. You can also specify a condition that matches an entity’s ID or key name by using the reserved name __key__ like a property name. (That’s two underscores, the word “key,” and two underscores.)

The datastore query engine supports five comparison operators for filters: =, <, <=, >, and >=. GQL also supports two additional operators: !=, meaning “not equal to,” and IN, which tests that the value equals any in a set of values.

For IN, the values are represented by a comma-delimited list of values surrounded by parentheses:

SELECT * FROM Player WHERE level IN (5, 6, 7)

Internally, GQL translates the != and IN operators into multiple datastore queries that have the same effect. If a query contains an IN clause, the query is evaluated once for each value in the clause, using an = filter. The results of all the queries are aggregated. The != operator performs the query once using a < filter and once using a >, then aggregates the results. Using these operators more than once in a GQL statement requires a query for each combination of the required queries, so be judicious in their use.

The WHERE clause is equivalent to one or more filters. It is not like SQL’s WHERE clause, and does not support arbitrary logical expressions. In particular, it does not support testing the logical-OR of two conditions.

The value on the righthand side of a condition can be a literal value that appears inside the query string. Seven of the datastore value types have string literal representations, as shown in Table 6-1.

Table 6-1. GQL value literals for datastore types

Type

Literal syntax

Examples

String

Single-quoted string; escape the quote by doubling it

'Haven''t You Heard'

Integer or float

Sign, digits; float uses decimal point

-7 3.14

Boolean

True or false keywords

TRUE FALSE

Date-time, date or time

Type, and value as numbers or a string

DATETIME(year, month, day, hour, minute, second)

DATETIME('YYYY-MM-DD HH:MM:SS')

DATE(year, month, day)

DATE('YYYY-MM-DD')

TIME(hour, minute, second)

TIME('HH:MM:SS')

DATETIME(1999, 12, 31, 23, 59, 59)

DATETIME('1999-12-31 23:59:59')

DATE(1999, 12, 31)

DATE('1999-12-31')

TIME(23, 59, 59)

TIME('23:59:59')

Entity key

Entity kind, and name or ID; can be a path

KEY('kind', 'name'/id)

KEY('kind', 'name'/id, 'kind', 'name'/id, ...)

KEY('Player', 1287)

User object

User('email-address')

User('edward@example.com')

GeoPt object

GEOPT(lat, long)

GEOPT(37.4219, -122.0846)

The type of a filter’s value is significant in a datastore query. If you use the wrong type for a value, such as by specifying a 'string_key' when you meant to specify a KEY('kind', 'key_name'), your query may not return the results you intend, or may unexpectedly return no results.

A GQL query can specify a LIMIT, a maximum number of results to return. A query can also specify an OFFSET, a number of results to skip before returning the remaining results. The following example returns the third, fourth, and fifth results, and may return fewer if the query has fewer than five results:

SELECT * FROM Player LIMIT 3 OFFSET 2

GQL keywords, shown here in uppercase as is traditional with SQL, are case-insensitive. Kind and property names, however, are case-sensitive.

The Python Query API

Recall that the Python datastore API represents entities using objects of classes named after kinds. Example 6-1 shows Python code that creates three Player entities for an online role-playing game.

Example 6-1. Python code to create several entities of the kind Player

from google.appengine.ext import db

import datetime

class Player(db.Expando):

pass

player1 = Player(name='wizard612',

level=1,

score=32,

charclass='mage',

create_date=datetime.datetime.now()

player1.put()

player2 = Player(name='druidjane',

level=10,

score=896,

charclass='druid',

create_date=datetime.datetime.now()

player2.put()

player3 = Player(name='TheHulk',

level=7,

score=500,

charclass='warrior',

create_date=datetime.datetime.now()

player3.put()

Once again, we’ll use Expando to keep the examples simple. As we start talking about queries, the importance of using a consistent layout, or schema, for entities of a kind will become apparent.

The Python API provides two ways to formulate queries, one using an object-oriented interface and one based on GQL.

The Query Class

The first way to formulate a query is with an instance of the Query class. A Query object can be constructed in one of two ways, with equivalent results:

q = db.Query(Player)

q = Player.all()

In both cases, q is assigned a new Query instance that represents all entities of the kind Player. The query is not executed right away; right now it’s just a question waiting to be asked. Without filters or sort orders, the object represents a query for all objects of the given kind.

To apply a filter to the query, you call the filter() method on the Query object. It takes two arguments. The first argument is a string containing the name of a property, a space character, and a comparison operator. The second argument is the value for the comparison, of a type appropriate for the situation:

q.filter('level >', 5)

You specify multiple filters by calling the filter() method multiple times. An entity must meet all filter criteria in order to be a result for the query. That is, filters have a logical-AND relationship with one another:

q.filter('level >', 5)

q.filter('level <', 20)

For convenience, the filter() method returns the Query object, so you can chain multiple calls to filter() in a single line:

q.filter('level >', 5).filter('level <', 20)

The filter() method supports the equality (=) operator, and the four inequality operators (<, <=, >, and >=). The Query class also supports the != (not-equal) and IN operators from GQL, and does so in the same way, using multiple datastore queries and aggregating the results.

To apply a sort order, you call the order() method. This method takes one argument, a string containing the name of a property to sort by. If the string begins with a hyphen character (-), the sort will be in descending order; otherwise, the sort will be in ascending order. Like filter(),order() returns the Query object, for chaining:

q.order('-score')

The datastore can sort query results by multiple properties. First, it sorts by the property and order from the first call to order(), then it sorts entities with equal values for the first property, using the second order, and so on. For example, the following sorts first by level ascending, then byscore descending:

q.order('level').order('-score')

The query engine supports limiting the number of results returned, and skipping ahead a number of results. You specify the limit and offset when you retrieve the results, described later.

GQL in Python

The Python environment includes a rich API for preparing queries, using GQL. In addition to the pure textual syntax, the Python API supports additional syntax for parameterized substitution of filter values.

To use GQL from Python, you instantiate an instance of the GqlQuery class with the text of the query:

q = db.GqlQuery("""SELECT * FROM Player

WHERE level > 5

AND level < 20

ORDER BY level ASC, score DESC""")

You can also instantiate a GqlQuery from the kind class directly, using its gql() class method. When using this method, omit the SELECT * FROM Kind from the string, since this is implied by the use of the method:

q = Player.gql("""WHERE level > 5

AND level < 20

ORDER BY level ASC, score DESC""")

You can specify values for conditions by using parameter substitution. With parameter substitution, the query string contains a placeholder with either a number or a name, and the actual value is passed to GqlQuery() or gql() as either a positional argument or a keyword argument. The number or name appears in the query string preceded by a colon (:1 or :argname):

q = db.GqlQuery("""SELECT * FROM Player

WHERE level > :1

AND level < :2""",

5, 20)

q = db.GqlQuery("""SELECT * FROM Player

WHERE level > :min_level

AND level < :max_level""",

min_level=5, max_level=20)

One advantage to parameter substitution is that each argument value is of the appropriate datastore type, so you don’t need to bother with the string syntax for specifying typed values. For the datastore types that do not have string literal syntax (mostly the Google Data value types mentioned in Table 5-1), parameter substitution is the only way to use values of those types in a GQL query.

You can rebind new values to a GQL query by using parameter substitution after the GqlQuery object has been instantiated using the bind() method. This means you can reuse the GqlQuery object for multiple queries that have the same structure but different values. This saves time because the query string only needs to be parsed once. The bind() method takes the new values as either positional arguments or keyword arguments.

You can save more time by caching parameterized GqlQuery objects in global variables. This way, the query string is parsed only when a server loads the application for the first time, and the application reuses the object for all subsequent calls handled by that server:

_LEADERBOARD_QUERY = db.GqlQuery(

"""SELECT * FROM Player

WHERE level > :min_level

AND level < :max_level

ORDER BY level ASC, score DESC""")

class LeaderboardHandler(webapp.RequestHandler):

def get(self):

_LEADERBOARD_QUERY.bind(min_level=5, max_level=20)

# ...

Retrieving Results

Once you have a Query or GqlQuery object configured with filters, sort orders, and value bindings (in the case of GqlQuery), you can execute the query, using one of several methods. The query is not executed until you call one of these methods, and you can call these methods repeatedly on the same query object to re-execute the query and get new results.

The fetch() method returns a number of results, up to a specified limit. fetch() returns a list of entity objects, which are instances of the kind class:

q = db.Query(Player).order('-score')

results = q.fetch(10)

When serving web requests, it’s always good to limit the amount of data the datastore might return, so an unexpectedly large result set doesn’t cause the request handler to exceed its deadline. The fetch() method requires an argument specifying the number of results to fetch (the limitargument).

The fetch() method also accepts an optional offset parameter. If provided, the method skips that many results, then returns subsequent results, up to the specified limit:

q = db.Query(Player).order('-score')

results = q.fetch(10, offset=20)

For a GqlQuery, the fetch result limit and offset are equivalent to the LIMIT and OFFSET that can be specified in the query string itself. The arguments to fetch() override those specified in the GQL statement. The limit argument to fetch() is required, so it always overrides a GQLLIMIT clause.

NOTE

In order to perform a fetch with an offset, the datastore must find the first result for the query and then scan down the index to the offset. The amount of time this takes is proportional to the size of the offset, and may not be suitable for large offsets.

If you just want to retrieve results in batches, such as for a paginated display, there’s a better way: query cursors. See Query Cursors.

It’s common for an app to perform a query expecting to get just one result, or nothing. You can do this with fetch(1), and then test whether the list it returns is empty. For convenience, the query object also provides a get() method, which returns either the first result, or the Python valueNone if the query did not return any results:

q = db.Query(Player).filter('name =', 'django97')

player = q.get()

if player:

# ...

The count() method executes the query, then returns the number of results that would be returned instead of the results themselves. count() must perform the query to count the results in the index, and so takes time proportional to what it is counting. count() accepts a limit as a parameter, and providing one causes the method to return immediately once that many results have been counted:

q = db.Query(Player).filter('level >', 10)

if q.count(100) == 100:

# 100 or more players are above level 10.

TIP

To test whether a query would return a result, without retrieving any results:

if q.count(1) == 1:

# The query has at least one result.

Query objects provide one other mechanism for retrieving results, and it’s quite powerful. In the Python API, query objects are iterable. If you use the object in a context that accepts iterables, such as in a for loop, the object executes the query, then provides a Python standard iterator that returns each result one at a time, starting with the first result:

q = db.Query(Player).order('-score')

for player in q:

# ...

As the app uses the iterator, the iterator fetches the results in small batches. The iterator will not stop until it reaches the last result. It’s up to the app to stop requesting results from the iterator (to break from the for loop) when it has had enough.

Treating the query object as an iterable uses default options for the query. If you’d like to specify options but still use the iterable interface, call the run() method. This method takes the same options as fetch(), but returns the batch-loading iterator instead of the complete list of results.

Unlike fetch(), the limit argument is optional for run(). But it’s useful! With the limit argument, results are fetched in batches with the iterator interface, but the query knows to stop fetching after reaching the limit:

for player in q.run(limit=100):

# ...

This is especially useful for passing the iterator to a template, which may not have a way to stop the iteration on its own.

You can adjust the size of the batches fetched by the iterator interface by passing the batch_size argument to the run() method. The default is 20 results per batch.

TIP

When your application calls the fetch() method, your application must wait until all the results are retrieved before continuing. When using the iterator interface, results are retrieved in the background, as needed. This can make the overall running time of your request handler faster, because the datastore is working simultaneously with your app.

When using the query object as an iterator directly, the first call to the datastore does not happen until the first time the iterator is accessed, and the app must wait for the first batch of results before continuing. You can cause this first call to happen in the background by calling the run() method before using the iterator it returns:

player_iter = q.run()

# ... Do other things while the first batch is being fetched...

for player in player_iter:

# ...

Keys-Only Queries

Instead of returning complete entities, the datastore can return just the keys of the entities that match the query. This can be useful in cases when a component only needs to know which entities are results, and doesn’t need the results themselves. The component can store the keys for later retrieval of the entities, or pass the keys to other components, or perform further operations on the key list before fetching entities.

In the Python query API, you can query for keys by using either the Query interface or the GqlQuery interface. To request just the keys with Query, provide the keys_only argument set to True to the Query constructor:

q = db.Query(Player, keys_only=True)

To specify a keys-only query with GQL, begin the GQL query with SELECT __key__ (that’s two underscores, the word key, and two more underscores) instead of SELECT *:

q = db.GqlQuery('SELECT __key__ FROM Player')

When performing a keys-only query, each result returned by the query object is a Key object instead of an instance of the model class:

q = db.Query(Player, keys_only=True)

for result_key in q:

# result_key is a Key...

There is no way to perform keys-only queries by using the Model class methods all() and gql(). This makes sense: the return values of the Model methods represent collections of instances of the model class, not keys.

The Java Query API

If you are using the JPA interface or the JDO interface, you will use the query facilities of those interfaces to perform datastore queries: JPQL or JDOQL, respectively. The concepts of those interfaces map nicely to the concepts of datastore queries: a query has a kind (a class), filters, and sort orders. We’ll look at the calling conventions for JPQL when we look at JPA in Chapter 10.

Naturally, the low-level Java datastore API includes a query interface as well, and it has more features. Here is a brief example:

import com.google.appengine.api.datastore.DatastoreService;

import com.google.appengine.api.datastore.DatastoreServiceFactory;

import com.google.appengine.api.datastore.Entity;

import com.google.appengine.api.datastore.PreparedQuery;

import com.google.appengine.api.datastore.Query;

// ...

DatastoreService ds = DatastoreServiceFactory.getDatastoreService();

Query q = new Query("Book");

q.setFilter(

new Query.FilterPredicate(

"copyrightYear",

Query.FilterOperator.LESS_THAN_OR_EQUAL,

1950);

q.addSort("title");

PreparedQuery pq = ds.prepare(q);

for (Entity result : pq.asIterable() {

String title = (String) result.getProperty("title");

// ...

}

To perform a query, you instantiate the Query class (from the com.google.appengine.api.datastore package), providing the name of the kind of the entities to query as an argument to the constructor. You call methods on the query object to add filters and sort orders. To perform the query, you pass the query object to a method of the DatastoreService instance. This method returns a PreparedQuery object, which you can manipulate to retrieve the results.

Let’s take a closer look at building queries, and then fetching results.

Building the Query

You start a query by constructing a Query instance. To query entities of a kind, you provide the kind name as a constructor argument, as a string:

Query q = new Query("Book");

Without calling any additional methods, this represents a query for all entities of the kind "Book".

You can tell the query to filter the results by calling the setFilter() method. A query has one filter, which can have one or more predicates. A predicate is a single property name, a comparison operator, and a value. If your query filter only has one predicate, you can construct aQuery.FilterPredicate, and pass it directly to setFilter():

q.setFilter(

new Query.FilterPredicate(

"copyrightYear",

Query.FilterOperator.LESS_THAN_OR_EQUAL,

1950);

Filter predicate operators include LESS_THAN, LESS_THAN_OR_EQUAL, EQUAL, NOT_EQUAL, GREATER_THAN_OR_EQUAL, GREATER_THAN, and IN.

To build a filter with more than one predicate, you use a Query.CompositeFilter. A composite filter is a collection of filters united by a logical operator, either “and” or “or.” Each filter in the collection can be a filter predicate or another composite filter, allowing you to build complex filters.

The first argument to the Query.CompositeFilter constructor is the composite operator, either Query.CompositeFilterOperator.AND or Query.CompositeFilterOperator.OR. The second argument is a Collection of filters, each of which can be either aQuery.FilterPredicate or a Query.CompositeFilter instance. The verbose object-oriented syntax looks like this:

q.setFilter(

new Query.CompositeFilter(

Query.CompositeFilterOperator.AND,

new ArrayList<Query.Filter>(Arrays.asList(

new Query.FilterPredicate(

"copyrightYear",

Query.FilterOperator.LESS_THAN_OR_EQUAL,

1950),

new Query.FilterPredicate(

"category",

Query.FilterOperator.EQUAL,

"Science Fiction")));

That can get a little unwieldy, so there’s also a shortcut API using methods on the composite and predicate operator classes. Query.CompositeFilterOperator.and() and Query.CompositeFilterOperator.or() are static methods that take one or more filters and return a composite filter with the appropriate operator. Each filter predicate operator constant has an of() method that takes a property name and a value. The example above shortens to this:

q.setFilter(

Query.CompositeFilterOperator.and(

Query.FilterOperator.LESS_THAN_OR_EQUAL.of(

"copyrightYear", 1950),

Query.FilterOperator.EQUAL.of(

"category", "Science Fiction"));

TIP

The filter predicate operators IN or NOT_EQUAL and the composite operator OR are not supported by the datastore natively. They are implemented by performing multiple datastore queries and processing the results. While it’s possible to build arbitrarily complex query filters with the Java Query API, it’s best to limit the use of these operators. See Not-Equal and IN Filters.

You can specify that the results of a query be sorted by calling the addSort() method for each sort order. The method takes a property name, and an optional Query.SortDirection, either Query.SortDirection.ASCENDING or Query.SortDirection.DESCENDING. The default is ASCENDING:

q.addSort("title");

Fetching Results with PreparedQuery

The query is not actually performed until you attempt to access results, using the PreparedQuery object. If you access the results by using an iterator via the asIterable() or asIterator() methods, the act of iterating causes the API to fetch the results in batches. When these methods are called without arguments, the resulting iterator keeps going until all results for the query have been returned. If a query has a large number of results, this may take longer than the time allowed for the request.

The asIterable() and asIterator() methods accept an optional argument, a FetchOptions object, that controls which results are returned. Options can include an offset, a number of results to skip prior to returning any, and a limit, a maximum number of results to return.FetchOptions uses a builder-style interface, as follows:

import com.google.appengine.api.datastore.Entity;

import com.google.appengine.api.datastore.FetchOptions;

import com.google.appengine.api.datastore.PreparedQuery;

import com.google.appengine.api.datastore.Query;

// ...

// Query q = ...

PreparedQuery pq = ds.prepare(q);

Iterable<Entity> results =

pq.asIterable(FetchOptions.Builder.withLimit(10).offset(20);

for (Entity result : results) {

String title = (String) result.getProperty("title");

// ...

}

This tells the datastore to skip the first 20 results, and return up to the next 10 results (if any).

NOTE

In order to perform a fetch with an offset, the datastore must find the first result for the query and then scan down the index to the offset. The amount of time this takes is proportional to the size of the offset, and may not be suitable for large offsets.

If you just want to retrieve results in batches, such as for a paginated display, there’s a better way: query cursors. See Query Cursors.

Instead of fetching results in batches, you can get all results in a list by calling the asList() method of the PreparedQuery class. The method returns a List<Entity>. Unlike the iterator interface, which gets results in batches, this method retrieves all results with a single service call. The method requires that a limit be specified using FetchOptions.

If a query is likely to have only one result, or if only the first result is desired, calling the asSingleEntity() method retrieves the result and returns an Entity object, or null.

If you just want a count of the results and not the entities themselves, you can call the countEntities() method of the PreparedQuery. Because the datastore has to perform the query to get the count, the speed of this call is proportional to the count, although faster than actually fetching the results.

To test whether a query would return a result without retrieving the result, call countEntities() with a limit of 1:

if (pq.countEntities(FetchOptions.Builder.withLimit(1) == 1) {

// The query has at least one result.

}

This is the fastest possible such test: it finds the first place in the index where there might be a result for the query, then attempts to count one result, and reports whether that count was successful.

Keys-Only Queries in Java

You can fetch just the keys for the entities that match a query instead of the full entities by using the low-level Java datastore API. To declare that a query should return just the keys, call the setKeysOnly() method on the Query object:

Query q = new Query("Book");

q.setKeysOnly();

When a query is set to return only keys, the results of the query are Entity objects without any properties set. You can get the key from these objects by using the getKey() method:

PreparedQuery pq = ds.prepare(q);

for (Entity result : pq.asIterable() {

Key k = result.getKey();

// ...

}

You can also perform keys-only queries using the JDO and JPA interfaces. See Chapter 10.

Introducing Indexes

For every query an application performs, App Engine maintains an index, a single table of possible answers for the query. Specifically, it maintains an index for a set of queries that use the same filters and sort orders, possibly with different values for the filters. Consider the following simple query:

SELECT * FROM Player WHERE name = 'druidjane'

To perform this query, App Engine uses an index containing the keys of every Player entity and the value of each entity’s name property, sorted by the name property values in ascending order. Such an index is illustrated in Figure 6-2.

An index of Player entity keys and “name” property values, sorted by name in ascending order, with the result for WHERE name = ‘druidjane’

Figure 6-2. An index of Player entity keys and “name” property values, sorted by name in ascending order, with the result for WHERE name = ‘druidjane’

To find all entities that meet the conditions of the query, App Engine finds the first row in the index that matches, then it scans down to the first row that doesn’t match. It returns the entities mentioned on all rows in this range (not counting the nonmatching row), in the order they appear in the index. Because the index is sorted, all results for the query are guaranteed to be on consecutive rows in the table.

App Engine would use this same index to perform other queries with a similar structure but different values, such as the following query:

SELECT * FROM Player WHERE name = 'duran89'

This query mechanism is fast, even with a very large number of entities. Entities and indexes are distributed across multiple machines, and each machine scans its own index in parallel with the others. Each machine returns results to App Engine as it scans its own index, and App Engine delivers the final result set to the app, in order, as if all results were in one large index.

Another reason queries are fast has to do with how the datastore finds the first matching row. Because indexes are sorted, the datastore can use an efficient algorithm to find the first matching row. In the common case, finding the first row takes approximately the same amount of time regardless of the size of the index. In other words, the speed of a query is not affected by the size of the data set.

App Engine updates all relevant indexes when property values change. In this example, if an application retrieves a Player entity, changes the name, and then saves the entity with a call to the put() method, App Engine updates the appropriate row in the previous index. It also moves the row if necessary so the ordering of the index is preserved. The call to put() does not return until all appropriate indexes are updated.

Similarly, if the application creates a new Player entity with a name property, or deletes a Player entity with a name property, App Engine updates the index. In contrast, if the application updates a Player but does not change the name property, or creates or deletes a Player that does not have a name property, App Engine does not update the name index because no update is needed.

App Engine maintains two indexes like the previous example above for every property name and entity kind, one with the property values sorted in ascending order and one with values in descending order. App Engine also maintains an index of entities of each kind. These indexes satisfy some simple queries, and App Engine also uses them internally for bookkeeping purposes.

For other queries, you must tell App Engine which indexes to prepare. You do this using a configuration file, which gets uploaded along with your application’s code. For Python apps, this file is named index.yaml. For Java, this file is WEB-INF/datastore-indexes.xml.

It’d be a pain to write this file by hand, but thankfully you don’t have to. While you’re testing your application in the development web server from the SDK, when the app performs a datastore query, the server checks that the configuration file has an appropriate entry for the needed index. If it doesn’t find one, it adds one. As long as the app performs each of its queries at least once during testing, the resulting configuration file will be complete.

The index configuration file must be complete, because when the app is running on App Engine, if the application performs a query for which there is no index, the query returns an error. You can tell the development web server to behave similarly if you want to test for these error conditions. (How to do this depends on which SDK you are using; see Configuring Indexes.)

Indexes require a bit of discipline to maintain. Although the development tools can help add index configuration, they cannot know when an index is unused and can be deleted from the file. Extra indexes consume storage space and slow down updates of properties mentioned in the index. And while the version of the app you’re developing may not need a given index, the version of the app still running on App Engine may still need it. The App Engine SDK and the Administration Console include tools for inspecting and maintaining indexes. We’ll look at these tools inChapter 20.

Before we discuss index configuration, let’s look more closely at how indexes support queries. We just saw an example where the results for a simple query appear on consecutive rows in a simple index. In fact, this is how most queries work: the results for every query that would use an index appear on consecutive rows in the index. This is both surprisingly powerful in some ways and surprisingly limited in others, and it’s worth understanding why.

Automatic Indexes and Simple Queries

As we mentioned, App Engine maintains two indexes for every single property of every entity kind, one with values in ascending order and one with values in descending order. App Engine builds these indexes automatically, whether or not they are mentioned in the index configuration file. These automatic indexes satisfy the following kinds of queries using consecutive rows:

§ A simple query for all entities of a given kind, no filters or sort orders

§ One filter on a property using the equality (=) operator

§ Filters using greater-than or less-than operators (>, >=, <, <=) on a single property

§ One sort order, ascending or descending, and no filters, or with filters only on the same property used with the sort order

§ Filters or a sort order on the entity key

§ Kindless queries with or without key filters

Let’s look at each of these in action.

All Entities of a Kind

The simplest datastore query asks for every entity of a given kind, in any order. Stated in GQL, a query for all entities of the kind Player looks like this:

SELECT * FROM Player

App Engine maintains an index mapping kinds to entity keys. This index is sorted using a deterministic ordering for entity keys, so this query returns results in “key order.” The kind of an entity cannot be changed after it is created, so this index is updated only when entities are created and deleted.

Since a query can only refer to one kind at a time, you can imagine this index as simply a list of entity keys for each kind. Figure 6-3 illustrates an example of this index.

An index of all Player entity keys, with results for SELECT * FROM Player

Figure 6-3. An index of all Player entity keys, with results for SELECT * FROM Player

When the query results are fetched, App Engine uses the entity keys in the index to find the corresponding entities, and returns the full entities to the application.

One Equality Filter

Consider the following query, which asks for every Player entity with a level property with a value of the integer 10:

SELECT * FROM Player WHERE level = 10

This query uses an index of Player entities with the level property, ascending—one of the automatic indexes. It uses an efficient algorithm to find the first row with a level equal to 10. Then it scans down the index until it finds the first row with a level not equal to 10. The consecutive rows from the first matching to the last matching represent all the Player entities with a level property equal to the integer 10. This is illustrated in Figure 6-4.

An index of the Player entity “level” properties, sorted by level then by key, with results for WHERE level = 10

Figure 6-4. An index of the Player entity “level” properties, sorted by level then by key, with results for WHERE level = 10

Greater-Than and Less-Than Filters

The following query asks for every Player entity with a score property whose value is greater than the integer 500:

SELECT * FROM Player WHERE score > 500

This uses an index of Player entities with the score property, ascending, also an automatic index. As with the equality filter, it finds the first row in the index whose score is greater than 500. In the case of greater-than, since the table is sorted by score in ascending order, every row from this point to the bottom of the table is a result for the query. See Figure 6-5.

An index of the Player entity “score” properties, sorted by “score” then by key, with results for WHERE score > 500

Figure 6-5. An index of the Player entity “score” properties, sorted by “score” then by key, with results for WHERE score > 500

Similarly, consider a query that asks for every Player with a score less than 1,000:

SELECT * FROM Player WHERE score < 1000

App Engine uses the same index (score, ascending), and the same strategy: it finds the first row that matches the query, in this case the first row. Then it scans to the next row that doesn’t match the query, the first row whose score is greater than or equal to 1000. The results are represented by everything above that row.

Finally, consider a query for score values between 500 and 1,000:

SELECT * FROM Player WHERE score > 500 AND score < 1000

Once again, the same index and strategy prevail: App Engine scans from the top down, finding the first matching and next nonmatching rows, returning the entities represented by everything in between. This is shown in Figure 6-6.

If the values used with the filters do not represent a valid range, such as score < 500 AND score > 1000, the query planner notices this and doesn’t bother performing a query, since it knows the query has no results.

One Sort Order

The following query asks for every Player entity, arranged in order by level, from lowest to highest:

SELECT * FROM Player ORDER BY level

As before, this uses an index of Player entities with level properties in ascending order. If both this query and the previous equality query were performed by the application, both queries would use the same index. This query uses the index to determine the order in which to returnPlayer entities, starting at the top of the table and moving down until the application stops fetching results, or until the bottom of the table. Recall that every Player entity with a level property is mentioned in this table. See Figure 6-7.

An index of the Player entity “score” properties, sorted by score, with results for WHERE score > 500 AND score < 1000

Figure 6-6. An index of the Player entity “score” properties, sorted by score, with results for WHERE score > 500 AND score < 1000

An index of the Player entity “level” properties sorted by level in ascending order, with results for ORDER BY level

Figure 6-7. An index of the Player entity “level” properties sorted by level in ascending order, with results for ORDER BY level

The following query is similar to the previous one, but asks for the entities arranged by level from highest to lowest:

SELECT * FROM Player ORDER BY level DESC

This query cannot use the same index as before, because the results are in the wrong order. For this query, the results should start at the entity with the highest level, so the query needs an index where this result is in the first row. App Engine provides an automatic index for single properties in descending order for this purpose. See Figure 6-8.

An index of the Player entity “level” properties sorted by level in descending order, with results for ORDER BY level DESC

Figure 6-8. An index of the Player entity “level” properties sorted by level in descending order, with results for ORDER BY level DESC

If a query with a sort order on a single property also includes filters on that property, and no other filters, App Engine still needs only the one automatic index to fulfill the query. In fact, you may have noticed that for these simple queries, the results are returned sorted by the property in ascending order, whether or not the query specifies the sort order explicitly. In these cases, the ascending sort order is redundant.

Queries on Keys

In addition to filters and sort orders on properties, you can also perform queries with filters and sort orders on entity keys. You can refer to an entity’s key in a filter or sort order using the special name __key__.

An equality filter on the key isn’t much use. Only one entity can have a given key, and if the key is known, it’s faster to perform a get() than a query. But an inequality filter on the key can be useful for fetching ranges of keys. (If you’re fetching entities in batches, consider using query cursors. See Query Cursors.)

App Engine provides automatic indexes of kinds and keys, sorted by key in ascending order. The query returns the results sorted in key order. This order isn’t useful for display purposes, but it’s deterministic. A query that sorts keys in descending order requires a custom index.

App Engine uses indexes for filters on keys in the same way as filters on properties, with a minor twist: a query using a key filter in addition to other filters can use an automatic index if a similar query without the key filter could use an automatic index. Automatic indexes for properties already include the keys, so such queries can just use the same indexes. And of course, if the query has no other filters beyond the key filter, it can use the automatic key index.

Kindless Queries

In addition to performing queries on entities of a given kind, the datastore lets you perform a limited set of queries on entities of all kinds. Kindless queries cannot use filters or sort orders on properties. They can, however, use equality and inequality filters on keys (IDs or names).

Kindless queries are mostly useful in combination with ancestors, which we’ll discuss in Chapter 7. They can also be used to get every entity in the datastore. (If you’re querying a large number of entities, you’ll probably want to fetch the results in batches. See Query Cursors.)

Using the Python Query class, you perform a kindless query by omitting the model class argument from the constructor:

q = db.Query()

q.filter('__key__ >', last_key)

Similarly in the Java low-level API, you perform a kindless query by instantiating the Query class using the no-argument constructor.

In GQL, you specify a kindless query by omitting the FROM Kind part of the statement:

q = db.GqlQuery('SELECT * WHERE __key__ > :1', last_key)

The results of a kindless query are returned in key order, ascending. Kindless queries use an automatic index.

NOTE

The datastore maintains statistics about the apps data in a set of datastore entities. When the app performs a kindless query, these statistics entities are included in the results. The kind names for these entities all begin with the characters __Stat_ (two underscores, Stat, and another underscore). Your app will need to filter these out if they are not desired.

The Python ext.db interface expects there to be a db.Model (or db.Expando) class defined or imported for each kind of each query result. When using kindless queries in Python, you must define or import all possible kinds. To load classes for the datastore statistics entity kinds, import the db.stats module:

from google.appengine.ext.db import stats

For more information about datastore statistics, see Accessing Metadata from the App.

Custom Indexes and Complex Queries

All queries not covered by the automatic indexes must have corresponding indexes defined in the app’s index configuration file. We’ll refer to these as “custom indexes,” in contrast with “automatic indexes.” App Engine needs these hints because building every possible index for every combination of property and sort order would take a gargantuan amount of space and time, and an app isn’t likely to need more than a fraction of those possibilities.

In particular, the following queries require custom indexes:

§ A query with multiple sort orders

§ A query with an inequality filter on a property and filters on other properties

§ Projection queries

A query that uses just equality filters on properties does not need a custom index in most cases thanks to a specialized query algorithm for this case, which we’ll look at in a moment. Also, filters on keys do not require custom indexes; they can operate on whatever indexes are used to fulfill the rest of the query.

Let’s examine these queries and the indexes they require. We’ll cover projection queries in Projection Queries.

Multiple Sort Orders

The automatic single-property indexes provide enough information for one sort order. When two entities have the same value for the sorted property, the entities appear in the index in adjacent rows, ordered by their entity keys. If you want to order these entities with other criteria, you need an index with more information.

The following query asks for all Player entities, sorted first by the level property in descending order, then, in the case of ties, sorted by the score property in descending order:

SELECT * FROM Player ORDER BY level DESC, score DESC

The index this query needs is straightforward: a table of Player entity keys, level values, and score values, sorted according to the query. This is not one of the indexes provided by the datastore automatically, so it is a custom index, and must be mentioned in the index configuration file. If you performed this query in the Python development web server, the server would add the following lines to the index.yaml file:

- kind: Player

properties:

- name: level

direction: desc

- name: score

direction: desc

The order the properties appear in the configuration file matters. This is the order in which the rows are sorted: first by level descending, then by score descending.

This configuration creates the index shown in Figure 6-9. The results appear in the table, and are returned for the query in the desired order.

An index of the Player entity “level” and “score” properties, sorted by level descending, then score descending, then by key ascending

Figure 6-9. An index of the Player entity “level” and “score” properties, sorted by level descending, then score descending, then by key ascending

Filters on Multiple Properties

Consider the following query, which asks for every Player with a level greater than the integer 10 and a charclass of the string 'mage':

SELECT * FROM Player WHERE charclass='mage' AND level > 10

To be able to scan to a contiguous set of results meeting both filter criteria, the index must contain columns of values for these properties. The entities must be sorted first by charclass, then by level.

For Python, the index configuration for this query would appear as follows in the index.yaml file:

- kind: Player

properties:

- name: charclass

direction: asc

- name: level

direction: asc

This index is illustrated in Figure 6-10.

An index of the Player entity “charclass” and “level” properties, sorted by charclass, then level, then key, with results for WHERE charclass = “mage” AND level > 10

Figure 6-10. An index of the Player entity “charclass” and “level” properties, sorted by charclass, then level, then key, with results for WHERE charclass = “mage” AND level > 10

The ordering sequence of these properties is important! Remember: the results for the query must all appear on adjacent rows in the index. If the index for this query were sorted first by level then by charclass, it would be possible for valid results to appear on nonadjacent rows.Figure 6-11 demonstrates this problem.

An index of the Player entity “charclass” and “level” properties, sorted first by level then by charclass, which cannot satisfy WHERE charclass = “mage” AND level > 10 with consecutive rows

Figure 6-11. An index of the Player entity “charclass” and “level” properties, sorted first by level then by charclass, which cannot satisfy WHERE charclass = “mage” AND level > 10 with consecutive rows

The index ordering requirement for combining inequality and equality filters has several implications that may seem unusual when compared to the query engines of other databases. Heck, they’re downright weird. The first implication, illustrated previously, can be stated generally:

The First Rule of Inequality Filters: If a query uses inequality filters on one property and equality filters on one or more other properties, the index must be ordered first by the properties used in equality filters, then by the property used in the inequality filters.

This rule has a corollary regarding queries with both an inequality filter and sort orders. Consider the following possible query:

SELECT * FROM Player WHERE level > 10 ORDER BY score DESC

What would the index for this query look like? For starters, it would have a column for the level, so it can select the rows that match the filter. It would also have a column for the score, to determine the order of the results. But which column is ordered first?

The First Rule implies that level must be ordered first. But the query requested that the results be returned sorted by score, descending. If the index were sorted by score, then by level, the rows may not be adjacent.

To avoid confusion, App Engine requires that the correct sort order be stated explicitly in the query:

SELECT * FROM Player WHERE level > 10 ORDER BY level, score DESC

In general:

The Second Rule of Inequality Filters: If a query uses inequality filters on one property and sort orders of one or more other properties, the index must be ordered first by the property used in the inequality filters (in either direction), then by the other desired sort orders. To avoid confusion, the query must state all sort orders explicitly.

There’s one last implication to consider with regard to inequality filters. The following possible query attempts to get all Player entities with a level less than 10 and a score less than 500:

SELECT * FROM Player WHERE level < 10 AND score < 500

Consider an index ordered first by level, then by score, as shown in Figure 6-12.

Neither possible index of the Player entity “level” and “score” properties can satisfy WHERE level < 10 AND score < 500 with consecutive rows

Figure 6-12. Neither possible index of the Player entity “level” and “score” properties can satisfy WHERE level < 10 AND score < 500 with consecutive rows

In fact, there is no possible index that could satisfy this query completely using consecutive rows. This is not a valid App Engine datastore query.

The Third Rule of Inequality Filters: A query cannot use inequality filters on more than one property.

A query can use multiple inequality filters on the same property, such as to test for a range of values.

Multiple Equality Filters

For queries using just equality filters, it’s easy to imagine custom indexes that satisfy them. For instance:

SELECT * FROM Player WHERE charclass='mage' AND level=10

A custom index containing these properties, ordered in any sequence and direction, would meet the query’s requirements. But App Engine has another trick up its sleeve for this kind of query. For queries using just equality filters and no sort orders, instead of scanning a single table of all values, App Engine can scan the automatic single-property indexes for each property, and return the results as it finds them. App Engine can perform a “merge join” of the single-property indexes to satisfy this kind of query.

In other words, the datastore doesn’t need a custom index to perform queries using just equality filters and no sort orders. If you add a suitable custom index to your configuration file, the datastore will use it. But a custom index is not required, and the development server’s automatic index configuration feature will not add one if it doesn’t exist.

Let’s consider how the algorithm would perform the following query, using single-property indexes:

SELECT * FROM Kind WHERE a=1 AND b=2 AND c=3

Recall that each of these tables contains a row for each entity with the property set, with fields for the entity’s key and the property’s value. The table is sorted first by the value, then by the key. The algorithm takes advantage of the fact that rows with the same value are consecutive, and within that consecutive block, rows are sorted by key.

To perform the query, the datastore uses the following steps:

1. The datastore checks the a index for the first row with a value of 1. The entity whose key is on this row is a candidate, but not yet a confirmed result.

2. It then checks the b index for the first row whose value is 2 and whose key is greater than or equal to the candidate’s key. Other rows with a value of 2 may appear above this row in the b index, but the datastore knows those are not candidates because the first a scan determined the candidate with the smallest key.

3. If the datastore finds the candidate’s key in the matching region of b, that key is still a candidate, and the datastore proceeds with a similar check in the index for c. If the datastore does not find the candidate in the b index but does find another larger key with a matching value, that key becomes the new candidate, and it proceeds to check for the new candidate in the c index. (It’ll eventually go back to check a with the new candidate before deciding it is a result.) If it finds neither the candidate nor a matching row with a larger key, the query is complete.

4. If a candidate is found to match all criteria in all indexes, the candidate is returned as a result. The datastore starts the search for a new candidate, using the previous candidate’s key as the minimum key.

Figure 6-13 illustrates this zigzag search across the single-property indexes, first with a failed candidate, then two successful candidates.

The merge join algorithm finding two entities WHERE a=1 AND b=2 AND c=3

Figure 6-13. The merge join algorithm finding two entities WHERE a=1 AND b=2 AND c=3

A key feature of this algorithm is that it finds results in the order in which they are to be returned: key order. The datastore does not need to compile a complete list of possible results for the query—possibly millions of entities—then sort them to determine which results ought to be first. Also, the datastore can stop scanning as soon as it has enough results to fulfill the query, which is always a limited number of entities.

Of course, this query could also use a custom index with all the filter properties in it. If you provide configuration for such an index, the query will use the custom index instead of doing the zigzag join. This can result in a query faster than the zigzag join, at the expense of added time to update the indexed entities.

A zigzag-capable query using equality filters on properties can also use inequality filters on keys without needing a custom index. This is useful for fetching a large number of results in key ranges. (But if you’re fetching batches, cursors might be more effective. See Query Cursors.)

Not-Equal and IN Filters

The Python and Java query APIs support two operators we haven’t discussed yet: != (not-equal) and IN. These operators are not actually supported by the datastore itself. Instead, they are implemented by the datastore API as multiple queries in terms of the other operators.

The filter prop != value matches every entity whose property does not equal the value. The datastore API determines the result set by performing two queries: one using prop < value in place of the not-equal filter, and one using prop > value in place of the filter. It returns both sets of results as one result set, which it can do reasonably quickly because the results are already in order.

Because not-equal is actually implemented in terms of the inequality operators, it is subject to the three rules of inequality operators:

1. The query’s index must be ordered by the property used with the not-equal filter before other sort orders.

2. If the query uses other explicit sort orders, the not-equal filter’s property must be explicitly ordered first.

3. And finally, any query using a not-equal filter cannot also use inequality or not-equal filters on other properties.

A not-equal filter will never return an entity that doesn’t have the filtered property. This is true for all filters, but can be especially counterintuitive in the case of not-equal.

The filter prop IN (value1, value2, value3) matches every entity whose property equals any of the values. The datastore API implements this as a series of equality queries, one for each value to test. The more values that appear in the list, the longer the full set of queries will take to execute.

If a single query includes multiple IN filters on multiple properties, the datastore API must perform equality queries for every combination of values in all filters. prop1 IN (value1, value2, value3, value4) AND prop2 IN (value5, value6, value7) is equivalent to 12 queries using equality filters.

The != and IN operators are useful shortcuts. But because they actually perform multiple queries, they take longer to execute than the other operators. It’s worth understanding their performance implications before using them.

Unset and Nonindexed Properties

As you’ve seen, an index contains columns of property values. Typically, an app creates entities of the same kind with the same set of properties: every Player in our game has a name, a character class, a level, and a score. If every entity of a kind has a given property, then an index for that kind and property has a row corresponding to each entity of the kind.

But the datastore neither requires nor enforces a common layout of properties across entities of a kind. It’s quite possible for an entity to not have a property that other entities of the same kind have. For instance, a Player entity might be created without a character class, and go without until the user chooses one.

It is possible to set a property with a null value (Python’s None, Java’s null), but a property set to the null value is distinct from the property not being set at all. This is different from a tabular database, which requires a value (possibly null) for every cell in a row.

If an entity does not have a property used in an index, the entity does not appear in the index. Stated conversely, an entity must have every property mentioned in an index to appear in the index. If a Player does not have a charclass property, it does not appear in any index with acharclass column.

If an entity is not mentioned in an index, it cannot be returned as a result for a query that uses the index. Remember that queries use indexes for both filters and sort orders. A query that uses a property for any kind of filter or any sort order can never return an entity that doesn’t have that property. The charclass-less Player can never be a result for a Player query that sorts results by charclass.

In Strings, Text, and Blobs, we mentioned that text and blob values are not indexed. Another way of saying this is that, for the purposes of indexes, a property with a text or blob value is treated as if it is unset. If an app performs a query by using a filter or sort order on a property that is always set to a text or blob value, that query will always return no results.

It is sometimes useful to store property values of other types, and exempt them from indexes. This saves space in index tables, and reduces the amount of time it takes to save the entity.

In the Python API, the only way to declare a property as unindexed is with the Model API. There is currently no other way to set a specific property as unindexed in this API. We’ll look at this feature when we discuss the modeling API in Chapter 9.

In the Java API, you can set a property as unindexed by using the setUnindexedProperty() method of the Entity object, instead of the setProperty() method. An entity can only have one property of a given name, so an unindexed property overwrites an indexed one, and vice versa. You can also declare properties as unindexed in the JDO and JPA interfaces; see Chapter 10.

If you need an entity to qualify as a result for a query, but it doesn’t make sense in your data model to give the entity every property used in the query, use the null value to represent the “no value” case, and always set it. The Python modeling API and the Java JDO and JPA interfaces make it easy to ensure that properties always have values.

Sort Orders and Value Types

App Engine keeps the rows of an index sorted in an order that supports the corresponding query. Each type of property value has its own rules for comparing two values of the same type, and these rules are mostly intuitive: integers are sorted in numeric order, strings in Unicode order, and so forth.

Two entities can have values of different types for the same property, so App Engine also has rules for comparing such values, although these rules are not so intuitive. Values are ordered first by type, then within their type. For instance, all integers are sorted above all strings.

One effect of this that might be surprising is that all floats are sorted below all integers. The datastore treats floats and integers as separate value types, and so sorts them separately. If your app relies on the correct ordering of numbers, make sure all numbers are stored using the same type of value.

The datastore stores eight distinct types of values, not counting the nonindexed types (text and blob). The datastore supports several additional types by storing them as one of the eight types, then marshaling them between the internal representation and the value your app sees automatically. These additional types are sorted by their internal representation. For instance, a date-time value is actually stored as an integer, and will be sorted amongst other integer values in an index. (When comparing date-time values, this results in chronological order, which is what you would expect.)

Table 6-2 describes the eight indexable types supported by the datastore. The types are listed in their relative order, from first to last.

Table 6-2. How the datastore value types are sorted

Data type

Python type

Java type

Ordering

The null value

None

null

-

Integer and date-time

long, datetime.datetime, db.Rating

long (other integer types are widened), java.util.Date, datastore.Rating

Numeric (date-time is chronological)

Boolean

bool (True or False)

boolean (true or false)

False, then true

Byte string

db.ByteString

datastore.ShortBlob

Byte order

Unicode string

unicode, db.Category, db.Email, db.IM, db.Link, db.PhoneNumber,db.PostalAddress

java.lang.String, datastore.Category, datastore.Email, datastore.IMHandle, datastore.Link, datastore.PhoneNumber,datastore.PostalAddress

Unicode character order

Floating-point number

float

double

Numeric

Geographical point

db.GeoPt

datastore.GeoPt

By latitude, then longitude (floating-point numbers)

A Google account

users.User

users.User

By email address, Unicode order

Entity key

db.Key

datastore.Key

Kind (byte string), then ID (numeric) or name (byte string)

Queries and Multivalued Properties

In a typical database, a field in a record stores a single value. A record represents a data object, and each field represents a single, simple aspect of the object. If a data object can have more than one of a particular thing, each of those things is typically represented by a separate record of an appropriate kind, associated with the data object by using the object’s key as a field value. App Engine supports both of these uses of fields: a property can contain a simple value or the key of another entity.

But the App Engine datastore can do something most other databases can’t: it can store more than one value for a single property. With multivalued properties (MVPs), you can represent a data object with more than one of something without resorting to creating a separate entity for each of those things, if each thing could be represented by a simple value.

One of the most useful features of multivalued properties is how they match an equality filter in a query. The datastore query engine considers a multivalued property equal to a filter value if any of the property’s values is equal to the filter value. This ability to test for membership means MVPs are useful for representing sets.

Multivalued properties maintain the order of values, and can have repeated items. The values can be of any datastore type, and a single property can have values of different types.

A Simple Example

Consider the following example. The players of our online game can earn trophies for particular accomplishments. The app needs to display a list of all the trophies a player has won, and the app needs to display a list of all the players who have won a particular trophy. The app doesn’t need to maintain any data about the trophies themselves; it’s sufficient to just store the name of the trophy. (This could also be a list of keys for trophy entities.)

One option is to store the list of trophy names as a single delimited string value for each Player entity. This makes it easy to get the list of trophies for a particular player, but impossible to get the list of players for a particular trophy. (A query filter can’t match patterns within string values.)

Another option is to record each trophy win in a separate property named after the trophy. To get the list of players with a trophy, you just query for the existence of the corresponding property. However, getting the list of trophies for a given player would require either coding the names of all the trophies in the display logic, or iterating over all the Player entity’s properties looking for trophy names.

With multivalued properties, we can store each trophy name as a separate value for the trophies property. To access a list of all trophies for a player, we simply access the property of the entity. To get a list of all players with a trophy, we use a query with an equality filter on the property.

Here’s what this example looks like in Python:

p = Player.get_by_key_name(user_id)

p.trophies = ['Lava Polo Champion',

'World Building 2008, Bronze',

'Glarcon Fighter, 2nd class']

p.put()

# List all trophies for a player.

for trophy in p.trophies:

# ...

# Query all players that have a trophy.

q = Player.gql("WHERE trophies = 'Lava Polo Champion'")

for p in q:

# ...

MVPs in Python

The Python API represents the values of a multivalued property as a Python list. Each member of the list must be of one of the supported datastore types:

class Entity(db.Expando):

pass

e = Entity()

e.prop = [ 'value1', 123, users.get_current_user() ]

Remember that list is not a datastore type; it is only the mechanism for manipulating multivalued properties. A list cannot contain another list.

A property must have at least one value, otherwise the property does not exist. To enforce this, the Python API does not allow you to assign an empty list to a property. Notice that the API can’t do otherwise: if a property doesn’t exist, then the API cannot know to represent the missing property as an empty list when the entity is retrieved from the datastore. (This being Python, the API could return the empty list whenever a nonexistent property is accessed, but that might be more trouble than it’s worth.)

Because it is often useful for lists to behave like lists, including the ability to contain zero items, the Python data modeling API provides a mechanism that supports assigning the empty list to a property. We’ll look at this mechanism in Chapter 9.

MVPs in Java

The Java low-level datastore API represents multivalued properties as java.util.List objects, parameterized with a native datastore type. When you call setProperty() on an Entity, the value can be any Collection of a native type. The values are stored in iterator order. When you retrieve the property value with getProperty(), the values are returned in a List:

Entity trophyCase = new Entity("TrophyCase");

List<String> trophyNames = new ArrayList<String>();

trophyNames.add("Goblin Rush Bronze");

trophyNames.add("10-Hut! (built 10 huts)");

trophyNames.add("Moon Landing");

trophyCase.setProperty("trophyNames", trophyNames);

ds.put(trophyCase);

// ...

// Key tcKey = ...

Entity newTrophyCase = ds.get(tcKey);

@SuppressWarnings("unchecked")

List<String> newTrophyNames = (List<String>) trophyCase.getProperty("trophyNames");

To represent a multivalued property with diverse value types, use List<Object>. Storing an unsupported type in such a List is a runtime error.

A property must have at least one value, otherwise the property does not exist. To enforce this, the Java API throws a runtime error if you assign an empty collection to a property. When using the low-level datastore API, your code must handle this as a special case, and represent zero values by not setting (or by deleting) the property.

Remember that List (or Collection) is not a datastore type. It can only contain native types, and not another Collection.

MVPs and Equality Filters

As you’ve seen, when a multivalued property is the subject of an equality filter in a query, the entity matches if any of the property’s values are equal to the filter value:

e1 = Entity()

e1.prop = [ 3.14, 'a', 'b' ]

e1.put()

e2 = Entity()

e2.prop = [ 'a', 1, 6 ]

e2.put()

# Returns e1 but not e2:

q = Entity.gql('WHERE prop = 3.14')

# Returns e2 but not e1:

q = Entity.gql('WHERE prop = 6')

# Returns both e1 and e2:

q = Entity.gql("WHERE prop = 'a'")

Recall that a query with a single equality filter uses an index that contains the keys of every entity of the given kind with the given property and the property values. If an entity has a single value for the property, the index contains one row that represents the entity and the value. If an entity has multiple values for the property, the index contains one row for each value. The index for this example is shown in Figure 6-14.

An index of two entities with multiple values for the “prop” property, with results for WHERE prop=‘a’

Figure 6-14. An index of two entities with multiple values for the “prop” property, with results for WHERE prop=‘a’

This brings us to the first of several odd-looking queries that nonetheless make sense for multivalued properties. Since an equality filter is a membership test, it is possible for multiple equality filters to use the same property with different values and still return a result. An example in GQL:

SELECT * FROM Entity WHERE prop = 'a' AND prop = 'b'

App Engine uses the “merge join” algorithm, described in Multiple Equality Filters for multiple equality filters, to satisfy this query, using the prop single-property index. This query returns the e1 entity because the entity key appears in two places in the index, once for each value requested by the filters.

The way multivalued properties appear in an index gives us another way of thinking about multivalued properties: an entity has one or more properties, each with a name and a single value, and an entity can have multiple properties with the same name. The API represents the values of multiple properties with the same name as a list of values associated with that name.

The datastore does not have a way to query for the exact set of values in a multivalued property. You can use multiple equality filters to test that each of several values belongs to the list, but there is no filter that ensures that those are the only values that belong to the list, or that each value appears only once.

MVPs and Inequality Filters

Just as an equality filter tests that any value of the property is equal to the filter value, an inequality filter tests that any value of the property meets the filter criterion:

e1 = Entity()

e1.prop = [ 1, 3, 5 ]

e1.put()

e2 = Entity()

e2.prop = [ 4, 6, 8 ]

e2.put()

# Returns e1 but not e2:

q = Entity.gql("WHERE prop < 2")

# Returns e2 but not e1:

q = Entity.gql("WHERE prop > 7")

# Returns both e1 and e2:

q = Entity.gql("WHERE prop > 3")

Figure 6-15 shows the index for this example, with the results of prop > 3 highlighted.

An index of two entities with multiple values for the “prop” property, with results for WHERE prop > 3

Figure 6-15. An index of two entities with multiple values for the “prop” property, with results for WHERE prop > 3

In the case of an inequality filter, it’s possible for the index scan to match rows for a single entity multiple times. When this happens, the first occurrence of each key in the index determines the order of the results. If the index used for the query sorts the property in ascending order, the first occurrence is the smallest matching value. For descending, it’s the largest. In the example above, prop > 3 returns e2 before e1 because 4 appears before 5 in the index.

MVPs and Sort Orders

To summarize things we know about how multivalued properties are indexed:

§ A multivalued property appears in an index with one row per value.

§ All rows in an index are sorted by the values, possibly distributing property values for a single entity across the index.

§ The first occurrence of an entity in an index scan determines its place in the result set for a query.

Together, these facts explain what happens when a query orders its results by a multivalued property. When results are sorted by a multivalued property in ascending order, the smallest value for the property determines its location in the results. When results are sorted in descending order, the largest value for the property determines its location.

This has a counterintuitive—but consistent—consequence:

e1 = Entity()

e1.prop = [ 1, 3, 5 ]

e1.put()

e2 = Entity()

e2.prop = [ 2, 3, 4 ]

e2.put()

# Returns e1, e2:

q = Entity.gql("ORDER BY prop ASC")

# Also returns e1, e2:

q = Entity.gql("ORDER BY prop DESC")

Because e1 has both the smallest value and the largest value, it appears first in the result set in ascending order and in descending order. See Figure 6-16.

Indexes of two entities with multiple values for the “prop” property, one ascending and one descending

Figure 6-16. Indexes of two entities with multiple values for the “prop” property, one ascending and one descending

MVPS AND THE QUERY PLANNER

The query planner tries to be smart by ignoring aspects of the query that are redundant or contradictory. For instance, a = 3 AND a = 4 would normally return no results, so the query planner catches those cases and doesn’t bother doing work it doesn’t need to do. However, most of these normalization techniques don’t apply to multivalued properties. In this case, the query could be asking, “Does this MVP have a value that is equal to 3 and another value equal to 4?” The datastore remembers which properties are MVPs (even those that end up with one or zero values), and never takes a shortcut that would produce incorrect results.

But there is one exception. A query that has both an equality filter and a sort order will drop the sort order. If a query asks for a = 3 ORDER BY a DESC and a is a single-value property, the sort order has no effect because all values in the result are identical. For an MVP, however, a = 3 tests for membership, and two MVPs that meet that condition are not necessarily identical.

The datastore drops the sort order in this case anyway. To do otherwise would require too much index data and result in exploding indexes in cases that could otherwise survive. As always, the actual sort order is deterministic, but it won’t be the requested order.

Exploding Indexes

There’s one more thing to know about indexes when considering multivalued properties for your data model.

When an entity has multiple values for a property, each index that includes a column for the property must use multiple rows to represent the entity, one for each possible combination of values. In a single property index on the multivalued property, this is simply one row for each value, two columns each (the entity key and the property value).

In an index of multiple properties where the entity has multiple values for one of the indexed properties and a single value for each of the others, the index includes one row for each value of the multivalued property. Each row has a column for each indexed property, plus the key. The values for the single-value properties are repeated in each row.

Here’s the kicker: if an entity has more than one property with multiple values, and more than one multivalued property appears in an index, the index must contain one row for each combination of values to represent the entity completely.

If you’re not careful, the number of index rows that need to be updated when the entity changes could grow very large. It may be so large that the datastore cannot complete an update of the entity before it reaches its safety limits, and returns an error.

To help prevent “exploding indexes” from causing problems, App Engine limits the number of property values—that is, the number of rows times the number of columns—a single entity can occupy in an index. The limit is 5,000 property values, high enough for normal use, but low enough to prevent unusual index sizes from inhibiting updates.

If you do include a multivalued property in a custom index, be careful about the possibility of exploding indexes.

Query Cursors

A query often has more results than you want to process in a single action. A message board may have thousands of messages, but it isn’t useful to show a user thousands of messages on one screen. Better would be to show the user a dozen messages at a time, and let the user decide when to look at more, such as by clicking a “next” link, or scrolling to the bottom of the display.

A query cursor is like a bookmark in a list of query results. After fetching some results, you can ask the query API for the cursor that represents the spot immediately after the last result you fetched. When you perform the query again at a later time, you can include the cursor value, and the next result fetched will be the one at the spot where the cursor was generated.

Cursors are fast. Unlike with the “offset” fetch parameter, the datastore does not have to scan from the beginning of the results to find the cursor location. The time to fetch results starting from a cursor is proportional to the number of results fetched. This makes cursors ideal for paginated displays of items.

The following is code for a simple paginated display, in Python:

import jinja2

import webapp2

from google.appengine.ext import db

template_env = jinja2.Environment(

loader=jinja2.FileSystemLoader(os.getcwd())

PAGE_SIZE = 10

class Message(db.Model):

create_date = db.DateTimeProperty(auto_now_add=True)

# ...

class ResultsPageHandler(webapp2.RequestHandler):

def get(self):

query = Message.all().order('-create_date')

cursor = self.request.get('c', None)

if cursor:

query.with_cursor(cursor)

results = query.fetch(PAGE_SIZE)

new_cursor = query.cursor()

query.with_cursor(new_cursor)

has_more_results = query.count(1) == 1

template = template_env.get_template('results.html')

context = {

'results': results,

}

if has_more_results:

context['next_cursor'] = new_cursor

self.response.out.write(template.render(context)

app = webapp2.WSGIApplication([

('/results', ResultsPageHandler),

# ...

],

debug=True)

The results.html template is as follows. Note the Next link in particular:

<html><body>

{% if results %}

<p>Messages:</p>

<ul>

{% for result in results %}

<li>{{ result.key().id() }}: {{ result.create_date }}</li>

{% endfor %}

</ul>

{% if next_cursor %}

<p><a href="/results?c={{ next_cursor }}">Next</a></p>

{% endif %}

{% else %}

<p>There are no messages.</p>

{% endif %}

</body></html>

This example displays all of the Message entities in the datastore, in reverse chronological order, in pages of 10 messages per page. If there are more results after the last result displayed, the app shows a “Next” link back to the request handler with the c parameter set to the cursor pointing to the spot after the last-fetched result. This causes the next set of results fetched to start where the previous page left off.

A cursor is a base64-encoded string. It is safe to use as a query parameter this way: it cannot be corrupted by the user to fetch results outside of the expected query range. Note that, like the string form of datastore keys, the base64-encoded value can be decoded to reveal the names of kinds and properties, so you may wish to further encrypt or hide the value if these are sensitive.

The Python API shown here is simple: c = q.cursor() returns the cursor after the last-fetched result for the Query object. q.with_cursor(c) sets a parameter on the query to start fetching results at the given cursor. We’ll review the complete Python and Java APIs in the next two sections.

This example uses a trick to determine whether there are any results after the cursor, so the template can hide the “Next” link if there are no further results. After establishing the cursor at the point after the last-fetched result, we immediately reinitialize the query with that cursor. Then we attempt to perform a count of one element after that point in the results:

new_cursor = query.cursor()

query.with_cursor(new_cursor)

has_more_results = query.count(1) == 1

This illustrates the “bookmark” nature of cursors: using a cursor does not move the cursor. The cursor just identifies a spot in the results.

TIP

This example shows how to use cursors to set up a “Next” link in a paginated display. Setting up a “Previous” link is left as an exercise for the reader. Hint: the cursor used to display the current page, if any, is the one the next page needs for the link.

A cursor is only valid for the query used to generate the cursor value. All query parameters, including kinds, filters, sort orders, and whether or not the query is keys-only, must be identical to the query used to generate the cursor.

A cursor remains valid over time, even if results are added or removed above or below the cursor in the index. A structural change to the indexes used to satisfy the query invalidates cursors for that query. This includes adding or removing fields of a custom index, or switching between built-in indexes and custom indexes for a query. These only occur as a result of your updating your index configuration. Rare internal changes to datastore indexes may also invalidate cursors. Using a cursor that is invalid for a given query (for whatever reason) will raise an exception.

Because a cursor stays valid even after the data changes, you can use cursors to watch for changes in some circumstances. For example, consider a query for entities with creation timestamps, ordered by timestamp. A process traverses the results to the end, and then stores the cursor for the end of the list (with no results after it). When new entities are created with later timestamps, those entities are added to the index after the cursor. Running the query with the cursor pick ups the new results.

The preceding example shows using a cursor to set the start position for fetching results. You can also use a cursor to set an end position for the fetch, instead of fetching a fixed number of results. We’ll see how to do that in the next sections.

NOTE

Conceptually, a cursor is a position in a list of results. Actually, a cursor is a position in the index (or path through multiple indexes) that represents the results for the query. This has a couple of implications.

You can’t use a cursor on a query that uses not-equal (!=) or set membership (IN) queries. These queries are performed by multiple primitive queries behind the scenes, and therefore do not have all results in a single index (or index path).

A query with a cursor may result in a duplicate result if the query operates on a property with multiple values. Recall that an entity with multiple values for a property appears once for each value in the corresponding indexes. When a query gathers results, it ignores repeated results (which are consecutive in the index). Because a cursor represents a point in an index, it is possible for the same entity to appear both before and after the cursor. In this case, the query up to the cursor and the query after the cursor will both contain the entity in their results. It’s a rare edge case—the entity would have to change after the cursor is created—but it’s one to be aware of if you’re using multivalued properties with queries and cursors.

Cursors in Python

To get a cursor for an executed query in Python, call the cursor() method of the Query (or GqlQuery) object. It returns the cursor value as a base64-encoded string:

query = Message.all().order('-create_date')

results = query.fetch(10)

cursor = query.cursor()

When you call cursor() on a query object whose results are being fetched by the iterator interface, this may invoke a datastore RPC call to get the cursor for the last result seen by the iterator. Behind the scenes, the iterator fetches results in batches, and only knows the cursor after the last result in the batch, which may not be the last result generated by the iterator.

There are several ways to fetch results for a query starting at a cursor. One way is to make the start cursor a parameter of the Query, by calling the with_cursor() method:

query.with_cursor(cursor)

You can use this method to set both a start cursor (the point before the first result to fetch) and an end cursor (the point after the last result to fetch), using the start_cursor and end_cursor keyword arguments. You can specify a start_cursor, an end_cursor, or both:

query.with_cursor(start_cursor=start)

query.with_cursor(end_cursor=end)

query.with_cursor(start_cursor=start, end_cursor=end)

You can also pass start_cursor and end_cursor arguments to any of the query execution methods: fetch(), get(), run(), and count():

start_cursor = self.request.get('start', None)

end_cursor = self.request.get('end', none)

results_iterable = query.run(

limit=20,

start_cursor=start_cursor,

end_cursor=end_cursor)

template_context = {

'results': results_iterable,

}

Cursors in Java

As we saw earlier, the Java PreparedQuery object has several methods that return results. The methods asQueryResultIterable(), asQueryResultIterator(), and asQueryResultList() return QueryResultIterable, QueryResultIterator, and QueryResultListinstances, respectively. These three classes provide access to the cursor that points to the location after the last fetched result. The getCursor() method returns a Cursor instance:

import com.google.appengine.api.datastore.Cursor;

import com.google.appengine.api.datastore.Entity;

import com.google.appengine.api.datastore.FetchOptions;

import com.google.appengine.api.datastore.PreparedQuery;

import com.google.appengine.api.datastore.Query;

import com.google.appengine.api.datastore.QueryResultList;

// ...

Query q = new Query("Message").addSort("create_date",

Query.SortDirection.DESCENDING);

PreparedQuery pq = ds.prepare(q);

QueryResultList<Entity> results = pq.asQueryResultList(

FetchOptions.Builder.withLimit(10);

Cursor cursor = results.getCursor();

You can convert the Cursor to a web-safe string by using its toWebSafeString() method, and back again with the static class method fromWebSafeString() method:

String cursorString = req.getParameter("c");

Cursor cursor = null;

if (cursorString != null) {

cursor = Cursor.fromWebSafeString(cursorString);

}

// ...

String newCursorString = cursor.toWebSafeString();

You use cursors with the query execution methods via FetchOptions. You can specify a startCursor(), an endCursor(), or both:

QueryResultList<Entity> results = pq.asQueryResultList(

FetchOptions.Builder.withLimit(10)

.startCursor(cursor);

Projection Queries

We’ve described the datastore as an object store. You create an entity with all of its properties. When you fetch an entity by key, the entire entity and all its properties come back. When you want to update a property of an entity, you must fetch the complete entity, make the change to the object locally, then save the entire object back to the datastore.

The types of queries we’ve seen so far reflect this reality as well. You can either fetch entire entities that match query criteria:

SELECT * FROM Kind WHERE ...

Or you can fetch just the keys:

SELECT __key__ FROM Kind WHERE ...

And really, an entity query is just a key query that fetches the entities for those keys. The query criteria drive a scan of an index (or indexes) containing property values and keys, and the keys are returned by the scan.

But sometimes you only need to know one or two properties of an entity, and it’s wasteful to retrieve an entire entity just to get at those properties. For times like these, App Engine has another trick up its sleeve: projection queries. Projection queries let you request specific properties of entities, instead of full entities, under certain conditions:

SELECT prop1, prop2 FROM Kind WHERE ...

The entity objects that come back have only the requested properties (known as the “projected properties”) and their keys set. Only the requested data is returned by the datastore service.

The idea of projection queries is based on how indexes are used to resolve queries. While a normal query uses indexes of keys and property values to look up keys then fetch the corresponding entities, projection queries take the requested values directly from the indexes themselves.

Every projection query requires a custom index of the properties involved in the query. The development server adds index configuration for each combination of projected properties, kind, and other query criteria.

Projection queries tend to be faster than full queries, for several reasons. Result entities are not fetched separately after a key lookup. Instead, the result data comes directly from the index. And naturally, less data means less RPC communication. The entire procedure is to find the first row of the custom index and scan to the last row, returning the columns for the projected properties.

Several restrictions fall out of this trick, and they’re fairly intuitive. Only indexed properties can be in a projection. Another way of saying this is, only entities with all the projected properties set to an indexable value can be a result for a projection query. A projection query needs a custom index even if the equivalent full-entity query does not.

There’s another weird behavior we need to note here, and yes, once again it involves multivalued properties. As we saw earlier, if an entity contains multiple values for a property, an index containing that property contains a row for each value (and a row for each combination of values if you’re indexing more than one multivalued property), repeating each of the single-valued properties in the index. Unlike an entity or key query, a projection query makes no attempt to de-duplicate the result list in this case. Instead, a projection on a multivalued property returns a separate result for each row in the index. Each result contains just the values of the properties on that row.

TIP

Each projection requires its own custom index. If you have two queries with the same kind and filter structures, but different projected properties, the queries require two custom indexes:

SELECT prop1, prop2 FROM Kind ...

SELECT prop1, prop3 FROM Kind ...

In this case, you can save space and time-to-update by using the same projection in both cases, and ignoring the unused properties:

SELECT prop1, prop2, prop3 FROM Kind ...

The SQL-like syntax we’re using is indeed the GQL syntax for projection queries. You can perform projection queries in the Administration Console (for both Python and Java apps). Remember that, as with other queries, the app must already have the appropriate index configured for the query to succeed in the Console.

Projection Queries in Python

In the Python interface, you request a projection query by listing the properties for the projection. In the all() model class method and the db.Query() constructor, you provide an iterable of property names as the projection argument:

q = MyModel.all(projection=('prop1', 'prop2')

q = db.Query(MyModel, projection=('prop1', 'prop2')

You can also use the GQL syntax for projection queries with the db.GqlQuery() constructor:

q = db.GqlQuery('SELECT prop1, prop2 FROM MyModel')

The gql() model class method only supports entity queries, and does not support projection queries.

The results of a projection query are instances of the model class with only the projected properties set. The model class is aware that the result is from a projection query, and alters its behavior slightly to accommodate. Specifically, it will allow a property modeled as required=True to be unset if it isn’t one of the projected properties. (We’ll cover property modeling in Chapter 9.) Also, it won’t allow the partial instance to be put() back to the datastore, even if you set all of its properties to make it complete.

You can test whether a model instance is a projection result using the db.model_is_projection(obj) function. The function takes a model instance and returns True if it is the result of a projection query.

As sanity checks, the query API will not let you project a property and use it in an equality filter in the same query. It also won’t allow you to project the same property more than once.

Projection Queries in Java

In Java, you request a projection query by calling the addProjection() method on the Query object for each property in the projection. (If you never call this method, the query is an entity query, and full entities are returned.) The addProjection() method takes aPropertyProjection, whose constructor takes the name of the property and an optional (but recommended) value type class:

import com.google.appengine.api.datastore.PropertyProjection;

// ...

Query query = new Query("Message");

query.addProjection(new PropertyProjection("prop1", String.class);

Each query result is an Entity with only the projected properties set. As usual, calling getProperty() will return the value as an Object, which you can then cast to the expected value type.

If projected properties may be of different types, you can pass null as the second argument of the PropertyProjection constructor. If you do this, the getProperty() method on the result will return the raw value wrapped in a RawValue instance. This wrapper provides several accessor methods with varying amounts of type control. getValue() returns the raw value as an Object; you can use introspection on the result to determine its type. asType() takes a java.lang.Class and returns a castable Object, as if you provided the class to thePropertyProjection constructor. And finally, <T> asStrictType(java.lang.Class<T>) provides an additional level of type control at compile time.

TIP

The Java Persistence API supports projection queries in its own query language, comparable to GQL or field selection in SQL. See Chapter 10 for more information on JPA.

Configuring Indexes

An application specifies the custom indexes it needs in a configuration file. Each index definition includes the kind, and the names and sort orders of the properties to include. A configuration file can contain zero or more index definitions.

Most of the time, you can leave the maintenance of this file to the development web server. The development server watches the queries the application makes, and if a query needs a custom index and that index is not defined in the configuration file, the server adds appropriate configuration automatically.

The development server will not remove index configuration. If you are sure your app no longer needs an index, you can edit the file manually and remove it.

You can disable the automatic index configuration feature. Doing so causes the development server to behave like App Engine: if a query doesn’t have an index and needs one, the query fails. How to do this is particular to the runtime environment, so we’ll get to that in a moment.

Index configuration is global to all versions of your application. All versions of an app share the same datastore, including indexes. If you deploy a version of the app and the index configuration has changed, App Engine will use the new index configuration for all versions.

Index Configuration for Python

For Python apps, the index configuration file is named index.yaml, and is in the YAML format (similar to app.yaml). It appears in the application root directory.

The structure is a single YAML list named indexes, with one element per index. Each index definition has a kind element (the kind name, a string) and a properties element. If the index supports queries with ancestor filters, it has an ancestor element with a value of yes.

properties is a list, one element per column in the index, where each column has a name and an optional direction that is either asc (ascending order, the default) or desc (descending order). The order of the properties list is significant: the index is sorted by the first column first, then by the second column, and so on.

Here’s an example of an index.yaml file, using indexes from earlier in this chapter:

indexes:

- kind: Player

properties:

- name: charclass

- name: level

direction: desc

- kind: Player

properties:

- name: level

direction: desc

- name: score

direction: desc

By default, the development server adds index configuration to this file as needed. When it does, it does so beneath this line, adding it (and a descriptive comment) if it doesn’t find it:

# AUTOGENERATED

You can move index configuration above this line to take manual control over it. This isn’t strictly necessary, since the development server will never delete index configuration, not even that which was added automatically.

To disable automatic index configuration in the development server, start the server with the --require_indexes command-line option. If you are using the Launcher, select the application, then go to the Edit menu and select Application Settings. Add the command-line option to the Extra Flags field and then click Update.

Index Configuration for Java

For Java apps, you add index configuration to a file named datastore-indexes.xml, in the directory WEB-INF/ in the WAR. This is an XML file with a root element named <datastore-indexes>. This contains zero or more <datastore-index> elements, one for each index.

Each <datastore-index> specifies the kind by using the kind attribute. It also has an ancestor attribute, which is true if the index supports queries with ancestor filters, and false otherwise.

A <datastore-index> contains one or more <property> elements, one for each column in the index. Each <property> has a name attribute (the name of the property) and a direction attribute (asc for ascending, desc for descending). The order of the <property> elements is significant: the index is sorted by the first column first, then by the second column, and so on.

An example:

<datastore-indexes autoGenerate="true">

<datastore-index kind="Player" ancestor="false">

<property name="charclass" direction="asc" />

<property name="level" direction="desc" />

</datastore-index>

<datastore-index kind="Player" ancestor="false">

<property name="level" direction="desc" />

<property name="score" direction="desc" />

</datastore-index>

</datastore-indexes>

The <datastore-indexes> root element has an attribute named autoGenerate. If it’s true, or if the app does not have a datastore-indexes.xml file, the Java development server generates new index configuration when needed for a query. If it’s false, the development server behaves like App Engine: if a query needs an index that is not defined, the query fails.

The development server does not modify datastore-indexes.xml. Instead, it generates a separate file named datastore-indexes-auto.xml, in the directory WEB-INF/appengine-generated/. The complete index configuration is the total of the two configuration files.

The Java server will never remove index configuration from the automatic file, so if you need to delete an index, you may need to remove it from the automatic file. You can move configuration from the automatic file to the manual file if that’s easier to manage, such as to check it into a revision control repository.