Datastore Entities - Programming Google App Engine (2012)

Programming Google App Engine

Chapter 5. Datastore Entities

Most scalable web applications use separate systems for handling web requests and for storing data. The request handling system routes each request to one of many servers. The server handles the request without knowledge of other requests going to other servers. Each request handler behaves as if it is stateless, acting solely on the content of the request to produce the response. But most web applications need to maintain state, whether it’s remembering that a customer ordered a product, or just remembering that the user who made the current request is the same user who made an earlier request handled by another server. For this, request handlers must interact with a central database to fetch and update the latest information about the state of the application.

Just as the request handling system distributes web requests across many machines for scaling and robustness, so does the database. But unlike the request handlers, databases are by definition stateful, and this poses a variety of questions. Which server remembers which piece of data? How does the system route a data query to the server or servers that can answer the query? When a client updates data, how long does it take for all servers that know that data to get the latest version, and what does the system return for queries about that data in the meantime? What happens when two clients try to update the same data at the same time? What happens when a server goes down?

As with request handling, Google App Engine manages the scaling and maintenance of data storage automatically. Your application interacts with an abstract model that hides the details of managing and growing a pool of data servers. This model and the service behind it provide answers to the questions of scalable data storage specifically designed for web applications.

App Engine’s abstraction for data is easy to understand, but it is not obvious how to best take advantage of its features. In particular, it is surprisingly different from the kind of database with which most of us are most familiar, the relational database. It’s different enough, in fact, that Google doesn’t call it a “database,” but a “datastore.”

The App Engine datastore is a robust, scalable data storage solution. Your app’s data is stored in several locations by using a best-of-breed consensus protocol (similar to the “Paxos” protocol), making your app’s access to this data resilient to most service failures and all planned downtime. When we discuss queries and transactions, we’ll see how this affects how data is updated. For now, just know that it’s a good thing.

We dedicate the next several chapters to this important subject.

TIP

In 2011–2012, App Engine transitioned from an older datastore infrastructure, known as the “master/slave” (M/S) datastore, to the current one, known as the “high replication” datastore (HR datastore, or HRD). The two architectures differ in how data is updated, but the biggest difference is that the M/S datastore requires scheduled maintenance periods during which data cannot be updated, and is prone to unexpected failures. The HR datastore stays available during scheduled maintenance, and is far more resistant to system failure.

All new App Engine applications use the HR datastore, and the M/S datastore is no longer an option. I only mention it because you’ll read about it in older articles, and may see occasional announcements about maintenance of the M/S datastore. You may also see mentions of a datastore migration tool, which old apps still using the M/S datastore can use to switch to the new HR datastore. In this book, “the datastore” always refers to the HR datastore.

Entities, Keys, and Properties

The App Engine datastore is best understood as an object database. An object in the datastore is known as an entity.

An entity has a key that uniquely identifies the object across the entire system. If you have a key, you can fetch the entity for the key quickly. Keys can be stored as data in entities, such as to create a reference from one entity to another. A key has several parts, some of which we’ll discuss here and some of which we’ll cover later.

One part of the key is the application’s ID, which ensures that nothing else about the key can collide with the entities of any other application. It also ensures that no other app can access your app’s data, and that your app cannot access data for other apps. This feature of keys is automatic.

An important part of the key is the kind. An entity’s kind categorizes the entity for the purposes of queries, and for ensuring the uniqueness of the rest of the key. For example, a shopping cart application might represent each customer order with an entity of the kind “Order.” The application specifies the kind when it creates the entity.

The key also contains an entity ID. This can be an arbitrary string specified by the app, or it can be generated automatically by the datastore. The API calls an entity ID given by the app a key name, and an entity ID generated by the datastore an ID. An entity has either a key name or an ID, but not both.

App-assigned key names are strings, while system-assigned IDs are integers. System-assigned IDs are generally increasing, although they are not guaranteed to be monotonically increasing. If you want a strictly increasing ID, you must maintain this yourself in a transaction. (See Chapter 7.) If you purposefully do not want an increasing ID, such as to avoid exposing data sizes to users, you can either generate your own key name, or allow the system to generate a numeric ID, then encrypt and store it with other data.

Once an entity has been created, its key cannot be changed. This applies to all parts of its key, including the kind and the key name or ID.

The data for the entity is stored in one or more properties. Each property has a name and at least one value. Each value is of one of several supported data types, such as a string, an integer, a date-time, or a null value. We’ll look at property value types in detail later in this chapter.

A property can have multiple values, and each value can be of a different type. As you will see in Multivalued Properties, multivalued properties have unusual behavior, but are quite useful for modeling some kinds of data, and surprisingly efficient.

NOTE

It’s tempting to compare these concepts with similar concepts in relational databases: kinds are tables; entities are rows; properties are fields or columns. That’s a useful comparison, but watch out for differences.

Unlike a table in a relational database, there is no relationship between an entity’s kind and its properties. Two entities of the same kind can have different properties set or not set, and can each have a property of the same name but with values of different types. You can (and often will) enforce a data schema in your own code, and App Engine includes libraries to make this easy, but this is not required by the datastore.

Also unlike relational databases, keys are not properties. You can perform queries on key names just like properties, but you cannot change a key name after the entity has been created.

A relational database cannot store multiple values in a single cell, while an App Engine property can have multiple values.

Introducing the Python Datastore API

In the Python API for the App Engine datastore, Python objects represent datastore entities. The class of the object corresponds to the entity’s kind, where the name of the class is the name of the kind. You define kinds by creating classes that extend one of the provided base classes.

Each attribute of the object corresponds with a property of the entity. To create a new entity in the datastore, you call the class constructor, set attributes on the object, then call a method to save it. To update an existing entity, you call a method that returns the object for the entity (such as via a query), modify its attributes, and then save it.

Example 5-1 defines a class named Book to represent entities of the kind Book. It creates an object of this class by calling the class constructor, and then sets several property values. Finally, it calls the put() method to save the new entity to the datastore. The entity does not exist in the datastore until it is put() for the first time.

Example 5-1. Python code to create an entity of the kind Book

from google.appengine.ext import db

import datetime

class Book(db.Expando):

pass

obj = Book()

obj.title = 'The Grapes of Wrath'

obj.author = 'John Steinbeck'

obj.copyright_year = 1939

obj.author_birthdate = datetime.datetime(1902, 2, 27)

obj.put()

The Book class inherits from the class Expando in App Engine’s db package. The Expando base class says Book objects can have any of their properties assigned any value. The entity “expands” to accommodate new properties as they are assigned to attributes of the object. Python does not require that an object’s member variables be declared in a class definition, and this example takes advantage of this by using an empty class definition—the pass keyword indicates the empty definition—and assigns values to attributes of the object after it is created. The Expando base class knows to use the object’s attributes as the values of the corresponding entity’s properties.

The Expando class has a funny name because this isn’t the way the API’s designers expect us to create new classes in most cases. Instead, you’re more likely to use the Model base class with a class definition that ensures each instance conforms to a structure, so a mistake in the code doesn’t accidentally create entities with malformed properties. Here is how we might implement the Book class using Model:

class Book(db.Model):

title = db.StringProperty()

author = db.StringProperty()

copyright_year = db.IntegerProperty()

author_birthdate = db.DateTimeProperty()

The Model version of Book specifies a structure for Book objects that is enforced while the object is being manipulated. It ensures that values assigned to an object’s properties are of appropriate types, such as string values for title and author properties, and raises a runtime error if the app attempts to assign a value of the wrong type to a property. With Model as the base class, the object does not “expand” to accommodate other entities: an attempt to assign a value to a property not mentioned in the class definition raises a runtime error. Model and the various Propertydefinitions also provide other features for managing the structure of your data, such as automatic values, required values, and the ability to add your own validation and serialization logic.

It’s important to notice that these validation features are provided by the Model class and your application code, not the datastore. Even if part of your app uses a Model class to ensure a property’s value meets certain conditions, another part of your app can still retrieve the entity without using the class and do whatever it likes to that value. The bad value won’t raise an error until the app tries to load the changed entity into a new instance of the Model class. This is both a feature and a burden: your app can manage entities flexibly and enforce structure where needed, but it must also be careful when those structures need to change. Data modeling and the Model class are discussed in detail in Chapter 9.

The Book constructor accepts initial values for the object’s properties as keyword arguments. The constructor code earlier could also be written like this:

obj = Book(title='The Grapes of Wrath',

author='John Steinbeck',

copyright_year=1939,

author_birthdate=datetime.datetime(1902, 2, 27)

As written, this code does not set a key name for the new entity. Without a key name, the datastore generates a unique ID when the object is saved for the first time. If you prefer to use a key name generated by the app, you call the constructor with the key_name parameter:

obj = Book(key_name='0143039431',

title='The Grapes of Wrath',

author='John Steinbeck',

copyright_year=1939,

author_birthdate=datetime.datetime(1902, 2, 27)

WARNING

Because the Python API uses keyword arguments, object attributes, and object methods for purposes besides entity properties, there are several property names that are off-limits. For instance, you cannot use the Python API to set a property named key_name, because this could get confused with the key_name parameter for the object constructor. Names reserved by the Python API are enforced in the API, but not in the datastore itself. Google’s official documentation lists the reserved property names.

The datastore reserves all property names beginning and ending with two underscores (such as __internal__). This is true for the Python API and the Java API, and will be true for future APIs as well.

The Python API ignores all object attributes whose names begin with a single underscore (such as _counter). You can use such attributes to attach data and functionality to an object that should not be saved as properties for the entity.

The complete key of an entity, including the key name and kind, must be unique. (We’ll discuss another part to keys that contributes to a key’s uniqueness, called ancestors, in Chapter 7.) If you build a new object with a key that is already in use, and then try to save it, the save will replace the existing object. When you don’t want to overwrite existing data, you can use a system-assigned ID in the key, or you can use a transaction to test for the existence of an entity with a given key and create it if it doesn’t exist.

The Python API provides a shortcut for creating entities with app-assigned key names. The get_or_insert() class method takes a key name and either returns an existing entity with that key name, or creates a new entity with that key name and no properties and returns it. Either way, the method is guaranteed to return an object that represents an entity in the datastore:

obj = Book.get_or_insert('0143039431')

if obj.title:

# Book already exists.

# ...

else:

obj.title = 'The Grapes of Wrath'

obj.author = 'John Steinbeck'

obj.copyright_year = 1939

obj.author_birthdate = datetime.datetime(1902, 2, 27)

obj.put()

TIP

The Python datastore code shown in this book uses the ext.db library, provided in the App Engine SDK. The App Engine team recently added a new Python datastore library to the SDK, called NDB (ext.ndb). NDB is similar to ext.db, but adds powerful features for structured data, automatic caching, and efficient use of service calls. See the App Engine website for more information on NDB.

Introducing the Java Datastore API

App Engine for Java includes support for two major standard interfaces for databases: Java Data Objects (JDO) and the Java Persistence API (JPA). Like the other standards-based interfaces in the App Engine Java API, using one of these interfaces makes it easier to move your application from and to another platform. JDO and JPA support different kinds of databases, including object databases and relational databases. They provide an object-oriented interface to your data, even if the underlying database is not an object store.

Many of the concepts of these interfaces translate directly to App Engine datastore concepts: classes are kinds, objects are entities, fields are properties. App Engine’s implementation also supports several advanced features of these interfaces, such as object relationships. Inevitably, some concepts do not translate directly and have behaviors that are specific to App Engine.

We’ll discuss one of these interfaces, JPA, in Chapter 10. For now, here is a simple example of a data class using JPA:

import java.util.Date;

import javax.persistence.Entity;

import javax.persistence.GeneratedValue;

import javax.persistence.GenerationType;

import javax.persistence.Id;

@Entity

public class Book {

@Id

@GeneratedValue(strategy = GenerationType.IDENTITY)

private Long id;

private String title;

private String author;

private int copyrightYear;

private Date authorBirthdate;

public Long getId() {

return id;

}

public String getTitle() {

return title;

}

public void setTitle(String title) {

this.title = title;

}

public String getAuthor() {

return author;

}

public void setAuthor(String author) {

this.author = author;

}

public int getCopyrightYear() {

return copyrightYear;

}

public void setCopyrightYear(int copyrightYear) {

this.copyrightYear = copyrightYear;

}

public Date getAuthorBirthdate() {

return authorBirthdate;

}

public void setAuthorBirthdate(Date authorBirthdate) {

this.authorBirthdate = authorBirthdate;

}

}

The JDO and JPA implementations are built on top of a low-level API for the App Engine datastore. The low-level API exposes all of the datastore’s features, and corresponds directly to datastore concepts. For instance, you must use the low-level API to manipulate entities with properties of unknown names or value types. You can also use the low-level API directly in your applications, or use it to implement your own data management layer.

The following code creates a Book entity by using the low-level API:

import java.io.IOException;

import java.util.Calendar;

import java.util.Date;

import java.util.GregorianCalendar;

import javax.servlet.http.HttpServlet;

import javax.servlet.http.HttpServletRequest;

import javax.servlet.http.HttpServletResponse;

import com.google.appengine.api.datastore.DatastoreService;

import com.google.appengine.api.datastore.DatastoreServiceFactory;

import com.google.appengine.api.datastore.Entity;

// ...

DatastoreService ds = DatastoreServiceFactory.getDatastoreService();

Entity book = new Entity("Book");

book.setProperty("title", "The Grapes of Wrath");

book.setProperty("author", "John Steinbeck");

book.setProperty("copyrightYear", 1939);

Date authorBirthdate =

new GregorianCalendar(1902, Calendar.FEBRUARY, 27).getTime();

book.setProperty("authorBirthdate", authorBirthdate);

ds.put(book);

// ...

Notice that the application code, not the datastore, is responsible for managing the structure of the data. JDO and JPA impose this structure by using classes whose fields are persisted to the datastore behind the scenes. This can be both a benefit and a burden when you need to change the structure of existing data.

To illustrate the datastore concepts, we will use the low-level API for Java examples in the next few chapters. In Chapter 10, we reintroduce JPA, and discuss how JPA concepts correspond with App Engine concepts. For more information on the Java Data Objects interface, see the official App Engine documentation.

TIP

If you’d prefer object-oriented management of datastore entities but do not need to use JPA for portability purposes, consider Objectify, a third-party open source project. Objectify is specific to App Engine, and supports most of the features of the low-level API. See the Objectify website for more information:

http://code.google.com/p/objectify-appengine/

Property Values

Each value data type supported by the datastore is represented by a primitive type in the language for the runtime or a class provided by the API. The data types and their language-specific equivalents are listed in Table 5-1. In this table, db is the Python packagegoogle.appengine.ext.db, and datastore is the Java package com.google.appengine.api.datastore.

Table 5-1. Datastore property value types and equivalent language types

Data type

Python type

Java type

Unicode text string (up to 500 bytes, indexed)

unicode or str (converted to unicode as ASCII)

java.lang.String

Long Unicode text string (not indexed)

db.Text

datastore.Text

Short byte string (up to 500 bytes, indexed)

db.ByteString

datastore.ShortBlob

Long byte string (not indexed)

db.Blob

datastore.Blob

Boolean

bool

boolean

Integer (64-bit)

int or long (converted to 64-bit long)

byte, short, int, or long (converted to long)

Float (double precision)

float

float or double (converted to double)

Date-time

datetime.datetime

java.util.Date

Null value

None

null

Entity key

db.Key

datastore.Key

A Google account

users.User

...api.users.User

A category (GD)

db.Category

datastore.Category

A URL (GD)

db.Link

datastore.Link

An email address (GD)

db.Email

datastore.Email

A geographical point (GD)

db.GeoPt

datastore.GeoPt

An instant messaging handle (GD)

db.IM

datastore.IMHandle

A phone number (GD)

db.PhoneNumber

datastore.PhoneNumber

A postal address (GD)

db.PostalAddress

datastore.PostalAddress

A user rating (GD)

db.Rating

datastore.Rating

A Blobstore key

ext.blobstore.BlobKey

blobstore.BlobKey

The datastore types in this table labeled “(GD)” are types borrowed from the Google Data protocol. These are supported as distinct native data types in the datastore, although most of them are implemented as text strings. Notable exceptions are GeoPt, which is a pair of floating-point values for latitude (–90 to +90) and longitude (–180 to +180), and Rating, which is an integer between 1 and 100.

Blobstore keys refer to values in the Blobstore. See Chapter 12 for more information.

Example 5-2 demonstrates the use of several of these data types in Python.

Example 5-2. Python code to set property values of various types

from google.appengine.ext import webapp

from google.appengine.ext import db

from google.appengine.api import users

import datetime

class Comment(db.Expando):

pass

class CommentHandler(webapp.RequestHandler):

def post(self):

c = Comment()

c.commenter = users.get_current_user() # returns a users.User object

c.message = db.Text(self.request.get('message')

c.date = datetime.datetime.now()

c.put()

# Display the result page...

TIP

When you use Python’s db.Expando or Java’s low-level datastore API, types that are widened to other types when stored come back as the wider datastore types when you retrieve the entity. For instance, a Java Integer comes back as a Long. If you use these APIs in your app, it’s best to use the native datastore types, so the value types stay consistent.

The data modeling interfaces offer a way to store values in these alternate types and convert them back automatically when retrieving the entity. See Chapters 9 and 10.

Strings, Text, and Blobs

The datastore has two distinct data types for storing strings of text: short strings and long strings. Short strings are indexed; that is, they can be the subject of queries, such as a search for every Person entity with a given value for a last_name property. Short string values must be less than 500 bytes in length. Long strings can be longer than 500 bytes, but are not indexed.

Text strings, short and long, are strings of characters from the Unicode character set. Internally, the datastore stores Unicode strings by using the UTF-8 encoding, which represents some characters using multiple bytes. This means that the 500-byte limit for short strings is not necessarily the same as 500 Unicode characters. The actual limit on the number of characters depends on which characters are in the string.

The Python API distinguishes between short strings and long strings, using Python data types. The Python built-in types unicode and str represent short string values. str values are assumed to be text encoded as ASCII, and are treated as UTF-8 (which is equivalent to ASCII for the first 128 characters in the character set). For long strings, the Python API includes a db.Text class, which takes a unicode or str value as an argument for its constructor:

# Short strings.

e.prop = "a short string, as an ASCII str"

e.prop = unicode("a short string, as a unicode value")

# A long string.

e.prop = db.Text("a long string, can be longer than 500 bytes")

The Java API makes a similar distinction, treating String values as short strings, and using the datastore.Text class to represent long text strings.

The datastore also supports two additional classes for strings of bytes, or “blobs.” Blobs are not assumed to be of any particular format, and their bytes are preserved. This makes them good for nontext data, such as images, movies, or other media. As with text strings, the blob types come in indexed and nonindexed varieties. The Python API provides the db.Blob class to represent blob values, which takes a str value as an argument for its constructor:

# A blob. self.request.body is the body of the request in a

# webapp request handler, such as an uploaded file.

e.prop = db.Blob(self.request.body)

In Java, the blob types are datastore.ShortBlob and datastore.Blob.

Unset Versus the Null Value

One possible value of a property is the null value. In Python, the null value is represented by the Python built-in value None. In Java, this value is null.

A property with the null value is not the same as an unset property. Consider the following Python code:

class Entity(db.Expando):

pass

a = Entity()

a.prop1 = 'abc'

a.prop2 = None

a.put()

b = Entity()

b.prop1 = 'def'

b.put()

This creates two entities of the kind Entity. Both entities have a property named prop1. The first entity has a property named prop2; the second does not.

Of course, an unset property can be set later:

b.prop2 = 123

b.put()

# b now has a property named "prop2."

Similarly, a set property can be made unset. In the Python API, you delete the property by deleting the attribute from the object, using the del keyword:

del b.prop2

b.put()

# b no longer has a property named "prop2."

In Java, the low-level datastore API’s Entity class has methods to set properties (setProperty() and unset properties (removeProperty().

Multivalued Properties

As we mentioned earlier, a property can have multiple values. We’ll discuss the more substantial aspects of multivalued properties when we talk about queries and data modeling. But for now, it’s worth a brief mention.

A property can have one or more values. A property cannot have zero values; a property without a value is simply unset. Each value for a property can be of a different type, and can be the null value.

The datastore preserves the order of values as they are assigned. The Python API returns the values in the same order as they were set.

In Python, a property with multiple values is represented as a single Python list value:

e.prop = [1, 2, 'a', None, 'b']

NOTE

Because a property must have at least one value, it is an error to assign an empty list ([] in Python) to a property on an entity whose Python class is based on the Expando class:

class Entity(db.Expando):

pass

e = Entity()

e.prop = [] # ERROR

In contrast, the Model base class includes a feature that automatically translates between the empty list value and “no property set.” You’ll see this feature in Chapter 9.

In the Java low-level datastore API, you can store multiple values for a property by using a Collection type. The low-level API returns the values as a java.util.List. The items are stored in the order provided by the Collection type’s iterator. For many types, such as SortedSetor TreeSet, this order is deterministic. For others, such as HashSet, it is not. If the app needs the original data structure, it must convert the List returned by the datastore to the appropriate type.

Keys and Key Objects

The key for an entity is a value that can be retrieved, passed around, and stored like any other value. If you have the key for an entity, you can retrieve the entity from the datastore quickly, much more quickly than with a datastore query. Keys can be stored as property values, as an easy way for one entity to refer to another.

The Python API represents an entity key value as an instance of the Key class, in the db package. To get the key for an entity, you call the entity object’s key() method. The Key instance provides access to its several parts by using accessor methods, including the kind, key name (if any), and system-assigned ID (if the entity does not have a key name).

The Java low-level API is similar: the getKey() method of the Entity class returns an instance of the Key class.

When you construct a new entity object and do not provide a key name, the entity object has a key, but the key does not yet have an ID. The ID is populated when the entity object is saved to the datastore for the first time. You can get the key object prior to saving the object, but it will be incomplete:

e = Entity()

e.prop = 123

k = e.key() # key is incomplete, has neither key name nor ID

kind = k.kind() # 'Entity'

e.put() # ID is assigned

k = e.key() # key is complete, has ID

id = k.id() # the system-assigned ID

If the entity object was constructed with a key name, the key is complete before the object is saved—although, if the entity has not been saved, the key name is not guaranteed to be unique. (In Python, the entity class method get_or_insert(), mentioned earlier, always returns a saved entity, either one that was saved previously or a new one created by the call.)

You can test whether a key is complete by using a method on the Key object. In Python, this is the has_id_or_name() method. The id_or_name() method returns either the object’s key name or its ID, whichever one it has.

In Java, you can call isComplete() to test the Key for completeness, and getId() or getName() to get the numeric ID or the string name.

Once you have a complete key, you can assign it as a property value on another entity to create a reference:

e2 = Entity()

e2.ref = k

e2.put()

If you know the kind and either the key name or ID of an entity in the datastore, you can construct the key for that entity without its object. In Python, you use the from_path() class method of the Key class. A complete explanation of this feature involves another feature we haven’t mentioned yet (ancestor paths), but the following suffices for the examples you’ve seen so far:

e = Entity(key_name='alphabeta')

e.prop = 123

e.put()

# ...

k = db.Key.from_path('Entity', 'alphabeta')

In Java, you can build a Key object from parts using KeyFactory. The static method KeyFactory.createKey() takes the kind and the ID or name as arguments:

Key k = KeyFactory.createKey("Entity", "alphabeta");

Ancestor paths are related to how the datastore does transactions. We’ll get to them in Chapter 7. For the entities we have created so far, the path is just the kind followed by the ID or name.

Keys can be converted to string representations for the purposes of passing around as textual data, such as in a web form or cookie. The string representation avoids characters considered special in HTML or URLs, so it is safe to use without escaping characters. The encoding of the value to a string is simple and easily reversed, so if you expose the string value to users, be sure to encrypt it, or make sure all key parts (such as kind names) are not secret. When accepting an encoded key string from a client, always validate the key before using it.

To convert between a key object and an encoded key string in Python:

k_str = str(k)

# ...

k = db.Key(k_str)

And in Java:

String k_str = KeyFactory.keyToString(k);

// ...

Key k = KeyFactory.stringToKey(k_str);

The Java Key class’s toString() method does not return the key’s string encoding. You must use KeyFactory.keyToString() to get the string encoding of a key.

Using Entities

Let’s look briefly at how to retrieve entities from the datastore by using keys, how to inspect the contents of entities, and how to update and delete entities. The API methods for these features are straightforward.

Getting Entities Using Keys

Given a complete key for an entity, you can retrieve the entity from the datastore.

In the Python API, you can call the get() function in the db package with the Key object as an argument:

from google.appengine.ext import db

k = db.Key.from_path('Entity', 'alphabeta')

e = db.get(k)

If you know the kind of the entity you are fetching, you can also use the get() class method on the appropriate entity class. This does a bit of type checking, ensuring that the key you provide is of the appropriate kind:

class Entity(db.Expando):

pass

e = Entity.get(k)

To fetch multiple entities in a batch, you can pass the keys to get() as a list. Given a list, the method returns a list containing entity objects, with None values for keys that do not have a corresponding entity in the datastore:

entities = db.get([k1, k2, k3])

Getting a batch of entities in this way performs a single service call to the datastore for the entire batch. This is faster than getting each entity in a separate call. The result of a batch get is unlimited.

For convenience, entity classes include methods that take just the IDs or key names and retrieve the corresponding entities, inferring the kind from the class name. See get_by_id() and get_by_key_name() in the official reference documentation.

In the Java low-level API, you get an entity by its key, using a DatastoreService instance (returned by DatastoreServiceFactory.getDatastoreService(). The instance provides a get() method that takes a Key for a single entity get, or an Iterable<Key> for a batch get. If given an iterable of keys, get() returns a Map of Key to Entity:

DatastoreService ds = DatastoreServiceFactory.getDatastoreService();

Map<Key, Entity> entities = ds.get(new ArrayList(Arrays.asList(k1, k2, k3));

Entity e1 = entities.get(k1);

Of course, you won’t always have the keys for the entities you want to fetch from the datastore. To retrieve entities that meet other criteria, you use datastore queries. Queries are discussed in Chapter 6.

Inspecting Entity Objects

Entity objects have methods for inspecting various aspects of the entity.

In the Java API, the methods of the Entity class provide straightforward access to the key (getKey() and kind (getKind() of the entity. The getProperty() method returns the value of a property given its name. The hasProperty() method tests whether a property is set.setProperty() takes a name and a value and sets the property, replacing any existing value.

The Python API has several features for inspecting entities worth mentioning here. You’ve already seen the key() method of an entity object, which returns the db.Key.

The is_saved() method returns False if the object has not been saved to the datastore since the object was constructed. If the object has been saved since it was constructed, or if the object was retrieved from the datastore, the method returns True. The method continues to return Trueeven if the object’s properties have been modified, so do not rely on this method to track changes to properties of previously saved entities:

e = Entity()

# e.is_saved() == False

e.put()

# e.is_saved() == True

In Java, you can tell if an entity with a system-assigned ID has been saved by calling the isComplete() method of its Key (entity.getKey().isComplete(). A key is complete if it has either a name or a system-assigned ID. A new entity created without a key name has an incomplete key if it has not been saved; saving it populates the system ID and completes the key.

In Python, entity properties can be accessed and modified just like object attributes:

e.prop1 = 1

e.prop2 = 'two'

print 'prop2 has the value ' + e.prop2

You can use Python built-in functions for accessing object attributes to access entity properties. For instance, to test that an entity has a property with a given name, use the hasattr() built-in:

if hasattr(e, 'prop1'):

# ...

To get or set a property whose name is defined in a string, use getattr() and setattr(), respectively:

# Set prop1, prop2, ..., prop9.

for n in range(1, 10):

value = n * n

setattr(e, 'prop' + str(n), value)

value = getattr(e, 'prop' + str(7)

While entity objects support accessing properties by using these methods, the objects do not actually store property values as object attributes. For instance, you cannot use Python’s dir() built-in to get a list of an entity’s properties. Instead, entity objects provide their own method,instance_properties(), for this purpose:

for name in e.instance_properties():

value = getattr(e, name)

Saving Entities

In Python, calling the put() method on an entity object saves the entity to the datastore. If the entity does not yet exist in the datastore, put() creates the entity. If the entity exists, put() updates the entity so that it matches the object:

e = Entity()

e.prop = 123

e.put()

When you update an entity, the app sends the complete contents of the entity to the datastore. The update is all or nothing: there is no way to send just the properties that have changed to the datastore. There is also no way to update a property on an entity without retrieving the complete entity, making the change, and then sending the new entity back.

You use the same API to create an entity as you do to update an entity. The datastore does not make a distinction between creates and updates. If you save an entity with a complete key (such as a key with a kind and a key name) and an entity already exists with that key, the datastore replaces the existing entity with the new one.

TIP

If you want to test that an entity with a given key does not exist before you create it, you can do so using a transaction. You must use a transaction to ensure that another process doesn’t create an entity with that key after you test for it and before you create it. For more information on transactions, see Chapter 7.

If you have several entity objects to save, you can save them all in one call using the put() function in the db package. The put() function can also take a single entity object:

db.put(e)

db.put([e1, e2, e3])

In Java, you can save entities by using the put() method of a DatastoreService instance. As with get(), the method takes a single Entity for a single put, or an Iterable<Entity> for a batch put.

When the call to put() returns, the datastore is up-to-date, and all future queries in the current request handler and other handlers will see the new data. The specifics of how the datastore gets updated are discussed in detail in Chapter 7.

Deleting Entities

Deleting entities works similarly to putting entities. In Python, you can call the delete() method on the entity object, or you can pass entity objects or Key objects to the delete() function:

e = db.get('Entity', 'alphabeta')

e.delete()

db.delete(e)

db.delete([e1, e2, e3])

# Deleting without first fetching the entity:

k = db.Key('Entity', 'alphabeta')

db.delete(k)

In Java, you call the delete() method of the DatastoreService with either a single Key or an Iterable<Key>.

As with gets and puts, a delete of multiple entities occurs in a single batch call to the service, and is faster than making multiple service calls. Delete calls only send the keys to the service, even if you pass entire entities to the function.

Allocating System IDs

When you create a new entity without specifying an explicit key name, the datastore assigns a numeric system ID to the entity. Your code can read this system ID from the entity’s key after the entity has been created.

Sometimes you want the system to assign the ID, but you need to know what ID will be assigned before the entity is created. For example, say you are creating two entities, and the property of one entity must be set to the key of the other entity. One option is to save the first entity to the datastore, then read the key of the entity, set the property on the second entity, and then save the second entity:

class Entity(db.Expando):

pass

e1 = Entity()

e1.put()

e2 = Entity()

e2.reference = e1.key()

e2.put()

This requires two separate calls to the datastore in sequence, which takes valuable clock time. It also requires a period of time where the first entity is in the datastore but the second entity isn’t.

We can’t read the key of the first entity before we save it, because it is incomplete: calling e1.key() before e1.put() would return an unusable value. We could use a key name instead of a system ID, giving us a complete key, but it’s often the case that we can’t easily calculate a unique key name, which is why we’d rather have a system-assigned ID.

To solve this problem, the datastore provides a method to allocate system IDs ahead of creating entities. You call the datastore to allocate an ID (or a range of IDs for multiple entities), then create the entity with an explicit ID. Note that this is not the same as using a key name string: you give the entity the allocated numeric ID, and it knows the ID came from the system.

In Python, you call the db.allocate_ids() function. The first argument is either a key or an instance of the corresponding (path and) kind for which the IDs are intended. The second argument is the number of IDs to allocate. The function returns a list of numeric IDs. After the IDs are allocated, the system will not assign those IDs to any entity of the given (path and) kind. To use an allocated ID, you construct a Key object with it, then construct the entity instance, using the key argument, like so:

# Allocate 1 system ID for entities of kind "Entity".

# The "0" in the representative key is ignored.

ids = db.allocate_ids(db.Key.from_path('Entity', 0), 1)

# Make a key of kind Entity with the allocated system ID.

e1_key = db.Key.from_path('Entity', ids[0])

e1 = Entity(key=e1_key)

e2 = Entity()

e2.reference = e1_key

db.put([e1, e2])

In Java, you call the allocateIds() service method. It takes a kind (as a String) and the number of IDs to allocate. If the new key will have an ancestor in its path, the parent Key must be the first argument. The method returns a KeyRange, an iterable object that generates Key objects in the allocated ID range. KeyRange also has a getStart() method, which returns the first Key. To create an entity with a given Key, you provide the Key as the sole argument to the Entity constructor:

import java.util.ArrayList;

import java.util.Arrays;

import com.google.appengine.api.datastore.DatastoreService;

import com.google.appengine.api.datastore.Entity;

import com.google.appengine.api.datastore.Key;

import com.google.appengine.api.datastore.KeyRange;

// ...

// DatastoreService ds = ...;

KeyRange range = ds.allocateIds("Entity", 1);

Key e1Key = range.getStart();

Entity e1 = new Entity(e1Key);

Entity e2 = new Entity("Entity");

e2.setProperty("reference", e1Key);

ds.put(new ArrayList<Entity>(Arrays.asList(e1, e2));

WARNING

A batch put of two entities does not guarantee that both entities are saved together. If your app logic requires that either both entities are saved or neither are saved, you must use a transaction. See Chapter 7. (As you can probably tell by now, that’s an important chapter.)

The Development Server and the Datastore

The development server simulates the datastore service on your local machine while you’re testing your app. All datastore entities are saved to a local file. This file is associated with your app, and persists between runs of the development server, so your test data remains available until you delete it.

The Python development server stores datastore data in a file named dev_appserver.datastore, in a temporary location. You can tell the development server to reset this data when it starts. From the command line, you pass the --clear_datastore argument to dev_appserver.py:

dev_appserver.py --clear_datastore appdir

You can specify an explicit datastore file for the Python development server to use with the --datastore_path=... argument. To see the default location of this file, run dev_appserver.py --help and look for the description of this argument.

In the Python Launcher, you can specify this option in the Application Settings (in the Edit menu). Under Launch Settings, check the box labeled “Clear datastore on launch,” then start the server. Remember to uncheck it again if you do not want it to clear the datastore every time.

The Java development server stores datastore data in a file named local_db.bin in your application’s war/WEB-INF/appengine-generated/ directory. To reset your development datastore, stop your server, delete this file, and then start the server again.