Data Modeling with ndb - Programming Google App Engine with Python (2015)

Programming Google App Engine with Python (2015)

Chapter 9. Data Modeling with ndb

Data modeling is the process of translating the data requirements of your application to the features of your data storage technology. While the application deals in players, towns, weapons, potions, and gold, the datastore knows only entities, entity groups, keys, properties, and indexes. The data model describes how the data is stored and how it is manipulated. Entities represent players and game objects, while properties describe the status of objects and the relationships between them. When an object changes location, the data is updated in a transaction, so the object cannot be in two places at once. When a player wants to know about the weapons in her inventory, the application performs a query for all weapon objects whose location is the player.

In the last few chapters, we’ve been using the Python class ndb.Expando to create and manipulate entities and their properties. As we’ve been doing it, this class illustrates the flexible nature of the datastore. The datastore itself does not impose or enforce a structure on entities or their properties, giving the application control over how individual entities represent data objects. This flexibility is also an essential feature for scalability: changing the structure of millions of records is a large task, and the proper strategy for doing this is specific to the task and the application.

But structure is needed. Every player has a number of health points, and a Player entity without a health property, or with a health property whose value is not an integer, is likely to confuse the battle system. The data ought to conform to a structure, or schema, to meet the expectations of the code. Because the datastore does not enforce this schema itself—the datastore is schemaless—it is up to the application to ensure that entities are created and updated properly.

App Engine includes a data modeling library for defining and enforcing data schemas in Python called ndb.1 This library resides in the google.appengine.ext.ndb package. It includes several related classes for representing data objects, including ndb.Model, ndb.Expando, andndb.PolyModel. To give structure to entities of a given kind, you create a subclass of one of these classes. The definition of the class specifies the properties for those objects, their allowed value types, and other requirements.

In this chapter, we’ll introduce the ndb data modeling library and discuss how to use it to enforce a schema for the otherwise schemaless datastore. We’ll also discuss how the library works and how to extend it. Finally, we’ll look at the ndb library’s powerful performance-related features for managing data caches.

Models and Properties

The ndb.Model superclass lets you specify a structure for every entity of a kind. This structure can include the names of the properties, the types of the values allowed for those properties, whether the property is required or optional, and a default value. Here is a definition of a Book class similar to the one we created in Chapter 6:

from google.appengine.ext import ndb

import datetime

class Book(ndb.Model):

title = ndb.StringProperty(required=True)

author = ndb.StringProperty(required=True)

copyright_year = ndb.IntegerProperty()

author_birthdate = ndb.DateProperty()

obj = Book(title='The Grapes of Wrath',

author='John Steinbeck')

obj.copyright_year = 1939

obj.author_birthdate = datetime.date(1902, 2, 27)

obj.put()

This Book class inherits from ndb.Model. In the class definition, we declare that all Book entities have four properties, and we declare their value types: title and author are strings, copyright_year is an integer, and author_birthdate is a date-time. If someone tries to assign a value of the wrong type to one of these properties, the assignment raises a datastore_errors.BadValueError (from the google.appengine.api.datastore_errors module).

We also declare that title and author are required properties. If someone tries to create a Book without these properties set as arguments to the Book constructor, the attempt will raise a datastore_errors.BadValueError. Both copyright_year and author_birthdate are optional, so we can leave them unset on construction, and assign values to the properties later. If these properties are not set by the time the object is saved, the resulting entity will have a value of None for these properties—and that’s allowed by this model.

A property declaration ensures that the entity created from the object has a value for the property, possibly None. As we’ll see in the next section, you can further specify what values are considered valid using arguments to the property declaration.

A model class that inherits from ndb.Model ignores all attributes that are not declared as properties when it comes time to save the object to the datastore. In the resulting entity, all declared properties are set, and no others.

This is the sole difference between ndb.Model and ndb.Expando. An ndb.Model class ignores undeclared properties. An ndb.Expando class saves all attributes of the object as properties of the corresponding entity. That is, a model using an ndb.Expando class “expands” to accommodate assignments to undeclared properties.

You can use property declarations with ndb.Expando just as with ndb.Model. The result is a data object that validates the values of the declared properties, and accepts any values for additional undeclared properties.

Properties with declarations are sometimes called static properties, while properties on an ndb.Expando without declarations are dynamic properties. These terms have a nice correspondence with the notions of static and dynamic typing in programming languages. Property declarations implement a sort of runtime validated static typing for model classes, on top of Python’s own dynamic typing.

As we’ll see, property declarations are even more powerful than static typing, because they can validate more than just the type of the value.

For both ndb.Model and ndb.Expando, object attributes whose names begin with an underscore (_) are always ignored. You can use these private attributes to attach transient data or functions to model objects.2

Beyond the model definition, ndb.Model and ndb.Expando have the same interface for saving, fetching, and deleting entities, and for performing queries and transactions. ndb.Expando is a subclass of ndb.Model.

Property Declarations

You declare a property for a model by assigning a property declaration object to an attribute of the model class. The name of the attribute is the name of the datastore property. The value is an object that describes the terms of the declaration. The ndb.StringProperty object assigned to the title class attribute says that an instance of the class, and therefore the entity it represents, can only have a string value for its title property. The required=True argument to the ndb.StringProperty constructor says that the object is not valid unless it has a value for the titleproperty.

This can look a little confusing if you’re expecting the class attribute to shine through as an attribute of an instance of the class, as it normally does in Python. Instead, the ndb.Model class hooks into the attribute assignment mechanism so it can use the property declaration to validate a value assigned to an attribute of the object. In Python terms, the model uses property descriptors to enhance the behavior of attribute assignment.

Property declarations act as intermediaries between the application and the datastore. They can ensure that only values that meet certain criteria are assigned to properties. They can assign default values when constructing an object. They can even convert values between a data type used by the application and one of the datastore’s native value types, or otherwise customize how values are stored.

Property Value Types

ndb.StringProperty is an example of a property declaration class. There are several property declaration classes included with the Python SDK, one for each native datastore type. Each one ensures that the property can only be assigned a value of the corresponding type:

class Book(ndb.Model):

title = ndb.StringProperty()

b = Book()

b.title = 99 # datastore_errors.BadValueError, title must be a string

b.title = 'The Grapes of Wrath' # OK

Table 9-1 lists the datastore native value types and their corresponding property declaration classes.

Data type

Python type

Property class

Unicode text string (up to 500 characters, indexed)

unicode or str (converted to unicode as ASCII)

ndb.StringProperty

Long Unicode text string (not indexed)

unicode or str (converted to unicode as ASCII)

ndb.TextProperty

Byte string (up to 500 bytes if indexed)

str

ndb.BlobProperty

Boolean

bool

ndb.BooleanProperty

Integer (64-bit)

long or int (converted to 64-bit long)

ndb.IntegerProperty

Float (double precision)

float

ndb.FloatProperty

Date-time

datetime.datetime

ndb.DateTimeProperty

datetime.date

ndb.DateProperty

datetime.time

ndb.TimeProperty

Entity key

ndb.Key

ndb.KeyProperty

A Google account

users.User

ndb.UserProperty

Table 9-1. Datastore property value types and the corresponding property declaration classes

The ndb library includes several additional property declaration classes that perform special functions. We’ll look at these later in this chapter.

There is one special property declaration class worth mentioning now: ndb.GenericProperty. This declaration accepts any type of value that can be stored directly to the datastore. (Refer back to Table 6-1.) This is useful for declaring properties whose values can be any of multiple types. (This class also comes in handy when forming query expressions based on undeclared properties and ndb.Expando, as we saw in Chapter 7.)

Property Validation

You can customize the behavior of a property declaration by passing arguments to the declaration’s constructor. We’ve already seen one example: the required argument.

All property declaration classes support the required argument. If True, the property is required and must not be None. You must provide a value for each required property when creating a new object. If you attempt to store an entity without setting a required property, it raisesdatastore_errors.BadValueError:

class Book(ndb.Model):

title = ndb.StringProperty(required=True)

b = Book()

b.put() # datastore_errors.BadValueError, title is required

b = Book(title='The Grapes of Wrath')

b.put() # OK

b = Book()

b.title = 'The Grapes of Wrath'

b.put() # OK

The datastore makes a distinction between a property that is not set and a property that is set to the null value (None). Property declarations do not make this distinction, because all declared properties must be set (possibly to None). Unless you say otherwise, the default value for declared properties is None, so the required validator treats the None value as an unspecified property value.

You can change the default value with the default argument. When you create an object without a value for a property that has a default value, the constructor assigns the default value to the property.

A property that is required and has a default value uses the default if constructed without an explicit value. The value can never be None. For example:

class Book(ndb.Model):

rating = ndb.IntegerProperty(default=1)

b = Book() # b.rating == 1

b = Book(rating=5) # b.rating == 5

You can declare that a property should contain only one of a fixed set of values by providing a list of possible values as the choices argument. If None is not one of the choices, this acts as a more restrictive form of required: the property must be set to one of the valid choices using a keyword argument to the constructor. For example:

_KEYS = ['C', 'C min', 'C 7',

'C#', 'C# min', 'C# 7',

# ...

]

class Song(ndb.Model):

song_key = ndb.StringProperty(choices=_KEYS)

s = Song()

s.song_key = 'H min' # datastore_errors.BadValueError

s.song_key = 'C# min' # OK

All of these features validate the value assigned to a property, and raise a datastore_errors.BadValueError if the value does not meet the appropriate conditions. For even greater control over value validation, you can define your own validation function and assign it to a property declaration as the validator argument. The function must take the property declaration object and the value as arguments, and either raise an exception if the value should not be allowed, or return the value to use:

def is_recent_year(prop, val):

if val < 1923:

raise datastore_errors.BadValueError

return val

class Book(ndb.Model):

copyright_year = ndb.IntegerProperty(validator=is_recent_year)

b = Book(copyright_year=1922) # datastore_errors.BadValueError

b = Book(copyright_year=1924) # OK

A validator function can return a different value than what was assigned to the property to act as a filter. However, the value must survive repeated calls to the validator. This is because values get revalidated each time the object is marshaled to the datastore. For example, a validator that returns val.lower() is fine, because passing the result through the validator again does not produce a new value. A validator that returns '"' + val + '"' would add quote marks around the string every time the object is saved, which isn’t the desired effect.

Nonindexed Properties

In Chapter 7, we mentioned that you can set properties of an entity in such a way that they are available on the entity, but are considered unset for the purposes of indexes. In ndb, you establish a property as nonindexed using a property declaration. If the property declaration is given anindexed argument of False, entities created with that model class will set that property as nonindexed:

class Book(ndb.Model):

first_sentence = ndb.StringProperty(indexed=False)

b = Book()

b.first_sentence = "On the Internet, popularity is swift and fleeting."

b.put()

# Count the number of Book entities with

# an indexed first_sentence property...

c = Book.all().order('first_sentence').count(1000)

# c = 0

Some property declarations are nonindexed by default, namely TextProperty and BlobProperty. In the case of BlobProperty, you can set indexed=True, as long as the value is no more than 500 bytes in length. (For a short indexed TextProperty, just use StringProperty.)

When using undeclared properties with an ndb.Expando model, all properties are considered indexed by default, unless the assigned value is a unicode longer than 500 characters or a str longer than 500 bytes (which cannot be indexed). As with ndb.Model, you can use property declarations in the class definition to override this default for specific properties. To change this default for all properties assigned to an ndb.Expando instance, set the _default_indexed = False class property:

class NonindexedEntity(ndb.Expando):

_default_indexed = False

entity = NonindexedEntity()

entity.foo = 'bar' # The foo property is stored as a nonindexed value.

entity.put()

Automatic Values

Several property declaration classes include features for setting values automatically.

The ndb.DateProperty, ndb.DateTimeProperty, and ndb.TimeProperty classes can populate the value automatically with the current date and time. To enable this behavior, you provide the auto_now or auto_now_add arguments to the property declaration.

If you set auto_now=True, the declaration class overwrites the property value with the current date and time when you save the object. This is useful when you want to keep track of the last time an object was saved:

class Book(ndb.Model):

last_updated = ndb.DateTimeProperty(auto_now=True)

b = Book()

b.put() # last_updated is set to the current time

# ...

b.put() # last_updated is set to the current time again

If you set auto_now_add=True, the property is set to the current time only when the object is saved for the first time. Subsequent saves do not overwrite the value:

class Book(ndb.Model):

create_time = ndb.DateTimeProperty(auto_now_add=True)

b = Book()

b.put() # create_time is set to the current time

# ...

b.put() # create_time stays the same

The ndb.UserProperty declaration class also includes an automatic value feature. If you provide the argument auto_current_user=True, the value is set to the user accessing the current request handler if the user is signed in. If you provide auto_current_user_add=True, the value is only set to the current user when the entity is saved for the first time, and left untouched thereafter. If the current user is not signed in, the value is set to None:

class BookReview(ndb.Model):

created_by_user = ndb.UserProperty(auto_current_user_add=True)

last_edited_by_user = ndb.UserProperty(auto_current_user=True)

br = BookReview()

br.put() # created_by_user and last_edited_by_user set

# ...

br.put() # last_edited_by_user set again

At first glance, it might seem reasonable to set a default for an ndb.UserProperty this way:

from google.appengine.api import users

class BookReview(ndb.Model):

created_by_user = ndb.UserProperty(

default=users.get_current_user())

# WRONG

This would set the default value to be the user who is signed in when the class is imported. Subsequent requests handled by the instance of the application will use a previous user instead of the current user as the default.

To guard against this mistake, ndb.UserProperty does not accept the default argument. You can use only auto_current_user or auto_current_user_add to set an automatic value.

Repeated Properties

The ndb library includes support for multivalued properties, which it just calls “repeated” properties. You declare a multivalued property by specifying a property declaration with the repeated=True argument. Its value is a Python list, possibly empty, containing values of the corresponding type:

class Book(ndb.Model):

tags = ndb.StringProperty(repeated=True)

b = Book()

b.tags = ['python', 'app engine', 'data']

The datastore does not distinguish between a multivalued property with no elements and no property at all. As such, an undeclared property on an ndb.Expando object can’t store the empty list. If it did, when the entity is loaded back into an object, the property simply wouldn’t be there, potentially confusing code that’s expecting to find an empty list. To avoid confusion, ndb.Expando disallows assigning an empty list to an undeclared property.

When you declare a repeated property, the declaration takes care of translating between the absence of a property and the empty list representation in your code. A repeated property declaration makes it possible to keep an empty list value on a multivalued property. The declaration interprets the state of an entity that doesn’t have the declared property as the property being set to the empty list, and maintains that distinction on the object. This also means that you cannot assign None to a declared list property—but this isn’t of the expected type for the property anyway. (None can be one of the values in the list if it is allowed by the type of the declaration.)

The datastore does distinguish between a property with a single value and a multivalued property with a single value. An undeclared property on an ndb.Expando object can store a list with one element, and represent it as a list value the next time the entity is loaded.

A repeated property declaration cannot have a default value or required=True. If the declaration specifies a validator function, this function will be called once for each value in the list. This means the validator cannot act on the length of the list, only the individual values in the list.

TIP

To declare a repeated property whose values can be of disparate types, use ndb.GenericProperty(repeated=True).

Serialized Properties

The simple property types we’ve seen so far let you store basic values and index them for the purpose of queries. When you have a more sophisticated data value, such as a dictionary, and you don’t need to index it, one way to store it on a property is to serialize it, then store the serialized blob as an ndb.BlobProperty. This has the disadvantage that your application code must manually serialize and deserialize the value when it is used and updated.

ndb has two property declaration classes that help with this common case: ndb.JsonProperty and ndb.PickleProperty. The ndb.JsonProperty class takes a Python data object that can be represented as a JSON data record, and converts it to and from the JSON format automatically. Similarly, ndb.PickleProperty does the same thing, but uses Python’s pickle library to perform the serialization and deserialization. In both cases, the value is stored in the datastore as an unindexed blob value. To your application code, the value appears in its original form:

class Player(ndb.Model):

# ...

theme = ndb.JsonProperty()

p = Player()

p.theme = {

'background-color': '#000033',

'color': 'white',

'spirit_animal': 'phoenix'

}

p.put()

if p.theme['spirit_animal'] == 'cougar':

# ...

pass

p.theme['color'] = 'yellow'

p.put()

Both property declaration classes accept an optional compressed argument. If True, the value will be compressed as well as serialized when stored, and decompressed and deserialized when loaded. (ndb.BlobProperty also accepts the compressed argument.)

Remember that ndb.JsonProperty takes a data object, not a JSON string, as its value. If you need the JSON string form of the value, or you need to parse a JSON string into a data object, use the json module in the Python standard library.

TIP

Use ndb.JsonProperty when the object can be represented as JSON and you might need to access the serialized value from non-Python code. ndb.PickleProperty can accept a wider range of values, but uses a serialization format that is exclusive to Python.

Structured Properties

Using serialized property classes for storing data structures is easy enough, but the values cannot participate in queries (at least not meaningfully). When you need to store structured data across multiple properties that can participate in queries, ndb has yet another powerful feature: structured properties.

A structured property has a value that is an instance of an ndb.Model class. This value can have attributes, and these attributes are modeled using the same property declarations that you use to model an entity. When ndb stores the value, it converts the value into multiple properties, one for each inner property of the value. For example:

class NotificationPrefs(ndb.Model):

news = ndb.BooleanProperty(default=False)

messages = ndb.BooleanProperty(default=False)

raids = ndb.BooleanProperty(default=False)

last_updated = ndb.DateTimeProperty(auto_now=True, indexed=False)

class Player(ndb.Model):

notifications = ndb.StructuredProperty(NotificationPrefs)

p = Player()

p.notifications = NotificationPrefs()

p.notifications.news = True

p.put()

In the preceding example, the Player model uses a structured property based on the NotificationPrefs class. To set this property, we construct a NotificationPrefs instance, and can access and set values on the instance as with any other instance variable (such asp.notification.news).

When the Player object is stored, ndb creates one entity of kind Player with four properties: notifications.news, notifications.messages, notifications.raids (all Boolean values), and notifications.last_updated (a datetime value). Because the Boolean values are described as indexable by the NotificationPrefs model, they are indexed by the datastore. notifications.last_updated has indexed=False and so is not indexed.

The indexed properties of the inner model can participate in queries, like so:

q = Player.query().filter(Player.notifications.news == True)

for player inq:

# Send news notification...

Structured properties can be repeated (repeated=True). ndb simply repeats the inner properties as needed when storing the entity to the datastore. It stores enough information so that it can reassemble the inner structured values correctly when reading the entity. This poses a modest restriction: when a structured property’s model has its own structured property, only one of the properties can be repeated.

If you like using model classes for declaring typed structures but do not need to index the inner properties, ndb also provides an ndb.LocalStructuredProperty property declaration class. Instead of storing inner properties as separate indexable properties of the entity, it serializes the structure and stores it as a blob value, much like ndb.JsonProperty or ndb.PickleProperty.

Computed Properties

It is often useful to have more than one view of an entity’s property data, such as a normalized version of a value, or a summary of data on the entity. Within your application code, you might implement this calculation as a method or a property accessor on the model class:

class Player(ndb.Model):

level = ndb.IntegerProperty()

score = ndb.IntegerProperty()

@property

def score_per_level(self):

if self.level == 0:

return 0

return float(self.score) / float(self.level)

p = Player(level=4, score=280)

# p.score_per_level == 70.0

Implemented this way, this calculated value is only available to your application code. It is not stored in the datastore, and therefore cannot be used in datastore queries.

With ndb, you can declare a computed property that calls a method to calculate the value of the property whenever the entity is stored. The ndb.ComputedProperty property declaration class takes a function as its argument. This function is passed the model instance as its only argument, and returns a value of one of the base datastore value types:

class Player(ndb.Model):

level = ndb.IntegerProperty()

score = ndb.IntegerProperty()

score_per_level = ndb.ComputedProperty(lambda self: self._score_per_level())

def _score_per_level(self):

if self.level == 0:

return 0

return float(self.score) / float(self.level)

p = Player(level=4, score=280)

# Store the Player entity with level=4, score=280, score_per_level=70.

p.put()

The computed property value is calculated fresh every time it is accessed, as well as when the entity is stored.

Models and Schema Migration

Property declarations prevent the application from creating an invalid data object, or assigning an invalid value to a property. If the application always uses the same model classes to create and manipulate entities, then all entities in the datastore will be consistent with the rules you establish using property declarations.

In real life, it is possible for an entity that does not fit a model to exist in the datastore. When you change a model class—and you will change model classes in the lifetime of your application—you are making a change to your application code, not the datastore. Entities created from a previous version of a model stay the way they are.

If an existing entity does not comply with the validity requirements of a model class, you’ll get a datastore_errors.BadValueError when you try to fetch the entity from the datastore. Fetching an entity gets the entity’s data, then calls the model class constructor with its values. This executes each property’s validation routines on the data.

Some model changes are “backward compatible” such that old entities can be loaded into the new model class and be considered valid. Whether it is sufficient to make a backward-compatible change without updating existing entities depends on your application. Changing the type of a property declaration or adding a required property are almost always incompatible changes. Adding an optional property will not cause a datastore_errors.BadValueError when an old entity is loaded, but if you have indexes on the new property, old entities will not appear in those indexes (and therefore won’t be results for those queries) until the entities are loaded and then saved with the new property’s default value.

The most straightforward way to migrate old entities to new schemas is to write a script that queries all of the entities and applies the changes. We’ll discuss how to implement this kind of batch operation in a scalable way using task queues, in “Task Chaining”.

Modeling Relationships

You can model relationships between entities by storing entity keys as property values. In ndb, you use ndb.KeyProperty to declare that a property contains a datastore key value. You can optionally provide the kind argument to validate that all keys assigned to the property have the same kind. As with other property types, you can use the repeated=True argument to store multiple keys:3

class Author(ndb.Model):

surname = ndb.StringProperty(required=True)

# ...

class Book(ndb.Model):

author_keys = ndb.KeyProperty(kind=Author, repeated=True)

a1 = Author(surname='Aniston')

a2 = Author(surname='Boggs')

a3 = Author(surname='Chavez')

ndb.put_multi([a1, a2, a3])

b = Book()

b.author_keys = [a1.key, a2.key, a3.key]

b.put()

The kind argument to the ndb.KeyProperty constructor takes either a model class or the kind name as a string. (This is generally true in several places in ndb that take a model class. Using a class instead of a string gives some added protection against typing errors.) You can use a string if you want an entity to store keys for its own kind.

A datastore key is just a value. Storing one does not require that an entity exists with that key. Because an entity’s key does not change, a key reference to an existing entity will remain valid until the entity is deleted.

Key property values are queryable and orderable like any other value. In the data created in the preceding example, you can find every Book written by an Author with a query on entities of the kind Book and a filter on the author_keys property.

Storing a list of keys is one way to model a many-to-many relationship, like the relationship between books and authors. Another method is to store a separate entity to represent the relationship itself, with the keys of both parties. This method can be a bit more cumbersome, especially if relationships need to be updated in transactions. But it avoids having to extend existing entities when introducing new relationships. It also makes it easier to form queries about the relationships themselves.

Model Inheritance

In data modeling, it’s often useful to derive new kinds of objects from other kinds. The game world may contain many different kinds of carryable objects, with shared properties and features common to all objects you can carry. Because you implement classes from the data model as Python classes, you’d expect to be able to use inheritance in the implementation to represent inheritance in the model. And you can, sort of.

If you define a class based on either ndb.Model or ndb.Expando, you can create other classes that inherit from that data class, like so:

class Carryable(ndb.Model):

weight = ndb.IntegerProperty()

location = ndb.KeyProperty(kind=Location)

class Bottle(Carryable):

contents = ndb.StringProperty()

amount = ndb.IntegerProperty()

is_closed = ndb.BooleanProperty()

The subclass inherits the property declarations of the parent class. A Bottle has five property declarations: weight, location, contents, amount, and is_closed.

Objects based on the child class will be stored as entities whose kind is the name of the child class. The datastore has no notion of inheritance, and so by default will not treat Bottle entities as if they are Carryable entities. This is mostly significant for queries, and we have a solution for that in the next section.

If a child class declares a property already declared by a parent class, the child class declaration overrides the parent class. Take care when doing this that your code is using the correct value type for the child class.

A model class can inherit from multiple classes, using Python’s own support for multiple inheritance:

class Pourable(ndb.Model):

contents = ndb.StringProperty()

amount = ndb.IntegerProperty()

class Bottle(Carryable, Pourable):

is_closed = ndb.BooleanProperty()

The rules for inheriting property declarations correspond with the rules for class member variables in Python. When a property declaration is accessed, parent classes are searched in the order they are specified, left to right. For example, if Pourable had a weight property declared as anndb.FloatProperty and Carryable declared weight as an ndb.IntegerProperty, Bottle would use the weight declaration from the leftmost class mentioned in its list of parents:

class Carryable(ndb.Model):

location = ndb.KeyProperty(kind=Location)

weight = ndb.IntegerProperty()

class Pourable(ndb.Model):

contents = ndb.StringProperty()

amount = ndb.IntegerProperty()

weight = ndb.FloatProperty()

class Bottle(Pourable, Carryable):

# Pourable.weight (float) wins over Carryable.weight (int)

# ...

b = Bottle()

b.weight = 3.4 # OK

class Bottle(Carryable, Pourable):

# Carryable.weight (int) wins over Pourable.weight (float)

# ...

b = Bottle()

b.weight = 3.4 # datastore_errors.BadValueError: Expected integer

Parents that themselves share a common ancestor class form a “diamond inheritance” pattern. This is supported, and resolved in the usual way:

class GameObject(ndb.Model):

name = ndb.StringProperty()

location = ndb.KeyProperty(kind='Location')

class Carryable(GameObject):

weight = ndb.IntegerProperty()

class Pourable(GameObject):

contents = ndb.StringProperty()

amount = ndb.IntegerProperty()

class Bottle(Carryable, Pourable):

is_closed = ndb.BooleanProperty()

Queries and PolyModels

The datastore knows nothing of our modeling classes and inheritance. Instances of the Bottle class are stored as entities of the kind 'Bottle', with no inherent knowledge of the parent classes. It’d be nice to be able to perform a query for Carryable entities and get back Bottle entities and others. That is, it’d be nice if a query could treat Bottle entities as if they were instances of the parent classes, as Python does in our application code. We want polymorphism in our queries.

For this, the data modeling API provides a special base class: ndb.PolyModel. Model classes using this base class support polymorphic queries. Consider the Bottle class defined previously. Let’s change the base class of GameObject to ndb.PolyModel, like so:

from google.appengine.ext.ndb import polymodel

class GameObject(polymodel.PolyModel):

# ...

We can now perform queries for any kind in the hierarchy, and get the expected results:

location_key = ndb.Key('Location', 'babbling brook')

b = Bottle(location=location_key, weight=125)

b.put()

# ...

q = Carryable.query()

q = q.filter(GameObject.location == location_key)

q = q.filter(Carryable.weight > 100)

for obj inq:

# obj is a carryable object that is at the babbling brook

# and weighs more than 100 kilos.

# ...

This query can return any Carryable, including Bottle entities. The query can use filters on any property of the specified class (such as weight from Carryable) or parent classes (such as location from GameObject).

Behind the scenes, polymodel.PolyModel does three clever things differently from its cousins:

§ Objects of the class GameObject or any of its child classes are all stored as entities of the kind 'GameObject'.

§ All such objects are given a property named class_ that represents the inheritance hierarchy starting from the root class. This is a multivalued property, where each value is the name of an ancestor class, in order.

§ Queries for objects of any kind in the hierarchy are translated by the polymodel.PolyModel class into queries for the base class, with additional equality filters that compare the class being queried to the class property’s values.

In short, polymodel.PolyModel stores information about the inheritance hierarchy on the entities, then uses it for queries to support polymorphism.

Each model class that inherits directly from polymodel.PolyModel is the root of a class hierarchy. All objects from the hierarchy are stored as entities whose kind is the name of the root class. As such, your data will be easier to maintain if you use many root classes to form many class hierarchies, as opposed to putting all classes in a single hierarchy. That way, the datastore viewer and bulk loading tools can still use the datastore’s built-in notion of entity kinds to distinguish between kinds of objects.

Creating Your Own Property Classes

The property declaration classes serve several functions in your data model:

Value validation

The model calls the class when a value is assigned to the property, and the class can raise an exception if the value does not meet its conditions.

Type conversion

The model calls the class to convert from the value type used by the app (the user value) to one of the core datastore types for storage (the base value), and back again.

Automatic values

The model calls the class to store the final value on the entity, giving it an opportunity to calculate the final value to be stored.

Every property declaration class inherits from the ndb.Property base class. This class implements features common to all property declarations, including support for the common constructor arguments (such as required and indexed). Declaration classes override methods and members to specialize the validation and type conversion routines.

When making your own property declaration classes, it’s easiest to inherit from one of the built-in classes, such as ndb.StringProperty, instead of inheriting from ndb.Property directly. Choose the class whose type is most like the base value you wish to store in the datastore.

Validating Property Values

Here is a simple property declaration class. It accepts any unicode value, and stores it as a datastore short string (the default behavior for Python unicode values). Because it stores a short string, it inherits from ndb.StringProperty:

class PlayerNameProperty(ndb.StringProperty):

def _validate(self, value):

if notisinstance(value, unicode):

raise datastore_errors.BadValueError(

'Expected unicode, got {}'.format(value))

return value

And here is how you would use the new property declaration:

class Player(ndb.Model):

player_name = PlayerNameProperty()

p = Player()

p.player_name = u'Ned Nederlander'

p.player_name = 12345 # BadValueError: int is not unicode

p.player_name = 'Ned' # BadValueError: str is not unicode

The PlayerNameProperty class overrides the default _validate() method, which takes the value to validate as its argument and either returns the value or raises an exception. Just like a custom validator function set by the declaration, the default validator can return a different value to act as a filter. The property class has other opportunities to perform value conversions, so this is best left to just perform validation.

Importantly, _validate() does not call the parent class’s base method (super), and it must not call it even if the parent class is another property class with its own validator. The property API is “stackable.” The ndb.Property base class invokes the validators for all classes in the inheritance chain automatically, starting with the closest class and working its way up to the base class. Each validator receives the result of the previous validator. This makes it easy to define new properties in terms of existing properties without worrying about the details of how the base property is implemented.

The _validate() method is not called if the property value is None. Any property that is not declared as required accepts a None value, and this value is not validated. As we’ll see later, the property class can substitute a different value for None in a different method.

So far, this example doesn’t do much beyond ndb.StringProperty other than require a unicode as the user value. This by itself can be useful to give the property type a class for future expansion. Let’s add a requirement that player names be between 6 and 30 characters in length:

class PlayerNameProperty(ndb.StringProperty):

def _validate(self, value):

if notisinstance(value, unicode):

raise datastore_errors.BadValueError(

'Expected unicode, got {}'.format(value))

if len(value) < 6 orlen(value) > 30:

raise datastore_errors.BadValueError(

'Value must be between 6 and 30 characters.')

return value

The new validation logic disallows strings with an inappropriate length:

p = Player()

p.player_name = 'Ned' # BadValueError: length < 6

p.player_name = 'Ned Nederlander' # OK

p = Player(player_name = 'Ned') # BadValueError: length < 6

Marshaling Value Types

The datastore supports a fixed set of core value types for properties, listed in Table 6-1. A property declaration can support the use of other types of values in the attributes of model instances by marshaling between the desired type and one of the core datastore types. The value provided by and to the application code is called the user value, and the value sent to and received from the datastore is called the base value.

The _to_base_type() method takes a user value and returns a base value. The _from_base_type() method takes a base value and returns a user value. You can override these to customize their default behavior, which is to return the value unmodified.

Like the _validate() method, the property API stacks these methods. They must not call their parent methods themselves. When ndb needs to convert a user value to a base value, it calls the closest class’s _to_base_type() with the user value, then passes the result of that to the parent method, and so on up to the base class. When it needs to convert a base value to a user value, it goes in the opposite direction: first it calls the base class’s _from_base_type() method, then it passes the result to the next child class in line, and so on down to the bottommost child class.

Say we wanted to represent player name values within the application using a PlayerName value class instead of a simple string. Each player name has a surname and an optional first name. We can store this value as a single property, using the property declaration to convert between the user value (PlayerName) and an appropriate base value (such as unicode):

class PlayerName(object):

def __init__(self, first_name, surname):

self.first_name = first_name

self.surname = surname

def is_valid(self):

return (isinstance(self.first_name, unicode)

andisinstance(self.surname, unicode)

andlen(self.surname) >= 6)

class PlayerNameProperty(ndb.StringProperty):

def _validate(self, value):

if notisinstance(value, PlayerName):

raise datastore_errors.BadValueError(

'Expected PlayerName, got {}'.format(value))

# Let the data class have a say in validity.

if notvalue.is_valid():

raise datastore_errors.BadValueError(

'Must be a valid PlayerName')

# Disallow the serialization delimiter in the fields.

if value.surname.find('|') != -1 orvalue.first_name.find('|') != -1:

raise datastore_errors.BadValueError(

'PlayerName surname and first_name cannot contain a "|".')

return value

def _to_base_type(self, value):

return '|'.join([value.surname, value.first_name])

def _from_base_type(self, value):

(surname, first_name) = value.split('|')

return PlayerName(first_name=first_name, surname=surname)

And here’s how you’d use it:

p = Player()

p.player_name = PlayerName(u'Ned', u'Nederlander')

p.player_name = PlayerName(u'Ned', u'Neder|lander')

# BadValueError, surname contains serialization delimiter

p.player_name = PlayerName(u'Ned', u'Neder')

# BadValueError, PlayerName.is_valid() == False, surname too short

p.player_name = PlayerName('Ned', u'Nederlander')

# BadValueError, PlayerName.is_valid() == False, first_name is not unicode

Here, the application value type is a PlayerName instance, and the datastore value type is that value encoded as a Unicode string. The encoding format is the surname field, followed by a delimiter, followed by the first_name field. We disallow the delimiter character in the surname using the _validate() method. (Instead of disallowing it, we could also escape it in _to_base_value() and unescape it in _from_base_value().)

In this example, PlayerName(u'Ned', u'Nederlander') is stored as this Unicode string:

Nederlander|Ned

The datastore value puts the surname first so that the datastore will sort PlayerName values first by surname, then by first name. In general, you choose a serialization format that has the desired ordering characteristics for your custom property type. The core type you choose also impacts how your values are ordered when mixed with other types, though if you’re modeling consistently this isn’t usually an issue.

If the conversion from the application type to the datastore type may fail, put a check for the conversion failure in the _validate() method. This way, the error is caught when the bad value is assigned, instead of when the object is saved.

Accepting Arguments

As we’ve seen in several of the built-in property declaration classes, it is useful to allow the user to customize the behavior of a property declaration on a per-use basis by providing arguments to the constructor.

Let’s extend PlayerNameProperty with a require_first_name customization argument that defaults to False. When True, the _validate() method rejects PlayerName values that do not have a first_name:

class PlayerNameProperty(ndb.StringProperty):

_attributes = ndb.Property._attributes + ['require_first_name']

@utils.positional(1 + Property._positional)

def __init__(self, name=None, require_first_name=False, **kwds):

super(PlayerNameProperty, self).__init__(name=name, **kwds)

self._require_first_name = require_first_name

def _validate(self, value):

# ...

if self._require_first_name and not value.first_name:

raise datastore_errors.BadValueError(

'PlayerName must have a first_name')

# ...

We made three changes:

§ Set the _attributes class member variable (a list of strings) to ndb.Property._attributes extended with the new attribute

§ Overrode the __init__() initializer to accept a require_first_name argument, call the parent initializer, then set the _require_first_name instance variable (shadowing the class member variable)

§ Extended _validate() to test the instance variable and require that value.first_name be nonempty, if requested by the declaration

The _attributes class member variable is only used to generate a string representation of the property instance for debugging purposes. It doesn’t affect how you use the argument, but it’s a convenient way to extend internal error messages involving custom properties.

There is no magic in how we’re collecting and storing the argument. We’re overriding the initializer, adding a keyword argument, and storing it on the instance for later use. There is a small amount of crazy stuff here to comply with requirements internal to the Property class, specifically aname argument that must be the first positional argument to the initializer. The model class populates this with the property name (the class attribute) when setting up the declaration. Briefly:

§ @utils.positional(1 + Property._positional) tells Property’s internal argument handling that self and name are positional arguments. Our custom argument is a keyword-only argument.

§ def __init__(self, name=None, …, **kwds): preserves the self and name arguments as the first positional arguments, followed by our custom keyword arguments, then a dict to consume the remaining keyword arguments.

§ super(PlayerNameProperty, self).__init__(name=name, **kwds) calls the parent class’s initializer with the name in the first spot, followed by the keyword arguments we didn’t reserve for ourselves. In typical Python 2.7 fashion, PlayerNameProperty is the name of the property class we are defining.

You’d use this feature as follows:

class Player(ndb.Model):

player_name = PlayerNameProperty(require_first_name=True)

p = Player()

p.player_name = PlayerName(

first_name=u'Ned', surname=u'Nederlander')

# OK

p.player_name = PlayerName(u'', u'Madonna')

# BadValueError: first_name is empty, but required by the declaration

Implementing Automatic Values

As we saw earlier, the ndb.DateTimeProperty has a special feature where if the user sets auto_now=True, the value is automatically updated to the current system time when the entity is saved. It does this in a hook called _prepare_for_put(). You can define this method in your property class to achieve a similar effect.

Why not put this in _validate() or _to_base_value()? This might work, depending on what you’re trying to do. Keep in mind that these methods are not called if the value is None (or just isn’t set). In the case of auto_now, it is important that the automatic value be set even if the app hasn’t set a value for the property.

Continuing our PlayerNameProperty example, let’s add an auto_ned parameter. If True, always use the name “Ned Nederlander” for the value, regardless of whether or how it was set by the application:

class PlayerNameProperty(ndb.StringProperty):

_attributes = ndb.Property._attributes + ['require_first_name', 'auto_ned']

def __init__(self, name=None,

require_first_name=False,

auto_ned=False,

**kwds):

super(PlayerNameProperty, self).__init__(name=name, **kwds)

self._require_first_name = require_first_name

self._auto_ned = auto_ned

# ...

def _prepare_for_put(self, entity):

if self._auto_ned:

self._store_value(entity, PlayerName(u'Ned', u'Nederlander'))

ndb calls the _prepare_for_put() method of each declared property prior to storing the entity in the datastore. This is another stacking API, and must not call the parent class. The method takes the entire entity (not just the property) as its argument, and is allowed to examine the entity and update the property value.

To update the value, we call the property class’s own self._store_value(entity, value) method. The value in this case is a user value. ndb will pass this value through _to_base_value() as needed. Alternatively, you can pass a base value by wrapping it inndb._BaseValue(val). This tells ndb that it doesn’t need converting. In this case, we create a PlayerName user value, and let _to_base_value() do its thing.

Here’s how you’d use this silly new option:

class Player(ndb.Model):

player_name = PlayerNameProperty(auto_ned=True)

p = Player()

p.player_name = PlayerName(u'', u'Madonna')

p.put() # p.player_name is now PlayerName(u'Ned', u'Nederlander')

That gets us behavior similar to ndb.DateTimeProperty’s auto_now. What if we want something like auto_now_add, which only sets the value if the value is not set? We can use self._has_value(entity) to test whether a value is present. Here is an implementation of anauto_ned_add feature for PlayerNameProperty:

class PlayerNameProperty(ndb.StringProperty):

_attributes = ndb.Property._attributes + ['require_first_name',

'auto_ned', 'auto_ned_add']

def __init__(self, name=None,

require_first_name=False,

auto_ned=False,

auto_ned_add=False,

**kwds):

super(PlayerNameProperty, self).__init__(name=name, **kwds)

self._require_first_name = require_first_name

self._auto_ned = auto_ned

self._auto_ned_add = auto_ned_add

# ...

def _prepare_for_put(self, entity):

if (self._auto_ned or

(self._auto_ned_add and not self._has_value(entity))):

self._store_value(entity, PlayerName(u'Ned', u'Nederlander'))

And here is how to use it:

class Player(ndb.Model):

player_name = PlayerNameProperty(auto_ned_add=True)

p = Player()

p.put() # p.player_name is PlayerName(u'Ned', u'Nederlander')

p = Player()

p.player_name = PlayerName(u'', u'Madonna')

p.put() # p.player_name is PlayerName(u'', u'Madonna')

Automatic Batching

Calling the datastore service each time your application reads or writes an entity takes a significant amount of time. Especially when making synchronous calls, where your app code waits for the call to succeed before proceeding, calling the datastore efficiently can make a big difference to your application’s performance. (See Chapter 17 for more information about calling services asynchronously.) The ndb library knows how to batch calls to the datastore automatically, and does so invisibly to your app.

We’ve already seen how to initiate a batch call from the application code. Consider this example:

class Entity(ndb.Model):

pass

e1 = Entity()

e1.put()

e2 = Entity()

e2.put()

e3 = Entity()

e3.put()

You could paraphrase this to use an explicit batch call to ndb.put_multi(), like so:

class Entity(ndb.Model):

pass

e1 = Entity()

e2 = Entity()

e3 = Entity()

ndb.put_multi([e1, e2, e3])

The batch call makes one remote procedure call to store all three entities, which is more efficient than making a remote procedure call for each entity.

Brilliantly, ndb knows better than to make three RPCs in the first example. ndb keeps track of how your application code manipulates datastore data and will optimize calls to the datastore as batch calls automatically whenever it can. It even knows how to do this with more complex calling patterns, such as code that interleaves gets and puts of overlapping data. The library preserves all transactional guarantees, and is generally more careful with calls that participate in a transaction.

In other words, thanks to ndb, the first example is generally equivalent to the second example with regards to how the datastore is called. Your code can simply call ndb when it is convenient and obvious to do so, and ndb will make it as fast as possible.

Automatic Caching

Another major technique for optimizing datastore calls is to use a cache: keep the data in memory, and prefer to read from memory than from the datastore. ndb uses two separate strategies for caching datastore data: an in-context cache and the distributed memcache.

The in-context cache is a temporary cache that lives in RAM, and lasts as long as the request handler is running, from request to response. Each request handler gets its own in-context cache. (It does not persist on the instance, and is not shared between handlers.) Like automatic batching, the in-context cache is something you might build yourself to save on calls to the datastore if ndb weren’t doing it for you. For example:

def add_props(id):

entity = ndb.Key('Entity', id).get()

return entity.prop1 + entity.prop2

def mult_props(id):

entity = ndb.Key('Entity', id).get()

return entity.prop1 * entity.prop2

sum = add_props('foo')

product = mult_props('foo')

Here we have two call paths that want to operate on the same entity. These call paths don’t know anything about each other, and probably shouldn’t: they’re easily understood as separate functions that, internally, access the datastore and perform some calculation. Without ndb’s help, calling each path would result in two calls to the datastore to read the same data twice. You could try to fix this yourself with a little in-memory cache of your own:

# (This is unnecessary with ndb.)

CACHE = {}

def get_entity(id):

if id not in CACHE:

CACHE[id] = ndb.Key('Entity', id).get()

return CACHE[id]

def add_props(id):

entity = get_entity(id)

return entity.prop1 + entity.prop2

def mult_props(id):

entity = get_entity(id)

return entity.prop1 * entity.prop2

sum = add_props('foo')

product = mult_props('foo')

If you trace through these calls, you’ll see that ndb.Key('Entity', 'foo').get() is only called once, and the second access uses CACHE. This puts undue burden on your code to perform a basic bookkeeping task.

By default, ndb will handle the first example like the second example, using its in-context cache for the second call to get() with the same key. This is usually what you want, at least for short-lived request handlers. It assumes that the request handler prefers to see the same version of an entity when the entity is requested twice, even if another process has updated the entity between the calls. Once again, ndb is smart enough to do the right thing when updating data or performing transactions. It’s a free performance benefit when the benefit makes sense, and you don’t have to contort your code in weird ways to get it.

In addition to the in-context cache, ndb uses App Engine’s distributed memcache service for further performance benefits across multiple request handlers. We’ll discuss this service in more detail in Chapter 12. For common cases where you would normally store a datastore entity in memcache by its key to avoid a call to the datastore, it’s best to rely on ndb’s built-in automatic support for doing just that. ndb has robust logic for determining when to read from memcache and when to invalidate the cached data and go straight to the datastore. And again, ndb is careful when it comes to transactions.

Setting the Cache Policy for a Kind

You don’t always want to use a cache. Especially with memcache, your code may need more direct control over when the app goes to the datastore for fresh data versus reading from the cache, which may have stale data. You can tell ndb to not use either the in-context cache, memcache, or both under certain circumstances.

Most often, you’ll want to set this cache policy based on the kind of the data. The easiest way to do this is to set special class variables on the ndb.Model subclass for the kind. For example:

class Entity(ndb.Model):

_use_cache = False

_use_memcache = False

e1 = ndb.Key('Entity', 'foo').get() # calls the datastore

# ...

e2 = ndb.Key('Entity', 'foo').get() # calls the datastore

If cls._use_cache is False, ndb never uses the in-context cache for entities of this kind. If cls._use_memcache is False, ndb never uses memcache automatically for entities of this kind. As in this example, if both are False, ndb goes straight to the datastore when asked to do so.

Putting memcache in front of the datastore means that when a datastore entity changes, there is a period of time where memcache may have old data, and therefore request handlers reading that data from memcache will not see the change. When ndb stores an entity in memcache, it sets an upper bound for this period of time, sometimes called a timeout, an expiration time, or the time-to-live (TTL). After that period has elapsed (or possibly sooner), memcache evicts the value. The next process to use ndb to read the entity will call the datastore, and update memcache with the latest data.

To adjust the memcache timeout for entities of a kind, set the cls._memcache_timeout class member, as a number of seconds:

class Entity(ndb.Model):

_memcache_timeout = 20 * 60 # 20 minutes

ndb does attempt to invalidate memcache values when the corresponding datastore entity is updated or deleted. However, this is not guaranteed. The memcache does not participate in datastore transactions, and so the datastore update may succeed while the memcache update may fail. The timeout ensures that memcache will eventually forget its stale data.

TIP

Even short cache timeouts can be useful. If an entity is read once per second, a timeout of two minutes replaces 99.16% of datastore calls with memcache calls. The data that is read will be at most two minutes old.

In addition to disabling the in-context cache or memcache (or both) for a kind, you can also disable the datastore. Why would you do that? With one or both caches enabled and the datastore disabled, ndb becomes a great way to cache structured data. Caches are not persistent storage, but they are useful to avoid repeating calculations or expensive network operations, just as they are useful to avoid unnecessary trips to the datastore. To disable datastore storage for a kind, use the cls._use_datastore class member.

More Complex Cache Policies

You can set a cache policy based on more complex criteria than just the kind of an entity. To do so, you define a function that takes the ndb.Key of an entity being manipulated and returns True if caching is allowed for that entity, or False if not. You install this cache policy using a method on the context object, which you obtain from the function ndb.get_context():

def never_cache_test_data(key):

id = key.string_id()

return notid or not id.startswith('test_')

ctx = ndb.get_context()

ctx.set_memcache_policy(never_cache_test_data)

This example sets a global cache policy for memcache that says any entity (of any kind) that has a string ID that begins with test_ should never be stored in memcache. This policy applies to all uses of ndb by this request handler, and only applies to the use of memcache, not the in-context cache or the datastore.

As written, this example overrides the default global policy for memcache, which is to test the _use_memcache member variable of the kind model class. If a model class sets _use_memcache, it will be ignored. Your global cache policy function can fall back to this behavior by callingndb.Context.default_memcache_policy():

def never_cache_test_data(key):

id = key.string_id()

if id.startswith('test_'):

return False

return ndb.Context.default_memcache_policy(key)

ctx = ndb.get_context()

ctx.set_memcache_policy(never_cache_test_data)

The equivalent functions for the in-context cache are ctx.set_cache_policy() to set the policy, and ndb.Context.default_cache_policy() to fall back on the default global policy. For the datastore, these are ctx.set_datastore_policy() andndb.Context.default_datastore_policy(), respectively.

You can also set a global policy for the memcache timeout. The policy function takes a key and returns the timeout as a number of seconds. To register this policy, pass the function to ctx.set_memcache_timeout_policy().

Ignoring Caches per Call

Finally, there’s an easy way to tell ndb to ignore one of the caches on a per-call basis. Methods that call the datastore accept use_cache, use_memcache, use_datastore, and memcache_timeout keyword arguments. Set these to override the established cache policy for the purposes of the call:

entity = ndb.Key('Entity', 'foo').get(use_memcache=False)

TIP

Remember that the default cache policies use all three of the datastore, in-context cache, and memcache. If you don’t want one or more of these in some cases, you must establish an alternative cache policy using one of these techniques.

1 The ndb library is a successor to an older library called ext.db, which is still distributed with the App Engine SDK. While it carries on many of the ideas of ext.db, the ndb library is not backwards compatible. You can read a version of this chapter that covers ext.db for free on the website for this book.

2 It is possible to create an entity with a property whose name starts with an underscore. This convention only applies to object attributes in the modeling API.

3 If you’ve used the older ext.db library, you may be familiar with its “reference properties” feature. This feature was dropped for ndb to make key handling easier to understand. Some use cases for reference properties are better met by ndb’s structured properties feature.