The Memory Cache - Programming Google App Engine (2012)

Programming Google App Engine

Chapter 11. The Memory Cache

Durable data storage requires a storage medium that retains data through power loss and system restarts. Today’s medium of choice is the hard drive, a storage device composed of circular platters coated with magnetic material on which data is encoded. The platters spin at a constant rate while a sensor moves along the radius, reading and writing bits on the platters as they travel past. Reading or writing a specific piece of data requires a disk seek to position the sensor at the proper radius and wait for the platter to rotate until the desired data is underneath. All things considered, hard drives are astonishingly fast, but for web applications, disk seeks can be costly. Fetching an entity from the datastore by key can take time on the order of tens of milliseconds.

Most high-performance web applications mitigate this cost with a memory cache. A memory cache uses a volatile storage medium, usually the RAM of the cache machines, for very fast read and write access to values. A distributed memory cache provides scalable, consistent temporary storage for distributed systems, so many processes on many machines can access the same data. Because memory is volatile—it gets erased during an outage—the cache is not useful for long-term storage, or even short-term primary storage for important data. But it’s excellent as a secondary system for fast access to data also kept elsewhere, such as the datastore. It’s also sufficient as global high-speed memory for some uses.

The App Engine distributed memory cache service, known as memcache in honor of the original memcached system that it resembles, stores key-value pairs. You can set a value with a key, and get the value given the key. A value can be up to a megabyte in size. A key is up to 250 bytes, and the API accepts larger keys and uses a hash algorithm to convert them to 250 bytes.

The memcache does not support transactions like the datastore does, but it does provide several atomic operations. Setting a single value in the cache is atomic: the key either gets the new value or retains the old one (or remains unset). You can tell memcache to set a value only if it hasn’t changed since it was last fetched, a technique known as “compare and set” in the API. The App Engine memcache also includes the ability to increment and decrement numeric values as an atomic operation.

A common way to use the memcache with the datastore is to cache datastore entities by their keys. When you want to fetch an entity by key, you first check the memcache for a value with that key, and use it if found (known as a cache hit). If it’s not in the memcache (a cache miss), you fetch it from the datastore, then put it in the memcache so future attempts to access it will find it there. At the expense of a small amount of overhead during the first fetch, subsequent fetches become much faster.

If the entity changes in the datastore, you can attempt to update the memcache when the entity is updated in the datastore, so subsequent requests can continue to go to the cache but see fresh data. This mostly works, but it has two minor problems. For one, it is possible that the memcache update will fail even if the datastore update succeeds, leaving old data in the cache. Also, if two processes update the same datastore entity, then update the memcache, the datastore will have correct data (thanks to datastore transactions), but the memcache update will have the value of whichever update occurs last. Because of this possibility, it’s somewhat better to just delete the memcache key when the datastore changes, and let the next read attempt populate the cache with a current value. Naturally, the delete could also fail.

Because there is no way to update both the datastore and the memcache in a single transaction, there is no way to avoid the possibility that the cache may contain old data. To minimize the duration that the memcache will have a stale value, you can give the value an expiration time when you set it. When the expiration time elapses, the cache unsets the key, and a subsequent read results in a cache miss and triggers a fresh fetch from the datastore.

Of course, this caching pattern works for more than just datastore entities. You can use it for datastore queries, web service calls made with URL Fetch, expensive calculations, or any other data that can be replaced with a slow operation, where the benefits of fast access outweigh the possibility of staleness.

This is so often the case with web applications that a best practice is to cache aggressively. Look through your application for opportunities to make this trade-off, and implement caching whenever the same value is needed an arbitrary number of times, especially if that number increases with traffic. Site content such as an article on a news website often falls into this category. Caching speeds up requests and saves CPU time.

The APIs for the memcache service are straightforward. Let’s look at each of the memcache features, in Python and Java.

Calling Memcache from Python

The Python API for the memcache service is provided by the google.appengine.api.memcache package. The API comes in two flavors: a set of simple functions (such as set() and get), and a Client class whose methods are equivalent to the corresponding functions.

This API is intended to be compatible with the Python memcached library, which existing code or third-party libraries might use. When a feature of this library does not apply to App Engine, the method or argument for the feature is supported, but does nothing. We won’t discuss the compatibility aspects here, but see the official documentation for more information.

The Client class supports one feature that the simple functions do not: compare-and-set. This mechanism needs to store state between calls, and does so on the Client instance. This class is not threadsafe, so be sure to create a new Client instance for each request handler, and don’t store it in a global variable.

Here’s a simple example that fetches a web feed by using the URL Fetch service (via App Engine’s version of the urllib2 library), stores it in the memcache, and uses the cached value until it expires five minutes (300 seconds) later. The key is the feed URL, and the value is the raw data returned by URL Fetch:

import urllib2

from google.appengine.api import memcache

def get_feed(feed_url):

feed_data = memcache.get(feed_url)

if not feed_data:

feed_data = urllib2.urlopen(feed_url).read()

memcache.set(feed_url, feed_data, time=300)

return feed_data

Calling Memcache from Java

App Engine supports two Java APIs to memcache. The first is a proprietary interface, which will be the subject of the Java portions of this chapter. App Engine also includes an implementation of JSR 107, known as JCache, an interface standard for memcache services. You can find the JCache interface in the package net.sf.jsr107cache.

The Java API to the memcache service is in the com.google.appengine.api.memcache package. As with the other service APIs, you get a service implementation by calling a static method on the MemcacheServiceFactory class, then interact with the service by calling methods on this instance.

To make synchronous calls to the memcache service, you use an implementation of the MemcacheService interface, which you get by calling the getMemcacheService() method of the factory:

import com.google.appengine.api.memcache.MemcacheService;

import com.google.appengine.api.memcache.MemcacheServiceFactory;

// ...

MemcacheService memcache = MemcacheServiceFactory.getMemcacheService();

To make asynchronous calls, you use an implementation of the AsyncMemcacheService interface, obtained by calling the getAsyncMemcacheService() method. For more information about asynchronous service calls, see Calling Services Asynchronously.

Memcache values can be partitioned into namespaces. To use a namespace, provide the namespace as a String argument to the factory method. All calls to the resulting service implementation will use the namespace.

Here’s a simple example that fetches a web feed by using the URL Fetch service, stores it in the memcache, and uses the cached value until it expires five minutes (300 seconds) later. The key is the feed URL, and the value is the raw data returned by URL Fetch:

import java.net.URL;

import com.google.appengine.api.memcache.Expiration;

import com.google.appengine.api.memcache.MemcacheService;

import com.google.appengine.api.memcache.MemcacheServiceFactory;

import com.google.appengine.api.urlfetch.URLFetchService;

import com.google.appengine.api.urlfetch.URLFetchServiceFactory;

// ...

public byte[] getFeed(URL feedUrl) {

MemcacheService memcache = MemcacheServiceFactory.getMemcacheService();

byte[] feedData = (byte[]) memcache.get(feedUrl);

if (feedData == null) {

URLFetchService urlFetch = URLFetchServiceFactory.getURLFetchService();

try {

feedData = urlFetch.fetch(feedUrl).getContent();

memcache.put(feedUrl, feedData, Expiration.byDeltaSeconds(300);

} catch (Exception e) {

return null;

}

}

return feedData;

}

Keys and Values

The memcache service stores key-value pairs. To store a value, you provide both a key and a value. To get a value, you provide its key, and memcache returns the value.

Both the key and the value can be data of any type that can be serialized. In Python, the key and value are serialized using the pickle module in the standard library. In Java, the key and value can be of any class that implements the java.io.Serializable interface, which includes the (auto-boxed) primitive types.

The key can be of any size. App Engine converts the key data to 250 bytes by using a hash algorithm, which makes for a number of possible unique keys larger than a 1 followed by 600 zeroes. You generally don’t have to think about the size of the key.

The value can be up to 1 megabyte in its serialized form. In practice, this means that pretty much anything that can fit in a datastore entity can also be a memcache value.

Setting Values

The simplest way to store a value in memcache is to set it. If no value exists for the given key, setting the value will create a new value in memcache for the key. If there is already a value for the key, it will be replaced with the new value.

In Python, you call either the set() function, or the equivalent method on a Client instance. The method returns True on success. It only returns False if there was an issue reaching the memcache service. Since memcache may evict (delete) values at any time, it is typical to ignore this return value:

success = memcache.set(key, value)

# Or:

memcache_client = memcache.Client()

success = memcache_client.set(key, value)

if not success:

# There was a problem accessing memcache...

In Java, you call the put() method of the service implementation. When called with just the key and value, this method sets the key-value pair, and the method has no return value:

memcache.put(key, value)

Setting Values that Expire

By default, a memcache value stays in the memcache until it is deleted by the app with a service call, or until it is evicted by the memcache service. The memcache service will evict a value if it runs out of space, or if a machine holding a value goes down or is turned down for maintenance.

When you set a value, you can specify an optional expiration time. If provided, the memcache service will make an effort to evict the value when the expiration time is reached. The timing may not be exact, but it’ll be close. Setting an expiration time encourages a cache-backed process to refresh its data periodically, without the app having to track the age of a cached value and forcibly delete it.

To set an expiration for a value in Python, you include a time argument to set(). Its value is either a number of seconds in the future relative to the current time up to one month (2,592,000 seconds), or it is an absolute date and time as a Unix epoch date:

success = memcache.set(key, value, time=300)

In Java, you set the expiration as an optional third argument to the put() method. This value is an instance of the Expiration class, which you construct by calling a static class method. To set an expiration time in the future relative to the current time, you callExpiration.byDeltaSeconds(int). You can also specify this time in milliseconds using Expiration.byDeltaMillis(int); this value will be rounded down to the nearest second. To set an expiration time as an absolute date and time, you callExpiration.onDate(java.util.Date):

memcache.put(key, value, Expiration.byDeltaSeconds(300);

A value’s expiration date is updated every time the value is updated. If you replace a value with an expiration date, the new value does not inherit the old date. There is no way to query a key for “time until expiration.”

Adding and Replacing Values

There are two subtle variations on setting a value: adding and replacing.

When you add a value with a given key, the value is created in memcache only if the key is not already set. If the key is set, adding the value will do nothing. This operation is atomic, so you can use the add operation to avoid a race condition between two request handlers doing related work.

Similarly, when you replace a value with a given key, the value is updated in memcache only if the key is set. If the key is not set, the replace operation does nothing, and the key remains unset. Replacing a value is useful if the absence of the value is meaningful to another process, such as to inspire a refresh after an expiration date. Note that, as with replacing values with set, the replaced value will need its own expiration date if the previous value had one, and there is no way to preserve the previous expiration after a replacement.

In Python, you invoke these variants using separate functions: add() and replace(). As with set(), these functions have equivalent methods on the Client class. Both of these methods accept the time argument for setting an expiration date on the added or replaced value. The return value is True on success—and unlike set(), the add or replace may fail due to the existence or absence of the key, so this might be useful to know:

success = memcache.add(key, value)

if not success:

# The key is already set, or there was a problem accessing memcache...

success = memcache.replace(key, value)

if not success:

# The key is not set, or there was a problem accessing memcache...

In Java, the distinction between set, add, and replace is made using a fourth argument to put(). This argument is from the enum MemcacheService.SetPolicy, and is either SET_ALWAYS (set, the default), ADD_ONLY_IF_NOT_PRESENT (add), or REPLACE_ONLY_IF_PRESENT(replace). If you want to add or replace but do not want to set an expiration, you can set the third argument to null. The four-argument form of put() returns true on success, so you can test whether the add or replace failed, possibly due to the existence or absence of the key:

boolean success = memcache.put(key, value, null,

MemcacheService.SetPolicy.ADD_ONLY_IF_NOT_PRESENT);

Getting Values

You can get a value out of the memcache by using its key.

In Python, you call the get() function (or method). If the key is not set, it returns None:

value = memcache.get(key)

if value is None:

# The key was not set...

In Java, you call the get() method of the service implementation. Its return value is of type Object, so you’ll need to cast it back to its original type. If the key is not set, the method returns null:

String value = (String) memcache.get(key);

if (value == null) {

// The key was not set...

}

Deleting Values

An app can force an eviction of a value by deleting its key. The deletion is immediate, and atomic.

In Python, you pass the key to the delete() function or method. This returns one of three values: memcache.DELETE_SUCCESSFUL if the key existed and was deleted successfully, memcache.DELETE_ITEM_MISSING if there was no value with the given key, ormemcache.DELETE_NETWORK_FAILURE if the delete could not be completed due to a service failure. These constants are defined such that if you don’t care about the distinction between a successful delete and a missing key, you can use the result as a conditional expression. (DELETE_NETWORK_FAILURE is 0.)

success = memcache.delete(key)

if not success:

# There was a problem accessing memcache...

In Java, you call the delete() method with the key to delete. This method returns true if the key was deleted successfully or if it was already unset, or false if the service could not be reached:

boolean success = memcache.delete(key);

Locking a Deleted Key

When you delete a value, you can tell memcache to lock the key for a period of time. During this time, attempts to add the key will fail as if the key is set, while attempts to get the value will return nothing. This is sometimes useful to give mechanisms that rely on an add-only policy some breathing room, so an immediate reading of the key doesn’t cause confusion.

Only the add operation is affected by a delete lock. The set operation will always succeed, and will cancel the delete lock. The replace operation will fail during the lock period as long as the key is not set; it otherwise ignores the lock.

To lock the key when deleting in Python, you specify the optional seconds argument. Its value is either a number of seconds in the future up to a month, or an absolute Unix epoch date-time. The default is 0, which says not to use a lock:

success = memcache.delete(key, seconds=20)

In Java, you lock the key with a second argument to delete(). Its value, a long, is a number of milliseconds in the future. (You can’t set an absolute date and time for a delete lock in Java.)

boolean success = memcache.delete(key, 20000);

Atomic Increment and Decrement

Memcache includes special support for incrementing and decrementing numeric values as atomic operations. This allows for multiple processes to contribute to a shared value in the cache without interfering with each other. With just the get and set operations we’ve seen so far, this would be difficult: incrementing a value would involve reading then setting the value with separate operations, and two concurrent processes might interleave these operations and produce an incorrect result. The atomic increment operation does not have this problem.

When considering using memcache for counting, remember that memcache is nondurable storage. Your process must be resilient to the counter value being evicted at any time. But there are many forms this resilience can take. For instance, the app can periodically save the counter value to the datastore, and detect and recover if the increment fails due to the key being unset. In other cases, the counter may be helpful but not strictly necessary, and the work can proceed without it. In practice, unexpected cache evictions are rare, but it’s best to code defensively.

You can use the increment and decrement operations on any unsigned integer value. Memcache integers are 64 bits in size. Incrementing beyond the maximum 64-bit integer causes the value to wrap around to 0, and decrementing has the same behavior in reverse. If the value being incremented is not an integer, nothing changes.

When you call the increment operation, you can specify an optional initial value. Normally, the increment does nothing if the key is not set. If you specify an initial value and the key being incremented is not set, the key is set to the initial value, and the initial value is returned as the result of the operation.

The Python API provides two functions: incr() and decr(). Given a key as its sole argument, the functions will increment or decrement the corresponding integer value by 1, respectively. You can specify a different amount of change with the optional delta argument, which must be a nonnegative integer. You can also specify an initial_value, which sets the value if the key is unset. Without an initial value, incrementing or decrementing an unset key has no effect. The function returns the new value, or None if the increment does not occur:

# Increment by 1, if key is set. v = v + 1

result = memcache.incr(key)

if result is None:

# The key is not set, or another error occurred...

# Increment by 9, or initialize to 0 if not set.

result = memcache.incr(key, delta=9, initial_value=0)

# Decrement by 3, if key is set. v = v - 3

result = memcache.decr(key, delta=3)

In the Java API, there is only one method: increment(). It takes as arguments the key and the amount of change, which can be negative. An optional third argument specifies an initial value, which sets the value if the key is unset. The method returns a java.lang.Long equal to the new value, or null if the increment does not occur:

// Increment by 1, if key is set. v = v + 1

Long result = memcache.increment(key, 1);

if (result == null) {

// The key is not set, or another error occurred...

}

// Increment by 9, or initialize to 0 if not set.

result = memcache.increment(key, 9, 0);

// Decrement by 3, if key is set. v = v + (-3)

result = memcache.increment(key, -3);

Compare and Set

While memcache does not support general purpose transactions across multiple values, it does have a feature that provides a modest amount of transactionality for single values. The “compare and set” primitive operation sets a value if and only if it has not been updated since the last time the caller read the value. If the value was updated by another process, the caller’s update does not occur, and the operation reports this condition. The caller can retry its calculation for another chance at a consistent update.

This is a simpler version of the optimistic concurrency control we saw with datastore transactions, with some important differences. “Compare and set” can only operate on one memcache value at a time. Because the value is retained in fast nondurable storage, there is no replication delay. Read and write operations occur simply in the order they arrive at the service.

The API for this feature consists of two methods: a different get operation that returns both the value and a unique identifier (the compare-and-set ID, or CAS ID) for the value that is meaningful to the memcache, and the compare-and-set operation that sends the previous CAS ID with the updated value. The CAS ID for a key in memcache changes whenever the key is updated, even if it is updated to the same value as it had before the update. The memcache service uses the provided CAS ID to decide whether the compare-and-set operation should succeed. The Python and Java APIs represent this functionality in slightly different ways.

In Python, the CAS IDs of retrieved values are kept internal to the Client instance you use to interact with the service. You call a slightly different method for getting values, gets(), which knows to ask for and remember the CAS ID for the key. To update with “compare and set,” you call the cas() method on a key previously retrieved using gets(). Arguments for these methods are similar to get() and set(). There are no function-style equivalents to these Client methods because the methods store the CAS IDs for keys in the client instance:

memcache_client = memcache.Client()

# Attempt to append a string to a memcache value.

retries = 3

while retries > 0:

retries -= 1

value = memcache_client.gets(key) or ''

value += 'MORE DATA!\n'

if memcache_client.cas(key, value):

break

The Client instance keeps track of all CAS IDs returned by calls to the gets() method. You can reset the client’s CAS ID store by calling the cas_reset() method.

In the Java API, the getIdentifiable() method accepts a key, and returns the value and its CAS ID wrapped in an instance of the MemcacheService.IdentifiableValue class. You can access the value with its getValue() method. To perform a compare-and-set update, you call the putIfUntouched() method with the key, the original MemcacheService.IdentifiableValue instance, the new value, and an optional Expiration value. This method returns true on success:

MemcacheService.IdentifiableValue idValue;

int retries = 3;

// Attempt to append a string to a memcache value.

while (retries-- > 0) {

idValue = memcache.getIdentifiable(key);

String value = "";

if (idValue != null) {

value = (String) idValue.getValue();

}

value += "MORE DATA!\n";

if (memcache.putIfUntouched(key, idValue, value) {

break;

}

}

Batching Calls to Memcache

The memcache service includes batching versions of its API methods, so you can combine operations in a single remote procedure call. As with the datastore’s batch API, this can save time in cases where the app needs to perform the same operation on multiple independent values. And as with the datastore, batching is not transactional: some operations may succeed while others fail. The total size of the batch call parameters can be up to 32 megabytes, as can the total size of the return values.

The details of the batch API differ between Python and Java, so we’ll consider them separately.

Memcache Batch Calls in Python

The Python API includes separate batch functions for each operation, both as standalone functions and as Client methods. The names of the batch methods all end with _multi.

set_multi() sets multiple values. It takes a mapping of keys and values as its first argument, and an optional expiration time argument that applies to all values set. The method returns a list of keys not set. An empty list indicates that all values were set successfully:

value_dict = {}

for result in results:

value_dict[key_for_result(result)] = result

keys_not_set = memcache.set_multi(value_dict)

if keys_not_set:

# Keys in keys_not_set were not set...

add_multi() and replace_multi() behave similarly. They take a mapping argument and the optional time argument, and return a list of keys not set. As with add() and replace(), these methods may fail to set keys because they are already set, or are not set, respectively.

get_multi() takes a list of keys, and returns a mapping of keys to values for all keys that are set in memcache. If a provided key is not set, it is omitted from the result mapping:

value_dict = memcache.get_multi(keys)

for key in keys:

if key not in value_dict:

# key is unset...

else:

value = value_dict[key]

# ...

delete_multi() takes a list of keys, and an optional seconds argument to lock all the keys from adds for a period of time. The method returns True if all keys were deleted successfully or are already unset, or False if any of the keys could not be deleted. Unlike delete(),delete_multi() does not distinguish between a successful delete and an unset key:

success = memcache.delete_multi(keys)

Batch increments and decrements are handled by a single method, offset_multi(). This method takes a mapping of keys to delta values, where positive delta values are increments and negative delta values are decrements. You can also provide a single initial_value argument, which applies to all keys in the mapping. The return value is a mapping of keys to updated values. If a key could not be incremented, its value in the result mapping is None:

increments = {}

for key in keys:

increments[key] = increment_for_key(key)

value_dict = memcache.offset_multi(increments, initial_value=0)

To get multiple values for later use with “compare and set,” you call the get_multi() method with an additional argument: for_cas=True. This returns a mapping of results just as it would without this argument, but it also stores the CAS IDs in the Client:

memcache_client = memcache.Client()

value_dict = memcache_client.get_multi(keys, for_cas=True)

To batch “compare and set” multiple values, you call the cas_multi() method with a mapping of keys and their new values. As with the other methods that update values, this method returns a list of keys not set successfully, with an empty list indicating success for all keys. If a key was not updated because it was updated since it was last retrieved, the key appears in the result list:

keys_not_set = memcache_client.cas_multi(value_dict)

Each Python batch function takes an optional key_prefix argument, a bytestring value. If provided, this prefix is prepended to every key sent to the service, and removed from every key returned by the service. This is useful as an inexpensive way to partition values. Note that key prefixes are distinct from namespaces:

prefix = 'alphabeta:'

value_dict = {'key1': 'value1',

'key2': 'value2',

'key3': 'value3'}

# Set 'alphabeta:key1', 'alphabeta:key2', 'alphabeta:key3'.

memcache.set_multi(value_dict, key_prefix=prefix)

keys = ['key1', 'key2', 'key3']

value_dict = memcache.get_multi(keys, key_prefix=prefix)

# value_dict['key1'] == 'value1'

# ('alphabeta:' does not appear in the value_dict keys.)

Memcache Batch Calls in Java

The Java API includes methods to perform operations in batches.

The <T> putAll() method takes a java.util.Map of keys and values, where T is the key type. As with put(), other forms of this method accept optional Expiration and SetPolicy values. The three-argument form of this method returns a java.util.Set<T> containing all the keys not set by the call. If the set is empty, then all keys were set successfully. (Without both an Expiration and the SetPolicy arguments, the method’s return type is void.)

import java.util.HashMap;

import java.util.Map;

import java.util.Set;

// ...

Map<String, String> valueMap = new HashMap<String, String>();

Set<String> keysNotSet = memcache.putAll(valueMap);

if (!keysNotSet.isEmpty() {

// Keys in keysNotSet were not set...

}

The getAll() method takes a java.util.Collection<T> of keys, and returns a Map<T, Object> of keys and values that are set. If a provided key is not set, then it is omitted from the result Map:

Map<String, Object> values = memcache.getAll(keys);

The incrementAll() method takes a java.util.Collection<T> of keys and an amount to change each value (a positive or negative long). Another form of the method takes an initial value as a third argument, which is used for all keys. The method returns a Map<T, Long>containing keys and updated values for all keys set or incremented successfully:

Map<String, Long> newValues = memcache.incrementAll(keys, 1);

newValues = memcache.incrementAll(keys, 10L, 0L);

To get multiple values that can be used with “compare and set” in a batch, you call the getIdentifiables() method (with the plural s at the end of the method name). It takes a Collection of keys and returns a Map of keys to MemcacheService.IdentifiableValue instances:

Map<String, MemcacheService.IdentifiableValue> values;

values = memcache.getIdentifiables(keys);

// Get a value for key k from the result map.

String v = (String) (values.get(k).getValue();

The putIfUntouched() method has a batch calling form for performing a “compare and set” with multiple values. It takes a Map of keys to instances of the wrapper class MemcacheService.CasValues, each of which holds the original MemcacheService.IdentifiableValue, the new value, and an optional Expiration. You can also provide an optional Expiration to putIfUntouched(), which applies to all values in the batch. The return value is the Set of keys that was stored successfully:

MemcacheService.IdentifiableValue oldIdentValue1;

// ...

Map<String, MemcacheService.CasValues> updateMap =

new HashMap<String, MemcacheService.CasValues>();

updateMap.put(key1, new MemcacheService.CasValues(oldIdentValue1, newValue1);

// ...

Set<String> successKeys = memcache.putIfUntouched(updateMap);

Memcache and the Datastore

The most common use of the memcache on App Engine is as a fast access layer in front of the datastore. We’ve discussed the general pattern several times already: when the app needs to fetch an entity by key, it checks the cache first, and if it’s not there, it fetches from the datastore and puts it in the cache for later. This exchanges potential update latency (cached data may be old) for access speed. The memcache is well suited for this purpose: it can use datastore entity keys as keys, and serialized datastore entity structures as values.

In Java, caching a datastore entity is a simple matter of making sure your entity class is Serializable. The Entity class of the low-level datastore API is already Serializable, as are all property value types. If you’re using JPA, you can usually just declare that your model classimplements Serializable. The resulting memcache value contains the data in the properties of your entity, or the fields of your model class.

In Python, using ext.db, you could just drop an instance of a db.Model subclass into the memcache. The memcache library will use pickle to convert the instance to a memcache value, and convert it back when you fetch it. This works, and this is what we did in Chapter 2. However,pickle will attempt to serialize the entire object, including temporary data structures internal to the db.Model class. This wastes space and risks the value being incompatible with future changes to the internal logic of the class.

One solution to this problem is to use the db.to_dict() function. This function takes a model instance and returns a mapping of property names to property values for the instance. You can reconstruct the model instance from this mapping by using your model’s class constructor. Note that the mapping does not include the key, so you’ll need to take care of that separately. The following code updates the example from Chapter 2 to use this technique:

from google.appengine.api import memcache

from google.appengine.ext import db

class UserPrefs(db.Model):

tz_offset = db.IntegerProperty(default=0)

user = db.UserProperty(auto_current_user_add=True)

def get_userprefs(user_id):

userprefs = None

userprefs_dict = memcache.get('UserPrefs:' + user_id)

userprefs_key = db.Key.from_path('UserPrefs', user_id)

if userprefs_dict:

userprefs = UserPrefs(key=userprefs_key, **userprefs_dict)

else:

userprefs = db.get(userprefs_key)

if userprefs:

memcache.set(

'UserPrefs:' + self.key().name(),

db.to_dict(userprefs)

else:

userprefs = UserPrefs(key_name=user_id)

return userprefs

Handling Memcache Errors

By default, if the memcache service is unavailable or there is an error accessing the service, the memcache API behaves as if keys do not exist. Attempts to set, add, or replace values report failure as if the put failed due to the set policy. Attempts to get values will behave as cache misses.

In the Java API, you can change this behavior by installing an alternate error handler. The setErrorHandler() method of MemcacheService takes an object that implements the ErrorHandler interface. Two such implementations are provided: LogAndContinueErrorHandler andStrictErrorHandler. The default is LogAndContinueErrorHandler with its log level set to FINE (the “debug” level in the Administration Console). StrictErrorHandler throws MemcacheServiceException for all transient service errors:

import com.google.appengine.api.memcache.StrictErrorHandler;

// ...

memcache.setErrorHandler(new StrictErrorHandler();

Error handlers can have custom responses for invalid values and service errors. Other kinds of exceptions thrown by the API behave as usual.

Memcache Administration

With the memcache service playing such an important role in the health and wellbeing of your application, it’s important to understand how your app is using it under real-world conditions. App Engine provides a Memcache Viewer in the Administration Console, which shows you up-to-date statistics about your app’s memcache data, and lets you query values by key. You can also delete the entire contents of the cache from this panel, a drastic but sometimes necessary act.

The viewer displays the number of hits (successful attempts to get a value), the number of misses (attempts to get a value using a key that was unset), and the ratio of these numbers. The raw numbers are roughly over the lifetime of the app, but it’s the ratio that’s the more useful number: the higher the hit ratio, the more time is being saved by using a cached value instead of performing a slower query or calculation.

Also shown is the total number of items and the total size of all items. These numbers mostly serve as vague insight into the overall content of the cache. They don’t apply to any fixed limits or billable quotas, and there’s no need to worry if these numbers are large. Understanding the average item size might be useful if you’re troubleshooting why small items used less frequently than very large items are getting evicted.

A particularly interesting statistic is the “oldest item age.” This is a bit of a misnomer: it’s actually the amount of time since the last access of the least recently accessed item, not the full age of that item. Under moderate load, this value approximates the amount of time a value can go without being accessed before it is evicted from the cache to make room for hotter items. You can think of it as a lower bound on the usefulness of the cache. Note that more popular cache items live longer than less popular ones, so this age refers to the least popular item in the cache.

You can use the Memcache Viewer to query, create, and modify a value in the memcache, if you have the key. The Python and Java APIs let you use any serializable data type for keys, and the Viewer can’t support all possible types, so this feature is only good for some key types. String keys are supported for Python, Java, and Go apps. You can also query keys of several Java primitive types, such an integers. Similarly, updating values from the Viewer is limited to several data types, including bytestrings, Unicode text strings, Booleans, and integers.

Lastly, the Memcache Viewer has a big scary button to flush the cache, evicting (deleting) all of its values. Hopefully you’ve engineered your app to not depend on a value being available in the cache, and clicking this button would only inconvenience the app while it reloads values from primary storage or other computation. But for an app of significant size under moderate traffic with a heavy reliance on the cache, flushing the cache can be disruptive. You may need this button to clear out data inconsistencies caused by a bug after deploying a fix (for example), but you may want to schedule the flush during a period of low traffic.

The Python development server includes a version of the memcache viewer in the development console, so you can inspect statistics, query specific (string) keys, and flush the contents of the simulated memcache service. With your development server running, visit the development console (/_ah/admin), then select Memcache Viewer from the sidebar. As of version 1.7.0, the Java development server does not yet have a memcache viewer.

Cache Statistics

The memcache statistics shown in the Administration Console are also available to your app through a simple API.

In Python, you fetch memcache statistics with the get_stats() method. This method returns a dictionary containing the statistics:

import logging

stats = memcache.get_stats()

logging.info('Memcache statistics:')

for stat in stats.iteritems():

logging.info('%s = %d' % stat)

In Java, you call the getStatistics() service method. This returns an instance of the Stats class, a read-only object with getters for each of the statistics.

Available statistics include the following:

hits / getHitCount()

The number of cache hits counted.

misses / getMissCount()

The number of cache misses counted.

items / getItemCount()

The number of items currently in the cache.

bytes / getTotalItemBytes()

The total size of items currently in the cache.

byte_hits / getBytesReturnsForHits()

The total of bytes returned in response to cache hits, including keys and values.

oldest_item_age / getMaxTimeWithoutAccess()

The amount of time since the last access of the least recently accessed item in the cache, in milliseconds.

Flushing the Memcache

You can delete every item in the memcache for your app, using a single API call. Just like the button in the Administration Console, this action is all or nothing: there is no way to flush a subset of keys, beyond deleting known keys individually or in a batch call.

If your app makes heavy use of memcache to front-load datastore entities, keep in mind that flushing the cache may cause a spike in datastore traffic and slower request handlers as your app reloads the cache.

To flush the cache in Python, you call the flush_all() function. It returns True on success:

memcache.flush_all()

To flush the cache in Java, you call the clearAll() service method. This method has no return value:

memcache.clearAll();