Programming Google App Engine
Chapter 19. Managing Request Logs
Activity and message logs are an essential part of a web application. They are your view into what happens with your application over time as it is being used, who is using it and how, and what problems, if any, your users are having.
App Engine logs all incoming requests for your application, including application requests, static file requests, and requests for invalid URLs (so you can determine whether there is a bad link somewhere). For each request, App Engine logs the date and time, the IP address of the client, the URL requested (including the path and parameters), the domain name requested, the browser’s identification string (the “user agent”), the referring URL if the user followed a link, and the HTTP status code in the response returned by the app or by the frontend.
App Engine also logs several important statistics about each request: the amount of time it took to handle each request, the amount of “CPU time” that was spent handling the request, and the size of the response. The CPU time measurement is particularly important to watch because requests that consistently consume a large amount of CPU may be throttled, such that the CPU use is spread over more clock time.
Your application code can log the occurrence of notable events and data by using a logging API. Logging a message associates a line of text with the request that emitted it, including all the data logged for the request. Each message has a log level indicating the severity of the message to make it easier to find important messages during analysis. App Engine supports five log levels: debug, info, warning, error, and critical.
You can browse your application’s request and message logs, using the Administration Console, under Logs. You can also download your log data for offline analysis and recordkeeping. An app can query log data programmatically using the log service API.
In this brief but important chapter, we’ll look at all of these features of the logging system.
TIP
If you’re new to web programming, you can ignore the advanced features of the logging system for now. But be sure to read the first couple of sections right away. Writing log messages and finding them in the Administration Console are important methods for figuring out what’s going on in a web application.
Writing to the Log
App Engine writes information about every request to the application log automatically. The app can write additional messages during the request to note application-specific details about what happened during the request handler.
An application log message has a log level that indicates the importance of the message. App Engine supports five levels: debug, info, warning, error, and critical. These are in order of “severity,” where “debug” is the least severe. When you browse or query log data, you can filter for messages above a given log level, such as to see just the requests where an error condition occurred.
App Engine will occasionally write its own messages to the log for a request. Uncaught application exceptions are written to the log as errors, with traceback information. When a handler exceeds its request deadline, App Engine writes an explicit message stating this fact. App Engine may also write informational messages, such as to say that the request was the first request served from a newly started instance, and so may have taken more time than usual.
In the development server, log messages are printed to the terminal (if run in a terminal window), the Logs window (the Python Launcher), or the Console window (Eclipse). During development, you can use log messages to see what’s going on inside your application, even if you decide not to keep those log messages in the live version of the app.
Logging in Python
Python applications can use the logging module from the standard library to log messages. App Engine hooks into this module to relay messages to the logging system, and to get the log level for each message. Example 19-1 shows this module in action.
Example 19-1. The use of the logging Python module to emit messages at different log levels
import logging
# ...
logging.debug('debug level')
logging.info('info level')
logging.warning('warning level')
logging.error('error level')
logging.critical('critical level')
sys.stderr.write('stderr write, logged at the error level\n')
In addition to messages logged with the logging module, each line of text written to the standard error stream (sys.stderr) is logged at the “error” level. (Because Python uses CGI, anything written to the standard output stream becomes part of the response data.)
In a traditional application using the logging module, you would configure the module to output only messages above a given level of severity. When running on App Engine, the level of the output is always the “debug” level, and it cannot be changed. You can filter messages by severity after the fact in the Administration Console, or when downloading logs with appcfg.py.
When running in the development web server, log messages are written to the Console, and data written to sys.stderr is written to the server’s error stream. If you are running your server in the Launcher, you can open a window to view log messages by clicking the Logs button.
The development server sets its log level to INFO by default. You can change this to DEBUG by giving the server the command-line argument --debug.
Logging in Java
For Java applications, App Engine supports the java.util.logging library from the JRE. App Engine recognizes log levels of messages logged using this library. Example 19-2 illustrates the use of the Logger class and its convenience methods.
Example 19-2. The use of the java.util.logging package to emit messages at different log levels
import java.io.IOException;
import javax.servlet.http.*;
import java.util.logging.Logger;
public class LoggingServlet extends HttpServlet {
private static final Logger log = Logger.getLogger(LoggingServlet.class.getName();
public void doGet(HttpServletRequest req, HttpServletResponse resp)
throws IOException {
log.finest("finest level"); // DEBUG
log.finer("finer level"); // DEBUG
log.fine("fine level"); // DEBUG
log.config("config level"); // DEBUG
log.info("info level"); // INFO
log.warning("warning level"); // WARNING
log.severe("severe level"); // ERROR
System.out.println("stdout level"); // INFO
System.err.println("stderr level"); // WARNING
}
}
The seven log levels of java.util.logging correspond to four of App Engine’s log levels: “finest,” “finer,” “fine,” and “config” all correspond to the App Engine debug level; “info” is info, “warning” is warning, and “severe” is error. The “critical” log level is reserved for exceptions that are not caught by the servlet; when this happens, the runtime environment logs a message at this level.
If the application writes any data to the standard output or error streams (System.out and System.err), App Engine adds that data to the log. Each line of text written to standard output becomes a log message at the “info” level, and each line written to standard error is logged at the “warning” level.
You can control which level of message should be written to the log by using configuration for java.util.logging. This allows you to leave detailed low-level logging statements in your code without having all that information clutter up the logs unnecessarily in a high-traffic app.
Configuring the log level requires two things: a configuration file and a system property that identifies the configuration file. For the configuration, create a resource file, such as war/WEB-INF/logging.properties, containing a line like this:
.level=INFO
You can configure the log level on a per-class basis by adding lines like this with the package path before the .level. This allows you to turn on fine-grained messaging for some components without turning it on for all components. For example, the Google Plugin for Eclipse creates alogging.properties configuration file with per-component settings for DataNucleus, the JDO/JPA interface package, so you can use verbose logging for your app code without cluttering up your output with messages from the DataNucleus component.
NOTE
Be sure to use the logging level name (such as FINEST) and not the App Engine level name for values in logging.properties. App Engine log levels only affect how messages are represented in the Admin Console.
Next, set a system property telling the logging library where to find its configuration file. You do this by including a <system-properties> element in your appengine-web.xml file, like so:
<system-properties>
<property name="java.util.logging.config.file"
value="WEB-INF/logging.properties" />
</system-properties>
If you created your Java project using the Google Plugin for Eclipse, your app already has this configuration file and this system property. This configuration comes preloaded with log levels for the DataNucleus interface (an interface for the datastore), so you can leave those set to the “warning” level while the rest of your app uses another level.
The java.util.logging and the standard output and error streams are the only ways to log messages at specific log levels. If you or a component of your app prefers a different logging library, such as log4j, messages emitted by that library will work as long as the library can write to the standard streams. If you want to be able to use the Administration Console to filter logs by levels other than “info” and “warning,” you will need an adapter of some kind that calls java.util.logging behind the scenes. You get complete log messages when you download log data, so you can always analyze alternate log formats in downloaded data.
When running in the development web server, log messages are written to the console, and text written to the standard streams is written to the corresponding streams for the server. In Eclipse, these messages appear in the Console pane.
Viewing Recent Logs
You can browse and search your application’s request logs and messages from the Logs panel of the Administration Console. Figure 19-1 shows the Logs panel with a request opened to reveal the detailed request data.
Figure 19-1. The Logs viewer in the Administration Console
The panel opens showing the 20 most recent requests. To filter this display to show only requests with messages above a particular log level, select the “Logs with minimum severity” radio button, then select the severity from the adjacent drop-down menu. The display updates automatically as soon as you change these settings.
To perform a more specific search, expand the Options button. This panel looks like Figure 19-2.
Figure 19-2. The search panel of the Logs viewer with expanded options
You can specify a filter in one of two ways. The simplest is the Regex filter, where the filter value is a regular expression that matches against all request log fields and application log messages.
The other kind of filter is the Labels filter. This filter consists of one or more request log fields and patterns to match against the field values. Each field filter is the field name followed by a colon, then the regular expression for the pattern. Field filters are delimited by spaces. The valid field names are listed on the screen; some useful examples are path (the URL path, starting with a slash) and user (a user signed in with a Google Account; the pattern matches the Google username). For example, with Labels selected, this query shows requests by the user dan.sanderson for paths beginning with /admin/:
path:/admin/.* user:dan\.sanderson
The Logs panel shows log data for the application version currently selected in the Administration Console version drop-down menu. If you’re having a problem with a live app, a useful technique is to deploy a separate version of the app with additional logging statements added to the code near the problem area, and then reproduce the issue using the version-specific URL (or temporarily make the new version the default, then switch it back). Then you can view and search the logs specific to the version with the added logging messages.
If a specific long-running instance appears to be having trouble, you can view logs just for that instance. Select the Instances panel, then click View Logs for the instance you wish to inspect.
The Logs panel is useful for digging up more information for problems with the application code. For broader analysis of traffic and user trends, you’ll want to download the log data for offline processing, or use a web traffic analytics product like Google Analytics.
TIP
For apps with a large amount of traffic, the Logs panel may return empty or incomplete results for some queries that otherwise have results. This is because the log retrieval service stops returning results after a period of time. The date and time of the oldest log record searched is displayed at the top of the results list. You can click Next Page to continue searching older log messages, even if the results page appears empty.
Downloading Logs
You can download log data for offline analysis and archiving by using the AppCfg command-line tool. (The Python Launcher and Google Plugin for Eclipse do not offer this feature.) To use it, run appcfg.py (Python) or appcfg.sh (Java) with the request_logs command, with the application directory and log output filename as arguments.
The following command downloads request logs for the app in the development directory clock, and saves them to a file named logs.txt:
appcfg.py request_logs clock logs.txt
This command takes many of the same arguments as appcfg.py update, such as those used for authentication.
The command fetches log data for the application ID and version described in the application config file. As with appcfg.py update, you can override these with the --application=... and --version=... arguments, respectively.
The command above downloads request data only. To download log messages emitted by the application, include a minimum severity level specified as a number, where 0 is all log messages (“debug” level and up) and 5 is only “critical” messages, using the --severity argument:
appcfg.py request_logs clock logs.txt --severity=1
Application messages appear in the file on separate lines immediately following the corresponding request. The format for this line is a tab, the severity of the message as a number, a colon, a numeric timestamp for the message, then the message:
1:1246801590.938119 get_published_entries cache HIT
Log data is ordered chronologically by request, from earliest to latest. Application messages are ordered within each request by their timestamps.
Request data appears in the file in a common format known as the Apache Combined (or “NCSA Combined”) logfile format, one request per line (shown here as two lines to fit on the page):
127.0.0.1 - - [05/Jul/2009:06:46:30 -0700] "GET /blog/ HTTP/1.1" 200 14598 -
"Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_5_8; en-us)...,gzip(gfe)"
From left to right, the fields are:
§ The IP address of the client
§ A - (an unused field retained for backward compatibility)
§ The email address of the user who made the request, if the user is signed in using Google Accounts; - otherwise
§ The date and time of the request
§ The HTTP command string in double quotes, including the method and URL path
§ The HTTP response code returned by the server
§ The size of the response, as a number of bytes
§ The “Referrer” header provided by the client, usually the URL of the page that linked to this URL
§ The “User-Agent” header provided by the client, usually identifying the browser and its capabilities
By default, the command fetches the last calendar day’s worth of logs, back to midnight, Pacific Time. You can change this with the --num_days=... argument. Set this to 0 to get all available log data. You can also specify an alternate end date with the --end_date=... option, whose value is of the form YYYY-MM-DD (such as 2009-11-04).
You can specify the --append argument to extend the log data file with new data, if the logfile exists. By default, the command overwrites the file with the complete result of the query. The append feature is smart: it checks the data file for the date and time of the most recent log message, then only appends messages from after that time.
Logs Retention
By default, App Engine stores up to 1 gigabyte of log data, or up to 90 days worth of messages, whichever is less. Once the retention limit is reached, the oldest messages are dropped in favor of new ones.
You can increase the maximum amount and maximum age in the Application Settings panel of the Administration Console. Scroll down to Logs Retention, enter new values, and then click Save Settings.
The first gigabyte and 90 days of retention are included with the cost of your application. Additional storage and retention time is billed at a storage rate specific to logs. See the official website for the latest rates. If you’re paying for log storage, you can retain logs for up to 365 days (one year).
Querying Logs from the App
App Engine provides a simple API for querying log data directly from the application. With this API, you can retrieve log data by date-time ranges, filter by log level and version ID, and page through results. You can use this API to build custom interactive log data inspectors for your app, or implement log-based alerts.
This is the same API that the Administration Console uses to power the Logs panel. You’ll notice that the API does not include filters based on regular expressions. Instead, the Logs panel simply pages through unfiltered results, and only displays those that match a given pattern. Your app can use a similar technique.
The development server can retain log data in memory to help you test the use of this API. The Java server does this automatically. If you’re using the Python development server, you must enable this feature with the --persist_logs flag:
dev_appserver.py --persist_logs
Querying Logs in Python
In Python, you fetch log data by calling the fetch() function in the google.appengine.api.logservice module. The function takes query parameters as arguments:
include_app_logs
True if the log records returned should include application messages.
minimum_log_level
The minimum severity a request’s application log messages should have to be a result. The value is an integer from 0 (debug) to 4 (critical), represented by constants named like logservice.LOG_LEVEL_INFO. The default is to return all requests; specifying a log level limits the results to just those requests with application log messages at or above the specified level.
start_time
The earliest timestamp to consider as a Unix epoch time. The default is None, no starting bound.
end_time
The latest timestamp to consider as a Unix epoch time. The default is None, no ending bound.
version_ids
A list of version IDs whose logs to fetch. The default is None, fetch the calling app’s version.
include_incomplete
If True, include incomplete requests in the results. (See Flushing the Log Buffer.)
batch_size
The number of results to fetch per service call when iterating over results.
offset
The offset of the last-seen result, for paging through results. The next result returned follows the last-seen result.
The function returns an iterable that acts as a stream of log results. Each result is an object with attributes for the fields of the request data, such as method, resource, and end_time. See the official documentation for the complete list of fields.
If application log messages are requested (include_app_logs=True), the app_logs attribute of a result is a list of zero or more objects, one for each log message. The attributes of this object are time (an epoch time), level (an integer), and message.
Here’s a simple example:
import time
from google.appengine.api import logservice
# ...
self.response.write('<pre>')
count = 0
for req_log in logservice.fetch(include_app_logs=True):
# Stop after 20 results.
count += 1
if count > 20:
break
self.response.write(
'%s %s %s\n' %
(time.ctime(req_log.end_time),
req_log.method,
req_log.resource)
for app_log in req_log.app_logs:
self.response.write(
' %s %s %s\n' %
(time.ctime(app_log.time),
['DEBUG', 'INFO', 'WARNING',
'ERROR', 'CRITICAL'][app_log.level],
app_log.message)
self.response.write('</pre>')
Each result includes an offset attribute, a web-safe string you can use to make a “next page” button in a paginated display. Simply pass the offset of the last result on a page to the fetch() function, and the first result returned will be the next result in the sequence.
Querying Logs in Java
In Java, you fetch log data by building a LogQuery object, then passing it to the fetch() method of a LogService instance. These classes are provided by the package com.google.appengine.api.log.
You get a LogQuery by calling LogQuery.Builder.withDefaults(). You can then call builder methods to specify the query parameters. Each method returns the LogQuery instance, so you can stack calls:
includeAppLogs(boolean)
true if the log records returned should include application messages.
minLogLevel(LogService.LogLevel)
The minimum severity a request’s application log messages should have to be a result. The value is an enum constant from LogService.LogLevel, one of DEBUG, INFO, WARN, ERROR, or FATAL (critical). The default is to return all requests; specifying a log level limits the results to just those requests with application log messages at or above the specified level.
startTimeUsec(long)
The earliest timestamp to consider as a Unix epoch time. If not specified, there is no starting bound.
endTimeUsec(long)
The latest timestamp to consider as a Unix epoch time. If not specified, there is no ending bound.
majorVersionIds: a java.util.List<String>
A list of version IDs whose logs to fetch. If not specified, this fetches the calling app’s version.
includeIncomplete(boolean)
If true, include incomplete requests in the results. (See Flushing the Log Buffer.)
batchSize(int)
The number of results to fetch per service call when iterating over results.
offset(String)
The offset of the last-seen result, for paging through results. The next result returned follows the last-seen result.
You get a LogService instance by calling LogServiceFactory.getLogService(). The instance’s fetch() method takes the LogQuery as an argument, and returns an iterable of RequestLogs objects, one for each request that matches the query. RequestLogs has getters for each request data field, such as getMethod(), getResource(), and getEndTimeUsec(). See the official documentation for the complete list of fields.
If application log messages are requested (includeAppLogs(true), the getAppLogLines() method of a RequestLogs returns a List of zero or more AppLogLine objects, one for each log message. AppLogLine has the getter methods getLogLevel() (returns aLogService.LogLevel), getLogMessage(), and getTimeUsec().
Here’s a simple example:
import com.google.appengine.api.log.AppLogLine;
import com.google.appengine.api.log.LogQuery;
import com.google.appengine.api.log.LogService;
import com.google.appengine.api.log.LogServiceFactory;
import com.google.appengine.api.log.RequestLogs;
import java.util.Calendar;
// ...
LogQuery query = LogQuery.Builder.withDefaults();
query.includeAppLogs(true);
Calendar cal = Calendar.getInstance();
int count = 0;
LogService logSvc = LogServiceFactory.getLogService();
for (RequestLogs reqLog : logSvc.fetch(query) {
count++;
if (count > 20) {
break;
}
cal.setTimeInMillis(reqLog.getEndTimeUsec() / 1000);
resp.getOutputStream().println(
cal.getTime().toString() + " " +
reqLog.getMethod() + " " +
reqLog.getResource();
for (AppLogLine appLog : reqLog.getAppLogLines() {
cal.setTimeInMillis(appLog.getTimeUsec() / 1000);
resp.getOutputStream().println(
" " +
cal.getTime().toString() + " " +
appLog.getLogLevel() + " " +
appLog.getLogMessage();
}
}
Each result includes a getOffset() method, which returns a web-safe string you can use to make a “next page” button in a paginated display. Simply add the offset string of the last result on a page as the offset() parameter of the query, and the first result returned will be the next result in the sequence.
Flushing the Log Buffer
In the log fetch API, an “incomplete request” is a request that has not yet finished, but may have written some messages to the log. The API lets you optionally fetch log data for incomplete requests, such as to include the logged activity of a long-running task in the log data.
In Python, application log messages accrue in a log buffer. Typically, the contents of the buffer are written to permanent log storage when the request handler exits. Since most request handlers are short-lived, this is sufficient for capturing log data in real time. For long-running request handlers (such as task handlers), you may wish to flush the log buffer periodically to make log messages available to the fetch API.
To flush the log buffer manually, call the flush() function in the google.appengine.api.logservice module:
from google.appengine.api import logservice
# ...
logservice.flush()
You can also enable automatic log flushing for the duration of the request. To do this, you modify global variables in the logservice module. To flush the logs after a certain number of seconds:
logservice.AUTOFLUSH_ENABLED = True
logservice.AUTOFLUSH_EVERY_SECONDS = 10
To flush the logs after a certain number of bytes have been accrued in the buffer:
logservice.AUTOFLUSH_ENABLED = True
logservice.AUTOFLUSH_EVERY_BYTES = 4096
To flush the logs after a certain number of lines have been accrued in the buffer:
logservice.AUTOFLUSH_ENABLED = True
logservice.AUTOFLUSH_EVERY_LINES = 50
You can combine these settings. The flush occurs after any limit is reached. To disable a limit, set it to None.
Java apps write log data immediately, and do not use a log buffer.