A Functional Approach to Web Services - Functional Python Programming (2015)

Functional Python Programming (2015)

Chapter 15. A Functional Approach to Web Services

We'll step away from Exploratory Data Analysis and look closely at web servers and web services. These are, to an extent, a cascade of functions. We can apply a number of functional design patterns to the problem of presenting web content. Our goal is to look at ways in which we can approach Representational State Transfer (REST). We want to build RESTful web services using functional design patterns.

We don't need to invent yet another Python web framework; there are plenty of frameworks to choose from. We'll avoid creating a large, general-purpose solution.

We don't want to select among the available frameworks, either. There are many, each with a distinct set of features and advantages.

We'll present some principles that can be applied to most of the available frameworks. We should be able to leverage functional design patterns for presenting web content. This will allow us to build web-based applications that have the advantages of a functional design.

For example, when we look at extremely large datasets, or extremely complex datasets, we might want a web service which supports subsetting or searching. We might want a web site which can download subsets in a variety of formats. In this case, we might need to use functional designs to create RESTful web services to support these more sophisticated requirements.

The most complex web applications often have stateful sessions that make the site easier to use. The session information is updated with data provided via HTML forms or fetched from databases, or recalled from caches of previous interactions. While the overall interaction involves state changes, the application programming can be largely functional. Some of the application functions can be non-strict in their use of request data, cache data, and database objects.

In order to avoid details of a specific web framework, we'll focus on the Web Server Gateway Interface (WSGI) design pattern. This will allow us to implement a simple web server. A great deal of information is present at the following link:

http://wsgi.readthedocs.org/en/latest/

Some important background of WSGI can be found at

https://www.python.org/dev/peps/pep-0333/

We'll start by looking at the HTTP protocol. From there, we can consider servers such as Apache httpd to implement this protocol and see how mod_wsgi becomes a sensible extension to a base server. With this background, we can look at the functional nature of WSGI and how we can leverage functional design to implement sophisticated web search and retrieval tools.

The HTTP request-response model

The essential HTTP protocol is, ideally, stateless. A user agent or client can take a functional view of the protocol. We can build a client using the http.client or urllib library. An HTTP user agent essentially executes something similar to the following:

import urllib.request

with urllib.request.urlopen(""http://slott-softwarearchitect.blogspot.com"") as response:

print(response.read())

A program like wget or curl does this at the command line; the URL is taken from the arguments. A browser does this in response to the user pointing and clicking; the URL is taken from the user's actions, in particular, the action of clicking on linked text or images.

The practical considerations of the internetworking protocols, however, lead to some implementation details which are stateful. Some of the HTTP status codes indicate that an additional action on the part of the user agent is required.

Many status codes in the 3xx range indicate that the requested resource has been moved. The user agent is then required to request a new location based on information sent in the Location header. The 401 status code indicates that authentication is required; the user agent can respond with an authorization header that contains credentials for access to the server. The urllib library implementation handles this stateful overhead. The http.client library doesn't automatically follow 3xx redirect status codes.

The techniques for a user agent to handle 3xx and 401 codes aren't deeply stateful. A simple recursion can be used. If the status doesn't indicate a redirection, it is the base case, and the function has a result. If redirection is required, the function can be called recursively with the redirected address.

Looking at the other end of the protocol, a static content server should also be stateless. There are two layers to the HTTP protocol: the TCP/IP socket machinery and a higher layer HTTP structure that depends on the lower level sockets. The lower level details are handled by the scoketserver library. The Python http.server library is one of the libraries that provide a higher level implementation.

We can use the http.server library as follows:

from http.server import HTTPServer, SimpleHTTPRequestHandler

running = True

httpd = HTTPServer(('localhost',8080), SimpleHTTPRequestHandler)

while running:

httpd.handle_request()

httpd.shutdown()

We created a server object, and assigned it to the httpd variable. We provided the address and port number to which we'll listen for connection requests. The TCP/IP protocol will spawn a connection on a separate port. The HTTP protocol will read the request from this other port and create an instance of the handler.

In this example, we provided SimpleHTTPRequestHandler as the class to instantiate with each request. This class must implement a minimal interface, which will send headers and then send the body of the response to the client. This particular class will serve files from the local directory. If we wish to customize this, we can create a subclass, which implements methods such as do_GET() and do_POST() to alter the behavior.

Often, we use the serve_forever() method instead of writing our own loop. We've shown the loop here to clarify that the server must, generally, be crashed. If we want to close the server down politely, we'll require some way in which we can change the value of theshutdown variable. The Ctrl + C signal, for example, is commonly used for this.

Injecting a state via cookies

The addition of cookies changes the overall relationship between a client and server to become stateful. Interestingly, it involves no change to the HTTP protocol itself. The state information is communicated via headers on the request and the reply. The user agent will send cookies in request headers that match the host and path. The server will send cookies to the user agent in response headers.

The user agent or browser must, therefore, retain a cache of cookie values and include appropriate cookies in each request. The web server must accept cookies in the request header and send cookies in the response header. The web server doesn't need to cache cookies. A server merely uses cookies as additional arguments in a request and additional details in a response.

While a cookie can, in principle, contain almost anything, the use of cookies has rapidly evolved to contain just an identifier for a session state object. The server can then use the cookie information to locate session state in some kind of persistent storage. This means the server can also update the session state based on user agent requests. It also means the server can discard sessions which are old.

The concept of a "session" exists outside the HTTP protocol. It is commonly defined as a series of requests with the same session cookie. When an initial request is made, no cookie is available, and a new session is created. Every subsequent request would include the cookie. The cookie would identify the session state object on the server; this object would have the information required by the server to provide consistent web content gracefully.

The REST approach to web services, however, does not rely on cookies. Each REST request is distinct and does not fit into an overall session framework. This makes it less "user-friendly" than an interactive site that uses cookies to simplify a user's interactions.

This also means that each individual REST request is, in principle, separately authenticated. In many cases, a simple token is generated by the server to avoid the client sending more complex credentials with every request. This leads to having the REST traffic secured using Secured Socket Layer (SSL) protocols; the https scheme is then used instead of http. We'll call both schemes HTTP throughout this chapter.

Considering a server with a functional design

One core idea behind HTTP is that the daemon's response is a function of the request. Conceptually, a web service should have a top-level implementation that can be summarized as follows:

response = httpd(request)

However, this is impractical. It turns out that an HTTP request isn't a simple, monolithic data structure. It actually has some required parts and some optional parts. A request may have headers, there's a method and a path, and there may be attachments. The attachments may include forms or uploaded files or both.

To make things more complex, a browser's form data can be sent as a query string in the path of a GET request. Alternatively, it can be sent as an attachment to a POST request. While there's a possibility for confusion, most web application frameworks will create HTML form tags that provide their data via a "method=POST" statement in the <form> tag; the form data will then be an attachment.

Looking more deeply into the functional view

Both HTTP response and request have headers and a body. The request can have some attached form data. Therefore, we can think of a web server like this:

headers, content = httpd(headers, request, [uploads])

The request headers may include cookie values, which can be seen as adding yet more arguments. Additionally, a web server is often dependent on the OS environment in which it's running. This OS environment data can be considered as yet more arguments being provided as part of the request.

There's a large but reasonably well defined spectrum of content. The Multipurpose Internet Mail Extension (MIME) types define the kinds of content that a web service might return. This can include plain text, HTML, JSON, XML, or any of the wide variety of non-text media that a website might serve.

As we look more closely at the processing required to build a response to an HTTP request, we'll see some common features that we'd like to reuse. This idea of reusable elements is what leads to the creation of web service frameworks that fill a spectrum from simple to sophisticated. The ways that functional designs allow us to reuse functions indicate that the functional approach seems very appropriate to build web services.

We'll look at functional design of web services by examining how we can create a pipeline of the various elements of a service response. We'll do this by nesting the functions for request processing so that inner elements are free from the generic overheads, which are provided by outer elements. This also allows the outer elements to act as filters: invalid requests can yield error responses, allowing the inner function to focus narrowly on the application processing.

Nesting the services

We can look at web request handling as a number of nested contexts. An outer context, for example, might cover session management: examining the request to determine if this is another request in an existing session or a new session. An inner context might provide tokens used for form processing that can detect Cross-Site Request Forgeries (CSRF). Another context might handle user authentication within a session.

A conceptual view of the functions explained previously is something like this:

response= content(authentication(csrf(session(headers, request, [forms]))))

The idea here is that each function can build on the results of the previous function. Each function either enriches the request or rejects it because it's invalid. The session function, for example, can use headers to determine if this is an existing session or a new session. The csrf function will examine form input to ensure that proper tokens were used. The CSRF handling requires a valid session. The authentication function can return an error response for a session that lacks valid credentials; it can enrich the request with user information when valid credentials are present.

The content function is free from worrying about sessions, forgeries, and non-authenticated users. It can focus on parsing the path to determine what kind of content should be provided. In a more complex application, the content function may include a rather complex mapping from path elements to functions that determine the appropriate content.

The nested function view, however, still isn't quite right. The problem is that each nested context may also need to tweak the response instead of or in addition to tweaking the request.

We really want something more like this:

def session(headers, request, forms):

pre-process: determine session

content= csrf(headers, request, forms)

post-processes the content

return the content

def csrf(headers, request, forms):

pre-process: validate csrf tokens

content= authenticate(headers, request, forms)

post-processes the content

return the content

This concept points toward a functional design for creating web content via a nested collection of functions that provide enriched input or enriched output or both. With a little bit of cleverness, we should be able to define a simple, standard interface that various functions can use. Once we've standardized an interface, we can combine functions in different ways and add features. We should be able to meet our functional programming objectives of having succinct and expressive programs that provide web content.

The WSGI standard

The Web Server Gateway Interface (WSGI) defines a relatively simple, standardized design pattern for creating a response to a web request. The Python library's wsgiref package includes a reference implementation of WSGI.

Each WSGI "application" has the same interface:

def some_app(environ, start_response):

return content

The environ is a dictionary that contains all of the arguments of the request in a single, uniform structure. The headers, the request method, the path, any attachments for forms or file uploads will all be in the environment. In addition to this, the OS-level context is also provided along with a few items that are part of WSGI request handling.

The start_response is a function that must be used to send the status and headers of a response. The portion of a WSGI server that has final responsibility for building the response will use a start_response function to send the headers and the status as well as to build the response text. For some applications, this function might need to be wrapped with a higher-order function so that additional headers can be added to the response.

The return value is a sequence of strings or string-like file wrappers that will be returned to the user agent. If an HTML template tool is used, then the sequence may have a single item. In some cases, like the Jinja2 templates, the template can be rendered lazily as a sequence of text chunks, interleaving template filling with downloading to the user agent.

Due to the way they nest, WSGI applications can also be viewed as a chain. Each application will either return an error or will hand the request to another application that will determine the result.

Here's a very simple routing application:

SCRIPT_MAP = {

""demo"": demo_app,

""static"": static_app,

"""": welcome_app,

}

def routing(environ, start_response):

top_level= wsgiref.util.shift_path_info(environ)

app= SCRIPT_MAP.get(top_level, SCRIPT_MAP[''])

content= app(environ, start_response)

return content

This app will use the wsgiref.util.shift_path_info() function to tweak the environment. This does a "head/tail split" on the items in the request path, available in the environ['PATH_INFO'] dictionary. The head of the path—up to the first "split"—will be moved into theSCRIPT_NAME item in the environment; the PATH_INFO item will be updated to have the tail of the path. The returned value will also be the head of the path. In the case where there's no path to parse, the return value is None and no environment updates are made.

The routing() function uses the first item on the path to locate an application in the SCRIPT_MAP dictionary. We use the SCRIPT_MAP[''] dictionary as a default in case the requested path doesn't fit the mapping. This seems a little better than an HTTP 404 NOT FOUNDerror.

This WSGI application is a function that chooses between a number of other functions. It's a higher-order function, since it evaluates functions defined in a data structure.

It's easy to see how a framework could generalize the path-matching process using regular expressions. We can imagine configuring the routing() function with a sequence of Regular Expression's (REs) and WSGI applications instead of a mapping from a string to the WSGI application. The enhanced routing() function application would evaluate each RE looking for a match. In the case of a match, any match.groups() function could be used to update the environment before calling the requested application.

Throwing exceptions during WSGI processing

One central feature of WSGI applications is that each stage along the chain is responsible for filtering the requests. The idea is to reject faulty requests as early in the processing as possible. Python's exception handling makes this particularly simple.

We can define a WSGI application that provides static content as follows:

def static_app(environ, start_response):

try:

with open(CONTENT_HOME+environ['PATH_INFO']) as static:

content= static.read().encode(""utf-8"")

headers= [

(""Content-Type"",'text/plain; charset=""utf-8""'),(""Content-Length"",str(len(content))),]

start_response('200 OK', headers)

return [content]

except IsADirectoryError as e:

return index_app(environ, start_response)

except FileNotFoundError as e:

start_response('404 NOT FOUND', [])

return([repr(e).encode(""utf-8"")])

In this case, we simply tried to open the requested path as a text file. There are two common reasons why we can't open a given file, both of which are handled as exceptions:

· If the file is a directory, we'll use a different application to present directory contents

· If the file is simply not found, we'll return an HTTP 404 NOT FOUND response

Any other exceptions raised by this WSGI application will not be caught. The application that invoked this should be designed with some generic error response capability. If it doesn't handle the exceptions, a generic WSGI failure response will be used.

Note

Our processing involves a strict ordering of operations. We must read the entire file so that we can create a proper HTTP Content-Length header.

Further, we must provide the content as bytes. This means that the Python strings must be properly encoded and we must provide the encoding information to the user agent. Even the error message, repr(e), is properly encoded before being downloaded.

Pragmatic WSGI applications

The intent of the WSGI standard is not to define a complete web framework; the intent is to define a minimum set of standards that allow flexible interoperability of web-related processing. A framework can take a wildly different approach than an internal architecture to provide web services. However, its outermost interface should be compatible with WSGI so that it can be used in a variety of contexts.

Web servers such as Apache httpd and Nginx have adapters, which provide a WSGI-compatible interface from the web server to Python applications. For more information on WSGI implementations, visit

https://wiki.python.org/moin/WSGIImplementations.

Embedding our applications in a larger server allows us to have a tidy separation of concerns. We can use Apache httpd to serve completely static content, such as .css, .js, and image files. For HTML pages, though, we can use Apache's mod_wsgi interface to hand off requests to a separate Python process, which handles only the interesting HTML portions of the web content.

This means that we must either create a separate media server, or define our website to have two sets of paths. If we take the second approach, some paths will have the completely static content and can be handled by Apache httpd. Other paths will have dynamic content, which will be handled by Python.

When working with WSGI functions, it's important to note that we can't modify or extend the WSGI interface in any way. For example, it seems like a good idea to provide an additional parameter with a sequence of functions that define the chain of processing. Each stage would pop the first item from the list as the next step in the processing. An additional parameter like this might be typical for functional design, but the change in the interface defeats the purpose of WSGI.

A consequence of the WSGI definition is that configuration is either done with global variables, the request environment, or with a function, which fetches some global configuration objects from a cache. Using module-level globals works for small examples. For more complex applications, a configuration cache might be required. It might also be sensible to have a WSGI app, which merely updates the environ dictionary with configuration parameters and passes control to another WSGI application.

Defining web services as functions

We'll look at a RESTful web service, which can "slice and dice" a source of data and provide downloads as JSON, XML, or CSV files. We'll provide an overall WSGI-compatible wrapper but the functions which do the "real work" of the application won't be narrowly constrained to fit the WSGI.

We'll use a simple dataset with four subcollections: the Anscombe Quartet. We looked at ways to read and parse this data in Chapter 3, Functions, Iterators, and Generators". It's a small set of data but it can be used to show the principles of a RESTful web service.

We'll split our application into two tiers: a web tier, which will be a simple WSGI application, and the rest of the processing, which will be more typical functional programming. We'll look at the web tier first so that we can focus on a functional approach to provide meaningful results.

We need to provide two pieces of information to the web service:

· The quartet that we want—this is a "slice and dice" operation. For this example, it's mostly just a "slice".

· The output format we want.

The data selection is commonly done via the request path. We can request "/anscombe/I/" or "/anscombe/II/" to pick specific datasets from the quartet. The idea is that a URL defines a resource, and there's no good reason for the URL to ever change. In this case, the dataset selectors aren't dependent on dates, or some organizational approval status or other external factors. The URL is timeless and absolute.

The output format is not a first class part of the URL. It's just a serialization format—not the data itself. In some cases, the format is requested via the HTTP Accept header. This is hard to use from a browser but easy to use from an application using a RESTful API. When extracting data from the browser, a query string is commonly used to specify the output format. We'll use the "?form=json" method at the end of the path to specify the JSON output format.

A URL we can use will look like this:

http://localhost:8080/anscombe/III/?form=csv

This would request a CSV download of the third dataset.

Creating the WSGI application

First, we'll use a simple URL pattern-matching expression to define the one and only routing in our application. In a larger or more complex application, we might have more than one such patterns:

import re

path_pat= re.compile(r""^/anscombe/(?P<dataset>.*?)/?$"")

This pattern allows us to define an overall "script" in the WSGI sense at the top level of the path. In this case, the script is "anscombe". We'll take the next level of the path as a dataset to select from the Anscombe Quartet. The dataset value should be one of I, II,III, or IV.

We used a named parameter for the selection criteria. In many cases, RESTful APIs are described using a syntax, as follows:

/anscombe/{dataset}/

We translated this idealized pattern into a proper, regular expression, and preserved the name of the dataset selector in the path.

Here's the kind of unit test that demonstrates how this pattern works:

test_pattern= """"""

>>> m1= path_pat.match(""/anscombe/I"")

>>> m1.groupdict()

{'dataset': 'I'}

>>> m2= path_pat.match(""/anscombe/II/"")

>>> m2.groupdict()

{'dataset': 'II'}

>>> m3= path_pat.match(""/anscombe/"")

>>> m3.groupdict()

{'dataset': ''}

""""""

We can include the three previously mentioned examples as part of the overall doctest using the following command:

__test__ = {

""test_pattern"": test_pattern,

}

This will ensure that our routing works as expected. It's important to be able to test this separately from the rest of the WSGI application. Testing a complete web server means starting the server process and then trying to connect with a browser or a test tool, such as Postman or Selenium. Visit http://www.getpostman.com or http://www.seleniumhq.org to get more information on the usage of Postman and Selenium. We prefer to test each feature in isolation.

Here's the overall WSGI application, with two lines of command highlighted:

import traceback

import urllib

def anscombe_app(environ, start_response):

log= environ['wsgi.errors']

try:

match= path_pat.match(environ['PATH_INFO'])

set_id= match.group('dataset').upper()

query= urllib.parse.parse_qs(environ['QUERY_STRING'])

print(environ['PATH_INFO'], environ['QUERY_STRING'],match.groupdict(), file=log)

log.flush()

dataset= anscombe_filter(set_id, raw_data())

content, mime= serialize(query['form'][0], set_id, dataset)

headers= [

('Content-Type', mime),('Content-Length', str(len(content))), ]

start_response(""200 OK"", headers)

return [content]

except Exception as e:

traceback.print_exc(file=log)

tb= traceback.format_exc()

page= error_page.substitute(title=""Error"", message=repr(e), traceback=tb)

content= page.encode(""utf-8"")

headers = [

('Content-Type', ""text/html""),('Content-Length', str(len(content))),]

start_response(""404 NOT FOUND"", headers)

return [content]

This application will extract two pieces of information from the request: the PATH_INFO and the QUERY_STRING methods. The PATH_INFO request will define which set to extract. The QUERY_STRING request will specify an output format.

The application processing is broken into three functions. A raw_data() function reads the raw data from a file. The result is a dictionary with lists of Pair objects. The anscombe_filter() function accepts a selection string and the dictionary of raw data and returns a single list of Pair objects. The list of pairs is then serialized into bytes by the serialize() function. The serializer is expected to produce bytes, which can then be packaged with an appropriate header and returned.

We elected to produce an HTTP Content-Length header. This isn't required, but it's polite for large downloads Because we decided to emit this header, we are forced to materialize the results of the serialization so that we can count the bytes.

If we elected to omit the Content-Length header, we could change the structure of this application dramatically. Each serializer could be changed to a generator function, which would yield bytes as they are produced. For large datasets, this can be a helpful optimization. For the user watching a download, however, it might not be so pleasant because the browser can't display how much of the download is complete.

All errors are treated as a 404 NOT FOUND error. This could be misleading, since a number of individual things might go wrong. A more sophisticated error handling would provide more try:/except: blocks to provide more informative feedback.

For debugging purposes, we've provided a Python stack trace in the resulting web page. Outside the context of debugging, this is a very bad idea. Feedback from an API should be just enough to fix the request and nothing more. A stack trace provides too much information to potentially malicious users.

Getting raw data

The raw_data() function is largely copied from Chapter 3, Functions, Iterators, and Generators. We included some important changes. Here's what we're using for this application:

from Chapter_3.ch03_ex5 import series, head_map_filter, row_iter, Pair

def raw_data():

""""""

>>> raw_data()['I'] #doctest: +ELLIPSIS

(Pair(x=10.0, y=8.04), Pair(x=8.0, y=6.95), ...

""""""

with open(""Anscombe.txt"") as source:

data = tuple(head_map_filter(row_iter(source)))

mapping = dict((id_str, tuple(series(id_num,data)))

for id_num, id_str in enumerate(['I', 'II', 'III', 'IV'])

)

return mapping

We opened the local data file, and applied a simple row_iter() function to return each line of the file parsed into a row of separate files. We applied the head_map_filter() function to remove the heading from the file. The result created a tuple-of-tuple structure with all of the data.

We transformed the tuple-of-tuple into a more useful dict() function by selecting particular series from the source data. Each series will be a pair of columns. For series "I," it's columns 0 and 1. For series "II," it's columns 2 and 3.

We used the dict() function with a generator expression for consistency with the list() and tuple() functions. While it's not essential, it's sometimes helpful to see the similarities with these three data structures and their use of generator expressions.

The series() function creates the individual Pair objects for each x,y pair in the dataset. In retrospect, we can see the the output value after modifying this function so that the resulting namedtuple class is an argument to this function, not an implicit feature of the function. We'd prefer to see the series(id_num,Pair,data) method to see where the Pair objects are created. This extension requires rewriting some of the examples in Chapter 3, Functions, Iterators, and Generators. We'll leave that as an exercise for the reader.

The important change here is that we're showing the formal doctest test case. As we noted earlier, web applications—as a whole—are difficult to test. The web server must be started and then a web client must be used to run the test cases. Problems then have to be resolved by reading the web log, which can be difficult unless complete tracebacks are displayed. It's much better to debug as much of the web application as possible using ordinary doctest and unittest testing techniques.

Applying a filter

In this application, we're using a very simple filter. The entire filter process is embodied in the following function:

def anscombe_filter(set_id, raw_data):

""""""

>>> anscombe_filter(""II"", raw_data()) #doctest: +ELLIPSIS

(Pair(x=10.0, y=9.14), Pair(x=8.0, y=8.14), Pair(x=13.0, y=8.74), ...

""""""

return raw_data[set_id]

We made this trivial expression into a function for three reasons:

· The functional notation is slightly more consistent and a bit more flexible than the subscript expression

· We can easily expand the filtering to do more

· We can include separate unit tests in the docstring for this function

While a simple lambda would work, it wouldn't be quite as convenient to test.

For error handling, we've done exactly nothing. We've focused on what's sometimes called the "happy path:" an ideal sequence of events. Any problems that arise in this function will raise an exception. The WSGI wrapper function should catch all exceptions and return an appropriate status message and error response content.

For example, it's possible that the set_id method will be wrong in some way. Rather than obsess over all the ways it could be wrong, we'll simply allow Python to throw an exception. Indeed, this function follows the Python I advice that, "it's better to seek forgiveness than to ask permission." This advice is materialized in code by avoiding "permission-seeking": there are no preparatory if statements that seek to qualify the arguments as valid. There is only "forgiveness" handling: an exception will be raised and handled in the WSGI wrapper. This essential advice applies to the preceding raw data and the serialization that we will see now.

Serializing the results

Serialization is the conversion of Python data into a stream of bytes, suitable for transmission. Each format is best described by a simple function that serializes just that one format. A top-level generic serializer can then pick from a list of specific serializers. The picking of serializers leads to the following collection of functions:

serializers = {

'xml': ('application/xml', serialize_xml),

'html': ('text/html', serialize_html),

'json': ('application/json', serialize_json),

'csv': ('text/csv', serialize_csv),

}

def serialize(format, title, data):

""""""json/xml/csv/html serialization.

>>> data = [Pair(2,3), Pair(5,7)]

>>> serialize(""json"", ""test"", data)

(b'[{""x"": 2, ""y"": 3}, {""x"": 5, ""y"": 7}]', 'application/json')

""""""

mime, function = serializers.get(format.lower(), ('text/html', serialize_html))

return function(title, data), mime

The overall serialize() function locates a specific serializer and a specific MIME type that must be used in the response to characterize the results. It then calls one of the specific serializers. We've also shown a doctest test case here. We didn't patiently test each serializer, since showing that one works seems adequate.

We'll look at the serializers separately. What we'll see is that the serializers fall into two groups: those that produce strings and those that produce bytes. A serializer that produces a string will need to have the string encoded as bytes. A serializer that produces bytes doesn't need any further work.

For the serializers, which produce strings, we need to do some function composition with a standard convert-to-bytes. We can do functional composition using a decorator. Here's how we can standardize the conversion to bytes:

from functools import wraps

def to_bytes(function):

@wraps(function)

def decorated(*args, **kw):

text= function(*args, **kw)

return text.encode(""utf-8"")

return decorated

We've created a small decorator named @to_bytes. This will evaluate the given function and then encode the results using UTF-8 to get bytes. We'll show how this is used with JSON, CSV, and HTML serializers. The XML serializer produces bytes directly and doesn't need to be composed with this additional function.

We could also do the functional composition in the initialization of the serializers mapping. Instead of decorating the function definition, we could decorate the reference to the function object:

serializers = {

'xml': ('application/xml', serialize_xml),

'html': ('text/html', to_bytes(serialize_html)),

'json': ('application/json', to_bytes(serialize_json)),

'csv': ('text/csv', to_bytes(serialize_csv)),

}

Though this is possible, this doesn't seem to be helpful. The distinction between serializers that produce strings and those that produce bytes isn't an important part of the configuration.

Serializing data into the JSON or CSV format

The JSON and CSV serializers are similar functions because both rely on Python's libraries to serialize. The libraries are inherently imperative, so the function bodies are strict sequences of statements.

Here's the JSON serializer:

import json

@to_bytes

def serialize_json(series, data):

""""""

>>> data = [Pair(2,3), Pair(5,7)]

>>> serialize_json(""test"", data)

b'[{""x"": 2, ""y"": 3}, {""x"": 5, ""y"": 7}]'

""""""

obj= [dict(x=r.x, y=r.y) for r in data]

text= json.dumps(obj, sort_keys=True)

return text

We created a list of dictionaries structure and used the json.dumps() function to create a string representation. The JSON module requires a materialized list object; we can't provide a lazy generator function. The sort_keys=True argument value is essential for unit testing. However, it's not required for the application and represents a bit of overhead.

Here's the CSV serializer:

import csv, io

@to_bytes

def serialize_csv(series, data):

""""""

>>> data = [Pair(2,3), Pair(5,7)]

>>> serialize_csv(""test"", data)

b'x,y\\r\\n2,3\\r\\n5,7\\r\\n'

""""""

buffer= io.StringIO()

wtr= csv.DictWriter(buffer, Pair._fields)

wtr.writeheader()

wtr.writerows(r._asdict() for r in data)

return buffer.getvalue()

The CSV module's readers and writers are a mixture of imperative and functional elements. We must create the writer, and properly create headings in a strict sequence. We've used the _fields attribute of the Pair namedtuple to determine the column headings for the writer.

The writerows() method of the writer will accept a lazy generator function. In this case, we used the _asdict() method of each Pair object to return a dictionary suitable for use with the CSV writer.

Serializing data into XML

We'll look at one approach to XML serialization using the built-in libraries. This will build a document from individual tags. A common alternative approach is to use Python introspection to examine and map Python objects and class names to XML tags and attributes.

Here's our XML serialization:

import xml.etree.ElementTree as XML

def serialize_xml(series, data):

""""""

>>> data = [Pair(2,3), Pair(5,7)]

>>> serialize_xml(""test"", data)

b'<series name=""test""><row><x>2</x><y>3</y></row><row><x>5</x><y>7</y></row></series>'

""""""

doc= XML.Element(""series"", name=series)

for row in data:

row_xml= XML.SubElement(doc, ""row"")

x= XML.SubElement(row_xml, ""x"")

x.text= str(row.x)

y= XML.SubElement(row_xml, ""y"")

y.text= str(row.y)

return XML.tostring(doc, encoding='utf-8')

We created a top-level element, <series>, and placed <row> subelements underneath that top element. Within each <row> subelement, we've created <x> and <y> tags and assigned text content to each tag.

The interface for building an XML document using the ElementTree library tends to be heavily imperative. This makes it a poor fit for an otherwise functional design. In addition to the imperative style, note that we haven't created a DTD or XSD. We have not properly assigned a namespace to our tags. We also omitted the <?xml version=""1.0""?> processing instruction that is generally the first item in an XML document.

A more sophisticated serialization library would be helpful. There are many to choose from. Visit https://wiki.python.org/moin/PythonXml for a list of alternatives.

Serializing data into HTML

In our final example of serialization, we'll look at the complexity of creating an HTML document. The complexity arises because in HTML, we're expected to provide an entire web page with some context information. Here's one way to tackle this HTML problem:

import string

data_page = string.Template(""""""<html><head><title>Series ${title}</title></head><body><h1>Series ${title}</h1><table><thead><tr><td>x</td><td>y</td></tr></thead><tbody>${rows}</tbody></table></body></html>"""""")

@to_bytes

def serialize_html(series, data):

"""""">>> data = [Pair(2,3), Pair(5,7)]>>> serialize_html(""test"", data) #doctest: +ELLIPSISb'<html>...<tr><td>2</td><td>3</td></tr>\\n<tr><td>5</td><td>7</td></tr>...""""""

text= data_page.substitute(title=series,rows=""\n"".join(

""<tr><td>{0.x}</td><td>{0.y}</td></tr>"".format(row)

for row in data)

)

return text

Our serialization function has two parts. The first part is a string.Template() function that contains the essential HTML page. It has two placeholders where data can be inserted into the template. The ${title} method shows where title information can be inserted and the ${rows} method shows where the data rows can be inserted.

The function creates individual data rows using a simple format string. These are joined into a longer string, which is then substituted into the template.

While workable for simple cases like the preceding example, this isn't ideal for more complex result sets. There are a number of more sophisticated template tools to create HTML pages. A number of these include the ability to embed the looping in the template, separate from the function that initializes serialization. Visit https://wiki.python.org/moin/Templating for a list of alternatives.

Tracking usage

Many publicly available APIs require the use of an "API Key". The supplier of the API requests you to sign up and provide an email address or other contact information. In exchange for this, they provide an API Key which activates the API.

The API Key is used to authenticate access. It may also be used to authorize specific features. Finally, it's also used to track usage. This may include throttling requests if an API Key is used too often in a given time period.

The variations in the business models are numerous. For example, use of the API Key is a billable event and charges are incurred. For other businesses, traffic must reach some threshold before payments are required.

What's important is non-repudiation of the use of the API. This, in turn, means creating API Keys that can act as a user's authentication credentials. The key must be difficult to forge and relatively easy to verify.

One easy way to create API Keys is to use a cryptographic random number to generate a difficult-to-predict key string. A small function, like the following, should be good enough:

import random

rng= random.SystemRandom()

import base64

def make_key_1(rng=rng, size=1):

key_bytes= bytes(rng.randrange(0,256) for i in range(18*size))

key_string= base64.urlsafe_b64encode(key_bytes)

return key_string

We've used the random.SystemRandom class as the class for our secure random number generator. This will seed the generator with the os.urandom() bytes, which assures a reliably unpredictable seed value. We've created this object separately so that it can be reused each time a key is requested. Best practice is to get a number of keys from a generator using a single random seed.

Given some random bytes, we used a base 64 encoding to create a sequence of characters. Using a multiple of three in the initial sequence of random bytes, we'll avoid any trailing "=" signs in the base 64 encoding. We've used the URL safe base 64 encoding, which won't include the "/" or "+" characters in the resulting string, characters that might be confusing if used as part of a URL or query string.

Note

The more elaborate methods won't lead to more random data. The use of random.SystemRandom assures that no one can counterfeit a key assigned to another user. We're using 18×8 random bits, giving us a large number of random keys.

How many random keys? Take a look at the following command and its output:

>>> 2**(18*8)

22300745198530623141535718272648361505980416

The odds of someone successfully forging a duplicate of someone else's key are small.

Another choice is to use uuid.uuid4() to create a random Universally Unique Identifier (UUID). This will be a 36-character string that has 32 hex digits and four "-" punctuation marks. A random UUID is also difficult to forge. A UUID that includes data such as username or host IP address is a bad idea because this encodes information, which can be decoded and used to forge a key. The reason for using a cryptographic random number generator is to avoid encoding any information.

The RESTful web server will then need a small database with the valid keys and perhaps some client contact information. If an API request includes a key that's in the database, the associated user is responsible for the request. If the API request doesn't include a known key, the request can be rejected with a simple 401 UNAUTHORIZED response. Since the key itself is a 24-character string, the database will be rather small and can easily be cached in memory.

Ordinary log-scraping might be sufficient to show the usage for a given key. A more sophisticated application might record API requests in a separate logfile or database to simplify analysis.

Summary

In this chapter, we looked at ways in which we can apply functional design to the problem of serving content with REST-based web services. We looked at the ways that the WSGI standard leads to somewhat functional overall applications. We also looked at how we can embed a more functional design into a WSGI context by extracting elements from the request for use by our application functions.

For simple services, the problem often decomposes down into three distinct operations: getting the data, searching or filtering, and then serializing the results. We tackled this with three functions: raw_data(), anscombe_filter(), and serialize(). We wrapped these functions in a simple WSGI-compatible application to divorce the web services from the "real" processing around extracting and filtering the data.

We also looked at the way that web services functions can focus on the "happy path" and assume that all of the inputs are valid. If inputs are invalid, the ordinary Python exception handling will raise exceptions. The WSGI wrapper function will catch the errors and return appropriate status codes and error content.

We avoided looking at more complex problems associated with uploading data or accepting data from forms to update a persistent data store. These are not significantly more complex than getting data and serializing the results. They are already solved in a better manner.

For simple queries and data sharing, a small web service application can be helpful. We can apply functional design patterns and assure that the website code is succinct and expressive. For more complex web applications, we should consider using a framework that handles the details properly.

In the next chapter, we'll look at a few optimization techniques that are available to us. We'll expand on the @lru_cache decorator from Chapter 10, The Functools Module. We'll also look at some other optimization techniques that were presented in Chapter 6,Recursions and Reductions.