Building Applications - eXist: A NoSQL Document Database and Application Platform (2015)

eXist: A NoSQL Document Database and Application Platform (2015)

Chapter 9. Building Applications

eXist is a great platform not only for storing and querying XML data, but also for building web applications. Returning pages in (X)HTML is a breeze, since XHTML is XML and XML is eXist’s bread and butter. Conveniently, because your data is in XML and the programming language is XML-aware, there is no “impedance” between the layers (data storage, processing, presentation). Piece of cake, isn’t it?

Well, maybe a marketing department (if one existed) would try to sell you this high-tech fairy tale. But as we all know, programming is hard work, whatever your environment or toolset. Silver-bullet environments don’t exist.

However, that being said, eXist does represent an excellent platform on which to build web applications. For applications that work with XML (or can be redesigned to work with XML), increases in development speed and decreases in code base size are often realized when compared to undertaking the same projects in Java or PHP for example. It’s not a panacea, but it can certainly help you in specific situations.

Overview

A web application is an application that is accessed over the Internet (or an intranet) and uses a web browser as its client. URLs passed from the browser are mapped to something on the server (a static page, a script that accesses a database, a CSS file, an image, etc.). Together this constitutes some meaningful functionality to the application’s users, such as accounting, playing games, or making friends. An application residing on a server typically consists of multiple files/resources. Some of these are scripts that, once executed, do something like querying or updating a database. Others are static, like CSS files and images. Another category is internal data, which is not shown directly to the user but used internally for configuration, lookup tables, and more.

Which Technology to Use?

Somewhat confusing perhaps is that eXist has two very different technologies for mapping URLs in HTTP requests to functionality (e.g., executing XQuery or XSLT code or retrieving a particular resource). They are URL rewriting and RESTXQ:

§ URL rewriting works by intercepting every HTTP request and passing it first through a centralized controller XQuery script that you provide. This script decides what to do with the request and either passes it on to another specific XQuery script, performs a redirect, or rejects it. Read more about this in “URL Mapping Using URL Rewriting”.

§ RESTXQ works with XQuery 3.0 function annotations that tell eXist the function that must be executed when certain HTTP requests come along. Read more about this in “Building Applications with RESTXQ”.

URL rewriting is the older and more mature of the two. RESTXQ is younger and easier to use, but might not provide all the necessary functionality yet. The approaches cannot be mixed!

So, which one to choose? RESTXQ is probably the simpler approach to get started with, and also the more platform-independent choice. However, there are some limitations in RESTXQ at the moment:

§ RESTXQ allows little nondeclarative access to the HTTP request: the eXist HTTP request module is not supported.

§ RESTXQ has no session (in other words, the eXist session module is not supported).

§ There is no support for processing HTTP multipart requests or responses in RESTXQ.

Therefore, you can handle more advanced tasks with URL rewriting (currently). However, few people need such advanced functionality, and missing features are likely to be added to RESTXQ in the near future.

Application Aspects

A web application is a multifaceted thing, and there are many aspects you have to deal with to make everything run smoothly. This is the case for all web technologies, and eXist is no exception. What are the eXist-specific aspects we have to talk about? In this chapter, we’ll explore the following topics:

Where to store your files/resources

Older versions of eXist gave you the choice of storing the application’s files/resources in the database or on the filesystem. Although the option of using the filesystem still exists, using the database is absolutely preferred. See “Where to Store Your Application?”.

The URL rewriting mapping mechanism

How does the URL rewriting mechanism map HTTP requests to functionality? See “URL Mapping Using URL Rewriting”.

Cleaning up URLs for URL rewriting

Up to now we have only seen ugly URLs like http://localhost:8080/exist/rest/db/myapp/. For a real application, you probably want to change these into something like http://www.myapp.com/. How to achieve this with URL rewriting is described in “Changing the URL for URL Rewriting”.

Requests, sessions, and responses

Inside your application you’ll want to inspect the requests, keep data alive between requests, and control the server’s responses. Information about this can be found in “Requests, Sessions, and Responses”.

Security for applications

How to handle the user base and add an extra layer of security using eXist’s native mechanisms is described in “Application Security”.

Global error pages

How to create specific pages that handle HTTP 400 (bad request), 404 (not found), and other response error codes is explained in “Global Error Pages”.

Using RESTXQ

RESTXQ is substantially different from URL rewriting. You can read more about it in “Building Applications with RESTXQ”.

Packaging

How to use the eXist packaging mechanism to easily distribute your application is described in “Packaging”.

Getting Started, Quickly?

This chapter will teach you the basic mechanisms for eXist applications: how they work and how to customize them to your needs. However, this might be too much information if you want to write an application quickly and are not really interested in what lies underneath.

In that case, we advise you to use eXide’s (eXist’s internal IDE’s) application framework. This allows you to quickly set up small- to medium-sized applications without any fuss. Please refer to “eXide” for more information.

Where to Store Your Application?

A real-world application consists of data, which for eXist is always stored in the database, and the application itself, which consists of many files—not only (XQuery) scripts but also images, stylesheets, static HTML pages, and more. In times past (v1.4 and before), eXist gave you a choice of where to store these files: either in the filesystem (underneath a subdirectory of $EXIST_HOME/webapp/myapp) or in the database.

Although the option to use the filesystem for your application is still supported, it has been deprecated for some time and should not be used. Security and deployment are the major drivers for this, and additionally, applications stored in the database are part of standard database backups. The database’s security system lets you tag certain resources for general usage and others for use by specific users and/or groups. This provides an extra layer of security on top of what is built into the application logic. Deployment on a live database is a breeze; a simple backup/restore does the trick. If you need more functionality, packaging mechanisms are available (see “Packaging”). There are several tools that make working with resources in the database feel almost exactly like working with files on the filesystem (as you’ll read about inChapter 14). RESTXQ (see “Building Applications with RESTXQ”) only works from within the database. The only remaining issue is version control, but with a bit of scripting that can be solved too (for instance, using eXist Ant scripting, as described in “Ant and eXist”).

So, develop your application in the database and reap the benefits of the security and deployment mechanisms offered. Do not use the filesystem (any longer).

URL Mapping Using URL Rewriting

URL mapping is all about providing meaningful URLs to your users and keeping them consistent. For instance, in a wiki, you might want the user to visit a subject with a URL like http://…/wiki/subjectname. But, of course, there will not be an XQuery script for every subject. Likely, there will be a single script handling all subjects based on some parameter, like in http://…/wiki/handlepage?subject=subjectname.

eXist’s oldest and most mature mechanism for mapping URLs to functionality is called URL rewriting. URL rewriting works by intercepting the HTTP request and passing control to a single entry point. This entry point is an XQuery script, always called controller.xql. Because it is XQuery, you can do whatever you want in it. It must return an XML fragment that describes what eXist should do next.

Anatomy of a URL Rewriting-Based Application

This section will take you through the anatomy of a mini demo application that uses URL rewriting. Although tiny, it shows you the important components and characteristics.

The demo application is part of the example code for this book; if you have installed that correctly you can start it with this URL (don’t forget the terminating slash): http://localhost:8080/exist/apps/exist-book/building-applications/mini-application/.

NOTE

Don’t let the URL format annoy you. We’ll talk about creating more user-friendly URLs soon, in “Changing the URL for URL Rewriting”.

You should see something like Figure 9-1.

Figure 9-1. The home screen of our example mini application

Notice that the URL visible in your browser changed and now ends in /home. It brings up a page generated by an XQuery script, but strangely enough, the URL doesn’t end in .xq.

Typing your name and pressing Submit brings up a similar “Hello <name>” screen; nothing particularly fancy is going on. So what makes this a typical eXist URL rewriting application?

A URL rewriting−based application has a central XQuery script as a single point of entrance for all requests. This is always called controller.xql and located in the root collection of your application. For Example 9-1, it is in /db/apps/exist-book/building-applications/mini-application/controller.xql.

Example 9-1. The URL rewriting controller code for the example application

xquery version "1.0" encoding "UTF-8";

(:~

: Example URL Rewriting Controller

:)

(: External variables available to the controller: :)

declare variable $exist:path external;

declare variable $exist:resource external;

declare variable $exist:controller external;

(: Other variables :)

declare variable $home-page-url := "home";

(: Function to get the extension of a filename: :)

declare function local:get-extension($filename as xs:string) as xs:string {

let $name := replace($filename, ".*[/\\]([^/\\]+)$", "$1")

return

if(contains($name, "."))

then replace($name, ".*\.([^\.]+)$", "$1")

else ""

};

(: If there is no resource specified, go to the home page.

This is a redirect, forcing the browser to perform a redirect. So this request

will pass through the controller again... :)

if($exist:resource eq"")then

<dispatch xmlns="http://exist.sourceforge.net/NS/exist">

<redirect url="{$home-page-url}"/>

</dispatch>

(: Check if there is no extension. If not, assume it is an XQuery file and forward

to this. Because we use forward here, the browser will not be informed of the

change and the user will still see a URL without a .xq extension. :)

else if (local:get-extension($exist:resource) eq"")then

<dispatch xmlns="http://exist.sourceforge.net/NS/exist">

<forward url="{concat($exist:controller, $exist:path, ".xq")}"/>

</dispatch>

(: Anything else, pass through: :)

else

<ignore xmlns="http://exist.sourceforge.net/NS/exist">

<cache-control cache="yes"/>

</ignore>

This example is intended to give you a first rough idea about what’s going on. We’re going to talk about URL rewriting in detail later, but here are some important characteristics:

§ Notice that the result of the script is always an XML fragment in an eXist-specific namespace. This fragment determines what eXist will do next—for instance, redirect the browser to another page, or silently (and invisibly to the user) forward to some other URL. The examples here are quite simple, but you can do some amazingly complex things, like pipelining results through XLST stylesheets.

§ The top of the script declares a number of external variables. eXist uses these to pass important information about the call to the script.

§ There is a function that extracts the extension part from a filename (e.g., xq from home.xq), by using a regular expression. This is, of course, not unique to URL rewriting, but you’ll often see regular expressions in a controller.xql inspecting parts of the URL.

§ The main part of the script examines the request by inspecting the external variables. The first if clause determines if the URL contains a resource name. If not, it redirects the browser to the home page. This causes a browser redirect (visible because the URL visible in your browser changes). So again, an HTTP request travels through our URL rewriting controller, but now with /home appended.

§ The second if clause checks whether the resource part of the URL has an extension (like .xq or .png). If not, it assumes that it is an XQuery script and forwards the request, invisibly to the browser, to an XQuery script in the database. So, for example, the second time around, after/home has been appended to the URI, home is interpreted as home.xq, and therefore home.xq is called.

§ The last else is a catchall that passes on the full URL to the appropriate handler, necessary for displaying images and more.

You’ll also want to look at the security settings of the application files. You can view these by, for instance, using eXist’s Java Admin Client:

§ Other users have execute permissions for controller.xql. This will always be the case for a URL rewriting controller, because it is the general entry point to your application. Otherwise, this is the error message you’ll see:

Subject 'guest' does not have '--------x' access to resource

'/db/apps/exist-book/building-applications/mini-application/controller.xql'

§ For our application, other users have execute permissions for other XQuery files too, so anybody can use them.

§ Other users also have execute permissions for the images subcollection. If you were to revoke this permission, the browser would not be able to load the eXist logo.

§ Finally, other users have read permissions to view the eXist logo in the images/existdb.png file. This makes it viewable by everyone.

For more restricted applications, you could limit these execute and other permissions to certain database users and/or groups. It’s easy to provide the user with a login page that changes its database identity, allowing you to fine-tune access. See “Application Security” for more about this.

Our last comment on this example involves inspecting the request and passing parameters. When you have a look at the code of the hello.xq file, you can see that it calls the eXist extension function request:get-parameter to read the name of the person invoking the request:

<p>Hello <i>{request:get-parameter('personname', '?')}</i></p>

The request extension module can get you a lot more information; see “The request Extension Module”.

How eXist Finds the Controller

To test a URL controller, you can use the URL http://localhost:8080/exist/apps/<pathtoyourapp>, as in the beginning of the previous section. We’ll do a sneak preview here of information to come (in “The controller-config.xml Configuration File”) to make sure you understand how this works and how eXist finds the controller:

eXist has a configuration file called $EXIST_HOME/webapp/WEB-INF/controller-config.xml. In it are entries like this:

<root pattern="/apps" path="xmldb:exist:///db/apps"/>

When you request a page that starts with http://localhost:8080/exist/apps/, the following happens:

§ Jetty recognizes the eXist prefix /exist and passes control to the eXist main servlet.

§ This servlet sees a URL starting with /apps. It tries to match this with an entry in controller-config.xml.

§ If a match is found, the value of its path attribute is used to try to locate a controller. So, in this case, eXist will first look for a controller in xmldb:exist:///db/apps/controller.xql.

§ If no match is found, it uses the rest of the URL to try to find the controller. It starts at the most specific path and works backward until it finds a controller.

So, for instance, a URL ending with /apps/myapp/a/b/c.xq will have eXist looking for a controller.xql file in /db/apps/myapp/a/b, /db/apps/myapp/a, and finally in /db/apps/myapp (where it will most probably be).

§ If eXist does not find a controller, it uses the URL as a path into the database and tries to find a matching resource.

Only one controller will be applied to a given request. It is not possible to pass control from one controller to another (or back to the same).

However, be aware that when your controller asks for a redirect (using the redirect element, as discussed in “Redirecting the request”), the browser will fire a new request and the whole circus of finding and possibly running a controller will start again. This creates the potential for redirect loops, so be careful!

The URL Rewriting Controller’s Environment

The URL rewriting controller in controller.xql gets information about the request through five external variables. You do not need to explicitly declare them, but if you do it should look like this:

declare variable $exist:path external;

declare variable $exist:resource external;

declare variable $exist:controller external;

declare variable $exist:prefix external;

declare variable $exist:root external;

NOTE

Besides using the special controller external variables, you can also use the functions in the request extension module (see “The request Extension Module”) to find out more about the request and the URL.

If you want to play with these variables, the collection /db/apps/exist-book/building-applications/show-controller-variables contains an example that passes the values of the external variables to the show-controller-variables.xq script, which displays them on an HTML page.

You can use this little application to inspect the values of the URL rewriting controller’s external variables. Use your browser to visit http://localhost:8080/exist/apps/exist-book/building-applications/show-controller-variables/<any-path-you-like>, and the values of the variables will be displayed.

Here are the definitions of the variables. For the examples, we assume you have browsed to http://localhost:8080/exist/apps/exist-book/building-applications/show-controller-variables/a/b/c.xq:

$exist:path

The part of the URL after the part that led to the controller. For example: a/b/c.xq.

$exist:resource

The part of the URL after the last / character, usually pointing to a resource. For example: c.xq.

$exist:controller

The part of the URL leading from the prefix (see below) to the controller script. For example: /exist-book/building-applications/show-controller-variables.

$exist:prefix

The URL prefix that caused the URL rewriting controller to become active. This is defined in the controller-config.xml configuration file. For example: /apps.

$exist:root

The root path used for finding the controller, as defined in the controller-config.xml configuration file. This path can be on the filesystem or in the database. In our example it is xmldb:exist:///db.

Figure 9-2 summarizes all this.

Figure 9-2. Going from a URL to controller variables

The Controller’s Output XML Format

A URL rewriting controller must output an XML fragment. This fragment determines what eXist will do next.

Ignoring the request

If you don’t want the controller to do anything and simply pass the request on for normal processing, either output nothing or use an ignore element. Skipping any URL rewriting is mostly used for “miscellaneous” requests, like for images or stylesheets. The format is:

<ignore xmlns="http://exist.sourceforge.net/NS/exist">

cache-control?

</ignore>

Cache control is explained in “URL rewrite caching”. When a request is ignored, cache control is usually on.

Redirecting the request

If you want the controller to redirect the client to another URL, use a dispatch element with a redirect child element. This will cause the client to issue a new request, potentially triggering the controller again. The format is:

<dispatch xmlns="http://exist.sourceforge.net/NS/exist">

redirect

cache-control?

</dispatch>

The redirect element is defined as:

<redirect url = string >

Cache control is explained in “URL rewrite caching”.

Forwarding the request

If you want the request forwarded to a specific resource on the server, use a dispatch element with a forward child. The format is:

<dispatch xmlns="http://exist.sourceforge.net/NS/exist">

<forward url = string1

servlet? = string2

absolute? = "yes" | "no" 3

method? = "POST" | "GET" | "PUT" | "DELETE" > 4

( add-parameter | set-attribute | clear-attribute | set-header )*

</forward>

</dispatch>

1

url directs the request to a new request path. This is equivalent to directly requesting this path, but without a controller present.

A relative path will be resolved relative to the original request path.

An absolute path will be resolved relative to the path that triggered the controller. For example, if the original URL started with http://localhost:8080/exist/apps/… and a forward was done to /ui/login.xq, the resulting request would be to http://localhost:8080/exist/apps/ui/login.xq.

2

servlet passes control to another servlet. Read more about this in “Advanced URL Control”.

3

If absolute is set to "yes", interpret the url attribute as a path on the filesystem, relative to the $EXIST_HOME/webapp directory, even when the controller is stored in the database. The default is "no".

For instance, <forward url="/extra/admin.xq/" absolute="yes"/> will forward control to $EXIST_HOME/webapp/extra/admin.xq.

4

method sets the HTTP method to use when passing the request to the next step in a pipeline. More about pipelines can be found in “Advanced URL Control”. The default is "POST".

A forward element can contain the following additional children:

§ The add-parameter element lets you add or override a request parameter:

<add-parameter name = string

value = string />

§ The set-attribute element sets a request attribute:

§ <set-attribute name = string

value = string />

You can inspect request attributes through eXist’s request extension module. There’s more about request attributes and how they differ from parameters in “The request Extension Module”.

§ The clear-attribute element clears a request attribute:

<clear-attribute name = string />

In rare circumstances this is necessary when constructing a pipeline. Read more about pipelines in “Advanced URL Control”.

§ The set-header element sets an HTTP header field:

<set-header name = string

value = string />

URL rewrite caching

You can enable URL rewrite caching by adding a cache-control child element to the dispatch element:

<cache-control cache = "yes" | "no" />

Setting the cache attribute to "yes" adds an entry for the dispatch rule to an internal map and prevents the controller from being triggered again for the input URL. For instance:

<dispatch xmlns="http://exist.sourceforge.net/NS/exist">

<redirect url="home"/>

<cache-control cache="yes"/>

</dispatch>

NOTE

URL rewrite caching has nothing to do with HTTP caching; only the dispatch rule is cached, not the response.

Advanced URL Control

URL rewriting is capable of more than just passing on or redirecting a request. It can also pass on the results of a forwarded request to a pipeline (a.k.a. sequence or view) of additional processing steps (usually XQuery and/or XSLT scripts).

The most common use case for this is probably the Model-View-Controller or MVC pattern, separating the application logic from its presentation. In the case of URL rewriting, controller.xql is the controller in the MVC pattern. Then we create an XML document, describing the contents of the response (but not its presentation). This becomes the model in the MVC pattern. Subsequent processing steps add the presentation to this, usually by transforming it to (X)HTML. This is the view in the MVC pattern.

URL rewriting allows you to specify such actions in the XML fragment output of the URL rewriting controller. To do this, add a view element after the forward element, containing the additional processing steps.

Here is a simple example of such an XML fragment. You can see this example in action by browsing to http://localhost:8080/exist/apps/exist-book/building-applications/views/:

<dispatch xmlns="http://exist.sourceforge.net/NS/exist">

<forward url="{concat($exist:controller, "/createmodel.xq")}"/>

<view>

<forward servlet="XSLTServlet">

<set-attribute name="xslt.stylesheet"

value="{concat($exist:root, $exist:controller, "/xslt/view1.xslt")}"/>

</forward>

</view>

</dispatch>

In this example, the request is first passed to the createmodel.xq script. This creates some XML that is subsequently passed to the view1.xsl XSLT stylesheet for transformation into HTML.

Another example uses two stylesheets in a pipeline:

<dispatch xmlns="http://exist.sourceforge.net/NS/exist">

<forward url="{concat($exist:controller, "/createmodel.xq")}"/>

<view>

<forward servlet="XSLTServlet">

<set-attribute name="xslt.stylesheet"

value="{concat($exist:root, $exist:controller, "/xslt/view2a.xslt")}"/>

</forward>

<forward servlet="XSLTServlet">

<set-attribute name="xslt.stylesheet"

value="{concat($exist:root, $exist:controller, "/xslt/view2b.xslt")}"/>

</forward>

</view>

</dispatch>

This will first create XML by calling createmodel.xq. This is passed to the view2a.xslt XSLT stylesheet and processed into something else. Finally, the view2b.xslt XSLT stylesheet which transforms it into HTML.

We pass the name of the stylesheet by setting the xslt.stylesheet request attribute. Notice that we do a bit of filename juggling there: concat($exist:root, $exist:controller, "/xsl/view1.xslt"). This is necessary because stylesheets are expected to be on the filesystem by default. To execute stylesheets from the database, we have to explicitly prepend their paths with xmldb:exist:///db/, and $exist:root starts with this. You can, of course, hardcode this, but eXist passes enough information in the controller variables to build this path dynamically, which somewhat isolates you from possible changes in future.

eXist has multiple servlets, but the one that is useful in this scenario is the XSLT servlet, named XSLTServlet. It is controlled by means of the following attributes:

xslt.stylesheet

The path and name of the XSLT stylesheet to execute. By default, the filesystem is used. If you want to use a stylesheet stored in the database, prepend this value with xmldb:exist:///db/.

xslt.user, xslt.password

The username and password of a database user, used during execution of the XSLT script when it accesses the database.

xslt.*

Any other attributes starting with xslt. will be passed as stylesheet parameters. For instance, an attribute called xslt.extra will be available to the stylesheet as global parameter $xslt.extra. Not all XDM types are supported, so it’s best to limit yourself to strings.

Changing the URL for URL Rewriting

We’ve only seen ugly URLs for referencing our application so far, like:

http://localhost:8080/exist/rest/db/myapp/

For a resource stored in the database underneath /db/myapp

http://localhost:8080/exist/apps/building-applications/

For an application with a URL rewriting controller stored in the database underneath /db/apps/myapp

It’s time to clean up our act and make way for nice URLs like http://localhost/myapp that use port 80 and don’t need the /exist prefix—or even better, use a DNS name like http://www.myapp.com/.

Changing the URL has everything to do with how eXist processes a URL:

§ The Jetty web server is the main receiver of the request. It listens on a certain TCP port (by default, 8080) for HTTP requests. It examines the request and, based on its URL, passes it on to a servlet. The default configuration tells Jetty that all requests (with a URL starting with/exist) should be passed to the XQueryUrlRewrite servlet, serving as the central entry point.

§ The XQueryUrlRewrite servlet matches the remainder of the URL (the part after /exist) to entries in the mapping file $EXIST_HOME/webapp/WEB-INF/controller-config.xml. This tells XQueryUrlRewrite what to do: look for a URL rewriting controller somewhere or pass it directly to another servlet.

§ If a URL rewriting controller is involved, it inspects the URL and passes control for further processing (or tells the browser to redirect to another page).

We talked about this last step first (see “URL Mapping Using URL Rewriting”), because it’s so crucial for understanding how applications work in eXist. Now we’re going to talk about the first two stages.

Changing Jetty Settings: Port Number and URL Prefix

The Jetty settings determine the TCP port number used (by default, 8080) and the prefix of the URL (by default, /exist). These settings are configured in $EXIST_HOME/tools/jetty/etc/jetty.xml.

Change TCP port number

To change the TCP port number eXist listens on, find the following entries, change the port numbers, and restart eXist:

<SystemProperty

name="jetty.port" default="8080"/>

<SystemProperty

name="jetty.port.ssl" default="8443"/>

Be aware that the second entry appears twice!

NOTE

On Unix and Linux systems, only a process running under the root user’s account can open ports beneath 1024. While web servers typically operate on port 80 and/or 443, as discussed in Chapter 8, it is better to run eXist as an unprivileged user (see “Hardening”) and instead reverse proxy eXist through an existing web server (see “Reverse proxying”).

URL prefix

To remove the /exist URL prefix, find the entry <Set name="contextPath">/exist</Set>, change its value to /, and restart eXist.

The controller-config.xml Configuration File

The next step eXist takes is examining the remainder of the URL (the part after the URL prefix, if any). This is done by the XQueryUrlRewrite servlet using the entries in the $EXIST_HOME/webapp/WEB-INF/controller-config.xml file.

NOTE

If you want to, you can change the location of the controller-config.xml file, even to somewhere inside the database. This can be beneficial for security or backup reasons.

Open $EXIST_HOME/webapp/WEB-INF/web.xml and search for the entry that mentions controller-config.xml. It should look like <param-value>WEB-INF/controller-config.xml</param-value>. Change this to, for instance, <param-value>xmldb:exist:///db/controller-config.xml</param-value> and store your controller-config.xmlin the /db database collection. Restart eXist.

Example 9-2 is a simplified and annotated version of this file.

Example 9-2. Example controller-config.xml file

<configuration xmlns="http://exist.sourceforge.net/NS/exist">

<!-- Forward URLs starting with rest or servlet to the REST servlet: -->

<forward pattern="/(rest|servlet)/" servlet="EXistServlet"/>

<!-- Patterns starting with /apps should look for a URL rewriting controller: -->

<root pattern="/apps" path="xmldb:exist:///db/apps"/>

<!-- My url www.myapp.com should map to my application stored underneath

/db/myapp in the database: -->

<root server-name="www.myapp.com" pattern="/*"

path="xmldb:exist:///db/apps/myapp/"/>

<!-- Anything else, pass on to the XQueryServlet for default executing

from the filesystem: -->

<forward pattern=".*\.(xq|xql|xqy|xquery)$" servlet="XQueryServlet"/>

</configuration>

The content of the controller-config.xml file must be in the http://exist.sourceforge.net/NS/exist namespace. The format is:

<configuration xmlns="http://exist.sourceforge.net/NS/exist">

( forward | root )+

</configuration>

What happens is that all entries in the controller-config.xml file are examined from top to bottom. If the remainder of the URL (the part after /exist) matches with a pattern attribute (which is a regular expression), this entry is used.

§ A forward element passes control directly to a given servlet:

§ <forward pattern = string

servlet = string />

§ A root element triggers the URL rewriting controller:

§ <root pattern = string

§ server-name? = string

path = string />

The path attribute tells eXist where to look for the URL rewriting controller, as explained in “How eXist Finds the Controller”. The default location is within the filesystem, but if you want it to point to a location in the database, start its value with xmldb:exist:///db/.

When a server-name attribute is present (e.g., server-name="www.myapp.com"), this must match also, allowing you to associate a DNS name with your application.

Proxying eXist Behind a Web Server

Another way of cleaning the URLs is by running eXist behind another web server, as a proxy. This web server—we’ll use Apache as an example—catches requests for eXist, passes them on, and sends the responses back to the user.

Although this sounds like a bit of a detour, it is actually quite useful in certain situations:

§ Sometimes you’re running on a server in a mixed environment. Besides the eXist application there can be other applications active, based on PHP, CGI, Perl, and more. The easiest way to handle this is to use a workhorse like Apache and proxy eXist behind Apache.

§ Apache is more flexible in configuration than Jetty + eXist and contains more functionality as a web server.

§ System managers are probably more used to running Apache as a frontend than Jetty. They have things like tools and scripts hanging around and know the commands by heart. Keep them happy!

§ Apache is used more than Jetty, so there are lots of third-party tools for things like analyzing web traffic.

There is more than one way to handle this, but here is a recipe for proxying an eXist application behind Apache:

1. Leave the Jetty settings at the defaults (i.e., TCP port 8080 and a URL prefix of /exist).

2. Adapt the controller-config.xml file so that the URL to your application points to the right collection (or directory). For example:

3. <root server-name="www.myapp.com" pattern=".*"

path="xmldb:exist:///db/myapp/"/>

4. Enable the mod_proxy module in Apache.

5. Add the configuration shown in Example 9-3 to Apache (this is a minimal example; you’ll probably want to add logging and other functionality).

Example 9-3. Apache configuration for proxying eXist

<VirtualHost *:80>

ServerAdmin your-admin-email@your-domain.com

# The URLs for this application:

ServerName www.myapp.nl

ServerAlias myapp.com

ProxyRequests Off

<Proxy *>

Order deny,allow

Allow from all

</Proxy>

ProxyPass / http://www.myapp.nl:8080/exist/

ProxyPassReverse / http://www.myapp.nl:8080/exist/

# Cookies must be adapted to allow the session mechanism to work:

ProxyPassReverseCookiePath /exist /

ProxyPassReverseCookieDomain localhost myapp.com

RewriteEngine on

RewriteRule ^/(.*)$ /$1 [PT]

</VirtualHost>

There was another example of proxying earlier in the book (using the Nginx web server), which focused on security; see “Reverse proxying”.

Requests, Sessions, and Responses

An entry to a web application starts with a request from a web client. A request consists of a URL but might also contain, for example, parameters or an uploaded file. In between requests you probably want to keep information for the current user in a session. The answer to a request is called a response, and there are several things you might want to control here too.

To work with requests, sessions, and responses, eXist uses extension modules. This section will provide you with an overview of the functionality found in these modules (for the full details, please refer to the function documentation browser). Along the way we’ll reveal some tips and tricks.

The request Extension Module

All details about an incoming HTTP request can be accessed through the request extension module. This module is really just a very simple XQuery wrapper around the underlying HttpServletRequest Java class that eXist handles for you. For instance:

§ request:get-uri will give you the original URI as received from the client.

§ There are other functions for inspecting details, like request:get-remote-port for checking the TCP port number.

§ The functions request:get-parameter-names and request:get-parameter give you access to the request parameters.

§ request:get-cookie-names and request:get-cookie-value let you access the data stored in cookies.

Request parameters and attributes

If you browse the functions of the request extension module, you might notice that both request parameters and attributes are mentioned:

§ A request parameter is a name/value pair that was passed in from the client—for instance, as part of the URL or as an input field of an HTML form. Parameter values are always strings.

§ A request attribute is a name/value pair that was set on the server. This was most likely done by the URL controller (see “URL Mapping Using URL Rewriting”), but if needed you can do it anywhere in your code using the request:set-attribute function. Attribute values can be anything from simple strings to complex XML fragments.

Request attributes are useful for internal communication between parts of your application code when processing a request. They are also used by some internal mechanisms as parameters to servlets (for an example of this, see “Advanced URL Control”).

Uploading files

The request extension module can also be used for uploading files to the server. For example, assume you want to upload a binary file to your server and store this in the database. The page that offers this functionality must contain a form with encoding type multipart/form-data, as in this HTML fragment:

<form enctype="multipart/form-data" method="post" action="upload1-process.xq">

<p>Upload binary file:

<input type="file" size="80" name="FileUpload"/>

<br/>

<input type="submit"/>

</p>

</form>

Access to the uploaded file is via the request:get-uploaded-file-data function. You can store the result in the database by using the xmldb:store function, as in this XQuery fragment:

let $stored-file as xs:string? := xmldb:store($store-collection, $store-resource,

request:get-uploaded-file-data($field-name), 'application/octet-stream')

This returns the path of the file as stored in the database. Other functions that might be of interest here are request:get-uploaded-file-name for getting the original file name and request:get-uploaded-file-size for getting the size of the file (and optionally rejecting it if it is too large).

The session Extension Module

A session represents the interaction with a server for a specific client over a period of time. You may store data in the session that is available across requests for the same client. Each client may have a distinct session with the server. A session is accessed with the session extension module. This module is really just a very simple XQuery wrapper around the underlying HttpSession Java class that eXist handles for you. Some usage hints:

§ A session must be created with the session:create function. Lots of other functions that do something with a session create it implicitly for you, but it can never hurt to create it explicitly with session:create. If the session already exists, the call is ignored.

§ A session can hold attributes that are name/value pairs. Attribute values can be anything from simple strings to complex XML fragments. Use the functions session:set-attribute and session:get-attribute to work with these.

§ Sessions invalidate automatically after not being accessed for a certain amount of time. You can control this interval using the session:get-max-inactive-interval and session:set-max-inactive-interval functions.

The response Extension Module

You control the response to a request via the response extension module. This module is really just a very simple XQuery wrapper around the underlying HttpServletResponse Java class that eXist handles for you. Useful functionality here includes:

§ Setting cookies with the response:set-cookie function

§ Explicitly setting response headers and the overall status code with the response:set-header and response:set-status-code functions

§ Redirecting the client to another page with response:redirect-to

§ Streaming data directly to the output with the response:stream and response:stream-binary functions (useful for creating download functionality, as described next)

Creating “download XML file” functionality

Creating a download function for an XML file, in which the browser asks you where to store it instead of displaying it, is not as easy as it may sound. You have to trick the browser into believing the file is not XML. The following code fragment forces an XML download:

response:stream-binary(

util:string-to-binary(

util:serialize(<Hello/>, 'method=xml'),

'UTF-8'

),

'application/octet-stream',

'download.xml'

)

§ An XML fragment (here, simply <Hello/>) is forced into a string via util:serialize.

§ This string is then forced into binary data via util:string-to-binary.

§ This is passed to response:stream-binary with an Internet media type set to application/octet-stream.

§ A filename (in this example, download.xml) is passed as the preferred filename for storage (the user can change this).

As a result, the browser sees a binary response, which it cannot display. It therefore asks the user where it should be stored.

Application Security

Unless you’re creating a fully public website, your application will have to deal with security. Such functionality may include creating and maintaining a user base, managing login and logout, and restricting access to parts of the application to certain user groups.

The usual way to implement this for an eXist application is to concentrate the security checks in the central controller (see “URL Mapping Using URL Rewriting”). The controller can check the user’s identity, restrict access, map URLs to different pages based on the user’s credentials, and more. Because the controller handles it all, your other code can be relatively security-code-free and concentrate on what it should be doing.

To make this all happen you need some kind of user/group administration system, and you might be tempted to set up one of your own—just some XML file with users, passwords, and additional information. There are several functions that take care of this, allowing users to log in and storing the identity of the current user in the session. The application can work with this information when pages are requested to allow or deny access.

However, we strongly advise against this approach. eXist already has an excellent security system that allows you to create users, log them in, organize them in groups, restrict access to scripts and data based on their credentials, and so on. This security system and how to work with it are described in detail in Chapter 8.

If you base your application’s security on top of eXist’s security, you have to write, debug, and maintain less code. It also creates two levels of security:

§ Your controller or other parts of your application can work with eXist’s security settings through functions in the xmldb and securitymanager extension modules. This allows for programmatically asking questions like “Is this user allowed to execute this XQuery module?” or “Is the current user allowed to see this data?” If not, you could redirect the user to the appropriate error or login page.

§ On top of that, eXist takes guard. So, if your application is flawed and tries to access a nonauthorized page or data file, this is simply not allowed.

Therefore, our advice is to base your application’s security on top of eXist’s security. Here are some tips and tricks:

§ Create at least one specific user group for your application, and make all the application’s users a member of this group. Nonpublic pages and data should be accessible by members of this group only. You can extend this mechanism with multiple user groups if your application needs more fine-grained authorization.

§ When you log somebody in, check whether this user is a member of the right user group(s) first! Sometimes you have multiple applications running on the same server, and you don’t want users of Application A being able to log in to Application B (and running into trouble afterward because the security settings won’t allow them to do anything).

Here is little login function that checks whether a user is part of a list of user groups before attempting the login:

declare function local:login(

$user-groups as xs:string*,

$user as xs:string,

$password as xs:string

) as xs:boolean

{

let $users-in-groups as xs:string* :=

for $group in$user-groups return xmldb:get-users($group)

return

if(empty($user-groups) or ($user = $users-in-groups)) then

xmldb:login('/db', $user, $password, true())

else

false()

};

§ A handy function for checking the access rights of the current user for a certain resource or collection is sm:has-access. You can check against a partial mode string like r-x or x. For instance:

if(sm:has-access('/db/myapp/securepage.xq', 'r-x')) then

(: forward to this page :)

else

(: redirect to error page :)

§ There is no explicit logout function. The safest way to log out is to return the current user’s identity back to guest and to invalidate the session:

xmldb:login('/db', 'guest', 'guest'),

session:invalidate()

Running with Extra Permissions

You’ve set up an application and paid special attention to security, so when a user runs an XQuery, it runs with minimum permissions and is not allowed to access those parts of the database that it doesn’t need to. However, suddenly you realize this user has to create/update the user base, a global logfile, or some other part of the database you don’t want to make accessible in normal circumstances. What to do?

This is a frequently occurring problem. Luckily, eXist allows you to switch to another user for a single XQuery statement (which can, of course, also be a function call, so you can do whatever complicated stuff you like).

The function call for this is in eXist’s system extension module:

system:as-user($username as xs:string, $password as xs:string?,

$code-block as item()*) as item()*

system:as-user runs $code-block with the credentials of the given user. It returns whatever $code-block returns.

So, you set up a user with enough privileges and run the offending command with system:as-user. For example, the following creates a new user group called appusers with a member erik:

let $create-group-result := system:as-user('privuser', 'verysecret',

xmldb:create-group('appusers', 'erik') )

As you probably have noticed, this creates a new security problem: you’ll have to provide the system:as-user function with the username and password of a privileged user, so this data must be defined somewhere in your XQuery code or read from a data file. Unfortunately, there is not (yet) a watertight solution for this. The best you can do now is store this information somewhere in the database and set the security measures for the resource as tight as possible.

Global Error Pages

When something goes wrong, eXist generates an error page with the appropriate HTTP status code—for instance, a page with status 500 for an XQuery script that contains an error, or a status 404 for a nonexistent resource. You might want to prevent the user from seeing this and redirect these error responses to some kind of “Oops, sorry” page.

Unfortunately, eXist has no means of defining these kinds of error pages on an application level. You can only define them at the Jetty level, making them global for the full eXist instance.

To add an error page, edit the $EXIST_HOME/webapp/WEB-INF/web.xml file and add the following XML fragment as a child of the root web-app node:

<error-page>

<error-code>http-error-code</error-code> 1

<location>uri-to-error-page</location> 2

</error-page>

1

The error-code element contains the integer HTTP status code you want to catch (e.g., 500 or 404).

2

The location element contains the URL to the page you want to display if such an error pops up. This must be the part of a valid eXist URL that comes after /exist. For instance: /rest/db/central/page404.xq.

So, if your web.xml file contains:

<error-page>

<error-code>404</error-code>

<location>/rest/db/central/page404.xq</location>

</error-page>

all responses with an HTTP 404 status code will be forwarded to the page404.xq script.

Note that you have to restart eXist for the changes to take effect.

Building Applications with RESTXQ

RESTXQ is a standard developed by the EXQuery community that allows you to declare interactions between HTTP requests and XQuery functions. RESTXQ takes a very different approach from that of XQuery URL rewriting in eXist, instead using XQuery 3.0 annotations to declare your HTTP intentions within function declarations. XQuery functions that declare RESTXQ annotations are known as resource functions due to the fact that they expose some sort of resource over HTTP.

RESTXQ was inspired by the JAX-RS specification JSR-311. RESTXQ attempts to be nondisruptive by allowing you to annotate existing functions, which will then become HTTP-aware in a web-enabled XQuery processor or continue to work fine in a standalone processor. While XQuery URL rewriting is specific to eXist, RESTXQ attempts to create a standard XQuery 3.0 approach to servicing HTTP requests with XQuery, thereby allowing you to execute your XQuery web applications on any RESTXQ-compatible XQuery processor.

RESTXQ is a relatively young project: an implementation for eXist started in early 2012, and a beta version became part of eXist 2.0. Progress is still being made toward a final RESTXQ 1.0 version, but it is already very usable in eXist and many people are doing so. There are also implementations available in BaseX, Zorba, and MarkLogic.

RESTXQ offers great potential, and need not be solely limited to HTTP in the future; ultimately, RESTXQ might enable XQuery URL rewriting and the REST Server to be reimplemented in XQuery as a set of resource functions. Documentation for RESTXQ is fairly limited at the moment, and the best source of information is most likely Chapter 4 of the paper “RESTful XQuery” from the conference proceedings of XML Prague 2012.1 Next, we will demonstrate how to make use of the features of RESTXQ, and you’ll find a more complete example of using it in “RESTXQ”.

Configuring RESTXQ

RESTXQ monitors the eXist database, and when XQueries are stored that contain RESTXQ annotations, RESTXQ is configured to route matching HTTP requests to the identified resource functions. RESTXQ accomplishes this monitoring by means of a trigger, which is enabled by default on all database collections via the collection configuration in /db/system/config/db/collection.xconf. You may enable or disable RESTXQ monitoring by adding its trigger configuration to or removing it from the configuration for a specific collection. For more details, see“System Collections” and “Database Triggers”.

TIP

RESTXQ maintains a registry of resource functions that it has detected. In eXist this registry is persisted on disk in the file $EXIST_HOME/webapp/WEB-INF/data/restxq.registry. You can see the list of known resource functions by looking in this file or by executing the XQuery function rest:resource-functions. While the format of this file is plain text, it is not recommended that you modify the file manually. However, removing this file when eXist is not running can be a good way to clear out the RESTXQ registry during development and testing.

When eXist starts up, RESTXQ reads its registry of known resource functions by means of a startup trigger (see “Startup Triggers”), which is enabled globally by default in $EXIST_HOME/conf.xml. Disabling this startup trigger along with removing references to RESTXQ from all collection configuration documents effectively disables RESTXQ in eXist.

TIP

All RESTXQ resource functions are relative to an implementation-defined base URI. In eXist, the default base URI is typically /restxq (i.e., http://localhost:8080/exist/restxq). You may reconfigure the base URI by making changes to the forward pattern for RestXqServlet in $EXIST_HOME/webapp/WEB-INF/controller-config.xml. If you wish to map this into an existing domain space, one option would be to use reverse proxying, as described in “Proxying eXist Behind a Web Server” and “Reverse proxying”. This would, for example, allow you to map http://www.something.com/customer/1234 on to http://localhost:8080/exist/restxq/customer/1234 (assuming that you have a resource function with a path annotation like%rest:path("/customer/{$id}")).

RESTXQ Annotations

RESTXQ defines a set of XQuery 3.0 annotations that, when added to an XQuery function, produce a resource function. This resource function can service an HTTP request and return an HTTP response. The exact mechanics of marshaling and demarshaling HTTP to XQuery are implementation-specific; RESTXQ just defines how various HTTP properties should be mapped into and out of an XQuery function. RESTXQ annotations can be used on any XQuery function; that is, functions in a main module or library module.

RESTXQ provides two classes of XQuery 3.0 annotations for use on resource functions:

Constraint annotations

Constraint annotations identify and limit the scope of HTTP requests that may be processed by a resource function. Constraint annotations allow you to specify, for example, the URI, HTTP method, and Internet media types that your function is interested in processing.

Parameter annotations

Parameter annotations extract properties of an HTTP request (matching the constraint annotations) and inject the values as parameters to your resource function. Parameter annotations allow you to extract parameters from the URI query, HTTP header, HTTP cookie, and POSTed HTML forms.

HTTP method constraint annotations

A resource function may have one or more method constraint annotations. A method constraint annotation constrains the HTTP methods that a resource function may process. RESTXQ currently supports the HTTP methods GET, HEAD, POST, PUT, and DELETE. See Example 9-4.

Example 9-4. Simple resource function that services all incoming GET requests

xquery version "3.0";

module namespace ex = "http://example/restxq/1";

import module namespace rest = "http://exquery.org/ns/restxq";

declare

%rest:GET 1

function ex:not-found() {

<result>The requested page could not be found!</result>

};

1

A RESTXQ resource constraint annotation for the HTTP method GET.

The XQuery in Example 9-4 is perhaps the simplest example of using RESTXQ; it will simply return a response for any HTTP GET request to eXist’s RESTXQ Server.

By storing the XQuery anywhere in the database and granting it execute rights, you may then access it by requesting by HTTP GET any URI under http://localhost:8080/exist/restxq. For example, using cURL:

$ curl -v http://localhost:8080/exist/restxq/any/thing/at/all

results in:

* About to connect() to localhost port 8080 (#0)

* Trying ::1...

* Adding handle: conn: 0x7f8091007200

* Adding handle: send: 0

* Adding handle: recv: 0

* Curl_addHandleToPipeline: length: 1

* - Conn 0 (0x7f8091007200) send_pipe: 1, recv_pipe: 0

* Connected to localhost (::1) port 8080 (#0)

> GET /exist/restxq/any/thing/at/all HTTP/1.1

> User-Agent: curl/7.32.0

> Host: localhost:8080

> Accept: */*

>

< HTTP/1.1 200 OK 1

< Date: Sun, 20 Oct 2013 13:01:17 GMT

< Set-Cookie: JSESSIONID=bzhhe8x66jqb1wremvc814vah;Path=/exist

< Expires: Thu, 01 Jan 1970 00:00:00 GMT

< Content-Type: application/xml 2

< Transfer-Encoding: chunked

* Server Jetty(8.1.9.v20130131) is not blacklisted

< Server: Jetty(8.1.9.v20130131)

<

* Connection #0 to host localhost left intact

<result>The requested page could not be found!</result> 3

1

Note the HTTP response is 200 OK. This is not ideal for when we do not find a document; 404 Not Found would be more appropriate!

2

The response media type is application/xml, which is the default of RESTXQ.

3

The result of the XQuery function.

Example 9-4 has some shortcomings, in that it only handles HTTP GET requests, it returns the wrong HTTP status code when a document is not found, and it assumes that the client wants an XML response when a document is not found. Example 9-5 shows an improved version.

Example 9-5. Simple resource function that creates an HTML response

xquery version "3.0";

module namespace ex = "http://example/restxq/2";

import module namespace rest = "http://exquery.org/ns/restxq";

declare namespace output = "http://www.w3.org/2010/xslt-xquery-serialization";

declare

%rest:GET 1

%rest:HEAD

%rest:POST

%rest:PUT

%rest:DELETE

%output:method("html5") 2

function ex:not-found() {

( 3

<rest:response>

<http:response status="404"/> 4

</rest:response>

, 5

<html> 6

<head><title>Document not found!</title></head>

<body>

<p>Sorry, we could not find the document that you requested :-(</p>

</body>

</html>

)

};

1

We declare that we wish to process all supported HTTP methods.

2

We use an output annotation to specify that the body of the response should be serialized as HTML5. Output annotations are part of the RESTXQ specification and simply provide an annotation syntax for the XSLT and XQuery 3.0 Serialization specification.

3

We start a sequence, which allows us to control the response from RESTXQ and provide a response body.

4

We declare that the HTTP response code should be set to 404 Not Found.

5

Note the comma that separates the first item in the sequence, which controls the RESTXQ response, from our response body (the second item in the sequence).

6

We construct an HTML document for our response body.

Example 9-5 addresses the problems of Example 9-4 by handling all methods and explicitly defining properties of the HTTP response. We can store it into the database, replacing the first example, and then access it by requesting any URI (by any HTTP method) underhttp://localhost:8080/exist/restxq. For example, using cURL:

$ curl -v -X POST http://localhost:8080/exist/restxq/any/thing/at/all

results in:

* About to connect() to localhost port 8080 (#0)

* Trying ::1...

* Adding handle: conn: 0x7f947b007200

* Adding handle: send: 0

* Adding handle: recv: 0

* Curl_addHandleToPipeline: length: 1

* - Conn 0 (0x7f947b007200) send_pipe: 1, recv_pipe: 0

* Connected to localhost (::1) port 8080 (#0)

> POST /exist/restxq/any/thing/at/all HTTP/1.1

> User-Agent: curl/7.32.0

> Host: localhost:8080

> Accept: */*

>

< HTTP/1.1 404 Not Found 1

< Date: Sun, 20 Oct 2013 13:29:14 GMT

< Set-Cookie: JSESSIONID=1dkp1kr1w2zdbrdjfycz9qtaa;Path=/exist

< Expires: Thu, 01 Jan 1970 00:00:00 GMT

< Content-Type: text/html;charset=UTF-8 2

< Transfer-Encoding: chunked

* Server Jetty(8.1.9.v20130131) is not blacklisted

< Server: Jetty(8.1.9.v20130131)

<

<!DOCTYPE html> 3

<html> 4

<head>

<title>Document not found!</title>

</head>

<body>

<p>Sorry, we could not find the document that you requested :-(</p>

</body>

</html>

* Connection #0 to host localhost left intact

1

The HTTP response is now 404 Not Found, as declared in our rest:response.

2

The response media type is text/html, which is set by RESTXQ by default when an HTML output serialization is declared.

3

The HTML5 doctype has been inserted by the HTML5 output serializer, as declared by our output:method annotation.

4

The response body aspect of the sequence results from our XQuery function.

So far, each of the examples that we have looked at has used simple HTTP requests, but what happens when a POST or PUT request is received that contains a request body? If you wish to extract the body of the HTTP request, you can declare this intention on POST and PUT methods by specifying the name of the function parameter that the body should be injected into. See Example 9-6.

Example 9-6. Resource function extracting a request body

xquery version "3.0";

module namespace ex = "http://example/restxq/3";

import module namespace rest = "http://exquery.org/ns/restxq";

declare

%rest:POST("{$body}") 1

function ex:echo($body) { 2

<received>{$body}</received> 3

};

1

We declare that we wish to process only HTTP POST requests, and that any request body should be extracted and injected into the function parameter named $body.

2

This parameter will be set to the value of the request body declared when invoked by RESTXQ.

3

The request body will be output as part of the response.

By storing the XQuery anywhere in the database and granting it execute rights, you may then access it by requesting by HTTP POST any URI under http://localhost:8080/exist/restxq. For example, given the following simple XML file:

<test>123</test>

using cURL to POST the XML file:

$ curl -X POST -H 'Content-Type: application/xml' -d @/tmp/test.xml

http://localhost:8080/exist/restxq/something

results in:

<received>

<test>123</test> 1

</received>

1

Note that the content of test.xml has been received by the server and placed inside the received element for the response by our XQuery function.

TIP

When extracting the HTTP request body for a POST or PUT, RESTXQ will attempt to automatically process the request body and provide the correct data type for you. The process for automatically converting the request body is as follows:

1. Is there an HTTP Content-Type header indicating that the content is of a binary type (looked up in $EXIST_HOME/mime-types.xml)? If so, return an xs:base64Binary value of the request body.

2. Try to parse the request body as XML; is it XML? If so, return it as a document-node() value.

3. Return the body as an xs:string.

URI path constraint annotation

A resource function may have a path constraint annotation, %rest:path, as shown in Example 9-7. A path constraint annotation constrains the URI path of an HTTP request that a resource function may process. The URI path may itself contain templates that are extracted and injected as parameters to the function. A URI path constraint may not be used by itself; it always requires at least one HTTP method constraint annotation to also be present on the resource function. The URI path is always relative to the base URI of the RESTXQ Server (see“Configuring RESTXQ”).

Example 9-7. Resource function saying hello with URI templating

xquery version "3.0";

module namespace ex = "http://example/restxq/4";

import module namespace rest = "http://exquery.org/ns/restxq";

declare

%rest:GET

%rest:path("/hello/{$name}") 1

function ex:say-hello($name) { 2

<greeting>Hi there {$name}!</greeting> 3

};

1

We declare that we wish to only process paths that start with /hello followed by a path segment template, which should be extracted and injected into the function parameter $name.

2

This parameter will be set to the value of the URI template declared by %rest:path when invoked by RESTXQ.

3

The value of the path segment will be output as part of the response.

This simple example declares the URI path to service, extracts a single URI segment using templating, and returns a result showing the value extracted from the URI.

By storing the XQuery anywhere in the database and granting it execute rights, you may then access it by an HTTP GET to the URI http://localhost:8080/exist/restxq/hello/myName. For example, using cURL:

$ curl http://localhost:8080/exist/restxq/hello/Liz

<greeting>Hi there Liz!</greeting>

Note the name has been extracted from the URI; changing the last segment of the URI changes the greeting!

URI paths may be much more complicated and have several templates within them; for example:

%rest:path("/country/{$country-code}/organization/{$org-id}/person/{$person-id}")

Consumes constraint annotation

A resource function may be constrained by the media types of HTTP requests that it is willing to process. You can achieve this by using one or more consumes constraint annotations, %rest:consumes (see Example 9-8). If no consumes constraint annotations are present on a resource function, then the function is assumed to process all content types. Consumes constraint annotations make the most sense in the context of POST and PUT requests, where you wish to control the POSTed/PUTed resources that your resource function processes. A consumes annotation is compared against the Content-Type header from an incoming HTTP request.

Example 9-8. Resource function restricting request processing by Content-Type

xquery version "3.0";

module namespace ex = "http://example/restxq/5";

import module namespace rest = "http://exquery.org/ns/restxq";

declare

%rest:POST("{$body}")

%rest:consumes("application/xml", "text/xml") 1

function ex:echo($body) {

<received>{$body}</received>

};

1

We declare that we wish to only process incoming HTTP requests that have a Content-Type of either application/xml or text/xml. You may specify as many media types as you wish within a consumes constraint annotation, or use multiple consumes constraint annotations.

Produces constraint annotation

The produces constraint annotation, %rest:produces (see Example 9-9), is the counterpart to the consumes constraint annotation: a resource function may be constrained by the media types that a client is willing to accept in an HTTP response. If no produces constraint annotations are present on a resource function, then the function is assumed to create a response that is compatible with any client. Produces constraint annotations are used for content negotiation scenarios, where the client informs the server which media types it accepts. A produces constraint annotation is compared against the Accept header from an incoming HTTP request.

Example 9-9. Resource function restricting request processing by Accept

xquery version "3.0";

module namespace ex = "http://example/restxq/6";

import module namespace rest = "http://exquery.org/ns/restxq";

declare

%rest:POST("{$body}")

%rest:consumes("application/xml", "text/xml")

%rest:produces("application/xml") 1

function ex:echo($body) {

<received>{$body}</received>

};

1

We declare that we wish to only process incoming HTTP requests that will accept a response of type application/xml. You may specify as many media types as you wish within a produces constraint annotation, or use multiple produces constraint annotations.

Parameter annotations

RESTXQ provides four different parameter annotations; however, their behavior is almost identical. It is the source of the parameter extraction that is the main difference. The annotations are:

Query parameters

%rest:query-param extracts a parameter from the URI query string of the HTTP request. The value extracted may be an empty sequence (if the parameter is not present) or a sequence of one or more values, as it is possible to have URI query parameters with the same name and different values.

Header parameters

%rest:header-param extracts a parameter from an HTTP header of the HTTP request. The value extracted may be an empty sequence (if the header is not present), or the value of the header.

Cookie parameters

%rest:cookie-param extracts a parameter from a cookie in the HTTP header of the HTTP request. The value extracted may be an empty sequence (if the cookie is not present), or the value of the cookie.

Form field parameters

%rest:form-param extracts a parameter from a POSTed or GETed HTML form, so this can only be used in combination with %rest:POST and/or %rest:GET. The value extracted may be an empty sequence (if the form field is not present) or a sequence of one or more values, as it is possible to have form fields with the same name and different values!

All of the parameter annotations have the same two forms:

§ %rest:source-param(parameter-name,function-parameter-reference)

§ %rest:source-param(parameter-name,function-parameter-reference, default-value)

The second form allows you to specify a default value to be injected into the named function parameter in case no matching parameter is available in the HTTP request (see Example 9-10).

Example 9-10. Resource function extracting request parameters

xquery version "3.0";

module namespace ex = "http://example/restxq/7";

import module namespace rest = "http://exquery.org/ns/restxq";

declare

%rest:GET

%rest:path("/hello") 1

%rest:query-param("name", "{$name}", "stranger") 2

function ex:say-hello($name) { 3

<greeting>Hi there {$name}!</greeting> 4

};

1

We declare that we wish to only process paths that end with /hello relative to the RESTXQ base URI.

2

We declare that we wish to extract the value of the name URI query parameter and, if it is not available, to use the default value stranger. We declare that the value should be injected into the function parameter $name.

3

This parameter will be set to the value of the URI query parameter declared by %rest:path when invoked by RESTXQ.

4

The value of the URI query parameter will be output as part of the response.

By storing the XQuery anywhere in the database and granting it execute rights, you may then access it by an HTTP GET to the URI http://localhost:8080/exist/restxq/hello. For example, using cURL:

curl -v http://localhost:8080/exist/restxq/hello

results in:

<greeting>Hi there stranger!</greeting>

Note that the default value of stranger is provided in the response because we did not specify a name URI query parameter.

Alternatively, the following:

curl -v http://localhost:8080/exist/restxq/hello?name=Adam

results in:

<greeting>Hi there Adam!</greeting>

A more complete example of using RESTXQ can be seen in “RESTXQ”.

RESTXQ XQuery Extension Functions

RESTXQ attempts to take a minimal approach to providing extensions to XQuery, so it currently defines only three external XQuery functions:

rest:base-uri() as xs:anyURI

Returns the base URI of the RESTXQ Server

rest:uri() as xs:anyURI

Returns the URI of the HTTP request that led to the resource function being invoked

rest:resource-functions() as document-node(element(rest:resource-functions))

Returns an XML document describing the resource functions that are known to the RESTXQ Server

Packaging

Once an application or library is finished, there is often a need to distribute it to others. For instance, something for the public should be easily distributable to and installable by everyone who wants to download and use it. Likewise more often than not, private applications need some kind of distribution too—for instance, moving from a development server to the test server, and after that to production.

To aid in this, eXist can work with packages. A package is an application (or library) bundled into a single archive ZIP file, together with machine-processable information on how to distribute and install it. You can work with packages through eXist’s Package Repository, a core component of eXist since version 2.0. The Package Repository can install/uninstall, update, and launch packages.

eXist also contains an extension module, for use from within your XQuery scripts for working with the repository (see repo). Most of the time, however, you will work with the Package Repository through its user interface, the Package Manager (Figure 9-3), which is available through the dashboard.

Figure 9-3. The eXist Package Manager

This whole packaging idea is based on the EXPath packaging system. This specification was designed to work across different XQuery implementations and is targeted at managing extension libraries (including XQuery, Java, or XSLT code modules). eXist extends this by adding a facility for the automatic deployment of entire applications into the database.

eXist packages come in two categories:

Applications

An application is anything with a web interface. It will produce a tile on the dashboard and will start when the user clicks the tile.

Library packages

Library packages contain data, libraries, or resources used by other packages. They can also contain Java JAR files to load into eXist’s classpath.

You might not have realized it, but you’ve used the result of the packaging system already quite intensively: most dashboard applications are packages. When you browse through the list of installed packages, you’ll likely recognize some of them from the dashboard.

Examples

We need not provide you with extensive examples of packages in this section, because there are many practical examples already in existence that you can learn from:

§ The example code for this book is distributed as a package, and you can look inside the package file or at the installed version in the database (in the /db/apps/exist-book collection) to see how it is structured. Besides examples of all the packaging configuration files, there is some interesting code in the installer subcollection also.

§ The directory $EXIST_HOME/webapp/WEB-INF/data/expathrepo holds the unpacked files of each installed package and is a good source of varied examples.

The Packaging Format

An eXist/EXPath package is a ZIP archive file, containing all the package’s resources in directories which follow their collection structure. The file extension is, by convention, not .zip but .xar.

The root of the archive contains some packaging configuration files:

expath-pkg.xml

The standard EXPath descriptor file. It contains information on things like the package’s name, version, and dependencies. See “The expath-pkg.xml file”.

repo.xml

The eXist-specific deployment descriptor file. It contains additional metadata and controls how the package will be deployed into the database. See “The repo.xml file”.

exist.xml

An eXist adaptation for loading extension modules written in Java. For more information about this, refer to the Package Repository documentation (available through the dashboard’s Function Documentation browser).

The expath-pkg.xml file

The eXist-specific version of the EXPath descriptor file expath-pkg.xml is as follows. The full definition of expath-pkg.xml offers some further options, but these are currently ignored by eXist:

<package xmlns = "http://expath.org/ns/pkg" 1

name = uri2

abbrev = string3

version = string4

spec = "1.0" > 5

title

dependency*

xquery*

</package>

1

An expath-pkg.xml file is an XML document whose content is in the http://expath.org/ns/pkg namespace.

2

name is a URI which is used to globally and uniquely identify the package.

3

abbrev contains a short abbreviation for the package. Since the Package Manager uses this for filename creation, it is best to choose something without spaces and/or punctuation characters.

4

version contains the version number or name of the package. To allow the Package Manager to work with this to its fullest extent, you should use what is called the semantic version number format: x.y.z (where x, y, and z are integers; e.g., 1.2.3). See also the upcoming description of the dependency element.

5

spec is the version of the EXPath specification and always contains, for now, 1.0.

The child elements of the package root element are:

title

The title element contains a descriptive title of the package. This is what will be displayed to the user in the dashboard.

dependency

The dependency element defines other packages that this package is dependent on:

<dependency package = uri

version = string

- OR -

semver = string

- OR -

semver-min = string

semver-max = string

/>

The package attribute holds the URI of a package that this package depends upon (the value of the necessary package’s name attribute from its expath-pkg.xml).

You can specify the version in one of three ways:

§ Define the absolute version of the dependency with the version attribute.

§ Define the version of the dependency in semantic version number format (x.y.z) using the semver attribute. This allows the packaging system, for instance, to select the highest version within a release (e.g., semver="1.2" will satisfy all versions starting with 1.2, like1.2.3, 1.2.16, etc.).

§ Use one or both of the semver-min and semver-max attributes to set the minimum and maximum version number of the dependency using semantic version number format (x.y.z).

Using the semantic version number format is highly recommended.

xquery

Use the xquery child element to register one or more library modules with eXist. These modules then become globally available for your XQuery scripts, and your code or other packages can use them without knowing where they are stored. In other words, you don’t have to use the at clause within the import module declaration for this module in your XQuery script’s prolog:

<xquery>

<namespace>namespace of xquery module</namespace>

<file>filename without path of xquery module</file>

</xquery>

For a package library module like this, the XQuery module itself must be stored in the /content subdirectory of the package.

The repo.xml file

The repo.xml file contains additional metadata which eXist uses to determine how to install and present the package:

<meta xmlns = "http://exist-db.org/xquery/repo" > 1

description 2

author+

website

status

license

copyright

type 3

target 4

prepare 5

finish

permissions 6

</meta>

1

A repo.xml file is an XML document whose content is in the http://exist-db.org/xquery/repo namespace.

2

Child elements description, author, website, status, license, and copyright contain additional (string-type) metadata about the package.

3

The type childelement tells eXist what kind of package this is. It contains either the value library or the value application.

4

The target child element tells eXist where to store the package in the database. This must be a relative path, and eXist prepends this with the root collection where the repository manager stores installed packages. This is, by default, /db/apps. So, a package with<target>xyz</target> specified in repo.xml will be stored in /db/apps/xyz.

NOTE

If you want to change this root directory, you can find its definition in $EXIST_HOME/conf.xml: <repository root="/db/apps"/>.

5

The prepare and finish child elements can contain the name of an XQuery script that runs before and after the package is installed, respectively. Further information about these scripts can be found in the next section.

Allowed values are either empty (no script), or a relative filename (relative to the root of the package). In the past, usually these scripts were called pre-install.xql and post-install.xql and stored in the root of the package, but you’re free to deviate from this convention, and we would now recommend using the .xq file extensions instead (see “XQuery Filename Conventions”).

6

The permissions child element sets for which user and under which permissions the package is loaded:

<permissions user = string

password = string

group = string

mode = string />

§ user, password, and group define the ownership of the loaded package files.

§ mode defines the permissions on the loaded files in permission string format (e.g., rw-rw-r--). For XQuery files, the x permission will be automatically set in addition.

In most cases you’ll want your package loaded under admin privileges. Since the Package Manager runs as admin anyway, you don’t have to specify the admin password. So, the usual contents of the permissions element are:

<permissions user="admin" password="" group="dba" mode="rw-rw-r--"/>

The Prepare and Finish Scripts

In repo.xml, in the prepare and finish child elements, you can define two optional XQuery scripts that run directly before (preinstall) and directly after (post-install) a package is installed. Typical tasks for these scripts could include:

§ Installing indexing or other definitions in a collection.xconf resource underneath /db/system/config. There are two approaches you can take:

§ Do this in the preinstall script (before loading the data). The data will then be indexed on load.

§ Do it in the post-install script and programmatically reindex (using the extension function xmldb:reindex). This is how the package for the book’s example files works. There is some generic code for this in the /db/apps/exist-book/installer/installer.xqm module that could be reused in other packages if so desired.

§ Creating users and groups for your application (or checking whether they exist).

§ Creating data collections and resources outside the application’s collection structure.

Both the pre- and the post-install scripts can make use of some external variables whose values will be provided by eXist during execution. The variables available for you to declare in your query’s prolog are:

declare variable $home external;

declare variable $dir external;

declare variable $target external;

$home

The directory where eXist is installed (i.e., $EXIST_HOME)

$dir

The directory containing the unpacked version of your packages .xar file

$target

The collection where your package will be installed

The pre- and post-install scripts can be difficult to debug as their output is not visible to the end user, and errors are logged only when they are very severe (which usually isn’t the case). The advice here is to test them standalone (as much as possible) before you try them out as part of a package install. If you need to see what your scripts are doing, consider writing some XML to the database from your script to act as a logfile during execution, or make frequent calls to util:log.

Creating Packages

A .xar file is a ZIP file, so it’s easy enough to create one manually, simply: create the right directory structure and contents on disk, and then zip it all up.

However, you’ll probably create your package when developing inside/with eXist. So, the collection structure, the code, and the data will initially be inside eXist. To manually create a package, you’ll have to export it all to disk, zip it, and so on. Not exactly impossible; just boring, repetitious work.

There is an easier solution to this, from the dashboard, start eXide. In eXide, open one of the resources in your package’s home directory (e.g., /db/apps/mypackage/repo.xml). Choose the menu option Application→Download App, and voilà, you are presented with your package.

Additional Remarks About Packages

Packages must be developed in such a way that they’re independent of their final location in the database. So, your application’s code must be able to find out where it is (the path to itself). There are two ways to achieve this:

§ You can use the extension function system:get-module-load-path. Unfortunately for our purposes, this function returns a collection path with the string embedded-eXist-server prefixed, for instance:

§ xmldb:exist://embedded-eXist-server/db/apps/myapp/installer/

installer.xqm

To turn this into a usable collection path, you can use this regular expression code:

replace(system:get-module-load-path(),

'^(xmldb:exist://)?(embedded-eXist-server)?(.+)$', '$3')

This code will work even if this strange string should disappear in a future release.

§ You can also search the repository manager’s root collection for the right package. Here is an example that returns the path to the book example application:

§ declare namespace expath="http://expath.org/ns/pkg";

§

§ let $descriptor := collection(repo:get-root())

§ //expath:package[@name eq "http://www.exist-db.org/exist-book"]

§ return

util:collection-name($descriptor)

1 Adam Retter, “RESTful XQuery: Standardised XQuery 3.0 Annotations for REST,” XML Prague 2012—Conference Proceedings (2012): 91-123.