Other XML Technologies - eXist: A NoSQL Document Database and Application Platform (2015)

eXist: A NoSQL Document Database and Application Platform (2015)

Chapter 10. Other XML Technologies

In previous chapters we have focused on eXist combined with XQuery, with only lip service paid to other XML technologies. But eXist is a full-blown XML application platform and has many other interesting and useful technologies available. One of its greatest strengths is the ability to mix and match different approaches, using the right technology for the problem at hand.

This chapter delves into technologies such as XSLT, XSL-FO, XInclude, XML validation, collations, and XForms and explains how to use them in eXist.

NOTE

We do not explain the technologies themselves; that is to say, this chapter does not contain crash courses on XSLT, XInclude, XForms, and so on. Rather, we assume that if you need one of the aforementioned technologies, you already know how to use it (or are able to learn how elsewhere). Only the relationship with eXist is explained. If you need more information about the technologies themselves, please refer to “Additional Resources”.

A notable missing technology in this chapter is XProc. Although eXist does contain some support for using XProc pipelines, this is still rather experimental and subject to change. There is a connector to the open source XProc processor XML Calabash available; see xmlcalabash.

NOTE

As of early 2014, there is an XProc module under development, but this will not run on eXist v2.1. You’ll need to wait for v2.2 (or use the development branch from GitHub) to be able to use it.

XSLT

XQuery is a powerful language, but there are tasks that can be solved just as well with XSLT, and sometimes even more easily. For instance, for complex XML transforms you can either use XQuery (with typeswitch constructions) or XSLT. Whether transformations are more easily achieved in XQuery or XSLT is a contentious issue, with many experts firmly preferring one over the other. Fortunately eXist supports both, so you may decide for yourself which you find easier. The most basic examples of using XSLT in eXist can be found in “Hello XSLT”.

For executing XSLT, eXist 2.1 uses Saxon HE (Home Edition) version 9.4.0.7 by default. If you need to upgrade to Saxon’s commercial PE (Professional Edition) or EE (Enterprise Edition), you can replace the existing Saxon libraries in $EXIST_HOME/lib/endorsed with their respective PE or EE counterparts and the accompanying license file. If you need a different XSLT processor, you can configure it in $EXIST_HOME/conf.xml.

An important consequence of using an external XSLT processor (and not one that is truly part of the eXist core) is that XSLT scripts run in isolation from the rest of the environment. The documents the XSLT processor works on are passed wholesale, but their database context (most importantly, indexes) is lost. No index-based query optimization is performed. So, be careful in designing the interaction between XQuery and XSLT: it’s best to leave the querying to your XQuery scripts and use XSLT for transformation only.

Embedding Stylesheets or Not

Stylesheets can be fully embedded in your XQuery code. Example 10-1 shows an XQuery script that runs an embedded stylesheet for checking the XSLT system property information (and can find out whether the Saxon version changed in the eXist version you’re using).

Example 10-1. Get the XSLT processor information

xquery version "1.0" encoding "UTF-8";

declare option exist:serialize "method=html media-type=text/html indent=no";

declare variable $page-title as xs:string := "XSLT processor information";

declare variable $xslt as document-node() := document {

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"

xmlns:xs="http://www.w3.org/2001/XMLSchema"

xmlns:fn="http://www.w3.org/2005/xpath-functions"

exclude-result-prefixes="#all">

<xsl:variable name="SystemProperties" as="xs:string+"

select="('xsl:vendor',

'xsl:vendor-url',

'xsl:product-name',

'xsl:product-version')"/>

<xsl:template match="/">

<XsltInfo>

<xsl:for-each select="$SystemProperties">

<Info property="">

<xsl:value-of select="system-property(.)"/>

</Info>

</xsl:for-each>

</XsltInfo>

</xsl:template>

</xsl:stylesheet>

};

<html>

<head>

<meta HTTP-EQUIV="Content-Type" content="text/html; charset=UTF-8"/>

<title>{$page-title}</title>

</head>

<body>

<h1>{$page-title}</h1>

<ul>

{

for $info intransform:transform(<dummy/>, $xslt, ())//Info

return

<li>{string($info/@property)} = {string($info)}</li>

}

</ul>

</body>

</html>

NOTE

Note the double curly braces in the XSLT stylesheet in <Info property="">. This is because we want to use the XSLT attribute-value template mechanism here, but if we use a single curly brace, XQuery kicks in and tries to interpret the contents as an XQuery expression. You can work around this by using double curly braces, which are passed as single curly braces to the XML document we’re defining.

We embedded the XSLT stylesheet in our XQuery script here to show you how this works. This is useful for small scripts, but if your XSLT is longer, it is better to store it in a resource of its own and reference it (see “Invoking XSLT with the Transform Extension Module” for details). There are also other advantages to not embedding XSLT within your XQuery:

§ Most XML IDEs provide content suggestion/completion when editing XSLT stylesheets (e.g., by proposing elements or showing function declarations). If you write your stylesheet embedded in an XQuery script, the IDE most likely cannot provide such help due to the mixed-content model.

§ When your code becomes sufficiently complex, you will probably want to test the stylesheet separately from the surrounding XQuery code. This is much easier when the stylesheet is a separate resource; for example, you may want to use XSpec to execute your stylesheet against a series of behavior-driven development (BBD)–style tests.

§ When your stylesheet is separate, it is possible to run it through an XSLT debugger when you are trying to diagnose a problem. Such a debugger is available for Saxon in the oXygen XML Editor.

§ Separate XSLT stylesheets can often have their compiled form cached, making repeated invocations faster.

Invoking XSLT with the Transform Extension Module

Performing XSLT transformations from your XQuery code can be done with eXist’s transform extension module. For instance:

transform:transform(

<input><text>hello XSLT</text></input>,

'xmldb:exist:///db/myapp/convertinput.xslt',

<parameters><param name="type" value="basic"/></parameters>

)

The first argument is the node tree to transform; the second is the URI or document element of the stylesheet. The third parameter passes the external parameter type=basic to the stylesheet (which you can reference in the XSLT with a global <xsl:param name="type"/>).

The transform extension module offers two approaches for doing a transformation:

transform:stream-transform

Directly streams the result of the transformation to the output stream, returning the empty sequence (). It is most commonly used as the final transformation step for converting XML into HTML.

The only thing you’ll see in your output is the output of transform:stream-transform; everything else is ignored. So, this is usually the last statement in a script.

WARNING

transform:stream-transform works only from within the REST Server; it does not work in a RESTXQ context. See “Building Applications with RESTXQ”.

transform:transform

Passes the result of the transformation back to you as a node tree.

All functions have the same parameter list:

$node-tree as node()*

The node tree to transform.

WARNING

At present eXist relies on an external XSLT processor, so the node tree has to be serialized to a byte stream, passed to the XSLT processor, and reparsed before it can be processed. This adds some overhead to the transformation and can have an impact when you’re using XSLT on very large documents from eXist.

$stylesheet as item()

The stylesheet to apply. This can be either a node tree containing a valid XSLT stylesheet, or a URI referencing an XSLT stylesheet. URIs to stylesheets residing in the database must be specified as XMLDB URIs (i.e., start with xmldb:exist://).

When you’re passing stylesheets by URIs, the stylesheet is cached, speeding up performance of the invocations that follow.

$parameters as node()

Optional parameters for the stylesheet, as described in the next section.

$serialization-options as xs:string (optional)

Optional serialization options to apply to the result. Must be in the same format as the exist:serialize option (refer to “Serialization”).

There is one additional serialization option: xinclude-path. This specifies the base path for expanding XIncludes (if any). More information about XInclude can be found in “XInclude”.

Passing XSLT Parameters

You can pass (string) parameters to your stylesheet by constructing an XML fragment as follows and passing it as the third argument to the transform function:

<parameters>

<param name="par1" value="value of par1"/>

<param name="par2" value="value of par2"/>

<!-- ...any further parameters -->

</parameters>

You reference these in your stylesheet by specifying global parameters with the same names. For instance:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:param name="par1"/>

<xsl:param name="par2"/>

<xsl:template match="/">

<p>Values passed were <xsl:value-of select="$par1"/>

and <xsl:value-of select="$par2"/></p>

</xsl:template>

</xsl:stylesheet>

There are two special parameters defined:

exist:stop-on-error

If this parameter is present and set to yes, an XQuery error is generated if the XSLT processor reports an error.

exist:stop-on-warning

If this parameter is present and set to yes, an XQuery error is generated if the XSLT processor reports a warning.

Most errors emitted by the XSLT processor are of the category fatal and will stop the processing anyway.

Invoking XSLT by Processing Instruction

Another way of invoking XSLT is by adding a <?xml-stylesheet type="text/xsl" href="..."?> processing instruction at the top of your output. The href attribute should contain a reference to an XSLT stylesheet. Relative names are taken from the location of the originating XML.

Although normal use for this processing instruction is for triggering client-side XSLT processing (in the browser), in an eXist context it triggers server-side XSLT processing. This means the client will see the output of the transformation.

There is a nice example of this in the eXist demo application. If you look in /db/apps/demo/data, you’ll find some Shakespeare plays marked up in XML. At the top of these XML files is the processing instruction <?xml-stylesheet href="shakes.xsl" type="text/xsl"?>. The same collection holds the referenced shakes.xsl document, which contains an XSLT stylesheet that renders a play in HTML. See this in action by browsing to, for instance, http://localhost:8080/exist/rest/db/apps/demo/data/macbeth.xml.

Checking for and invoking an XSLT stylesheet by processing instruction is enabled by default. You can, however, disable it with serialization options; see “Post-processing serialization options”.

Stylesheet Details

Here are some final details about using XSLT from within eXist:

§ If you get a puzzling Java NullPointerException when trying to transform something using a URI for the stylesheet, it probably means that the URI was incorrect.This rather unspecific error message makes it hard to find out why the code isn’t working. A common mistake here is to forget to start the URI with xmldb:exist:// when you want to specify a stylesheet in the database!

§ The serialization options set in the XSLT stylesheet itself (with xsl:output) will not be used.

§ xsl:include and xsl:import work as expected. Relative filenames are resolved against the location of the stylesheet.

§ The XPath doc function in a stylesheet works as expected: it loads a document. Relative paths are resolved against the location of the stylesheet (not the location of the calling XQuery file). Like in XQuery, doc silently returns an empty sequence, (), when the referenced document is not an XML file.

§ The collection function does not behave as you would expect: it does not return any direct information about the collections in the database. It can, however, be used as Saxon intended it. For more information about this, please refer to the Saxon documentation.

§ You can use the XSLT xsl:result-document instruction to create files on the filesystem only; writing to the database (with an XMLDB URI) results in an error. Relative paths are resolved against $EXIST_HOME (so <xsl:result-document href="test.xml"> will result in a file $EXIST_HOME/test.xml, which is probably not where you want an output file to be written).

XInclude

By default, eXist performs XInclude processing during the serialization phase. In a nutshell, this means that xi:include elements are replaced with what they refer to (the namespace prefix xi must be bound to the namespace http://www.w3.org/2001/XInclude). XInclude is an official W3C standard.

XInclude is primarily intended for reusing XML or XHTML code fragments. Therefore, a use case for XInclude is an application that outputs pages with a fixed menu and navigation bar: you can insert the XHTML code for these parts using XInclude. However, in eXist more complicated scenarios are also possible, like including the output of XQuery scripts, or partial documents.

A first simple XInclude example can be found in “Hello XInclude”.

XInclude processing is switched on by default. If you want to turn it off, you can do so by using serialization options; see “Post-processing serialization options”.

eXist’s implementation of the XInclude standard is not complete. Its limitations are:

§ You can’t use it to include raw text. Only XML is supported.

§ XPointers are restricted to XPath. Additional features, like points and ranges, are not supported.

Including Documents

Including a document is easy—just add a reference to it in the href attribute, as in the following examples:

<xi:include href="includethisxml.xml"/>

<xi:include href="file:///file/from/filesystem.xml"/>

<xi:include href="http://somewhere.com/xmlfeed"/>

If you don’t include a scheme (like http:// or file://), the document is included using the same scheme as the master document (either database or filesystem). Relative paths are resolved from the location of the master document.

You can limit the output of the include by using an xpointer attribute holding a limited XPointer expression. eXist supports two constructions:

Node identifier

If you only specify the identifier of a node, the output will be limited to that node. An identifier can be set by an xml:id attribute or an attribute that is marked as type ID by an attached DTD. For instance, if we have the document:

<Lines>

<Line xml:id="L1">Line 1</Line>

<Line xml:id="L2">Line 2</Line>

</Lines>

Specifying the XInclude as <xi:include href="..." xpointer="L1"/> (assuming the href is correct) will only include the first Line element.

XPath

The other construction supported is passing an XPath expression with the so-called xpointer scheme. Here are some examples:

<xi:include href="includethisxml.xml" xpointer="xpointer(//Line[1])"/>

<xi:include href="file:///file/from/filesystem.xml"

xpointer="xpointer(//customer[@id eq '123'])"/>

Including Query Results

If the href attribute references an XQuery script stored in the database, the script will be executed and the results included.

The executing script can get information about the master document with two variables (you don’t have to explicitly declare them as external unless you wish to):

$xinclude:current-doc

The name of master document without the collection path

$xinclude:current-collection

The collection for the master document

WARNING

These variables are implicitly declared by the XInclude processor. This means that if you want to use the same script outside of the XInclude context, you can’t use (reference) them!

Passing your own parameters is also possible. For instance, when you have an XInclude that looks like this:

<xi:include href="script.xq?par1=abcdef"/>

you can reference the par1 parameter in your script as $par1 by declaring it as an external variable:

declare variable $par1 external;

Limiting the output by using XPointer, as when including documents, is not possible for XQuery results.

Error Handling and Fallback

If you try to include a resource that doesn’t exist, an error will be generated. You can prevent this by specifying an xi:fallback element:

<xi:include href="includethisxml.xml">

<xi:fallback><p>XML not found!</p></xi:fallback>

</xi:include>

Validation

Validation involves checking an XML document against a grammar document, like a DTD (document type definition) or an XML schema, and determining whether it conforms to this grammar. eXist can validate documents in two ways:

Implicit validation

Validates an XML document when it is stored into the database. Any parsing error stops the document from being stored.

Explicit validation

Validates documents from within XQuery code, using the validation extension module.

Implicit Validation

Implicit validation is (if turned on) performed when a document is being stored into the database. eXist will search for an appropriate grammar document, validate the incoming XML document against it, and reject or accept the incoming document based on the validation results. You can turn implicit validation on or off for the full database or specific collections.

Implicit validation is useful when you want to make absolutely sure that all the stored content is valid. However, some interesting limitations apply:

§ Implicit validation can only be performed using XML schemas or DTDs, and not for instance with RELAX NG.

§ The catalogs used for finding the grammar documents are globally defined for the entire database. There is no way to use a specific catalog for a particular collection or application.

§ You cannot specify that a certain collection should only accept documents validating against a specific grammar. The only thing you can specify is that the documents must be valid according to the global set of grammar documents available in the catalog. So, a scenario forcing a collection to hold, for instance, only DocBook files is not possible with this mechanism.

This makes implicit validation a somewhat coarse-grained mechanism. However, there are use cases where it can help make your application more robust, for instance, when your database is serving a single application and all data files must validate against a specific set of XML schemas or DTDs.

Controlling implicit validation

To turn implicit validation on or off for the full database, you have to edit the $EXIST_HOME/conf.xml configuration file (and restart eXist afterward). Search for the following fragment:

<validation mode="no">

<entity-resolver>

<catalog uri="..."/>

</entity-resolver>

</validation>

The mode attribute determines whether implicit validation is on or off. It can have one of the following three values:

no

Implicit validation is off.

yes

Implicit validation is on. All XML documents are validated and rejected if they do not pass. If an appropriate XML schema or DTD cannot be found (see “Specifying catalogs for implicit validation”), the document is rejected also.

auto

Implicit validation is applied only when an appropriate XML schema or DTD can be found. Otherwise, the document is accepted.

To tune implicit validation for a specific collection (and its subcollections), you have to do the following:

1. The database has a system collection, /db/system/config. Repeat the database collection structure here, leading up to the collection for which you want to specify the implicit validation. So, when you want to turn on implicit validation for /db/myapp/data, create the collection/db/system/config/db/myapp/data.

2. Create an XML file here called collection.xconf with the following contents:

3. <collection xmlns="http://exist-db.org/collection-config/1.0">

4. <validation mode="..."/>

</collection>

5. Fill in the appropriate value for the mode attribute, as described earlier.

Specifying catalogs for implicit validation

The validation element in the $EXIST_HOME/conf.xml file also contains the URIs of the catalog files that eXist uses for implicitly validating documents. A catalog file specifies:

§ Mappings from system or public IDs to DTD grammar documents

§ Mappings from namespaces to XML schema grammar documents

eXist works with v1.0 OASIS catalog files. For an example, have a look at the default eXist catalog file in $EXIST_HOME/webapp/WEB-INF/catalog.xml.

Catalog files must be specified in the $EXIST_HOME/conf.xml file as in this example:

<validation mode="no">

<entity-resolver>

<catalog uri="${WEBAPP_HOME}/WEB-INF/catalog.xml"/>

<catalog uri="xmldb:exist:///db/myapp/myapp-catalog.xml"/>

</entity-resolver>

</validation>

All uri attributes must point to valid OASIS catalog files:

§ By default, the catalogs are on the filesystem. Use an XMLDB URI (like in the second catalog element in the preceding example) to specify a catalog that is stored in the database.

§ You can use ${EXIST_HOME} to point to $EXIST_HOME and ${WEBAPP_HOME} to point to $EXIST_HOME/webapp.

Explicit Validation

Explicit validation allows you to perform validation from your code using the validation extension module. eXist provides three different parsers for this:

JAXP

This parser (called JAXP because internally it uses the Java javax.xml.parsers interface) validates documents using Xerces2. XML schemas (v1.0) and DTDs are supported through Xerces. Implicit validation uses the JAXP validator.

JAXV

This parser (called JAXV because internally it uses the Java java.xml.validation interface) validates documents using the validation facility built into the Java standard library. Only XML schemas are supported.

Jing

This validation is based upon James Clark’s Jing parser. It supports XML Schema, RELAX NG (both full and compact), Schematron (v1.5), and Namespace-based Validation Dispatching Language (NVDL).

This leaves you with the problem of which one to use. If you are unsure, the general advice is to use JAXP for XML schema− and DTD-based validation and Jing for all other types.

Performing explicit validation

You can perform explicit validation using the functions from the validate extension module. This module has separate functions for all three parsers (JAXP, JAXV, and Jing).

There are also generic validation functions that try to select the best parser for you, based on the type of grammar document that you provide. We’ll describe these generic functions here; the functions for the specific parsing types are more or less the same. For specifics, please refer to the online XQuery function documentation.

There are two generic validation functions:

validation:validate

This will validate your input document and return true or false depending on the result.

validation:validate-report

This will validate your input document and return an XML fragment describing the result. Use this if, for instance, you have to provide detailed feedback to the end user about the validity of some input.

These functions will choose a parser based on the type of grammar document: if this is a DTD or XML schema, the JAXP (Xerces) parser is used, otherwise Jing is used.

WARNING

These functions do not produce the correct results for XML where the root element is not in a namespace. In that case, use the specific functions instead (e.g., validation:jing-report).

Both functions have the following arguments:

$instance as item()

This is the input document to validate. You can specify either a URI (data type xs:anyURI), an element, or a document node.

$grammar as xs:anyURI

This argument is optional. It is used to determine the grammar document. Specifying the grammar in $grammar can be done in one of four ways:

§ If you don’t specify $grammar, the catalogs defined for implicit validation (see “Specifying catalogs for implicit validation”) are used.

§ If the $grammar URI ends with .dtd (DTD), .xsd (XML Schema), .rng (RELAX NG), .rnc (RELAX NG Compact), .sch (Schematron), or .nvdl (NVDL), it is assumed to reference a grammar document of that type.

§ If the $grammar URI ends with .xml it is assumed to be an OASIS catalog file, which is used to further determine the grammar document.

§ If the $grammar URI ends with a /, it is assumed to be the name of a collection. eXist will search for an appropriate grammar in that collection and its subcollections.

The report returned by validation:validate-report for a valid document will look like this:

<report>

<status>valid</status>

<namespace>http://myapp.com/namespace</namespace>

<duration unit="msec">51</duration>

</report>

For an invalid document, it will contain one or more error messages. For example:

<report>

<status>invalid</status>

<namespace>http://myapp.com/namespace</namespace>

<duration unit="msec">6</duration>

<message level="Error" line="10" column="29">

cvc-complex-type.2.4.a: Invalid content was found ...

</message>

</report>

Grammar management in the JAXP (Xerces) parser

This applies to the JAXP (Xerces) parser type only: to speed up validation, grammar documents are loaded, compiled, and held in the cache. This is usually fine, but there might be situations where you want a little bit more control over this (for instance, when you’re developing grammars).

eXist provides the following XQuery functions for working with the grammar cache:

validation:clear-grammar-cache

Clears the grammar cache and returns the number of deleted grammars

validation:pre-parse-grammar

Parses one or more XML schema documents or DTDs and adds them to the grammar cache

validation:show-grammar-cache

Returns an XML fragment describing the contents of the grammar cache

Collations

Collations are the mechanism used for comparing strings. By specifying a collation, you can make comparing strings language-specific (a.k.a. locale-specific). For instance, in a default comparison of 'në' with 'ni', 'ni' comes first, because the Unicode code for an ë is greater than the Unicode code for an i. However, if you compare these words with a collation for a language that uses diacriticals (like Dutch, German, or French), things get reversed because the ë is treated like an e.

Supported Collations

eXist supports the following collations:

http://www.w3.org/2005/xpath-functions/collation/codepoint

This is the default collation that uses the Unicode code points. Internally, the basic Java string comparison and search functions are used.

http://exist-db.org/collation?lang=...&strength=...&decomposition=...

Or for short: ?lang=...&strength=...&decomposition=... (the strength and decomposition parameters are optional). This specifies a language-specific collation:

§ The lang parameter selects the language using an ISO 639-1 language code like en, en-US, de, nl-NL, or fr.

You can find out which languages are supported by calling the util:collations extension function.

§ The strength parameter value must be one of primary, secondary, tertiary, or identical.

§ The decomposition parameter value must be one of none, full, or standard.

What exactly these parameters do is a deep and rather separate subject that we’re not going to handle here. It has to do with the way Unicode is built up, and canonization of Unicode accented characters. Most likely, if you don’t know what this is about, you probably don’t need to. A good place to start looking for more information is the Unicode site.

Specifying Collations

There are several ways to work with collations:

§ You can specify a default collation for your XQuery script in its prolog:

declare default collation "?lang=de-DE";

§ The FLWOR expression’s order by clause has a collation keyword for specifying the collation:

§ for $w in$list-of-words

§ order by $w collation "http://exist-db.org/collation?lang=nl-NL"

§ return

$w

§ Several functions have collation arguments—for instance, contains and ends-with.

WARNING

Lots of standard string functions, like contains and ends-with, accept an optional third collation parameter. Although you can certainly use this functionality, it may stop the expression from being optimized and indexes from being exploited!

XSL-FO

XSL Formatting Objects (XSL-FO) is an XML vocabulary to transform XML into formatted media, often PDF. To turn XSL-FO XML into PDF you need an XSL-FO formatter (or renderer), such as the open source Apache FOP or a commercial one like Antenna House Formatter orRenderX XEP. eXist has the ability to connect directly with several XSL-FO renderers. Our examples will use the open source Apache FOP formatter, but they should work for any formatter supported by eXist.

NOTE

eXist has standard connectors for the Apache FOP, RenderX, and Antenna House formatters. You can change formatters by placing the JAR files in $EXIST_HOME/lib/user and changing the processor adapter within the definition of the XSLFOModule module configuration in $EXIST_HOME/conf.xml.

It is also possible to add support for any third-party FO processor to eXist by writing a simple SAX adapter in Java that implements org.exist.xquery.modules.xslfo.ProcessorAdapter and making it available on the classpath (e.g., adding it to $EXIST_HOME/lib/user).

Usually, an (XML) application that wants to present something in PDF creates, from some data source, the XSL-FO XML. This can be done by XSLT, XQuery, or any other way you like. The resulting XSL-FO document is then passed to the XSL-FO formatter for final processing. If your XSL-FO document doesn’t contain any errors, a PDF is produced.

Performing the final XSL-FO transformation is trivial. We assume here that the XSL-FO document creation is already complete and that the final XSL-FO document is available from somewhere in the database. The following example transforms this document to PDF and displays it in your browser:

let $xsl-fo-document as document-node() := doc('some-xsl-fo.xml')

let $media-type as xs:string := 'application/pdf'

return

response:stream-binary(

xslfo:render($xsl-fo-document, $media-type, ()),

$media-type,

'output.pdf'

)

The xslfo:render function does the trick: it transforms the XSL-FO instructions into a (binary, xs:base64binary) PDF document and returns this. This is picked up by response:stream-binary, which sends it to your browser. Because the Internet media type is set toapplication/pdf, it will (hopefully) show up as a nicely formatted PDF document. There is an example in the accompanying source code (see “Accompanying Source Code”) that does exactly this.

The parameters for the xslfo:render function are:

$document as node()

The XSL-FO document to render.

$mime-type as xs:string

The requested output’s Internet media type. In most cases this will be application/pdf. Please refer to your XSL-FO formatter’s documentation to find out if other Internet media types are supported also.

$parameters as node()

Parameters to pass to the formatter. The format, a parameter element with param children, is exactly the same as that used for passing parameters to an XSLT transformation (see “Passing XSLT Parameters”). Recognized rendering parameters are author, title, keywords, and dpi.

$config-file as node() (optional)

An optional formatter configuration file. Please refer to your XSL-FO formatter’s documentation for more information about this.

XForms

XML Forms (XForms) is an XML standard developed by the W3C to provide the next generation of forms for the Web: it splits apart the data model from the presentation of that data model, so that you may focus on each independently. XForms are a major component of XRX web application architectures.

XForms are not freestanding expressions; rather, they must be embedded into a host document. Originally, XForms were expected to become the forms for XHTML 2.0. While the XHTML 2.0 Working Group has expired and XHTML 2.0 has been superseded by HTML5 and its XHTML expression, the XForms standard is still being actively developed and of course may be embedded into HTML5 documents as an alternative or complement to HTML forms.

You may be wondering when you would consider using XForms instead of HTML forms. Our advice would be to use XForms when you need to collect anything more than a couple of trivial fields. XForms can provide automated validation and correction hinting of form values and enable you to collect your form responses into a complex, structured XML document that can be saved directly into eXist and/or further processed with XQuery or XSLT. One of the great advantages of XForms is that the same form that is used for collecting data can later be used to edit that data (when provided with the collected XML document as its instance). Another advantage is that if your forms need to perform calculations, either for display or within an instance for submission (such as calculating totals in a spreadsheet), XForms provides clever dependency rules that enable these values to be automatically recalculated when a dependency in the graph changes.

There are two main classes of XForms processors:

Server-side processors

A server-side XForms processor renders XForms markup into another form (such as HTML, CSS, or JavaScript) on demand, when a request for a form is made to the server. The main advantage of the server-side approach is that you only transmit a rendered representation of theXForm to the client; you don’t need to transmit all of the data of the model behind the form. Server-side processors often blur the strict client/server boundary, as the JavaScript (or other code) that some of them generate on the server in fact often runs on the client.

Client-side processors

A client-side XForms processor instead typically operates inside a web browser, as either a plug-in or a JavaScript library. When the browser receives XForms markup, the XForms processor modifies the page DOM to enable the browser to render the XForm as intended and handle interactions and events. The main advantages of the client-side approach are that you do not require any special server-side processing, and you distribute the processing of forms to the client.

This chapter is not a comprehensive explanation of XForms itself, but instead is meant to show how you can use XForms with eXist. For in-depth information about XForms, check out Micah Dubinko’s book XForms Essentials (O’Reilly) and Dan McCreary’s XForms wikibook.

eXist provides facilities for both server-side processing through betterForm, which is embedded in eXist, and client-side processing through XSLTForms, which is available as an EXPath package for eXist. We will take a look at how each of these may be configured and used shortly.

At this point it is also worth mentioning the excellent Orbeon Forms. Orbeon is an open source (LGPL v2.1–licensed) server-side XForms processor that ships with an embedded eXist instance. There is also a commercial and supported version available. One of the major features of Orbeon is that it provides a pipeline language called XPL that enables you to easily create XForms, deliver them over the Web, and then save the results into eXist. It is also possible to configure Orbeon to use a separate eXist server instead of its own embedded instance. Orbeon is a separate project that deserves a book in its own right, so it will not be discussed further here; however, if you are interested in XForms and eXist, it is well worth evaluating.

XForms Instances

An XForm may have one or more instances within its model; these instances define the model aspect of the MVC architecture behind XForms. Simply put, each instance can be considered a standalone XML document that provides data to the form, for the purposes of display, capture, or influencing behavior. Ultimately, it is usually an instance (XML document) that is stored into eXist when the user submits the form. Typically, in a simple XForm the instances are hardcoded as either documents that have structure but no content, or documents with content that is to be edited; however, with eXist, you have several ways to make the instance data available to your form in a more dynamic manner.

Instances and the REST Server

Each instance within the model of an XForm need not be inlined. Rather, an instance can be retrieved from an external URI—and what better place to retrieve your XML instance documents from than a native XML database like eXist?

So, rather than constructing your instance inline like so:

<xf:model>

<xf:instance xmlns="">

<company>

<name>eXist Solutions GmbH</name>

<registration>HRB 89454, Amtsgericht Darmstadt</registration>

<vatId>DE273180763</vatId>

<taxNum>007 232 51397 </taxNum>

</company>

</xf:instance>

...

</xf:model>

you could instead store your instance document into eXist and construct your instance like so:

<xf:model>

<xf:instance xmlns=""

src="http://localhost:8080/exist/rest/db/companies/exist-solutions.xml"/>

...

</xf:model>

While the end result is the same, there are several advantages to be gained from using the latter approach:

Content reuse

Your instance data can be reused in different applications, and may not necessarily be exclusive to your XForm.

Security

eXist provides an extensive security system and offers authorization and authentication for resources in the database. Therefore, you can separately manage the security constraints of your data and your forms, which may have different requirements.

Architecture

While referencing the URL of the instance still provides a static instance, it is a pattern that we can reuse to provide a dynamic instance instead.

Instances and XQuery

We have seen how you may request an instance from eXist’s REST Server with XForms rather than inlining the instance content, but up to this point the instances have been static. Here we look at generating an instance dynamically using XQuery.

Imagine that in the database we have a collection of XML documents (/db/weather), one for each day, that describes the weather for that day. In our form we may wish to display some information about today’s weather. By sending a small piece of XQuery to the REST Server as part of an HTTP GET request, we can retrieve the correct weather document for our instance. Such a request to the REST Server may look like:

http://localhost:8080/exist/rest/db/weather?_query=

/weather[@date eq current-date()]

An XForms instance declaration to retrieve this instance would then look like Example 10-2.

Example 10-2. Instance retrieval by query submission

<xf:instance xmlns=""

src="http://localhost:8080/exist/rest/db/weather?_query=%2Fweather%5B%40date%20eq

%20current-date()%5D&_wrap=no"/>

NOTE

In the URL used in the xf:instance/@src we need to URL-encode the query string parameter for the XQuery used to ensure that it is correctly transmitted. We have also added the parameter wrap=no, as we want the matched XML document for our instance; otherwise, by default it would have been wrapped in an exist:result element by the REST Server!

See “Querying the database” for further information on submitting an XQuery to the REST Server.

Our next example is a reworking of Example 10-2, but rather than sending the XQuery to the database, it instead relies on the fact that we have already stored the XQuery into the database. Doing so allows us to later invoke the query from the REST Server by URI and have it executed.

So, if you were to store the following XQuery into the database at /db/weather.xq:

xquery version "1.0";

collection("/db/weather")/weather[@date eq current-date()]

An XForms instance declaration to retrieve this instance would then look like Example 10-3:

Example 10-3. Instance retrieval by stored query

<xf:instance xmlns=""

src="http://localhost:8080/exist/rest/db/weather.xq"/>

You have now seen how you can bring in instance data dynamically, but this is really just scratching the surface of what is possible. You can also send parameters to your stored XQuery to influence the XML it will produce for your instance. For further information, see “Executing stored queries” and “The request Extension Module”. As an alternative to stored query execution via the REST Server, you could retrieve an instance from a URI provided by a RESTXQ resource function; for further details, see “RESTXQ”.

TIP

It is also possible to dynamically calculate the URI from which to retrieve instance data. Unfortunately, this cannot be done through the xf:instance directly, as XPath expressions are not allowed in the src attribute. However, this is possible through clever use of an xf:submission and event handling to replace instance content, as described athttps://en.wikibooks.org/wiki/XForms/Read_and_write_with_get_and_put#Discussion.

XForms Submissions

Typically, you will want to store the completed result of your XForm somewhere, either for posterity or for further processing. The responsibility of an xf:submission is typically to submit an instance from the model using a method to a resource. Almost all XFormsimplementations support submission by HTTP GET, POST, and PUT, which is a great fit for use with eXist’s REST Server or RESTXQ APIs. Given that, you can easily have the result of your completed XForm stored into the database.

Submission to the REST Server

It is relatively trivial to design your XForm to store its result into an XML document in the eXist database, by simply modifying its xf:submission to HTTP PUT the instance into an XML document in a collection within the eXist database.

For example, the xf:submission shown in Example 10-4, when fired, would place the result of the XForm into the document /db/registration/result.xml within eXist.

Example 10-4. XForms submission to REST Server

<xf:submission id="s-save" method="put"

resource="http://localhost:8080/exist/rest/db/registration/result.xml"

replace="none">

<xf:action ev:event="xforms-submit-error">

<xf:message>Registration failed. Please fill in valid values.</xf:message>

</xf:action>

<xf:action ev:event="xforms-submit-done">

<xf:message>You have been registered successfully.</xf:message>

</xf:action>

</xf:submission>

Some advantages of this approach are:

Automatic collection creation

As we are doing an HTTP PUT to the REST Server, eXist will create any collections that do not yet exist but are required to store the document.

Create or update

If a document with the same URI does not yet exist in the database, it will be created. However, if a document with the same URI is already present, it will be overwritten with the new instance content.

The major disadvantage of this approach is that we can only create a single document in the database, when it’s likely we’ll want many users to fill out our form and the results to be stored into the database and/or further processed. Solving this will be discussed next.

Submission via XQuery

You have seen how you may store the result of an XForm directly into eXist via the REST Server without having to know anything more than XForms. However, this approach is quite limited, so we will now look at submission via XQuery to dynamically store and/or post-process the instance content.

By submitting the instance content to a stored XQuery via the REST Server, we have the full power of XQuery at our fingertips to help us decide how to then store the document into the database. Of course, we may also do some post-processing and assert some control over the result of the XForms submission by having the XQuery return an appropriate HTTP response to the submission. Now we will look at storing each instance submission into its own document in the database collection /db/registration.

Say you create the collection /db/registration, and then store the XQuery shown in Example 10-5 into the database at /db/registration.xq.

Example 10-5. Submission via stored query

xquery version "1.0";

import module namespace request = "http://exist-db.org/xquery/request";

import module namespace xmldb = "http://exist-db.org/xquery/xmldb";

let $doc-db-uri := xmldb:store 1

("/db/registration", 2

(), 3

request:get-data() 4)

return

<stored> 5

<dbUri>{$doc-db-uri}</dbUri>

<uri>http://{request:get-server-name()}:{request:get-server-port()}

{request:get-context-path()}/rest{$doc-db-uri}</uri>

</stored>

1

The XQuery function xmldb:store will store a document into a database collection in eXist.

2

We specify the collection /db/registration as the first argument to xmldb:store, which is the collection in which to store the document.

3

Note that the second argument to xmldb:store is the empty sequence; this tells eXist that we do not know the name of the document we wish to store. eXist will create a name for the document on our behalf; a random-number generator is used and the result is encoded into a hexadecimal string for use as the filename.

4

The function request:get-data will retrieve the body of an incoming HTTP POST or PUT request; in this case, it will be the instance content from the XForm.

5

We return to the XForm submission a simple XML document, which, although we do not act on it in our form here, could be used for instructing the XForm further.

An XForms submission declaration to submit the instance to this XQuery would then look like:

<xf:submission id="s-save" method="post"

resource="http://localhost:8080/exist/rest/db/registration.xq"

replace="none">

<xf:action ev:event="xforms-submit-error">

<xf:message>

Registration failed. Please fill in valid values

</xf:message>

</xf:action>

<xf:action ev:event="xforms-submit-done">

<xf:message>You have been registered successfully.</xf:message>

</xf:action>

</xf:submission>

NOTE

We use the POST method here for our xf:submission instead of the PUT method used in Example 10-4 as the REST Server in eXist does not support stored query execution via HTTP PUT.

The disadvantage of the approach in Example 10-5 is that if the same user completes the form twice or hits Submit twice, you will end up with two XML documents in the database collection containing (most likely) the same instance content. There are many different ways to attack this problem, but they all involve being able to identify the user. Two possible approaches are as follows:

§ Have the user add a uniquely identifiable piece of information to a form field. When the form is submitted, you can check if this information from the instance is already present in the database; if so, you can fail the submission by returning an HTTP 403 Forbidden error code (e.g., response:set-set-status-code(403)).

§ Set up authentication and appropriate permissions, and have the user log in via XQuery before allowing her to access the XForm. In this way a session will be created on the server. You can use the username to determine if the user has already created a document in the/db/registration collection. If she already has a submission in the collection, simply return a 403 Forbidden response to the form submission.

In combination with this approach, you could generate the XForm using XQuery (as simply as calling doc). If the user already has a submission in the collection, instead of showing her the form, you can display a message or redirect the user to another page.

For further details on calling stored queries via the REST Server, see “Executing stored queries”. As an alternative to stored query execution via the REST Server, you could instead submit an XForm instance to a URI provided by a RESTXQ resource function, as discussed in“RESTXQ”.

Submission authentication

So far, all of our submission examples that store or update documents in eXist have ignored the issue of security (or assume that you have manually authenticated). Unfortunately, support in XForms 1.1 for authentication is terribly lacking. You should really be able to do basic HTTP authentication at the very least, but there is no function in XForms to Base64-encode your authentication credentials. There is function support in XForms for creating digests, so even better, you would hope that you could perform HTTP digest authentication. Alas, there is no way to handle the challenge from the server that provides the nonce that you need to reuse as part of your digest!

At present there is only one mechanism in XForms that is not eXist-specific and can be reliably used to authenticate with eXist (see Example 10-6). That mechanism involves your passing your username and password in clear text as part of the submission resource URI. Obviously, sending this information in clear text is not at all ideal! If you are using the betterForm processor because the processing happens on the server side, this information will never leave your server. However, it is still not ideal, so for a betterForm-specific solution, see the next section. If you are using the XSLTForms processor, this information will be sent in clear text, but there is an alternative option covered in “XSLTForms”.

Example 10-6. Statically coded authentication

<xf:submission id="s-save" method="put"

resource="http://username:password@localhost:8080/exist/rest/db/registration

/result.xml" replace="none"/>

Perhaps slightly better is that through the use of an xf:resource element in the xf:submission, you could dynamically encode the username and password into the URI from form fields that the user has completed and that are present in an instance. See Example 10-7.

Example 10-7. Constructed authentication from form

<xf:submission id="s-save" method="put" replace="none">

<xf:resource

value="concat('http://', instance('auth')/Username,

':', instance('auth')/Password,

'@localhost:8080/exist/rest/db/registration/result.xml')"/>

</xf:submission>

betterForm

betterForm is an open source (BSD and Apache2 licensed) server-side XForms 1.1 processor written in Java. To say that betterForm only runs on the server side would be unfair; the majority of the processing happens server-side, but the server also generates JavaScript and HTML5 (or XHTML) for the web browser to represent your XForm UI. All UI interaction in the browser and subsequent incremental updating is processed by JavaScript with Ajax calls back to the server.

betterForm comes already bundled with eXist, and thus there is no installation or configuration required to start immediately working with XForms using betterForm and eXist. It really could not be simpler!

How does this work, you ask? The simple explanation is that betterForm acts as a filter between eXist and all HTTP traffic. If betterForm detects that eXist is returning an XML document that contains an XForm, it will transparently intercept this and replace the XForm with anHTML form, CSS, and JavaScript. Likewise, when the form UI is interacted with, or instances need to be submitted or updated, betterForm intercepts these requests to eXist and takes care of processing the state of the XForm before passing the request on to eXist.

By default eXist and betterForm are configured such that any documents stored into the database that are delivered over the URI /exist/apps can be intercepted and processed by betterForm. Remember that the URI /exist/apps is mapped onto the collection /db/apps by the XQuery URL rewriting controller (see “The controller-config.xml Configuration File”). Therefore, any documents containing XForms stored into the database collection /db/apps (or a subcollection of it) and requested by a URI starting with /exist/apps will be processed by betterForm.

ADDITIONAL TIPS FOR WORKING WITH BETTERFORM

Here are some hints and tips for working with betterForm effectively:

§ You can change the URI path that betterForm post-processes by adjusting the XFormsFilter url-pattern in $EXIST_HOME/webapp/WEB-INF/web.xml, after which you must restart eXist for the change to take effect. For example:

§ <filter-mapping>

§ <filter-name>XFormsFilter</filter-name>

§ <url-pattern>/apps/*</url-pattern>

</filter-mapping>

§ Should you wish to entirely disable betterForm post-processing, you may do so by changing filter.ignoreResponseBody to true in $EXIST_HOME/webapp/WEB-INF/betterform-config.xml, after which you must restart eXist for the change to take effect. For example:

<property name="filter.ignoreResponseBody" value="false"/>

§ If you wish to see what your XForm is doing within betterForm, you can enable the betterForm debugger. This will add an additional toolbar to your rendered XForms page, allowing you to introspect the host document, instances, and events. To enable the debugger, set betterform.debug-allowed to true in $EXIST_HOME/webapp/WEB-INF/betterform-config.xml, after which you must restart eXist for the change to take effect. For example:

§ <property name="betterform.debug-allowed"

§ value="true"

description="if true enables debug bar and event log viewer"/>

In addition, if you wish to monitor betterForm on the server and how it processes your XForms host documents, you can find its logfile in $EXIST_HOME/webapp/WEB-INF/logs/betterform.log.

§ Since version 2.1 of eXist, betterForm has provided an eXist connector. This connector may be used when you are running betterForm embedded in eXist. The connector enables betterForm to participate in the current user session of eXist. Instead of using http:// in the scheme of your URIs for talking to eXist, you can instead use exist:// and remove the hostname, port, and context from the URIs. For example, if you previously used http://localhost:8080/exist/rest/db/registration.xq as the URI for the src of your instance, you could now instead use exist://db/registration.xq.

There are many excellent betterForm examples at http://demo.betterform.de/exist/apps/betterform/dashboard.html. In particular, the Feature Explorer is useful for anyone learning XForms, regardless of whether you are using betterForm.

The source code of a small XForm for capturing details of someone using betterForm is provided at chapters/other-xml-technologies/better-form/test-xform.xhtml in the book-code Git repository (see “Getting the Source Code”).

To use the example, simply store the better-form/test-xform.xhtml document into the /db/apps collection in eXist. You can then display the form in a web browser by calling the document from eXist’s REST Server using a URI like http://localhost:8080/exist/apps/test-xform.xhtml. If all goes well, you should see the result of the XForm (after being processed by betterForm) rendered in your web browser (see Figure 10-1).

Figure 10-1. Address XForm with betterForm (debug mode)

XSLTForms

XSLTForms is an open source (LGPL v2.1–licensed) client-side XForms 1.1 processor written in XSLT 1.0 and JavaScript. XSLTForms relies on your web browser to execute the XSLT and JavaScript code that will translate your XForm XML into HTML that the browser can render and handle interactions and events for.

NOTE

While XSLTForms is described as a client-side processor and this is most often how it is deployed, it is entirely possible to process the XSLT part of XSLTForms server-side with eXist. You can do so either by using an XSLT transformation function from XQuery (see “XSLT”), or by using URL rewriting (see http://www.exist-db.org/exist/apps/doc/xforms.xml#D1.2.5.3). As these topics are covered elsewhere, they will not be discussed further here.

To install XSLTForms with eXist, you must install the EXPath package provided for use with eXist via the eXist dashboard. Visit http://localhost:8080/exist/apps/dashboard/ on your machine, and then install XSLTForms by clicking the Package Manager app and selecting Install for the XSLTForms Files package (see Figure 10-2).

Installing the XSLTForms Files package will download and extract the XSLTForms EXPath package into a new database collection at /db/apps/xsltforms.

Once the XSLTForms Files package is installed into eXist, you may simply store your XForm host document into the database (typically, if it is an HTML document, you should use the .xhtml extension to ensure it is delivered to the web browser with an application/xmlContent-Type) and add the following processing instruction to the top of it:

<?xml-stylesheet

href="http://localhost:8080/exist/apps/xsltforms/xsltforms.xsl"

type="text/xsl"?>

As opposed to the absolute URI used in the preceding processing instruction, you may use a relative URI. For example, if you had stored your XForm into a subcollection of /db, you might use the following processing instruction:

<?xml-stylesheet href="../apps/xsltforms/xsltforms.xsl" type="text/xsl"?>

Figure 10-2. Package Manager: installing the XSLTForms Files package

ADDITIONAL TIPS FOR WORKING WITH XSLTFORMS

Here are some hints and tips for working with XSLTForms effectively:

§ By default, when directly requesting an XForms host document from the REST Server, eXist will attempt to process XML stylesheet processing instructions server-side. You should be able to disable this in $EXIST_HOME/conf.xml by changing the enable-xsl attribute on the serializer config element (in fact, the default in eXist is to disable this), but the setting does not seem to currently affect behavior in eXist 2.1. Alternatively, you may manually disable server-side processing on a per-request basis via the REST Server by adding ?_xsl=no to the URI of your XForm.

Likewise, should you wish to generate your XForm using an XQuery, you need to instruct the serializer not to expand the processing instruction when it serializes your XQuery by adding the following option to the prolog of the XQuery:

declare option exist:serialize

"method=xhtml media-type=text/xml process-xsl-pi=no";

§ eXist is configured such that betterForm will automatically try to render any XForm content serialized from eXist that was accessed via the /exist/apps URI of the REST Server. To disable this behavior, see “betterForm”.

§ If you wish to see what your XForm is doing in XSLTForms, you can add this processing instruction below the xml-stylesheet processing instruction in your XForm host document:

<?xsltforms-options debug="yes"?>

This will enable the XSLTForms debugger, after which launching the Profiler can be a useful tool for viewing the current state of your model instances (via the built-in Instance Viewer).

§ eXist includes an authentication mechanism that is not specific to XForms but will work for XSLTForms served from eXist. The approach is for you to create an XQuery that logs users into eXist using xmldb:login. A user must visit this XQuery before being served the XForm host document from eXist. In this way the browser is furnished with an HTTP cookie representing the current user’s logged-in session, and any submissions subsequently performed by an XForm within the same session will have the same access rights to the database as that user would. With this mechanism, there is no need to encode a username and password into the resource URI of the xf:submission!

Once you have XSLTForms installed and configured, you can simply store your XForms host documents into the database (typically as XHTML documents) and request them via the REST Server.

The source code of a small XForm for capturing details of someone using XSLTForms is provided at chapters/other-xml-technologies/xslt-forms/test-xform.xhtml in the book-code Git repository (see “Getting the Source Code”).

To use the example, simply install and configure XSLTForms as described previously and then store the xslt-forms/test-xform.xhtml document into the /db collection. You can then display the form in a web browser by calling the document from eXist’s REST Server using a URI likehttp://localhost:8080/exist/rest/db/test-xform.xhtml?_xsl=no. If all goes well, you should see the result of the XForm (after being processed by XSLTForms) rendered in your web browser with the XSLTForms debugger enabled, as shown in Figure 10-3.

Figure 10-3. Address XForm with XSLTForms (debug mode)