XQuery for eXist - eXist: A NoSQL Document Database and Application Platform (2015)

eXist: A NoSQL Document Database and Application Platform (2015)

Chapter 6. XQuery for eXist

Most of your work using eXist will be done in the XQuery programming language. This chapter covers what is and is not supported. It will also describe some eXist XQuery specifics, like controlling serialization and available pragmas.

eXist’s XQuery Implementation

Currently, 2.0+ versions of eXist support almost the full XQuery 1.0 specification (as eXist has done for years) and quite a lot of XQuery 3.0. This section will provide you with the details.

XQuery 1.0 Support

eXist implements almost all of the full XQuery 1.0 specification, with the following exceptions:

§ eXist’s XQuery processor does not support the schema import and schema validation features. This is perfectly reasonable as they are defined as optional in the XQuery specification (validate and import schema). The database does not store type information along with the (values of) nodes; consequently it cannot know the typed value of a node and has to assume xs:untypedAtomic. This is compliant with the behavior defined by the XQuery specification.

§ You cannot specify a data type in an element or attribute test. eXist suports the node test element(test-node), but the test element(test-node, xs:integer) results in a syntax error.

NOTE

The absence of the features does not mean that eXist is not type-safe; it is, very much so. It only means that type checking based on schema imports is not implemented.

eXist tested its implementation against the official XQuery Test Suite (XQTS version 1.0.2). Of the more than 14,000 tests, it passed over 99%.

WARNING

eXist does not yet type-check the name of an element or attribute. So, strangely enough, you can write let $elm as element(a) := <b/> and eXist will find it absolutely OK, although this is a relaxation from the XQuery specification. The advice is not to use name tests in element or attribute data type specifications, though. So, use element() orattribute() instead of element(a) or attribute(b), since specifying a name implies type checking that alas never occurs.

XQuery 3.0 Support

New since version 2.0 is eXist’s support for XQuery 3.0. As of writing, this specification had reached Proposed Recommendation status and several partial implementations were available.

To enable the XQuery 3.0 support, start your XQuery program with:

xquery version "3.0";

XQuery 3.0 is a relatively new and probably not yet very well known standard. Therefore, the support eXist offers is handled in somewhat more detail next. For the exact details, please refer to the standard.

XPath 3.0 functions

Many of the extra functions defined in XPath and XQuery 3.0 are implemented. Among them are some very useful ones, like format-dateTime.

An exact list of what is available and what isn’t can be found with the XQuery Function Documentation browser in the dashboard. Browse the http://www.w3.org/2005/xpath-functions module.

try/catch

The XQuery 3.0 try/catch mechanism allows you to catch errors raised during execution. These can be errors raised by the XQuery engine (meaning your code did something wrong), or errors you’ve explicitly raised with the error function. The following example shows atry/catch usage example that sets a variable to -1 if a division-by-zero error occurs:

let $result as xs:decimal :=

try

{

$something div $something-else

}

catch err:FOAR0001 { -1 }

This example tests for a very specific error, which is good, because we would like a warning when something unexpectedly goes wrong. If you want to test for all errors that can occur, change the err:FOAR0001 into an *.

NOTE

If you want to test for a specific error condition but don’t know its code, probably the easiest way to find it is to force the error and copy the error code reported back into the XQuery.

Inside the catch operand you have access to information about the error through a number of implicitly declared variables—$err:code, $err:line-number, $err:column-number, $err:description, $err:value, $err:module, and $err:additional. Please refer to the XQuery 3.0 specification for full details.

switch expression

The XQuery 3.0 switch expression implements that which in other languages is often called a case expression or, in XSLT, an xsl:choose. This example was copied from the XQuery 3.0 specification:

switch ($animal)

case "Cow" return "Moo"

case "Cat" return "Meow"

case "Duck" return "Quack"

default return "What's that odd noise?"

Higher-order functions

A higher-order function is a function that takes another function as a parameter or returns a function. The normal use case for this is mapping or filter functions. Here is an example:

declare function local:map-example($func, $list) { 1

for $item in$list

return

$func($item)

};

let $f := upper-case#1 2

return

local:map-example($f, ("Hello", "world!")) 3

1

We first define a function, local:map, that runs the function passed in its first parameter, $func, over all members of its second operand, $list.

2

We then assign the upper-case function to $f. The #1 after the function name means that we want the upper-case function with only one parameter (in case there are more).

3

Finally, we call local:map with our function and some input strings. It returns the expected HELLO WORLD!.

Higher-order functions is a serious subject in its own right and includes topics such as: inline and partial functions, closures, currying, and more. For further information, you can refer to the XQuery 3.0 specification and to this excellent eXist wiki article.

The simple map operator

The XQuery 3.0 bang operator ! (or simple map operator, as it is officially called) can be seen as a shorthand for simple FLWOR expressions. It applies the right-hand expression to each item in the sequence gained from evaluating the left-hand expression. For instance:

(1 to 10) ! . + 1

is the same as:

for $i in (1 to 10) return $i + 1

The string concatenation operator

The string concatenation operator || is a shorthand replacement for the concat function: it concatenates strings. For example, the following expression will be true:

'Hello '|| 'world' eqconcat('Hello ', 'world')

Annotations

XQuery 3.0 allows annotations for functions and variables. This is used, for instance, to make them private (visible only in the enclosing module) or public:

declare %private variable $myns:only-i-can-see-this := 'secret';

declare

%public

function myns:do-something-public() {

(: some function body here:)

}

Within eXist, annotations are also used for RESTXQ (see “Building Applications with RESTXQ”).

Serialization

eXist now supports the new XQuery 3.0 manner of controlling serialization. For instance, this:

declare namespace output = "http://www.w3.org/2010/xslt-xquery-serialization";

declare option output:method "xml";

declare option output:media-type "text/xml";

is exactly the same as (eXist’s incumbent nonstandard mechanism):

declare option exist:serialize "method=xml media-type=text/xml";

The options supported are the same also. More about serialization and the full list of options supported can be found in “Controlling Serialization”.

The group by clause

eXist has had an order by clause for its FLWOR expressions since 2006. Unfortunately, this was not compatible with the XQuery 3.0 group by clause, and so it was replaced in the 2.0 release with the official version. Here is an example:

let $data as element()* := (

<item>Apples</item>,

<item>Bananas</item>,

<item>Apricots</item>,

<item>Pears</item>,

<item>Brambles</item>

)

return

<GroupedItems>

{

for $item in$data

group by $key := upper-case(substring($item, 1, 1))

order by $key

return

<Group key="{$key}">

{$item}

</Group>

}

</GroupedItems>

The fruits are grouped and sorted based upon the uppercased first characters of their names. This returns:

<GroupedItems>

<Group key="A">

<item>Apples</item>

<item>Apricots</item>

</Group>

<Group key="B">

<item>Bananas</item>

<item>Brambles</item>

</Group>

<Group key="P">

<item>Pears</item>

</Group>

</GroupedItems>

Other XQuery Extras

Beside eXist’s support for XQuery 1.0 and the majority of XQuery 3.0, it also has a few interesting features which are currently specific to its XQuery implementation.

The map data type proposed for XQuery 3.1

The map data type in eXist is essentially a key/value lookup table. Keys must be atomic values (e.g., xs:string, xs:integer, xs:date, etc.). Values can be anything from a simple numbers to complete XML documents. Here is a basic example of creating and using a map:

let $map1 := map {

"a" := 1,

"b" := <XML>this is <i>cool</i></XML>

} return

($map1("a"), $map1("b"))

This will return:

1

<XML>this is <i>cool</i></XML>

Working programmatically with maps is possible through the map extension module. This module allows you to do everything from checking for the existence of keys up to constructing maps on the fly. Please refer to the online function documentation for more information.

NOTE

Maps are immutable, like any other XQuery variables. So, changing a map using the functions from the map extension module (e.g., calling map:remove) will create a new map.

There is also an article about maps on the eXist wiki.

Java binding

eXist allows you to make arbitrary calls to Java libraries using the so-called Java binding. For example:

declare namespace javasystem="java:java.lang.System";

declare namespace javamath="java:java.lang.Math";

javasystem:getenv('JAVA_HOME'),

javamath:sqrt(2)

For security reasons, the Java binding is disabled by default. If you want to use it, edit $EXIST_HOME/conf.xml, search for the enable-java-binding attribute, set its value to "yes", and restart eXist for the change to take effect.

There are some specifics you need to know about when using the Java binding:

§ If the function name in XQuery contains a hyphen, the hyphen is removed and the character following it is converted to uppercase. So, a call in XQuery to to-string will call the Java method toString.

§ Java constructors can be called using the new function.

§ eXist adds a generic type, object, to its data-model, which is used for all Java objects.

§ Instance methods of a class (methods that work on a specific object, like most of the Java methods) must get the object reference as their first parameter.

§ When a method returns an array, it is converted to a sequence and you can iterate over it using a FLWOR expression.

Here is an example that will return a list containing the names of all files and subdirectories in the $EXIST_HOME directory:

declare namespace javafile="java:java.io.File";

let $fobject as object := javafile:new(system:get-exist-home())

return

for $file injavafile:list($fobject)

return

$file

NOTE

If you only want to get a list of files and directories, it is probably easier to use the file extension module instead of the Java binding.

XQuery Execution

There are some details you should be aware of regarding XQuery execution in eXist. These include:

Transaction boundaries

eXist is transactional only during updates to the database; that is, a single update either succeeds or fails atomically, not something in between, even if a crash occurs in the middle of the operation.

eXist is not transactional during the execution of a full XQuery script (like some other XQuery engines are). An XQuery script does not run in isolation, and updates made by it or by concurrently running neighbor scripts will immediately be visible. However, you can group updates into a single transaction; see the exist:batch-transaction pragma in “eXist XQuery Pragmas”.

Evaluation of expressions

eXist does not lazily evaluate expressions. For instance, a series of let expressions will all be evaluated, from top to bottom, even if some of the variables are never used again.

The reasoning behind this has to do with side effects, which XQuery officially doesn’t have, but which (as we all know) in a real-world program are a necessity. For instance, when you have a function that adds to a logfile, you want it executed even if you don’t do anything with its return value.

As a consequence, be careful computing expensive values that might never be used. It’s better to either defer this until you really need them (e.g., nesting them inside an if-then-else structure) or do something along the lines of:

let $expensive-value :=

if (...decide-whether-value-is-really-needed...)

then compute-value...

else ()

Serialization

Although it may seem as though eXist works directly with XML, as in “text with a lot of angle brackets,” it does not. Internally, XML is represented as an efficient tree-structured data type. Only on the way out, in the final step, are the angle brackets added and the XML displayed once again as we know it. This process of changing the internal representation into something suitable for the outside world is known as serialization.

Controlling serialization is important: sometimes you may want XML, while at other times you want HTML and/or JSON. You may perhaps also want to set the Internet media type explicitly, or control indentation.

For the XSLT programmers among us who think this sounds familiar: you’re right. In XSLT, serialization is controlled likewise through the xsl:output element.

Controlling Serialization

There are a number of ways you can control serialization from within your XQuery scripts:

option exist:serialize

You can control serialization by adding a declare option exist:serialize statement to the XQuery prolog. For instance:

declare option exist:serialize

"method=html media-type=text/html indent=no";

The contents of the exist:serialize option are a whitespace-separated list of name/value pairs, as described inthe following section. You do not have to define the exist namespace prefix. eXist automatically binds this to the appropriatehttp://exist.sourceforge.net/NS/exist namespace.

util:get-option, util:declare-option

These extension functions allow you to inspect and set the value of an XQuery script option programmatically. For instance, setting the serialization options can be done with:

util:declare-option("exist:serialize",

"method=html media-type=text/html indent=no")

XQuery 3.0 serialization settings

eXist now also supports the standard XQuery 3.0 way of controlling serialization. This is described in “Serialization”.

Serialization Options

This section will list all the serialization options that eXist supports.

General serialization options

The more general serialization options closely mimic the options of the same name available on the XSLT xsl:output command:

method=xml|microxml|xhtml|html5|text|json

Sets the principal serialization method.

The microxml method produces MicroXML as opposed to full XML. You can find out more about MicroXML from the W3C MicroXML Community Group.

The xhtml method makes sure that only the short form is used for elements that are declared empty in the XHTML specification. For instance, a br element is always returned as <br/>. In addition, if you omit the XHTML namespace from your XML, you can have the XHTML serializer inject it for you by setting the serialization option enforce-xhtml=yes.

If you specify the text method, only the atomized content of elements is returned: for example, <foo>this is content</foo> will return this is content. Namespaces, attributes, processing instructions, and comments are ignored.

For JSON and JSONP serialization options, see “JSON serialization”.

media-type=...

Indicates the Internet media type of the output. This is used to set the HTTP Content-Type header if the query is running in an HTTP context.

encoding=...

Specifies the character encoding used for serialization. The default is the encoding set in the XQuery declaration at the top of the program. If that is not set, the default is UTF-8.

indent=yes|no

Indicates whether the output should be indented.

omit-xml-declaration=yes|no

Specifies whether the XML declaration (<?xml version="1.0"?>) at the top of the output should be omitted.

doctype-public=... doctype-system=...

When at least one of these is present, a doctype declaration is included at the top of the output.

enforce-xhtml=yes|no

Forces all output to be in the XHTML (http://www.w3.org/1999/xhtml) namespace.

Post-processing serialization options

eXist can do post-processing of the XQuery result by processing xi:include elements and <?xml-stylesheet?> processing instructions referencing XSLT stylesheets. You can control this with the following options:

expand-xincludes=yes|no

Indicates whether the serializer should process any xi:include elements (see “XInclude”). The default is yes.

process-xsl-pi=yes|no

Indicates whether the serializer should process any <?xml-stylesheet type="text/xsl" href="..."?> processing instructions (see “Invoking XSLT by Processing Instruction”). The default is yes.

eXist-specific serialization options

eXist-specific options include the following:

add-exist-id=element|all

If you output elements that come from the database, eXist will add an attribute exist:id to them, showing the internal node identifier of each element. Setting this option to element will show only the node identifier of the top-level element; setting it to all will show all node identifiers.

There are functions in the util extension module to work with these identifiers.

highlight-matches=both|elements|attributes|none

When querying text with the full-text or NGram extensions, the query engine tracks the exact position of all matches inside text content. The serializer can later use this information to mark those matches by wrapping them into an exist:match element. Find more information about this in “Locating Matches”.

JSON serialization

JSON (JavaScript Object Notation) is a lightweight data-interchange format. eXist has a JSON serializer built in that you can enable by setting the serialization method to json (see “General serialization options”). There is one related serialization option:

jsonp=...

Produces JSONP (JSON with padding) output by wrapping the JSON output in the named function. For example, specifying jsonp=abc causes the output to be wrapped in the JavaScript function abc like so: abc({"hello": "world"}). This can be useful when you’re working around same-origin policies in some web browsers.

It is also possible to set the JSONP function dynamically by calling the function util:declare-option and passing in the function name; for example, util:declare-option("exist:serialize", "method=json jsonp=myFunctionName").

Here is a summary of how eXist performs the JSON serialization (see also the wiki entry on this subject):

§ The root element is absorbed: <root>A</root> becomes "A".

§ Attributes are serialized as properties, with the attribute name and its value.

§ An element with a single text child becomes a property whose value is the text child: <e>text</e> becomes {"e": "text"}.

§ Sibling elements with the same name within a parent element are added to an array: <A><b>1</b><b>2</b></A> becomes { "b" : ["1", "2"] }.

§ In mixed-content nodes, text nodes are dropped.

§ If an element has attribute and text content, the text content becomes a property: <A a="b">1</A> becomes { "A" : { "a" : "b", "#text" : "1" } }.

§ An empty element becomes null: <e/> becomes {"e": null}.

§ An element with name <json:value> is serialized as a simple value, not an object: <json:value>my-value</json:value> becomes "my-value".

Sometimes it is necessary to ensure that a certain property is serialized as an array, even if there’s only one corresponding element in the XML input, you can use the attribute json:array="true|false" for this.

By default, all values are strings. If you want to output a literal value—for example, to serialize a number—use the attribute json:literal="true".

The JSON prefix json should be bound to the namespace http://www.json.org. As an example, here is some XML:

<Root xmlns:json="http://www.json.org">

<Items>

<Item id="1">Bananas</Item>

<Item>CPU motherboards</Item>

</Items>

<Items >

<Item json:array="yes">Bricks</Item>

</Items>

<Mixed>This is <i>mixed</i> content</Mixed>

<Empty/>

<Literal json:literal="yes">1</Literal>

</Root>

And here is its JSON serialization:

{ "Items" : [{ "Item" : [{ "id" : "1", "#text" : "Bananas" },

"CPU motherboards"] }, {"Item" : ["Bricks"] }],

"Mixed" : { "i" : "mixed" }, "Empty" : null, "Literal" : 1 }

In addition to the JSON serializer in eXist, which attempts to convert XML into JSON with as little effort from the developer as possible, there are three other XQuery modules that enable you to work with JSON. The first two modules—JSON XQuery (see json) and JSONP XQuery (see jsonp)—work in much the same way as the JSON serializer. The third module, XQJSON (see xqjson), which was written by John Snelson and adapted by Joe Wictenowski, is the newest JSON addition to eXist; it allows you to serialize XML to JSON as well as parse JSON back into XML so that you can round-trip your data.

Controlling XQuery Execution

There are several mechanisms which give you control over the execution of your XQuery scripts.

eXist XQuery Pragmas

With XQuery pragmas, you can set implementation-specific options for parts of your code. The general syntax is:

(# pragmaname #) {

(: Your XQuery code block :)

}

eXist has the following pragmas:

exist:batch-transaction

Provides a way to combine multiple updates on the database into a single transaction. Only works for updates done through eXist’s XQuery update extension. For example:

(# exist:batch-transaction #) {

update delete $document/*/LogEntry[position() ge10],

update insert $new-entry preceding $document/*/LogEntry[1]

}

exist:force-index-use

Useful for debugging index usage (see Chapters Chapter 11 and Chapter 12). Will raise an error if there is no index available for the given XQuery expression. This can help you to check whether indexes are correctly defined.

exist:no-index

Prevents the use of indexes on the given XQuery expression. Useful for debugging or for curiosity purposes (“How long does my query take without indexes?”). Also, sometimes it is more efficient to run without indexes than with—for instance, when a search isn’t very selective.

exist:optimize

Enables optimization for the given XQuery expression. If you’ve turned optimization off (with declare option exist:optimize "enable=no";, as discussed in “Serialization Options”), you can turn it on again for specific expressions with this pragma.

exist:timer

Measures the time it takes to execute the XQuery expressions within the pragma—for instance, (# exist:timer #) { count(//TEST) }.

To see the timer, you need to enable tracing in the $EXIST_HOME/log4j.xml configuration file (set <priority value="trace"/> for the root logger). You’ll see an entry like this in the $EXIST_HOME\webapp\WEB-INF\logs\exist.log file:

2012-09-12 15:01:29,846 [eXistThread-31] TRACE (TimerPragma.java [after]:63)

- Elapsed: 171ms. for expression: count([root-node]/descendant::{}TEST)

Limiting Execution Time and Output Size

You can control execution time and query output size by adding the correct declare option exist:... statement to the XQuery prolog:

declare option exist:timeout "time-in-msecs"

Indicates the maximum amount of time (specified in milliseconds) that a query is allowed to execute for. If this is exceeded, an error will be raised.

declare option exist:output-size-limit "size-hint";

Defines a limit on the maximum size of created document fragments. This limit is an estimation, specified in terms of the accumulated number of nodes contained in all generated fragments. If this is exceeded, an error will be raised.

Other Options

Here are some miscellaneous options you can set by adding a declare option exist:... statement to the XQuery prolog:

declare option exist:implicit-timezone "duration";

Specifies the implicit time zone for the XQuery context as defined in the XQuery standard. More information is available at http://www.w3.org/TR/xquery/#dt-timezone.

declare option exist:current-dateTime "dateTime";

If for some reason you don’t want to use your operating system’s date/time, you can specify your own using this option (it is merely there to enable some of the XQuery test suite cases to run).

declare option exist:optimize "enable=yes|no";

Use this to disable the query optimizer in eXist (the default is yes, of course). This is linked to the exist:optimize pragma; see “eXist XQuery Pragmas”.

XQuery Documentation with xqDoc

xqDoc is an effort to standardize XQuery documentation in a similar vein to how JavaDoc has for Java. xqDoc works by reading specialized comments you insert into your XQuery code. A parser can then use these to extract additional information about your module, its (global) variables, and its functions. This information could then, for example, be used to display details about a module to the user. The eXist function browser is a good example of an implementation which uses xqDoc to achieve exactly that.

Here is an example of a little module containing xqDoc information:

xquery version "1.0" encoding "UTF-8";

(:~

: Example module with xqDoc information

:

: @version 1.0

: @author Erik Siegel

:)

module namespace xquerydoc="http://www.exist-db.org/book/XQueryDoc";

(:~

: Example dummy function

:

: @param $in The input to the function

:)

declare function xquerydoc:test($in as xs:string+) as xs:string

{

'Dummy'

};

All comments starting with (:~ are parsed by the xqDoc parser. Keywords in these comments start with an @ character. The exact syntax can be found on the xqDoc website.

eXist has an inspect extension module to work with xqDoc. The functions in this module return an XML representation of the module’s content, including possible annotations by the xqDoc comments. For instance, running inspect:inspect on the preceding example module returns:

<module uri="http://www.exist-db.org/book/XQueryDoc" prefix="xquerydoc">

<description> Example module with xqDoc information </description>

<author> Erik Siegel </author>

<version> 1.0 </version>

<variable name="xquerydoc:global" type="xs:string" cardinality="exactly one"/>

<function name="xquerydoc:test"

module="http://www.exist-db.org/book/XQueryDoc">

<argument type="xs:string" cardinality="one or more" var="in">

The input to the function</argument>

<returns type="xs:string" cardinality="exactly one"/>

<description> Example dummy function </description>

</function>

</module>

From this XML you could easily create any required HTML or PDF documentation.

eXist does not support the full xqDoc specification. If you need some specific xqDoc feature, please run some tests to see if it is present.