Working with the Database - eXist: A NoSQL Document Database and Application Platform (2015)

eXist: A NoSQL Document Database and Application Platform (2015)

Chapter 5. Working with the Database

At its core, eXist is an XML database. It stores XML efficiently and makes fast querying possible. Besides XML, it is also capable of storing other file types. Although in its default configuration it doesn’t do much with them besides storing and retrieving, this capability is useful when you are building applications with eXist.

This chapter is about eXist’s database: what’s in it, how it’s structured, how you access and update its content, and (of course) how you query it.

The Database’s Content

In this section we will dive into the contents of the database.

Help: Where Is My XML?

Superficially, when you’re accessing the database using a WebDAV client, eXist’s database looks like a filesystem: you’ll see directories and files, and you can work with them directly as you would in any other filesystem.

Of course, under the surface things look very different. XML files are “ripped apart” (or “shredded”), indexed, and stored in a way that makes searching, indexing, and retrieval efficient. You can see this in action when you store a file in the database and reopen it: the indentation after reopening will probably look different. This is because the document was not stored as is, but rather as a tree structure according to the XML data model. The document was likely recreated using different indentation rules than those with which the original was created.

Related to this is one of the most frequently asked questions from eXist newcomers: “Where is my XML?” People look in the database storage directory $EXIST_HOME/webapp/WEB-INF/data/ (if the installation defaults were used) and see a bunch of *.dbx files. The stored XML is nowhere in sight. What makes things even more confusing is that stored non-XML files (binaries and queries) can be found in the fs subdirectory.

So where is the XML? Don’t worry. Although you can’t find it as a file, the XML document is stored inside the *.dbx files. eXist ensures that you can access the XML as though it is still a file and access and query it in an efficient way.

If you’re interested in how eXist performs this trick, please refer to Chapter 4.

Terminology

Let’s define some important terminology:

Collections

As noted previously, what is called a directory in a filesystem is called a collection inside eXist’s database. When you use, for instance, the WebDAV interface to look inside the database, you’ll see no difference.

The reason it is called a collection is linked to the definition of the XPath collection function. This function retrieves documents from something it calls a collection, but the XPath specification does not say exactly what this collection concept actually is. eXist uses an internal directory-like structure as the basis for its collections. Read more about this in “Collections” and “The collection Function”.

Resources

What is called a file in a filesystem is called a resource inside eXist’s database. Resources can be anything you usually store in a file: images, CSS files, XQuery scripts, and, of course, XML documents.

Documents

A resource containing a (well-formed) XML document is called a document.

Properties of Collections and Resources

Anything stored in the database has several properties attached. You can view and change these properties using, for instance, Java Admin Client tool (see Figure 5-1).

Figure 5-1. Viewing and modifying properties using Java Admin Client tool

Alternatively, you can work with the collection and resource properties through the dashboard’s collection browser or from within eXide. You can also view or change these properties programmatically from XQuery code with functions from eXist’s xmldb extension module. See more about this in “Controlling the Database from Code”.

On the Client tool’s Properties screen, you’ll see all of the following for the selected resource:

Internet Media Type

The Internet media type of a resource is first set by eXist when the resource is created, based on its content and/or file extension. However, if you create resources with the xmldb extension module (see “Controlling the Database from Code”) you have the option to control this yourself.

After creation, you can change the Internet media type of a resource programmatically. However, you can’t turn a non-XML resource into an XML one, or vice versa.

Collections do not have an Internet media type.

NOTE

The Internet media type of a resource (when not set explicitly) is determined from the configuration file $EXIST_HOME/mime-types.xml. This file maps file extensions to Internet media types (or MIME types, as they used to be known) and tells eXist when to treat a file as XML instead of binary data.

Created and Last Modified

Resource have both created and last modification date and time, however collections only have a created date and time.

Owner, Group, Base Permissions, and Access Control List

The security settings for this resource. Find more information about this in Chapter 8.

System Collections

The database’s collection /db/system contains eXist system-specific information. Most of the information underneath this collection is maintained by eXist itself and there is usually no need for a “normal” user to access it, because there are extension functions for this.

For instance, in /db/system/security/exist/accounts you’ll find user information, and by querying the resources found there you could create a list of registered users and the user groups they participate in. However, we definitely advise against doing this! eXist does not guarantee that these files nor the format of their content will remain stable in the future. Rather, this is internal configuration information and as such is potentially subject to change without notice whenever a new version of eXist is released. The preferred alternative is to instead use the appropriate functions from the Security Manager and/or the xmldb extension modules, which should remain stable.

There is one important exception to this rule: the /db/system/config collection must be used to configure important properties for collections such as: indexes, triggers, and validation. Underneath /db/system/config you’ll find a partial copy of the database’s collection structure withcollection.xconf resources in some of them. We’ll give you more information about this in the chapters to come.

Addressing Collections, Resources, and Files

To work with documents and collections stored in the database, you need to be able to address them, point to them, and know their names. There are several ways to do this. Extra care should be taken when your resource and/or collection names contain spaces or special (e.g., accented) characters.

Use URIs

There is one extremely important thing you should be aware of when addressing resources and collections inside eXist’s database: eXist uses URL-encoded URIs for naming. This means that all reserved characters, according to the URL encoding rules (for more information, seehttp://tools.ietf.org/html/rfc3986#page-12), must be percent-encoded!

For instance:

§ A document that shows up as an example.xml in the WebDAV browser must be addressed programmatically as an%20example.xml.

§ A document in the collection /db/my app (test)/test.xml must be addressed with /db/my%20app%20%28test%29/test.xml.

§ When you request a list of documents inside a collection (with the function xmldb:get-child-resources), the names returned are URIs, and as such they are URL-encoded.

NOTE

If you are not extremely careful with this, your application will quickly become a mess. Always distinguish internally between names (useful for displaying document/collection names to the user) and URIs (useful for addressing the documents/collections).

Conveniently, eXist contains standard functions to transform a name into a URI, and vice versa:

xmldb:decode-uri

Decodes a URI into a name; that is, it changes all percent-encoded characters into their UTF-8 equivalent characters.

xmldb:encode-uri

Encodes a name into a URI; that is, it checks for reserved characters and changes them into the equivalent URL percent-encoding.

WARNING

Unfortunately, xmldb:encode-uri does not check for the optional xmldb:exist:// database prefix (see “XMLDB URIs”) and erroneously encodes xmldb:exist:///a/b/c into xmldb%3Aexist%3A/a/b/c, which is probably not what you want.

Relative versus absolute paths

Using relative paths in eXist can be confusing. To help clarify when to use relative versus absolute paths, we have to distinguish between two situations:

Paths in a static context

These are paths resolved at compile time—for instance, an XQuery import module at clause, or an xsl:include or xsl:import in an XSLT stylesheet. These paths are resolved, as expected, against the location of the code that does the import or include. We strongly advise using relative paths here because it makes moving your code around much easier.

Paths in a dynamic context

These are paths resolved at run time—for instance, in code like doc("/db/myapp/data/data.xml"). These paths are resolved using what is called the base collection of a query. How this base collection is determined unfortunately depends on the way the query came to life. The rules are difficult to remember, confusing, and subject to change as eXist evolves over time.

Because the invocation of a query matters, a relative path is not guaranteed to always work the same way. So, our advice is to use absolute paths for addressing collections and resources in code whenever possible.

XMLDB URIs

Instead of writing a direct path to a database resource, such as /db/mycollection/test.xml, you can use a so-called XMLDB URI. For this, add the prefix xmldb:exist:// in front of the resource, as in xmldb:exist:///db/mycollection/test.xml. (Note that there are now three slashes afterxmldb:exist:.) Used like this, both notations are equivalent and point to the same document. It’s a matter of preference which one to use: the XMLDB URI is somewhat longer but more specific, and therefore you might consider it more self-documenting. It is worth noting that when using the Java Admin Client to execute queries in embedded mode, you have to use the XMLDB URI. Likewise, when writing XQuery using oXygen with eXist, depending on the version, you may also need to use the XMLDB URI for eXist to recognize module import sources.

Accessing files

If you want to access a file on the filesystem (not in the database), use the file:// prefix, as in:

doc("file:///home/erik/test.xml")

WARNING

To manipulate the filesystem, eXist has a file extension module. In contrast to what you might expect, this module does not use the file:// prefix syntax.

The XPath Collection and Doc Functions in eXist

The XPath collection function is defined as returning a sequence of (usually document) nodes. The doc function is defined as returning document-node, or the empty sequence if it cannot find the document indicated by the passed URI. Both use a URI as a parameter. These are important functions for XML databases because they allow you to easily address subsets of your database content for further inspection or processing. The XPath standard defines their behavior as implementation-dependent, so we need to know how eXist handles them.

The collection Function

The collection function in eXist returns the set of resources residing in the collection identified by its URI parameter, including those in its subcollections, recursively. In other words, it will return a sequence containing all resources underneath a certain collection path in the database. If you want only the resources in the collection itself (without those in subcollections) you can use the extension function xmldb:xcollection instead.

Now, maybe to your surprise, collection returns not only the XML documents found, but all resources. To illustrate this, assume we have a collection called /db/test in which there are two files: test.xml (an XML file) and test.pdf (a PDF file). Now run the following query:

for $doc incollection("/db/test")

return

base-uri($doc)

The result will be /db/test/test.xml and /db/test/test.pdf.

Of course, besides getting their URIs with the base-uri function, you won’t be able to do much with the non-XML nodes returned by collection. However, getting a list of all resources can be quite useful in some cases—for instance, for showing the user a list of available content.

NOTE

You can also use functions from the xmldb extension module for iterating over the contents of collections. Read more about this in “Controlling the Database from Code”.

You can easily check whether a node points to an XML document by calling exists($doc/*), where $doc is a member of a sequence returned by collection as in the preceding example. This will only return true for XML documents.

WARNING

Remember, the behavior of the collection function is implementation-defined. This means that your XQuery’s calls to fn:collection may not behave the same on different platforms, which can introduce issues for code portability.

The doc Function

If you know (or have computed) the URI to an XML document, the easiest and most straightforward way to address its content is using the XPath doc function. For example:

let $documenturi := "/db/myapp/contents.xml"

for $item indoc($documenturi)//Item

return ...

An interesting (and sometimes useful) behavior of the doc function in eXist is that when it gets passed the URI of a nonexistent or non-XML document, it silently returns the empty sequence (without throwing an error).

Querying the Database Using REST

The simplest and most often used way to access the database’s content is using eXist’s REST (a.k.a. REST-style or RESTful) interface. It allows you to query the database by firing HTTP requests at it. You can do this programmatically from another application or, for GET requests, by hand using a web browser. This section will examine eXist’s REST interface at a fairly basic level; a more thorough explanation (from a system integration point of view) can be found in “REST Server API”.

By default, to access the REST interface for a standard eXist setup, start the URLs with:

http://localhost:8080/exist/rest/

NOTE

You might be rightfully worried now about how your URLs are going to look when you’re building an application in eXist. The /exist/rest looks awful! Don’t worry: you can tune the URLs to your liking; Chapter 9 explains how to do this.

The database operations are, like in a true REST interface, mapped to the HTTP request methods GET, PUT, DELETE, and POST. Most often used is GET.

Security

Of course, everything you do through the REST interface is subject to the strict security rules of eXist. For instance, if you’re not allowed to see a file (that is, you have no read permission), it will not show up when you request the contents of the containing collection in a GET request.

The default identity when you’re firing a request through this interface is the limited guest account. HTTP authentication is supported to change identity (see, for instance, http://www.httpwatch.com/httpgallery/authentication/).

Even with eXist’s strict security in place, allowing REST access on a production server can be scary: it is a powerful tool with lots of opportunities for misuse or inadvertent damage. You can turn it off completely, which is probably something you’d want to do on production servers (unless you really need direct REST access). See “Disabling direct access to the REST Server” for more information.

GET Requests

HTTP GET requests are the workhorses for accessing data and for XQuery scripts. Using GET requests is probably the most convenient way to execute queries stored in the database and/or retrieve XML and other contents from it: you can simply do it from your web browser.

For an HTTP GET request, eXist examines the remainder of the URL (by default the part after /exist/rest) and reacts in one of the following ways:

§ If a _query URL parameter is present (see the following), it will use this as the XQuery to execute and will return its result.

§ If the remainder of the URL points to a collection, it will return an XML fragment describing its content. For instance:

§ <exist:result xmlns:exist="http://exist.sourceforge.net/NS/exist">

§ <exist:collection name="/db/test"

§ created="2012-09-13T08:18:02.35+02:00"

§ owner="guest"

§ group="guest" permissions="rwxr-xr-x">

§ <exist:resource name="test.xml"

§ created="2012-09-13T08:18:25.088+02:00"

§ last-modified="2012-09-13T08:18:25.088+02:00" owner="guest"

§ group="guest" permissions="rw-rw-rw-"/>

§ <exist:resource name="test.pdf"

§ created="2012-09-13T08:18:29.9+02:00"

§ last-modified="2012-09-13T08:18:29.9+02:00" owner="guest"

§ group="guest" permissions="rw-rw-rw-"/>

§ </exist:collection>

</exist:result>

§ If the remainder of the URL points to a file (XML or otherwise), the contents of the file are returned using the Internet media type stored in the database.

§ If the remainder of the XML points to an XQuery script (Internet media type application/xquery) with execute permission, it will be executed and the results returned.

A GET request accepts the following parameters:

_xsl=xsl-stylesheet-reference | no

Applies an XSLT stylesheet to the result of getting an XML resource or executing an XQuery script. The path is considered an internal database path, unless it contains an external URI (e.g., starts with http://). Setting this parameter to no disables all stylesheet processing.

WARNING

Applying an XSLT stylesheet in this manner always changes the response’s Internet media type to text/html.

_indent=yes | no

Specfies whether to indent (pretty-print) the returned XML. The default is yes.

_encoding=character-encoding

Indicates the character encoding to use. The default is UTF-8.

_query=XQuery-expression

Executes the given XQuery expression on the result.

_howmany=number-of-items

When you pass a query by the _query parameter and the result is a sequence, specifies how many items to return from the sequence. The default is 10.

_start=starting-position

When you pass a query by the _query parameter and the result is a sequence, specifies at which position to start returning results from the sequence. The default is 1.

_wrap=yes | no

Indicates whether returned query results (and collection contents) should be wrapped in an exist:result root element (the namespace prefix exist is bound to the namespace http://exist.sourceforge.net/NS/exist). The default is yes for collection contents and queries passed in the _query parameter, and no otherwise.

_source=yes | no

Indicates whether the query should display its source code. You must explicitly allow this by adding the name of the query file to $EXIST_HOME/descriptor.xml in the allow-source section (and then restart eXist for the changes to take effect).

PUT Requests

HTTP PUT requests can be used to store or update documents in the database. The remainder of the URI (the part after /exist/rest) is used as the target location of the document.

As an example of how to do this, we will use eXist’s own httpclient extension module to store an XML document into the database:

let $URI :=

'http://localhost:8080/exist/rest/db/apps/exist-book/data/put-example.xml'

return

httpclient:put(xs:anyURI($URI), <new-file-by-rest-put/>, false(), ())

NOTE

Be aware that this is an example to show you how to use HTTP PUT. If you want to store a document into the database from your own XQuery program, it is more efficient to avoid the HTTP overhead and use xmldb:store instead (see “Creating Resources and Collections”).

DELETE Requests

An HTTP DELETE request does exactly what its name implies: it deletes the collection or resource pointed to by the remainder of the URL (the part after /exist/rest) from the database. The returned HTTP status code will indicate whether the deletion was successful.

POST Requests

HTTP POST requests can be used for three distinct purposes:

§ If the remainder of the URI (the part after /exist/rest) references an XQuery program stored in the database, it will be executed.

§ If the body of the POST request is a valid XUpdate document, the XUpdate processor will be invoked to update the database. An explanation of how this works and the XUpdate XML format is described in “XUpdate”.

§ If the body of the POST request is XML and uses the http://exist.sourceforge.net/NS/exist namespace, it is interpreted as a so-called extended query request. These can be used to post complex XQuery scripts that are too large or too unwieldy to pass in a _queryparameter of a GET request. The result will be wrapped in an exist:result element. The XML format for extended query requests is described in “Extended query request XML format”.

For example, let’s fire an HTTP POST request containing an extended query request at our own database using the eXist httpclient extension module:

let $URI := 'http://localhost:8080/exist/rest/doesnotmatter'

let $query := 'for $i in 1 to 10 return <Result index="{$i}"/>'

let $request :=

<query xmlns="http://exist.sourceforge.net/NS/exist" start="3" max="3">

<text>{$query}</text>

</query>

return

httpclient:post(xs:anyURI($URI), $request, false(), ())

The result will be something like:

<httpclient:response xmlns:httpclient="http://exist-db.org/xquery/httpclient"

statusCode="200">

<httpclient:headers>

<httpclient:header name="Date" value="Mon, 17 Sep 2012 12:45:02 GMT"/>

<httpclient:header name="Set-Cookie"

value="JSESSIONID=4mpvajj2ez99sa2ik31kzjvq;Path=/exist"/>

<httpclient:header name="Expires" value="Thu, 01 Jan 1970 00:00:00 GMT"/>

<httpclient:header name="Content-Type" value="application/xml;

charset=UTF-8"/>

<httpclient:header name="Transfer-Encoding" value="chunked"/>

<httpclient:header name="Server" value="Jetty(7.5.4.v20111024)"/>

</httpclient:headers>

<httpclient:body mimetype="application/xml; charset=UTF-8" type="xml">

<exist:result xmlns:exist="http://exist.sourceforge.net/NS/exist"

exist:hits="10"

exist:start="3" exist:count="3">

<Result index="3"/>

<Result index="4"/>

<Result index="5"/>

</exist:result>

</httpclient:body>

</httpclient:response>

The major part of the output is generated by the httpclient:post function. For the result of our extended query POST request, look at the contents of the httpclient:body element. Notice that since the request returns a sequence and we specified a start position and length of 3, we get the third through fifth elements of the sequence only.

Extended query request XML format

An extended query request has the following format:

<query xmlns = "http://exist.sourceforge.net/NS/exist"

start? = integer

max? = integer

cache? = "yes" | "no"

session-id? = string >

text

properties?

</query>

§ start contains the index (counting from 1) of the first item to be returned.

§ max is the maximum number of items to return. Together with start, it allows you to control which part of the results you’ll see.

§ Setting cache to yes will have the query start a session. The session ID will be returned in the result (in the exist:session attribute on the exist:result element) and must be passed in the session-id attribute on subsequent requests.

§ session-id allows you to pass a previously created session identifier.

The text element is used to pass the query. You will most likely want to enclose its contents in a CDATA section (<text><<![CDATA[ ... ]]></text>) to avoid XML parsing errors.

The properties element can be used to set serialization properties like indenting and character encoding:

<properties>

<property name = string

value = string >*

</properties>

A list of serialization properties can be found in “Serialization Options”.

Ad Hoc Querying

There are several ways to query the database on a more “ad hoc” basis; that is, not really to create applications with, but rather to explore and experiment.

Querying using eXide

To query using eXide, simply fire up eXide from the dashboard, type your query in an empty document, and click Run (or press Ctrl-Return), as shown in Figure 5-2.

Figure 5-2. Ad hoc querying in eXide

Querying using the eXist Client tool

eXist’s Java Admin Client tool can be used to query the database too. Click on the binoculars icon and the XQuery dialog will open, as Figure 5-3 shows.

Figure 5-3. The XQuery dialog in eXist’s Java Admin Client tool

You can type queries (or open them from files), view the results, and even get a trace of their execution.

Updating Documents

A database wouldn’t be of much use if you couldn’t update its contents. Of course, you can always replace complete documents with new, updated ones, but for larger documents that’s not very efficient. Therefore, XML databases—and eXist is no exception—provide mechanisms to update specific nodes within XML documents directly.

A document update in eXist has a relatively large overhead. It creates a transaction, updates the XML, and updates the relevant indexes. This makes updating expensive. For this reason, try to do as much as possible in a single update. For instance, creating a node and then adding its attributes all in separate updates is not a good idea. So, try to avoid this (we’ll get to the exact syntax later):

let $elm as element() := doc('/db/Path/To/Some/Document.xml')/*

return (

update insert <NEW/> into $elm,

update insert attribute x { 'y' } into $elm/*[last()],

update insert attribute a { 'b' } into $elm/*[last()]

)

Instead, create the node with its attributes as an XML fragment first and update your document in one go:

let $elm as element() := doc('/db/Path/To/Some/Document.xml')/*

return

update insert <NEW x="y" a="b"/> into $elm

Or:

let $elm as element() := doc('/db/Path/To/Some/Document.xml')/*

let $new-element as element() := <NEW x="y" a="b"/>

return

update insert $new-element into $elm

Also be aware that updating documents is not done in isolation. Updating something that another running XQuery script is using at the same time may lead to unanticipated results.

eXist has two mechanisms to directly update XML documents:

The eXist XQuery update extension

This is an eXist-specific extension to XQuery (based on an early draft of the XQuery update specification) that allows you to write XQuery statements that alter the database. This is the preferred mechanism for performing in-place updates.

XUpdate

You specify your desired alterations to the database in an XML document using the XUpdate syntax, and then pass it to the XUpdate processor to have them applied. The XUpdate specification is no longer maintained and so using this mechanism is discouraged, but it’s still present mainly for backward compatibility reasons.

eXist’s XQuery Update Extension

eXist defines its own XQuery language constructs to update documents in the database. These language constructs follow a proposal from Patrick Lehti.

To get a feeling for it, here is an example of an XQuery update adding a log message to an XML logfile:

update insert <LogEntry>Something happened...</LogEntry> into

doc('/db/logs/mainlog.xml')/*

The following (somewhat more complicated) example makes sure that the number of messages does not exceed 10, and that the most recent message is on top:

let $document := doc('/db/logs/mainlog.xml')

let $newentry := <LogEntry>Something happened...</LogEntry>

return

update delete $document/*/LogEntry[position() ge10],

if (exists($document/*/LogEntry[1]))

then update insert $newentry preceding $document/*/LogEntry[1]

else update insert $newentry into $document/*

All eXist XQuery update statements start with the keyword update, followed by an update action. Available actions are delete, insert, rename, replace, and value.

The return type of an update statement is always the empty sequence ().

An eXist update statement can only be used to update persistent XML documents stored in the database. It cannot be used to update temporary documents or document fragments stored in memory. For example, the following code fragment is illegal and will result in an error:

(: This is invalid: :)

let $document as document-node() := <Root><a/></Root>

return

update insert <b/> into $document/*

You can use eXist’s update statements anywhere in your XQuery main code or function bodies. However, take care when using them inside a FLWOR expression return statement. Update statements take effect immediately, and changing the structure your query is looping over may lead to some unexpected results!

update delete

update delete simply deletes nodes. Its syntax is:

update delete expr1

where expr1 is an XQuery expression resolving to any kind of node. All nodes in expr1 will be deleted by this statement.

Note that you cannot delete document root elements with update delete.

update insert

update insert inserts content into an element node. Its syntax is:

update insert expr1 [ into | following | preceding ] expr2

where expr1 is an XQuery expression resolving to the content sequence to insert, and expr2 is an XQuery expression resolving to the content sequence to insert into. It must resolve to one or more element nodes. If it contains more than one element node, the insertion takes place for all of them.

Where to insert is determined by the keywords into, following, and preceding:

into

Appends the content in expr1 after the last child element node of expr2

following

Inserts the content in expr1 immediately after the element node expr2

preceding

Inserts the content in expr1 immediately before the element node expr2

update rename

update rename renames nodes. Its syntax is:

update rename expr1 as expr2

Here, expr1 is an XQuery expression resolving to element or attribute nodes. These are the nodes to rename. expr2 is an XQuery expression. From the result of this expression, the string value of the first item is used as the new name.

Note that you cannot rename document root elements using update rename. Only nodes with a parent element node can be renamed.

update replace

update replace replaces element, attribute, or text nodes. Its syntax is:

update replace expr1 with expr2

where expr1 is an XQuery expression resolving to a single element, attribute, or text node, and expr2 is an XQuery expression. Rules and treatment depend on the type of expr1:

§ When expr1 is an element node, expr2 must be an element node too.

§ When expr1 is an attribute or text node, the value of expr1 is replaced by the concatenated string value of expr2.

As an example of the second case, the following update statement replaces the value of the name attribute on the root element of the given document with aaabbb:

update replace doc('/db/test/test.xml')/*/@name with

<a>aaa<b>bbb</b></a>

Note that you cannot replace document root nodes using update replace. Only nodes with a parent element node can be replaced.

update value

update value updates the content of nodes. Its syntax is:

update value expr1 with expr2

Its functionality is equivalent to update replace, but it updates only values, not nodes.

XUpdate

XUpdate is an older mechanism to change the contents of the database in an indirect way. You first specify the required changes in an XML document using the XUpdate syntax. You then pass this document to an XUpdate processor that applies them. The full XUpdate specification can be found at http://xmldb-org.sourceforge.net/xupdate/xupdate-wd.html. Using the XUpdate mechanism is mostly discouraged, but it’s still there for backward compatibility.

In Example 5-1, the XUpdate document inserts a log message in a logfile, before any existing ones.

Example 5-1. XUpdate document

let $xupdate-specification :=

<xupdate:modifications version="1.0" xmlns:xupdate="http://www.xmldb.org/xupdate">

<xupdate:insert-before select="/Log/LogEntry[1]" >

<LogEntry>Something happened...</LogEntry>

</xupdate:insert-before>

</xupdate:modifications>

let $update-count := xmldb:update('/db/logs', $xupdate-specification)

return

<p>The number of log entries added was {$update-count}</p>

XUpdate XML format

An XUpdate document is an XML document that contains one or more XUpdate XML fragments. An XUpdate XML fragment always uses the namespace http://www.xmldb.org/:

<modifications xmlns = "http://www.xmldb.org/xupdate"

version = "1.0">

( insert-before | insert-after | append | update | remove )*

</modifications>

§ The version attribute always has the fixed value "1.0".

Inside a modifications element, you can specify the updates using certain XML elements (all of these elements have a select attribute that must contain an XPath expression evaluating to a node set):

insert-before, insert-after

Inserts content before or after the node(s) specified in the select attribute

append

Appends content as a child of the node(s) specified in the select attribute

update

Updates the contents of the node(s) specified in the select attribute

remove

Removes the node(s) specified in the select attribute

To specify new content, you can use a direct XML fragment as shown in Example 5-1 or use one of the following constructions (all elements in the http://www.xmldb.org/xupdate namespace):

element name="..."

Creates an element node.

attribute name="..."

Creates an attribute node. The contents of the element will become the attribute’s value.

text

Creates a text node. The contents of the element will become the node’s value (a.k.a. the text).

NOTE

The original XUpdate specification also mentions processing-instruction, comment, and using variables with variable and value-of. This is not supported in eXist.

Executing XUpdate

To execute an XUpdate document, eXist contains the following XQuery extension function:

xmldb:update($collection-uri as xs:string, $modifications as node())

as xs:integer

Here, $collection-uri is the collection the XUpdate is applied to. Notice that this is a collection, so the XUpdate is performed against all documents in the collection (and any subcollections, recursively). Make sure your XUpdate specification targets the right XML document(s) and/or design your collection structure carefully to constrain this.

$modifications contains your XUpdate document.

The function returns the number of updates applied.

Controlling the Database from Code

Learning how to control the database from XQuery code is important because when writing an application, sooner or later you’ll want to create or delete documents and collections, change their properties, interrogate them, and so on. It’s all part of the game.

Most functions for controlling the database are in eXist’s xmldb extension module. To find the exact details of these functions, refer to the online eXist module documentation in the XQuery Function Documentation app in the dashboard. In this section, we will provide you with an overview.

This subject is somewhat dependent on that of security. For instance, if you try to create a document but the parent collection doesn’t allow the current user to do so, you have a problem. Also, after you’ve created something you’ll probably want to set its security properties (owner, group, permissions), right? Security is a big subject and is handled in depth in Chapter 8. This section provides you only with the basic information of how to work with the various security-related settings, not their meaning. In general, eXist’s security system closely mimics that of UNIX systems.

Specifying Collections and Resources for the xmldb Extension Module

The xmldb extension module is somewhat fickle in how it handles addressing collections and resources:

§ A collection must always be passed as a URL-escaped URI (of type xs:string), as specified in “Use URIs”. For instance:

xmldb:get-child-resources("/db/new%20collection")

§ A resource name can be passed as a URL-escaped URI, but surprisingly also as a normal, nonescaped string. Therefore, the following function calls are equivalent:

xmldb:size("/db/new%20collection", "new document.xml")

xmldb:size("/db/new%20collection", "new%20document.xml")

Accessing external databases using extended XMLDB URIs

The XMLDB extension function collection parameters all accept an extended syntax for the XMLDB URIs, as described in “XMLDB URIs”:

xmldb:exist://username:password@server:port/exist/xmlrpc/db/...

You can use this to access remote servers and update their databases directly.

Getting Information

There are many functions in eXist’s xmldb extension module that can provide you with information about the database’s content. Here is an overview of the most important ones:

xmldb:last-modified, xmldb:size, xmldb:get-mime-type

Retrieve the basic properties of a resource.

NOTE

Most xmldb functions take two parameters—a collection and a document URI—but xmldb:get-mime-type is an exception (for unknown reasons). It takes the full URI to the document as its input.

Additionally, you should note that xmldb:size will not give you the exact size of an XML resource, but an estimate based on the number of database pages the document occupies.

xmldb:get-owner, xmldb:get-group, xmldb:get-permissions

Retrieve the security settings for a collection or resource.

Permissions are returned as integer values; the function xmldb:permissions-to-string turns these into something more readable. It is strongly recommended instead to use the newer sm:get-permissions function from the Security Module instead.

xmldb:get-child-collections, xmldb:get-child-resources

Provide you with sequences of the child collections or resources of a given parent collection. You can use these functions to traverse and inspect the database’s collection/resource structure. Example 5-2 is a little XQuery program that displays the database’s content. It uses a recursive function to traverse the collection tree, which is a pattern you’ll see quite often in XQuery code.

Example 5-2. Traversing and displaying the database structure

xquery version "1.0" encoding "UTF-8";

declare option exist:serialize "method=html media-type=text/html indent=no";

declare function local:traverse-collection($collection as xs:anyURI,

$indent as xs:integer) as element(p)*

{

for $sub-collection inxmldb:get-child-collections($collection)

return

(

<p style="margin-left: {$indent}pt"><b>{$sub-collection}</b></p>,

local:traverse-collection(xs:anyURI(concat($collection, '/',

$sub-collection)), $indent + 10),

for $document inxmldb:get-child-resources($collection)

return

<p style="margin-left: {$indent + 5}pt">{$document}</p>

)

};

<body>

{

local:traverse-collection(xs:anyURI('/db/apps/exist-book'), 0)

}

</body>

Creating Resources and Collections

For creating resources and collections, the following functions are available:

xmldb:create-collection

As its title implies, it creates a new collection in the database. It will return the path to the new collection when successful, or the empty sequence otherwise.

xmldb:store

Creates a new resource, storing some data passed as a parameter. It will return the path to the new resource when successful, or throw an error when unsuccessful.

Specifying what to store is quite flexible: you can pass data directly (for instance, as an XML fragment or a string), and it will be stored as the new resource. However, when the data is passed as type xs:anyURI, this is taken as the URI to the data and eXist will try to read it from there.

There are two variants for this function: one where eXist will try to guess the Internet media type of the data to store, and one where you can explicitly specify this.

xmldb:store-files-from-pattern

Bulk-loads files from the filesystem. There are several variants of this function that allow you more or less control over what is stored.

Setting Permissions

Every collection or resource that you create will be assigned the current user, group, and default security permissions. That might not be what you want, so it’s quite common to change these after creating a collection or resource. Of course, there might also be other situations where you have to change some security setting. You can use the following functions to do this:

xmldb:set-collection-permissions, xmldb:set-resource-permissions

Change the user, group, and/or permissions for a collection or resource.

Notice that permissions must be passed in as integer values. The function xmldb:permissions-to-string might be of help here.

WARNING

While the xmldb:set-collection-permissions and xmldb:set-resource-permissions functions are still available, they are in fact deprecated by the newer sm:chmod and sm:chown functions in the Security Manager module (see “Executing XQuery functions”).

Moving, Removing, and Renaming

The following functions can be used to move, rename, and remove collections and resources:

xmldb:move

Moves collections and resources from one location (parent collection) to another

xmldb:rename

Renames collections or resources

xmldb:remove

Removes (deletes) collections or resources

WARNING

Be warned, removing collections or resources is permanent: there is no such thing as a trash can collection in eXist!