Advanced Topics - eXist: A NoSQL Document Database and Application Platform (2015)

eXist: A NoSQL Document Database and Application Platform (2015)

Chapter 16. Advanced Topics

This chapter introduces some of the more advanced things that can be done with eXist, such as scheduled tasks, triggers, and internal modules. Many, although not all, of the examples in this chapter are developed in Java as plug-ins to eXist. We do not attempt to teach Java in any way in this chapter; rather, where the examples are in Java, we have tried to make them simple, self-explanatory, and easily usable so that even those without Java experience can learn from and make use of them. However, some programming experience will undoubtedly assist you.

WARNING

If you wish to extend and use additional external libraries for the Java projects in this chapter whose code is deployed into $EXIST_HOME/lib/user, be aware that if eXist also uses the same libraries (consult the files in the subfolders of $EXIST_HOME/lib), you must ensure that you use exactly the same versions of the libraries as provided with eXist. To achieve this, you must set the correct version on the dependency in your pom.xml file, and also set the scope of the dependency to provided. This is required because your Java code will be running in the same JVM as eXist and therefore uses the same class loader; in this manner of operation, it is only possible to have a single version of a specific class.

XQuery Testing

Creating tests to assert the correctness of your code and to prevent regressions in the future can be desirable for many reasons. Over the last decade there has been a serious focus within software engineering to provide better testing tools for programmers. There have also been a number of testing philosophies and methodologies developed, such as test-driven development (TDD) and behavior-driven development (BDD), which provide guidance on how to deliver robust and well-tested software.

Many programming languages offer various integrated or third-party tools and libraries for facilitating the structured testing of applications. While XQuery is a very high-level functional language, it should certainly not be considered exempt from good testing practices. Thus far, there have been many attempts at creating frameworks for testing XQuery modules and functions, including XTC, XQUT, XSpec, XRay, and Unit Module. Most of these frameworks are, unfortunately, implementation-specific, due to the current lack of reflective capabilities in XQuery. There are many different types of tests that can be constructed, but all of the XQuery test frameworks to date have focused on the unit testing type of tests. Unit tests are designed to allow you to assert the behavior of a small unit of code, ideally in isolation from the rest of the system. In this chapter, we will also focus on writing unit tests in XQuery.

For many years eXist has provided an XQuery testing mechanism within its own test suite to its core developers, allowing them to write tests in XQuery to assert the correct behavior of eXist. Called XQuery Test Runner, it was never particularly designed with the needs of other XQuery implementations or developers in mind, and it proved somewhat clunky as test suite descriptions had to be written in a separate file using a specific XML DSL.

As of version 2.1 eXist now officially provides XQSuite, a unit test framework designed to be used by any XQuery developer and in such a way that it could be implemented by any XQuery 3.0 vendor. XQSuite provides a standard set of XQuery 3.0 function annotations that set up test parameters and make assertions about the results. One of the most interesting characteristics of XQSuite is that it allows you to place your tests directly onto the function you wish to test. Should you wish, however, you can also choose to keep your tests separate from your code (in a different module), by creating wrapper functions that have the XQSuite annotations that simply call your functions under test. Even if you do choose to place the XQSuite annotations onto the functions under test, your code is still potentially portable to platforms other than eXist, as the XQuery 3.0 specification states that if an implementation does not understand an annotation, it can just ignore it.

Perhaps the best way to learn how to use XQSuite is to look at some examples. The examples used here are supplied in the chapters/advanced-topics/xquery-testing folder of the book-code Git repository (see “Getting the Source Code”). Imagine that you have an XQuery library module for producing identifiers, as shown in Example 16-1 (see the file id.xqm).

Example 16-1. Simple module for producing identifiers (id.xqm)

xquery version "3.0";

module namespace id = "http://example.com/record/id";

import module namespace util = "http://exist-db.org/xquery/util";

declare variable $id:ERR-PRESENT := xs:QName("id:ERR-PRESENT");

declare function id:insert($record as element(record)) as element(record) {

if($record/id)then

fn:error($id:ERR-PRESENT, "<id> is already present in record!", $record/id)

else

<record>

{

$record/@*,

<id>{id:generate()}</id>,

$record/node()

}

</record>

};

declare function id:generate() as xs:string {

let $id := id:random()

return

if(exists(collection("/db/records")/record/id[. eq$id]))then

id:generate()

else

$id

};

declare function id:random() as xs:string {

codepoints-to-string(

((1 to 8) ! util:random(26))

! (.[. lt10] + 48, .[. ge10] + 87)

)

};

Let’s first write some test cases for the function id:insert. Here are three things that leap to mind that we might like to write test cases for:

§ If we send it a document that already has an id element, it raises the error id:ERR-PRESENT.

§ If we send it a document without an id element, it adds an id element for us with some sort of identifier.

§ If we send it a document without an id element, it adds an id element for us with some sort of identifier; otherwise, the result document is indiscernible from the original.

That is to say, it has the same descendant nodes (we check this as the function is really creating a copy of the input and modifying it).

TIP

Arguably, the second and third test cases just described have some overlap and could be merged into one. However, having two separate tests gives us more granularity in understanding the problems if or when tests fail. For instance, it is entirely possible that the second test case could pass while the third test case fails, which tells us the issue is with copying the descendant nodes and not with the generation of the id element. Writing fine-grained tests that allow you to quickly discover where bugs or regressions occur can help expedite bug fixes.

Let’s now look at how we might write our first test case, which was:

If we send it a document that already has an id element, it raises the error id:ERR-PRESENT.

We will begin by modifying the id:insert function by adding some XQSuite annotations (see the file id-1.xqm), as shown in Example 16-2.

Example 16-2. id.xqm with first test case (id-1.xqm)

xquery version "3.0";

module namespace id = "http://example.com/record/id";

import module namespace util = "http://exist-db.org/xquery/util";

declare namespace test = "http://exist-db.org/xquery/xqsuite";

declare variable $id:ERR-PRESENT := xs:QName("id:ERR-PRESENT");

declare

%test:args("<record><id>existing</id></record>") 1

%test:assertError("id:ERR-PRESENT") 2

function id:insert($record as element(record)) as element(record) {

if($record/id)then

fn:error($id:ERR-PRESENT, "<id> is already present in record!", $record/id)

else

<record>

{

$record/@*,

<id>{id:generate()}</id>,

$record/node()

}

</record>

};

(: Unchanged remaining code omitted for brevity... :)

1

The %test:args annotation provides values to the function’s arguments when the test case is run. If your function takes more than one argument, you can supply them one after another—for example, %test:args("arg1", "arg2", "argN").

2

The %test:assertError annotation asserts that, for the previously provided args, the function must throw the error (code) that is named—in this case, id:ERR-PRESENT.

OK, so now we have seen how to annotate our function with some parameters for our test case and an assertion about the expected result of executing that function using those parameters. However, we have not yet run our test case—so how can we actually have eXist execute our test case and return a report of whether it succeeded or failed? Well, of course, we have to write another little bit of XQuery! We’ll create an XQuery main module so we can directly execute it, and from this XQuery we will invoke XQSuite against the functions in our XQuery library module, id-1.xqm. Such an XQuery main module is known as a test runner (see the file test-runner.xq) and is demonstrated in Example 16-3.

Example 16-3. Test runner XQuery (test-runner.xq)

xquery version "3.0";

import module namespace inspect = "http://exist-db.org/xquery/inspection"; 1

import module namespace test = "http://exist-db.org/xquery/xqsuite" at

"resource:org/exist/xquery/lib/xqsuite/xqsuite.xql"; 2

let $modules := (

xs:anyURI("/db/apps/exist-book/chapters/advanced-topics/xquery-testing/id-1.xqm")

) 3

let $functions := $modules ! inspect:module-functions(.) 4

return

test:suite($functions) 5

1

We import the eXist XQuery inspection module, so that we can reflectively find all the functions in our module we wish to test.

2

We import the XQSuite test framework XQuery module, so that we can subsequently run our test suite.

3

We prepare a sequence of URIs of all modules that we are interested in testing. You can add more modules to this sequence if you wish.

4

We call inspect:module-functions for each module we wish to test, to get a list of all functions in all our modules.

5

We call test:suite, passing in all the functions. XQSuite will operate only on those functions that have XQSuite annotations, so we need not worry specifically about which functions we pass in.

Running the test runner should yield an xUnit result document that looks something like:

<testsuites>

<testsuite package="http://example.com/record/id"1

timestamp="2014-07-02T17:55:43.747+01:00"

failures="0" 2

tests="1" 3

time="PT0.023S"> 4

<testcase name="insert" class="id:insert"/>

</testsuite>

</testsuites>

1

Note that the package name used in the xUnit output matches the namespace of our module under test.

2

Here we can see how many of our tests failed. In this case there were 0 failures, so everything must have passed (succeeded).

3

Here we can see the total number of tests run. In this case there was 1 test run. The number of tests passed is calculated by passed = tests - failures; therefore, there was 1 passed test.

4

Here you can see the time taken to run a specific test suite. This can be useful for measuring performance loss or gains when refactoring code.

Great! Now we have one test case that we can run that helps us prove the correctness of our code and guard against regressions. What if we want more than one test case per function? Let’s now look at how we would add our second test case, which was:

If we send it a document without an id element, it adds an id element for us with some sort of identifier.

We again modify our id:insert function to insert the additional test case (see the file id-2.xqm), as shown in Example 16-4.

Example 16-4. id.xqm with two test cases (id-2.xqm)

declare

%test:args("<record><id>existing</id></record>")

%test:assertError("id:ERR-PRESENT")

%test:args("<record/>") 1

%test:assertXPath("$result/exists(id)") 2

%test:assertXPath("not($result/empty(id))") 3

function id:insert($record as element(record)) as element(record) {

if($record/id)then

fn:error($id:ERR-PRESENT, "<id> is already present in record!", $record/id)

else

<record>

{

$record/@*,

<id>{id:generate()}</id>,

$record/node()

}

</record>

};

1

We have added a second %test:args annotation to our function for our second test case. Every %test:args or set of %test:arg annotations, delimited by one or more assert annotations, forms a distinct test case.

23

For this test case we have two assertions—first that the result contains an element id, and second that the id has some child content. You may have as many assertions about the result of the function for each test case as you wish.

In this test case we are using %test:assertXPath instead of %test:assertError, as before. %test:assertXPath allows us to evaluate an arbitrary XPath expression against the $result of the function.

Finally, let’s look at how we would add our final test case, which was:

If we send it a document without an id element, it adds an id element for us with some sort of identifier; otherwise, the result document is indiscernible from the original.

We again modify our id:insert function to add the final test case (see the file id-3.xqm), as shown in Example 16-5.

Example 16-5. id.xqm with three test cases (id-3.xqm)

declare

%test:args("<record><id>existing</id></record>")

%test:assertError("id:ERR-PRESENT")

%test:args("<record/>")

%test:assertXPath("$result/exists(id)")

%test:assertXPath("not($result/empty(id))")

%test:args("<record a='1'><child1>text1</child1></record>") 1

%test:assertXPath("$result/exists(id)") 2

%test:assertXPath("$result/@a eq '1'") 3

%test:assertXPath("local-name(($result/child::element())[1]) eq 'id'") 4

%test:assertXPath("local-name(($result/child::element())[2]) eq 'child1'") 5

%test:assertXPath("$result/child1/text() eq 'text1'") 6

function id:insert($record as element(record)) as element(record) {

if($record/id)then

fn:error($id:ERR-PRESENT, "<id> is already present in record!", $record/id)

else

<record>

{

$record/@*,

<id>{id:generate()}</id>,

$record/node()

}

</record>

};

1

We set the argument for the function for our test case to be an XML element that contains both attributes and descendant nodes, as we want to make sure the result is properly constructed.

2

This is the same assertion as from our last test case, to ensure that an id element is added to the record.

3456

These are several assertions to ensure that the record element returned by the function contains all of the nodes in the same order that the record element provided them as the argument to the function.

Executing our test runner again (test-runner.xq) now produces an xUnit result document similar to the following:

<testsuites>

<testsuite package="http://example.com/record/id"

timestamp="2014-07-02T18:26:29.784+01:00"

failures="0" tests="3" time="PT0.064S"> 1

<testcase name="insert" class="id:insert"/> 2

<testcase name="insert" class="id:insert"/> 3

<testcase name="insert" class="id:insert"/> 4

</testsuite>

</testsuites>

1

We can now see that all three of our test cases are executing.

234

Unfortunately, at present XQSuite shows the same name for all of our test cases.

We have now written our three test cases for our id:insert function, but what about the id:generate and id:random functions? Well, of course, we could write some further test cases for these using XQSuite annotations—but wait a minute, maybe there are some difficulties in testing these functions! These two functions have in fact been written to illustrate problems that can arise in writing test cases.

First, let’s consider the issues with the id:random function. This function, as its name implies, will return something random each time it is called (in this instance, an eight-character alphanumeric string). This function exhibits a trait known as nondeterministic behavior, which means that each time the function is called, it may produce a different result. This makes testing harder, as in this case we cannot make assertions about the exact return value of the function using the XQSuite annotation %test:assertEquals. Instead, we can only make generalizations such as The result must be a string of 8 characters and “The result can only contain the characters a to z and 0 to 9.” Unfortunately, this nondeterminism is spread throughout our identifier’s module, as each function eventually calls id:random. We could solve this in some test frameworks in other languages by introducing mocks that would effectively replace the underlying call to util:random with some static value, which we could then use as the basis for assertions about the deterministic results of our tests—that is, asserting that our algorithm for encoding from numbers into a charset is entirely correct. Regrettably, XQSuite does not yet support function mocks.

So, what about testing our id:generate function? Well, again, we have the nondeterminism problems of id:random, but here they are amplified by the contents of a database collection because the database itself is in a mutable state, which can lead to even worse nondeterministic results. Again, if only function mocking were supported, we could provide a deterministic substitute for the database collection.

WARNING

There are also other potential problems with how id:generate is implemented and would be used in a multiuser system, but we will leave that as a mental exercise for the reader.

However, all is not lost! If we were to change our id:generate function to take a path to a collection as an argument rather than the currently hardcoded one, perhaps we could create a test collection and prime it with deterministic data for our test case, before we run it. Afterward, when we are done with the test data, we would be courteous and remove our test collection, as is good practice. Indeed, we can do such a thing by providing custom setup and teardown functions that run before and after our test functions, respectively. To declare a function as asetup or teardown function, you can use the %test:setUp and %test:tearDown annotations. So let’s look briefly at a refactored id:generate function with test cases (see the file id-4.xqm), as shown in Example 16-6.

Example 16-6. Refactored id.xqm to inject collection path (id-4.xqm)

declare

%test:setUp 1

function id:_test-setup() {

xmldb:create-collection("/db", "test-records"),

xmldb:store("/db/test-records", (), <record><id>12345678</id></record>),

xmldb:store("/db/test-records", (), <record><id>abcdefgh</id></record>)

};

declare

%test:tearDown 2

function id:_test-teardown() {

xmldb:remove("/db/test-records")

};

declare

%test:args("/db/test-records") 3

%test:assertXPath("$result ne '12345678'") 4

%test:assertXPath("$result ne 'abcdefgh'") 5

function id:generate($records-collection as xs:string) as xs:string { 6

let $id := id:random()

return

if(exists(collection($records-collection)/record/id[. eq $id]))then

id:generate($records-collection)

else

$id

};

1

The %test:setUp annotation will cause the id:_test-setup function to be executed once before each function under test.

In this instance, we create the test data collection /db/test-records and place two records in it containing the ids 12345678 and abcdefgh.

2

The %test:tearDown annotation will cause the id:_test-teardown function to be executed once after each function under test.

In this instance, we clean up the test data we created in the before step by removing the collection /db/test-records.

3

We inject the test data collection /db/test-records into our function under test as an argument.

45

We make assertions that the function should never return the ids of the records in the test data collection.

Note that this is quite a brittle set of test assertions, as the ids are generated randomly, so we may never hit the edge cases. Hopefully, though, it shows you what is possible.

6

Our newly refactored function now accepts the path to the records collection as a parameter, which allows us to use our test data collection when testing, and the real collection otherwise.

We have really only scratched the surface of both the larger topic of testing and the specifics of XQSuite here, but hopefully we’ve provided enough information to start you thinking about testing and XQuery code quality. XQSuite provides several other annotations that allow you to pass parameters in different ways and make different types of assertions about the results of a function. For further information, consult the XQSuite documentation.

Versioning

eXist provides two simple versioning mechanisms, which, while not applicable to all use cases, can be useful to those who wish to track the revision history of certain types of documents. Both mechanisms can be configured on a per-collection hierarchy basis, which enables you to switch versioning on and off for various document collections within your database.

Historical Archiving

The historical archival facility will make an archival copy of any document before it is deleted or overwritten. The archival copy will be placed into a mirror of the collection tree under the archival collection /db/history by default.

While this is not versioning in the strictest sense, it can be valid for many applications. Arguably, you could also achieve basic versioning, if all updates to a document are performed by the user as document replacements. However, the real purpose of this facility is to effectively make documents immutable for archival purposes.

Historical archiving is implemented in a document trigger written in Java called org.exist.collections.triggers.HistoryTrigger. For more information on database triggers and configuring them, see “Database Triggers”. If you wish to configure the HistoryTriggerfor a collection in your database, you need to add the following to the triggers section of your collection’s configuration document (collection.xconf):

<triggers>

<trigger class="org.exist.collections.triggers.HistoryTrigger">

<!--

Collection to store the archival copies in;

if omitted, then the collection /db/history

is used.

-->

<parameter name="root" value="/db/system/archival"/>

</trigger>

</triggers>

Archival copies will be stored into a mirror of their original collection path within the archival collection. In addition, a collection is created for each document using its name, and the archival copy is stored within this collection using a timestamp for its name that reflects the previous last-modified date of the document.

Document Revisions

The document revision versioning mechanism in eXist is much more comprehensive than the historical archiving mechanism (described in the previous section).

However, you should note that it is not well suited to large collections of data-centric XML or binary documents; rather, it is designed with human editors in mind and for use with modest collections of XML documents. For example, if you are working as part of a team with published articles or humanities texts, then the versioning mechanism may be useful for you, but if you are streaming gigabytes of server log data into eXist it may not scale to your needs.

The versioning mechanism in eXist works well enough for many cases, but while eXist knows the version history of your documents, there is little support from XML editor tools that connect to eXist to expose this information to the end user. eXist does provide some basic tools for you to visually examine the version history of documents and also to interrogate this from XQuery, but these tools are rudimentary at best. If you wish to use the versioning mechanism in a larger enterprise, you might want to consider building your own tools atop the XQuery functions that eXist exposes.

The document revision versioning mechanism has two main parts:

§ The versioning actions are implemented in a document trigger written in Java called org.exist.collections.triggers.VersioningTrigger. For more information on database triggers and configuring them, see “Database Triggers”.

§ Version write conflict avoidance is implemented in a serialization filter written in Java called org.exist.versioning.VersioningFilter. The use of the filter is optional, but it potentially stops two users from editing the same document outside of eXist and overwriting each other’s changes.

When the VersioningTrigger is configured on a collection, it will store information about revisions to documents within that collection hierarchy in a mirrored collection hierarchy, under /db/system/versions. Specifically, the version trigger performs the following operations on various document events:

§ The first time a document is changed:

§ Before it is changed, a copy will be made and stored into the appropriate mirrored collection within /db/system/versions; it will have the same filename as the original but with the additional suffix .base. This is known as the document base revision.

§ After it is changed, a document describing the differences from the document base revision to the new revision is also stored into the appropriate mirrored collection within /db/system/versions; it will have the same filename as the original but with an additional ordinal suffix indicating the revision number. This is known as a diff document. These documents use an XML format that is specific to eXist. For binary documents we can only say that one document replaced another, so we also store a copy of the new binary document with the ordinal suffix and an additional .binary suffix. For XML documents the diff document also describes node-level insertion, deletion, and append operations; a change is modeled as a delete and insert.

§ The 1+n time a document is changed:

§ After it is changed, like the first time it is changed, a document describing the differences from the document base revision to the new revision is again stored into the versions collection, with an ordinal suffix (and if it is a binary document, then that is likewise stored again with the .<ordinal>.binary suffix). Just as before, this is known as a diff document; however, it is worth noting that every diff is between the current document and its document base revision, not the most recent revision.

NOTE

Diff documents are absolute rather than incremental. The advantage of this is that it is very simple to understand the changes from the original document to the current document. The downside, however, is that diff documents become increasingly large over time, and you have to replay each diff independently to see the changes over time (which repeats many operations).

§ When a document is copied or moved, the behavior just described is applied to the destination document (assuming it is also in a versioned collection).

For example, say we have the collection /db/actors for which we have enabled the VersioningTrigger, and we have a document stored in that collection called michael-rennie.xml, which looks similar to:

<actor>

<name>

<given>Michael</given>

<family>Rennie</family>

</name>

<born>1909</born>

<desceased>1971</desceased>

<abstract>The British actor Michael Rennie worked as a car salesman and factory

manager before he turned to acting.</abstract>

</actor>

Ah, but then we realize that we have spelled deceased wrong in our element, and decide to update the document to fix this. Subsequently, we then see the documents shown in Figure 16-1 in the collection /db/system/versions/db/actors.

Figure 16-1. Versioning trigger, first revision documents

While the michael-rennie.xml.base document is just a copy of the original XML document, the michael-rennie.xml.1 document is the diff document, whose ordinal suffix immediately indicates to you that it is the first revision. The content of the document looks like:

<v:version xmlns:v="http://exist-db.org/versioning">

<v:properties>

<v:document>michael-rennie.xml</v:document>

<v:user>admin</v:user>

<v:date>2014-03-09T14:59:18.954Z</v:date>

<v:revision>1</v:revision>

</v:properties>

<v:diff>

<v:delete event="start" ref="1.3"/>

<v:append ref="1.3">

<v:end name="deceased"/>

</v:append>

<v:delete event="end" ref="1.3"/>

<v:insert ref="1.3">

<v:start name="deceased"/>

</v:insert>

</v:diff>

</v:version>

As expected, the diff document clearly shows the rename as a process of deleting the element with the incorrect name and then replacing it with a new one with the correct name.

Finally, let’s update the document by adding some extra information to the abstract element. Following the update, we then see the documents shown in Figure 16-2 in the collection /db/system/versions/db/actors.

Figure 16-2. Versioning trigger, second revision documents

There is now a second diff document (michael-rennie.xml.2), and if we examine it, its content should look something like this:

<v:version xmlns:v="http://exist-db.org/versioning">

<v:properties>

<v:document>michael-rennie.xml</v:document>

<v:user>admin</v:user>

<v:date>2014-03-09T15:15:28.342Z</v:date>

<v:revision>2</v:revision>

</v:properties>

<v:diff>

<v:delete event="start" ref="1.3"/>

<v:append ref="1.3">

<v:end name="deceased"/>

</v:append>

<v:delete event="end" ref="1.3"/>

<v:delete ref="1.4.1"/>

<v:insert ref="1.3">

<v:start name="deceased"/>

</v:insert>

<v:insert ref="1.4.1">The British actor Michael Rennie worked as a car

salesman and factory manager before he turned to acting. A meeting

with a Gaumont-British Studios casting director led to Rennie's first

acting job - that of stand-in for Robert Young in Secret Agent

(1936), directed by Alfred Hitchcock.</v:insert>

</v:diff>

</v:version>

We can see that the second diff document contains both the changes we made in this revision (i.e., updated text in the abstract element) as well as all changes from previous revisions. This is because, as previously mentioned, each diff document is between the current document and the document base revision.

If you wish to configure the VersioningTrigger for a collection in your database, you need to add the following to the triggers element within your collection’s configuration document (collection.xconf):

<triggers>

<trigger class="org.exist.versioning.VersioningTrigger">

<!--

Whether to try and avoid

write conflicts on versioned

documents.

Set to 'false' to avoid write conflicts,

or 'true' to ignore write conflicts and

overwrite the later revision.

-->

<parameter name="overwrite" value="false"/>

</trigger>

</triggers>

It is worth noting that you can also interrogate document revisions from XQuery by using the versioning XQuery module, as discussed in versioning.

Write conflict avoidance

Being able to edit a document and have a history of revisions created for you is all well and good, but what happens if multiple people start editing the document at the same time, and all wish to save their changes independently?

When two users open the same document—which is, say, at revision 1—and then make changes, when the first user saves the document he is effectively creating revision 2. What should happen when the second user attempts to save her update of revision 1? Mitigating the issue of revision 2 being overwritten with a new version based on revision 1 is known in eXist as write conflict avoidance.

As most XML editor clients that connect to eXist have no awareness of resource versioning within eXist, if you enable write conflict avoidance, eXist solves that problem by prohibiting later revisions from being overwritten by updates to earlier revisions. Figure 16-3 attempts to show how eXist solves this.

As shown in Figure 16-3, write conflict avoidance comes in two parts:

1. The versioning trigger can be configured to avoid changes being overwritten by changes to an earlier revision. However, the versioning trigger can respond only to changes that it knows about. As the changes are potentially coming from a third-party client application, we need some mechanism to identify revisions within the documents that are being accessed; this is where the versioning filter comes in.

2. The versioning filter can be configured to add versioning attributes to the root element of any document that is retrieved from eXist and is from a collection that has the versioning trigger enabled on it. When this document is sent back to eXist, the versioning trigger will see the versioning attributes on the root element, act on them, and, if there is no conflict, remove them before storing the document into the database. If there is a conflict, it will reject the operation and prohibit the document from being stored.

If you wish to enable write conflict avoidance, you need to set the overwrite parameter on the versioning trigger to false in the collection configuration document (collection.xconf), enable the versioning filter in eXist’s configuration file ($EXIST_HOME/conf.xml), and then restart eXist. To enable the versioning filter, make sure the following definition is present and uncommented in the serializer element of the configuration file:

<custom-filter class="org.exist.versioning.VersioningFilter"/>

Figure 16-3. Write conflict avoidance between two users operating on the same versioned document

Once the versioning filter is enabled, if you request a versioned document from the database you will see three additional attributes on its root element: v:revision, v:key, and v:path. For example:

<actor xmlns:v="http://exist-db.org/versioning" v:revision="2"

v:key="144a903cec62" v:path="/db/actors/michael-rennie.xml">

<name>

<given>Michael</given>

<family>Rennie</family>

</name>

<born>1909</born>

<deceased>1971</deceased>

<abstract>The British actor Michael Rennie worked as a car salesman

and factory manager before he turned to acting. A meeting with a

Gaumont-British Studios casting director led to Rennie's first acting

job - that of stand-in for Robert Young in Secret Agent (1936),

directed by Alfred Hitchcock.</abstract>

</actor>

While adding the versioning attributes to the root element of the document is not ideal, it is really the only way that eXist can attempt to ensure that this information is preserved during a round-trip of the document out of the database, into a client editor, and back out again. The versioning attributes, however, are in their own namespace and so hopefully will not interfere with your document.

WARNING

If you are making use of the versioning filter, you must not remove the versioning attributes from the document if you plan to store it back into eXist in the same location, or you risk losing previous changes from the revision history!

Scheduled Jobs

eXist has a scheduler built into its core that enables you to schedule jobs to be executed at some point(s) in the future. Internally eXist wraps the Quartz scheduler, but it exposes a much simpler interface to the user and allows jobs to interact with the database. You can write your jobs in either XQuery or Java. XQuery jobs are simpler to implement ,while Java jobs give you more control over how the job is executed. There are two types of job that can be executed by the scheduler:

User jobs

This is the standard job type that users will typically implement in either XQuery or Java. User jobs may execute concurrently, and the same job may overlap with a previously scheduled execution if it is long-running.

System task jobs

System task jobs are solely for executing system tasks and may only be implemented in Java. System tasks execute when the database is switched into protected mode; no other transactions are permitted. System task jobs do not execute concurrently due to the restrictions on system tasks, and they cannot overlap with a previously scheduled execution. eXist makes use of the system task job both to schedule its synchronization task, which flushes the database journal to persistent storage, and to execute scheduled backups.

Scheduling Jobs

You can schedule jobs by setting up their configuration with the scheduler in eXist’s configuration file ($EXIST_HOME/conf.xml): you add jobs to the scheduler element indicated by the XPath /exist/scheduler. Remember that changes to the configuration file are only read when eXist is started.

TIP

So, what can you do if you want to schedule a job to run immediately, without restarting eXist?

Simply, you must add it to $EXIST_HOME/conf.xml so that it is persisted across restarts, and also submit it using the scheduler XQuery extension module (see scheduler) so that it is scheduled immediately without needing to restart.

ENABLING THE SCHEDULER XQUERY EXTENSION MODULE

The scheduler XQuery extension module (see scheduler) in eXist is not compiled or enabled by default. To enable it, you need to:

1. Stop eXist.

2. Set include.module.scheduler to true in $EXIST_HOME/extensions/build.properties.

3. Recompile eXist’s extension modules in place (see “Building eXist from Source”). Make sure to use the extension-modules Ant target!

4. Enable the module in eXist’s configuration file ($EXIST_HOME/conf.xml) by uncommenting the line <module uri="http://exist-db.org/xquery/scheduler" class="org.exist.xquery.modules.scheduler.SchedulerModule"/>.

5. Start eXist.

Table 16-1 lists the scheduled job arguments.

Argument

Description

Mandatory/optional

type

The type of the job to schedule. Either system for system task jobs or user for user jobs.

Mandatory.

class

Used for specifying the fully qualified class name of your Java class that implements either org.exist.scheduler.UserJavaJob or org.exist.storage.SystemTask.a

Mandatory if job is implemented in Java.

xquery

Used for specifying the database path to an XQuery if you have implemented your scheduled job in XQuery. The syntax uses the simple database path (e.g., /db/my-collection/my-job.xq).

Mandatory if job is implemented in XQuery.

name

When scheduling an XQuery job, you can provide a friendly name for the scheduled job, but it must be unique across all scheduled jobs. Java jobs implement their own name function.

Optional if XQuery job, and otherwise ignored. The default is the name of the XQuery file.

unschedule-on-exception

When you’re executing an XQuery job, if the job causes some sort of exception it can be unscheduled so that it does not run again in the future. Must be either yes or no.

Optional if XQuery job, and otherwise ignored. The default is yes.

cron-trigger

A description of when the scheduled job is run, using a cron-like syntax. For the exact syntax, see http://www.quartz-scheduler.org/documentation/quartz-1.x/tutorials/crontrigger.

Mandatory, or use period instead.

period

A period in milliseconds defining the frequency after any delay with which the job is run.

Mandatory, or use cron-trigger instead.

delay

A startup delay in milliseconds after which the scheduled job is first run. If unspecified, the job will be executed immediately after eXist has initialized.

Optional; use with period.

repeat

The number of times to repeat execution of the job. If unspecified, the job will be executed periodically indefinitely.

Optional; use with period.

Table 16-1. Scheduled job arguments

aYour Java class must be available on eXist’s classpath, which you typically accomplish by placing your JAR and any JAR dependencies into $EXIST_HOME/lib/user.

You may view the currently scheduled and executing jobs either by using the Scheduler dashboard application (Figure 16-4) or by executing the scheduler:get-scheduled-jobs XQuery extension function.

Figure 16-4. Dashboard Scheduler application, showing scheduled jobs

XQuery Jobs

Writing user jobs in XQuery that are executed periodically by the scheduler is not much different from writing normal XQuery. The scheduler can execute only XQuery main modules—that is, it cannot directly call a specific XQuery function—but this should not be a problem as you can always write a one-line XQuery main module that acts as a wrapper for your library function.

Sometimes you need to be able to parameterize an XQuery that the scheduler will execute, perhaps with some configuration settings. The scheduler fits well with the standard mechanism for passing external configuration into XQuery, which is to use variable declarations with external binding. In the configuration of the XQuery scheduled job, you may set parameters, each of which will attempt to bind to the equivalently named external variable declared in your XQuery. By default, the external variables are expected to be bound to the local namespace of the XQuery and use the binding prefix local. If you wish to change this, you can use the special parameter bindingPrefix in the configuration of the XQuery scheduled job.

Scheduled weather retrieval (XQuery)

Supplied alongside this chapter is an XQuery file called weather.xq, in the chapters/advanced-topics folder of the book-code Git repository (see “Getting the Source Code”). This XQuery has been designed as a simple example of what you can achieve with a scheduled XQuery job. The XQuery simply connects to a public web service and downloads the current weather for a particular city, parses the results, and stores them into the database. By scheduling this XQuery, you can build up a dataset of weather over time, which you can then later query to understand how the weather changed.

To use the example, you must store the query into the database (for example, at /db/weather.xq), set it as executable by the guest user, create a collection for storing weather data, and make that writable by the guest user. You then need to add the scheduled job configuration shown in Example 16-7 to $EXIST_HOME/conf.xml and restart eXist.

Example 16-7. Scheduled configuration for the weather example

<job type="user" xquery="/db/weather.xq" name="hourly-weather"

cron-trigger="0 0 0/1 * * ?">

<parameter name="city" value="Exeter"/>

<parameter name="country" value="United Kingdom"/>

<parameter name="weather-collection" value="/db/weather"/>

</job>

This scheduler configuration will cause the XQuery /db/weather.xq to be executed every hour.

Alternatively, you could schedule it using the scheduler:schedule-xquery-cron-job XQuery extension function, as shown in Example 16-8.

Example 16-8. Immediate scheduling for the weather example

scheduler:schedule-xquery-cron-job(

"/db/weather.xq",

"0 0 0/1 * * ?",

"hourly-weather",

<parameters>

<param name="city" value="Exeter"/>

<param name="country" value="United Kingdom"/>

<param name="weather-collection" value="/db/weather"/>

</parameters>

)

The code for the weather.xq scheduled weather web service query is listed in Example 16-9.

Example 16-9. The weather web service scheduled query (weather.xq)

xquery version "3.0";

import module namespace http = "http://expath.org/ns/http-client";

import module namespace util = "http://exist-db.org/xquery/util";

import module namespace xmldb = "http://exist-db.org/xquery/xmldb";

declare namespace wsx = "http://www.webserviceX.NET";

(: Configuration :)

declare variable $local:city external; 1

declare variable $local:country external; 2

declare variable $local:weather-collection external; 3

let $webservice := "http://www.webservicex.net/globalweather.asmx/GetWeather",

$url := $webservice || "?CityName=" || encode-for-uri($local:city) ||

"&CountryName=" || encode-for-uri($local:country) , 4

$result := http:send-request(<http:request href="{$url}" method="get"/>) 5

return

let $doc := if($result[1]/@status eq "200" and $result[2]/wsx:string) then 6

(: reconstruct XML, the webservice provides it as a string

for some reason! :)

util:parse($result[2]/wsx:string/text()) 7

else

(: record failure :)

<failed at="{current-dateTime()}">{$result}</failed> 8

return

let $stored := xmldb:store($local:weather-collection, (), $doc) 9

return

(: log that we ran! :)

util:log("debug", "Stored hourly weather to: " || $stored) 10

123

These externally bound variables are filled by the parameters to the scheduled job configuration.

4

We prepare the full URL for the weather web service call.

5

We make a call to the weather web service using the EXPath HTTP Client extension module (see http).

6

We check whether the web service call succeeded or failed.

7

If the web service call succeeded, we extract the weather data.

8

If the web service call failed, we prepare some failure data.

9

We store the result into the database.

10

We write a message to the logfile indicating that the task ran to completion.

WARNING

While you are typically interested in the final computed result of your XQuery, when your query is running within the scheduler there is nowhere for the result to be implicitly sent to. If you wish to understand or retain the result of your query, you are responsible for either explicitly storing it into the database from the query using the xmldb:store XQuery extension function or writing it to a logfile using the fn:trace or util:log XQuery functions.

Java Jobs

As noted previously, if you wish to write a scheduled job in Java, you can implement either a user job or a system task job. For each option, eXist provides appropriate interfaces and abstract classes to assist in your implementation. Once you have implemented the appropriate class, you need to place your compiled class onto eXist’s classpath, which you typically do by placing a JAR file of your code and any dependent JAR files into $EXIST_HOME/lib/user. Remember that you may have to restart eXist for the JVM class loader to see your new JAR files!

WARNING

When writing a Java job, you must be careful to return any brokers that you borrow from the broker pool you are provided with and ensure that you release any locks that you have taken on collections or resources. Failure to do so can reduce the connections available to the database and leave resources inaccessible until eXist is restarted. It is strongly recommended to use the Java try/catch/finally pattern to release any acquired resources in the finally block. In addition, you should be aware that you cannot keep state in member variables of your Job class across invocations; instead, you should keep any nontransient state in a singleton (remember to synchronize access where needed) or store it into the database.

Java user job

Most often, developers will want to implement user jobs, which you do by extending the abstract class org.exist.scheduler.UserJavaJob. When extending this class, you must implement the following job naming, defined in org.exist.scheduler.JobDescription:

/**

* Set the name of the job.

* After being set, the job should return this name.

*

* @param name The job's new name

*/

public String getName();

/**

* Get the name of the job.

* If a name has not yet been set, you must create one!

*

* @return The job's name

*/

public void setName(final String name);

These functions simply round-trip the name of the job. The job’s name itself must be unique across all job instances. For Java jobs, you simply need to create an initial name and then return that each time get is called, or store the incoming name when set is called and then return that.

You also need to provide the actual work of the job in the form of implementing the abstract function defined in org.exist.scheduler.UserJavaJob:

/**

* The actual work/task of the scheduled job.

* This function is called each time the job is executed by the scheduler

* according to its schedule.

*

* @param brokerpool The database BrokerPool.

* @param params Any parameters passed to the job, or null otherwise.

*

* @throws JobException You may throw a JobException to control the

* cleanup of the job and affect rescheduling in case of a problem.

*/

public abstract void execute(BrokerPool brokerpool, Map<String, ?> params)

throws JobException;

When your code is executing, should you encounter a problem or exception, you must catch it, and if you wish to end processing you may throw an org.exist.scheduler.JobException. The JobException class takes an enumerated value of typeorg.exist.scheduler.JobException.JobExceptionAction, which controls how the job is aborted by the scheduler and whether its schedule should be adjusted. The available job exception actions when aborting a running job are outlined in Table 16-2.

Action

Description

JOB_ABORT

Instructs the scheduler to stop the current job. It may still be executed in the future according to its schedule.

JOB_ABORT_THIS

Instructs the scheduler to stop the current job. It also removes from the scheduler the schedule that triggered this job.

JOB_ABORT_ALL

Instructs the scheduler to stop the current job. It also removes from the scheduler the schedule that triggered this job and all other schedules that refer to this job.

JOB_REFIRE

Instructs the scheduler to stop the current job. It also reschedules the job for immediate execution. The schedule that triggered this job remains scheduled.

Table 16-2. Job exception actions

Scheduled weather retrieval (Java)

An example class implementing UserJavaJob called exist.book.example.scheduler.user.WeatherJob is supplied with this chapter in the folder chapters/advanced-topics/scheduler-java-job/weather-user-java-job-example of the book-code Git repository (see “Getting the Source Code”). The example is an indirect port of the XQuery code in Example 16-9, and hopefully serves as a good comparison while showing that it is much simpler to implement a user job for the scheduler in XQuery than in Java (when possible). The example uses the Jersey client library for talking to the public weather web service.

To compile the example, enter the scheduler-java-job folder and run mvn package; the Java user job example can be found in the subfolder weather-user-java-job-example.

To deploy the WeatherJob to eXist, you need to create the collection /db/weather (with write access by the guest user), stop eXist, and copy all of the files from scheduler-java-job/weather-user-java-job-example/target/weather-user-java-job-example-1.0-assembly to$EXIST_HOME/lib/user. You can then schedule the WeatherJob class to run hourly in $EXIST_HOME/conf.xml by adding the following job definition to the scheduler configuration before restarting eXist:

<job type="user" class="exist.book.example.scheduler.user.WeatherJob"

name="hourly-weather" cron-trigger="0 0 0/1 * * ?">

<parameter name="city" value="Exeter"/>

<parameter name="country" value="United Kingdom"/>

<parameter name="weather-collection" value="/db/weather"/>

</job>

As an alternative to adding the WeatherJob to eXist’s configuration file, you could schedule nonpersistently by executing the following XQuery:

scheduler:schedule-java-cron-job(

"exist.book.example.scheduler.user.WeatherJob",

"0 0 0/1 * * ?",

"hourly-weather",

<parameters>

<param name="city" value="Exeter"/>

<param name="country" value="United Kingdom"/>

<param name="weather-collection" value="/db/weather"/>

</parameters>

)

Java system task job

It is unlikely that most developers will ever implement system task jobs, as you can meet the vast majority of use cases by instead implementing a user job. However, should you want to implement a scheduled system task job in Java, you do not actually need to implement the scheduler job aspect. Instead, you can just implement org.exist.storage.SystemTask, as eXist provides a generic system task job org.exist.scheduler.impl.SystemTaskJobImpl that enables any system task to be scheduled.

WARNING

Remember that when your system task runs, you have exclusive access to the database, as the database will be in protected mode. All other operations will be blocked until your task completes. Accordingly, you should ensure that your task executes quickly and efficiently!

When implementing org.exist.storage.SystemTask, you must implement the following configuration functions:

/**

* Called to configure the system task.

* Enables you to configure your system task!

* Note: If the system task is managed by the scheduler,

* this happens before scheduling.

*

* @param config A reference to the parsed in-memory

* representation of eXist's configuration file $EXIST_HOME/conf.xml.

* @param properties A property set containing any parameters passed

* to the scheduled job.

*

* @throws EXistException You may throw this to abort configuration of

* the system task. If the task is managed by the scheduler,

* it will not be scheduled.

*/

void configure(Configuration config, Properties properties)

throws EXistException;

/**

* Indicates whether a checkpoint should be generated

* before the execute function is called. A checkpoint

* guarantees that any outstanding changes are flushed to

* persistent storage.

*

* @return true if a checkpoint should be generated, false otherwise

*/

boolean afterCheckpoint();

You also need to provide the actual work of the system task job in the form of implementing the function:

/**

* Called when the system task is executed in protected mode.

* Constitutes the work unit of the system task.

*

* @param broker A database broker that can be used to access the database.

*

* @throws ExistException You may throw this to indicate an error and abruptly

* abort executing of the system task. If the task is managed by the

* scheduler, this will not affect its future schedule.

*/

void execute(DBBroker broker) throws EXistException;

WARNING

System tasks do not need to acquire locks on collections and documents, as they are operating in protected mode—in fact, they should not attempt to do so! Invalid management of locks in a system task can lead to deadlock situations.

If you wish to acquire collections and documents without locks, you can use the getCollection(XmldbURI) method on org.exist.storage.DBBroker and then subsequently the collectionIteratorNoLock(DBBroker) and iteratorNoLock(DBBroker) methods on org.exist.collections.Collection.

Database stats scheduled system task

An example class implementing SystemTask called exist.book.example.scheduler.system.StatsSystemTask is supplied in the folder chapters/advanced-topics/scheduler-java-job/stats-system-task-example of the book-code Git repository (see “Getting the Source Code”). The example simply generates statistics about the current content of the database and stores them into a new document in a configured database collection. When the system task is scheduled, this allows you to build a collection of statistics about the content of the database over time, which could potentially be used by other XQueries to generate reports and/or graphs about database usage. The use of a system task here allows us to guarantee that our statistics represent the exact state of the database at a particular point in time, due to execution happening while the database is in protected mode.

To compile the example, enter the scheduler-java-job folder and run mvn package; the system task job example can be found in the subfolder stats-system-task-example.

To deploy the StatsSystemTask to eXist, you need to create the collection /db/stats (the permissions are not important, as each system task will be executed as the SYSTEM user), stop eXist, and copy all of the files from scheduler-java-job/stats-system-task-example/target/stats-system-task-example-1.0-assembly to $EXIST_HOME/lib/user. You can then schedule the StatsSystemTask class to run hourly in $EXIST_HOME/conf.xml by adding the following job definition to the scheduler configuration before restarting eXist:

<job type="system" class="exist.book.example.scheduler.system.StatsSystemTask"

name="hourly-stats" cron-trigger="0 0 0/1 * * ?">

<parameter name="stats-collection" value="/db/stats"/>

</job>

As an alternative to adding the StatsSystemTask to eXist’s configuration file, while you cannot schedule immediately using the XQuery scheduler extension module, you can instead trigger the system task for almost immediate execution by using the system:trigger-system-task function from the XQuery system extension module (see system):

system:trigger-system-task(

"exist.book.example.scheduler.system.StatsSystemTask",

<parameters>

<param name="stats-collection" value="/db/stats"/>

</parameters>

)

TIP

While you cannot directly schedule a system task for execution from XQuery, a possible workaround is to create a stub XQuery in the database that simply calls system:trigger-task, and then schedule the execution of that XQuery using either scheduler:schedule-xquery-cron-job or schedule-xquery-periodic-job.

Startup Triggers

Startup triggers are a very simple mechanism that enable you to implement a Java class that has exclusive access to the database during the database’s startup process. A startup trigger is executed as the final phase, after the database server has initialized but before it is made available for general use.

You may be wondering what you would use a startup trigger for. Typically, they are used for performing computed configuration or adjustments to the database when it is started. To illustrate their use, let’s briefly look at the three startup triggers that eXist provides for use during normal database startup:

Autodeployment

The trigger org.exist.repo.AutoDeploymentTrigger is used to install any new EXPath packages into the database that have been placed into $EXIST_HOME/autodeploy. This ensures that the latest packages are available when the database is started.

Message receiver

For the emerging database replication support in eXist, the trigger org.exist.replication.jms.subscribe.MessageReceiverStartupTrigger starts a JMS listener to listen to any incoming replication requests.

RESTXQ

The RESTXQ implementation in eXist uses the trigger org.exist.extensions.exquery.restxq.impl.RestXqStartupTrigger to load its resource function registry from the file $EXIST_HOME/webapp/WEB-INF/data/restxq.registry. This is done so that the RESTXQ implementation can reregister those resource functions that were previously registered before eXist was restarted. This ensures that resource functions remain available across database restarts. For further details, see “Configuring RESTXQ”.

Note that startup triggers are executed synchronously, and thus their executing will block the database startup. Depending on your use of eXist, you might want to avoid delaying the database startup any more than necessary. Startup triggers are effectively executed in protected mode, as there is only a single database broker available and it is provided to your startup trigger; hence, no other transactions will be taking place against the database while the triggers are executing. Just as with system tasks (see “Java system task job”), you need not worry about locking collections or resources.

When creating your own startup trigger, you must implement the execute method defined in org.exist.storage.StartupTrigger:

/**

* Synchronously execute a task at database startup before the database

* is made available to connections.

*

* Remember, your code within the execute function will block the database

* startup until it completes!

*

* Any RuntimeExceptions thrown will be ignored and database startup

* will continue. Database startup cannot be aborted by this trigger!

*

* Note: If you want an asynchronous trigger, you could use a future in your

* implementation to start a new thread; however, you cannot access the

* sysBroker from that thread as it may have been returned to the broker

* pool. Instead, if you need a broker, you may be able to do something

* clever by checking the database status and then acquiring a new broker

* from the broker pool. If you wish to work with the broker pool you must

* obtain this before starting your asynchronous execution by calling

* sysBroker.getBrokerPool().

*

* @param sysBroker The single broker available during database startup.

* @param params A parameter map of keys/values that provide any parameters

* given to the startup trigger configuration in $EXIST_HOME/conf.xml.

*/

public void execute(final DBBroker sysBroker,

final Map<String, List<? extends Object>> params);

Once you have your implementation, you need to place your compiled class onto eXist’s classpath; this is typically done by placing a JAR file of your code and any dependent JAR files into $EXIST_HOME/lib/user. Remember that you may have to restart eXist for the JVM class loader to see your new JAR files!

Configured Modules Example Startup Trigger

An example class implementing StartupTrigger called exist.book.example.startuptrigger.ConfiguredModulesStartupTrigger is supplied in the folder chapters/advanced-topics/startup-trigger of the book-code Git repository (see “Getting the Source Code”). The example examines all of the available XQuery extension modules written in Java that are configured in eXist (via $EXIST_HOME/conf.xml), extracts some details about each module, and stores all of the details into a document in the database at a configured location. The example is rather contrived and may be of little practical use; however, it clearly shows how to implement a startup trigger that both examines eXist’s configuration and writes a document into the database, which are useful and common tasks in their own right!

While the code inside a startup trigger executes in a manner akin to protected mode (i.e., there are no other transactions happening at the same time), the task performed by the ConfiguredModulesStartupTrigger need not necessarily be executed in this manner. This is because the available configured modules never change while eXist is running; they only change when we adjust eXist’s configuration file and restart eXist.

To compile the example, enter the startup-trigger folder and run mvn package.

To deploy the ConfiguredModulesStartupTrigger to eXist, you need to stop eXist and copy all of the files from startup-trigger/target/configured-modules-startup-trigger-1.0-assembly to $EXIST_HOME/lib/user. You can then configure theConfiguredModulesStartupTrigger class in $EXIST_HOME/conf.xml by adding the following job definition to the startup trigger’s configuration before restarting eXist:

<trigger

class="exist.book.example.startuptrigger.ConfiguredModulesStartupTrigger">

<parameter name="target" value="/db/modules-summary.xml"/>

</trigger>

Database Triggers

Database triggers in eXist are considerably different from the startup triggers we looked at in the previous section, and in many respects are similar to triggers found in relational database systems. Database triggers (or just triggers, as we will refer to them going forward) in eXist allow actions to be carried out in response to events before, during, or after various document and collection operations. You can implement triggers that can intercept and either reject, change, or take additional steps when an action is performed on the database.

For example, say you have two collections of documents, and that in one collection you wish to store (and update) documents that meet some criteria set out in documents in the other collection. If the documents do not meet these criteria, they should not be stored or updated. Such cross-document and even cross-collection validation can easily be achieved with a trigger. In this example, you would implement a trigger and configure it on the target collection. Your trigger would perform checks against the criteria collection before a new document was stored or an existing document updated. If those criteria were not met, it could return an error or throw an exception to abort the store or update operation.

Database triggers in eXist offer a huge amount of power to the developer, but remember that the trigger is called once for each operation that it is configured to listen for and is a blocking operation. Any trigger will have an impact on the time it takes to complete the requested database operation, so developers should try to avoid performing lengthy operations in triggers.

TIP

So what can you do if you have a lengthy operation (perhaps because it is computationally complex, or you need to talk to several external systems) that you wish to perform as a trigger, but you cannot afford the performance hit to that database operation?

Well, you could instead consider performing your database operations on a staging collection, using a scheduled job (see “Scheduled Jobs”) to carry out the task asynchronously in the background, and moving resources to a live collection periodically.

While you are most likely interested in creating your own triggers, eXist provides several triggers out of the box. It is useful to mention these, as you may wish to study them as examples:

XML CSV extraction

The trigger src.org.exist.collections.triggers.CSVExtractingTrigger offers the facility to split the text node of an element into multiple subelements during document storage. For example, you could transform the following element:

<value key="product_model">SomeName|SomeCode12345</value>

into:

<value key="product_model">

<product_name>SomeName</product_name>

<product_code>SomeCode12345</product_code>

</value>

The trigger takes two parameters:

separator

The character or string that separates the text values that you wish to split into multiple element text values.

path

An expression similar to a simple XPath expression, and a list of extractions to perform. You may provide as many path parameters as you wish.

For example, for you to achieve the preceding example split, the collection configuration for the CSV extracting trigger would look like:

<collection xmlns="http://exist-db.org/collection-config/1.0">

<triggers>

<trigger class="org.exist.collections.triggers.CSVExtractingTrigger">

<parameter name="separator" value="|"/>

<parameter name="path">

<xpath>/content/properties/value[@key eq "product_model"]</xpath>

<extract index="0" element-name="product_name"/>

<extract index="1" element-name="product_code"/>

</parameter>

</trigger>

</triggers>

</collection>

History

The trigger org.exist.collections.triggers.HistoryTrigger can be used to create an archive copy of a resource before it is deleted or overwritten. For details, see “Versioning”.

Replication

For the emerging database replication support in eXist, the trigger org.exist.replication.jms.publish.ReplicationTrigger publishes details of all database operations to a JMS topic.

RESTXQ

The RESTXQ implementation in eXist uses the trigger org.exist.extensions.exquery.restxq.impl.RestXqTrigger to compile XQuery modules after they are stored into the database. It then examines the compiled XQuery for resource functions, and any it finds are registered with the RESTXQ resource registry so that they may respond to incoming HTTP requests. By default this is configured for the entire database by means of being present in the collection configuration for /db. For further details, see “Configuring RESTXQ”.

Streaming Transformations for XML (STX)

STX is an alternative mechanism to XSLT that was designed specifically for performing transformations on a stream of XML events.

The trigger org.exist.collections.triggers.STXTransformerTrigger allows you to transform documents with STX transformation sheets when they are stored or updated. For details of the STX transformation sheet language, see the STX specification.

The STX transformer trigger relies on the Joost implementation of STX, so to use STX you need to download Joost 0.9.1 and place the joost.jar file into $EXIST_HOME/lib/user before restarting eXist. The trigger takes a single parameter, src, which points to your STXtransformation sheet. This may be either a path to a document in the database, or a URI from which the STX transformation sheet can be downloaded.

For example, the collection configuration for the STX transformer trigger may look like:

<collection xmlns="http://;exist-db.org/collection-config/1.0">

<triggers>

<trigger class="org.exist.collections.triggers.STXTransformerTrigger">

<parameter name="src" value="/db/my-transformation.stx"/>

</trigger>

</triggers>

</collection>

Versioning

The resource versioning facility in eXist uses the trigger org.exist.versioning.VersioningTrigger to support creating diffs of resources and tracking the revision history of documents. See “Versioning”.

XQuery trigger

The XQuery trigger, while itself written in Java, acts as a bridge to XQuery and is implemented in the org.exist.collections.triggers.XQueryTrigger class.

Upon receiving an event, it calls an appropriate function in a configured XQuery. Its purpose is to allow developers to implement their own triggers in XQuery as an easier alternative to Java. It is described in detail in “XQuery Triggers”.

Triggers in eXist may be implemented in either XQuery or Java. Implementing the triggers in XQuery is much simpler, but there are some operations that can currently only be implemented in Java. You can implement trigger actions before or after database operations performed for both documents and collections in either XQuery or Java, but you can only modify documents or examine the content of documents during store or update operations by using Java triggers.

Triggers that take action on database operations for documents are called document triggers; likewise, those that act on collection actions are called collection triggers. Triggers that modify documents or examine them as they are being stored are called document filtering triggers. Triggers are configured on a per-collection basis in the collection’s configuration document (collection.xconf), and like all aspects of collection configuration (see “Configuring Indexes” for related detail), triggers are inherited or overridden from their parent collection.

Caution

As triggers are configured in collection configuration documents, and these are inherited downward or overridden, if you define triggers in a collection configuration document—for example, on the collection /db/a/b—then any triggers defined in collection configuration documents on /db/a or /db will not be inherited by /db/a/b. If you define your triggers in a collection configuration document, you must also include any other triggers that you wish to use.

For example, the RESTXQ trigger is defined in the collection configuration for /db. Thus, if you define your own triggers in any other collection configuration documents and wish to use RESTXQ in those collections, you will also need to include a definition for the RESTXQ trigger in your lower-level collection configuration documents.

Whether implementing a trigger in XQuery or Java, you are required to implement a function for each event that you wish to act before or after. You may only reject changes to the database during before actions. Your function, which must follow a naming convention (see Table 16-3and Table 16-4 for documents and collections, respectively), will be called each time that event occurs within the database.

Event

XQuery function name

Java function name

Create document

before-create-document

beforeCreateDocument

after-create-document

afterCreateDocument

Update document

before-update-document

beforeUpdateDocument

after-update-document

afterUpdateDocument

Update document metadata

n/a

beforeUpdateDocumentMetadata

n/a

afterUpdateDocumentMetadata

Copy document

before-copy-document

beforeCopyDocument

after-copy-document

afterCopyDocument

Move document

before-move-document

beforeMoveDocument

after-move-document

afterMoveDocument

Delete document

before-delete-document

beforeDeleteDocument

after-delete-document

afterDeleteDocument

Table 16-3. Naming convention for document trigger events

Event

XQuery function name

Java function name

Create collection

before-create-collection

beforeCreateCollection

after-create-collection

afterCreateCollection

Copy collection

before-copy-collection

beforeCopyCollection

after-copy-collection

afterCopyCollection

Move collection

before-move-collection

beforeMoveCollection

after-move-collection

afterMoveCollection

Delete collection

before-delete-collection

beforeDeleteCollection

after-delete-collection

afterDeleteCollection

Table 16-4. Naming convention for collection trigger events

XQuery Triggers

When implementing an XQuery Trigger in eXist, you have two main options:

§ Store the XQuery into an XQuery library module in the database and reference it by URI in a parameter to the XQueryTrigger called url in the collection configuration.

§ Write the XQuery code into a parameter to the XQueryTrigger called query in the collection configuration.

While both options are available to the developer, we would advise taking the first approach and storing your XQuery code for your trigger into the database. This is the approach we will cover in this chapter. This enables you to store your code separately from your configuration, which means that you can reuse this trigger in multiple collection configurations. In addition, it means that you can test your trigger code independently by writing an XQuery that imports your trigger module and calls your functions.

WARNING

Storing an invalid XQuery trigger library module into the database may cause any collections for which it is configured to reject all database operations. This is because eXist will attempt to execute the query for each database operation, and an exception will be raised if the query is invalid, which will reject the operation.

Consequently, it is often better not to store your XQuery trigger into the same collection as that on which you are configuring the trigger. Otherwise, should you make a mistake in your trigger code, you may be unable to save your updated trigger. This problem occurs when you store an invalid XQuery trigger that is listening to the before-update-document event; it will attempt to execute the invalid query when you try to save your fixed trigger code, and as the existing query is invalid, it will reject your update. To resolve this you need to deconfigure the trigger in the appropriate collection configuration document, fix the trigger XQuery code, and then re-enable the trigger in the collection’s configuration.

To implement an XQuery trigger you need to simply implement one or more of the functions named in Tables 16-3 and 16-4. Your implementation of each of these functions must currently reside in the namespace http://exist-db.org/xquery/trigger.

If you are placing your XQuery trigger in a stored XQuery library module, it is recommended that your module have its own namespace, with the trigger functions just calling your own functions (see Example 16-10). The majority of these functions take a single argument, $uri (of type xs:anyURI), which provides you with the URI of the document or collection that has caused the trigger event to fire. The exceptions are the functions for copying and moving collection; these instead take two parameters, $src and $dst (both of type xs:anyURI), which describe the source and destination URIs, respectively, of the database operation on the collection. Table 16-5 lists the parameters for the XQuery trigger configuration.

Parameter

Description

Mandatory/optional

query

You can directly place your XQuery code in this parameter.

Mandatory, unless using url

url

You can provide a URI to your XQuery library module in the database that implements one or more functions from the http://exist-db.org/xquery/trigger namespace.

This approach is preferred over using the query parameter.

Mandatory, unless using query

bindingPrefix

If you wish to pass any parameters as external variables into your XQuery, you need to declare the prefix of the namespace (declared in your XQuery) that they should be bound to.

Mandatory if passing parameters to external variables

anything

You may pass any other parameter to your XQuery and it will be bound to the equivalently named external variable (in the namespace indicated by bindingPrefix), which is declared in your XQuery.

Optional

Table 16-5. Parameters for the XQuery trigger configuration

You will find an example of a simple XQuery trigger implemented in an XQuery library module in the file chapters/advanced-topics/simple-example-trigger.xqm of the book-code Git repository (see “Getting the Source Code”), and in Example 16-10. The trigger simply writes an entry to eXist’s logfile to record the fact that the trigger function was called when a new document was stored into a collection in the database.

Example 16-10. Simple XQuery trigger implemented in an XQuery library module

module namespace et = "http://example/trigger"; 1

declare namespace trigger = "http://exist-db.org/xquery/trigger"; 2

import module namespace util = "http://exist-db.org/xquery/util";

declare variable $et:log-level external; 3

declare function trigger:after-create-document($uri as xs:anyURI) { 4

et:log(("XQuery Trigger called after document '", $uri, "' created.")) 5

};

declare function et:log($msgs as xs:string+) 6 {

util:log($et:log-level, $msgs)

};

1

Namespace binding prefix of the XQuery library module.

2

Import of the XQuery trigger namespace, so that you may implement trigger functions in this namespace.

3

External variable will be bound to a parameter named log-level declared in the trigger configuration within the collection configuration document.

4

Your implementation of the after-create-document function. This must be in the trigger namespace!

5

Call to your module’s functions that provide the actual processing.

6

Implementation of your business logic.

Example 16-10 shows how to implement a simple XQuery library module that implements the after-create-document function that eXist will call when a document is stored into a collection on which the trigger has been configured. The collection configuration for that example might look like:

<collection xmlns="http://exist-db.org/collection-config/1.0">

<triggers>

<trigger class="org.exist.collections.triggers.XQueryTrigger">

<parameter name="url"

value="xmldb:exist:///db/simple-example-trigger.xqm"/> 1

<parameter name="bindingPrefix"

value="et"/> 2

<parameter name="log-level" value="INFO"/> 3

</trigger>

</triggers>

</collection>

1

The URI to the XQuery library module in the database that provides the XQuery trigger

2

The binding prefix, which must be the same as the XQuery library module’s namespace prefix

3

A parameter that is passed into the XQuery library module by binding to the external variable named $example:log-level

To install the example:

1. Store the XQuery trigger into the database in a document located at /db/simple-example-trigger.xqm.

2. Create the collection /db/test-trigger.

3. Create the collection /db/system/config/db/test-trigger.

4. Store the collection configuration document (see the preceding code block) into the database in a document located at /db/system/config/db/test-trigger/collection.xconf. This configures the trigger on the /db/test-trigger collection.

To test the example:

1. Store any document into /db/test-trigger.

2. Check the logfile $EXIST_HOME/webapp/WEB-INF/logs/exist.log. If everything succeeded you should see a message in the log that looks similar to:

3. 2014-01-14 11:10:44,561 [eXistThread-30] INFO (LogFunction.java [eval]:150) -

4. (Line: 14 /db/simple-example-trigger.xqm) XQuery Trigger called after

document '/db/test- trigger/blah3.xml' created.

A more complicated example that shows how to react to multiple trigger events and send email notifications is provided in the file chapters/advanced-topics/journal-notification-trigger.xqm of the book-code Git repository.

Java Triggers

When implementing database triggers in Java for eXist, you need to implement the appropriate Java interfaces for the events that you wish to handle (see Figure 16-5). The events are split between document events and collection events, as described in Tables 16-3 and 16-4. The interface for document events is org.exist.collections.triggers.DocumentTrigger, while the interface for collection events is org.exist.collections.triggers.CollectionTrigger. It is perfectly possible to implement both interfaces in a single class if you wish to respond to both types of events.

NOTE

Omitted from the interfaces DocumentTrigger and CollectionTrigger are two methods, prepare and finish. These functions are deprecated and can safely be ignored; in fact, they were removed after the eXist 2.1 release. They provide the old infrastructure for triggers and have been replaced by the before* and after* functions.

Once you have implemented the appropriate class you need to place your compiled class onto eXist’s classpath, which you typically do by placing a JAR file of your code and any dependent JAR files into $EXIST_HOME/lib/user. Remember that you may have to restart eXist for theJVM class loader to see your new JAR files!

Caution

When writing triggers in Java, you can assume that any DocumentImpl or Collection objects that you are given are already locked. However, if you open any other documents or collections you must be sure to lock them correctly and, most importantly, to release those locks when you are done with them. Failure to do so could lead to deadlocks in eXist. It is strongly recommended to use the Java try/catch/finally pattern to release any acquired locks in the finally block.

Figure 16-5. UML diagram showing Java trigger classes

An instance of your trigger class will be lazily instantiated for each collection it is configured for. When your trigger class is instantiated, your implementation of the configure method from org.exist.collections.triggers.Trigger will be called; you may use this function to read any parameters from the trigger’s configuration and set up any initial state in a thread-safe manner. Should you decide to store some state in member variables of your class, remember that the class instance is per collection, so the values of these variables will not be globally consistent.

Caution

Also be aware that the calls to the trigger methods (e.g., beforeStoreDocument) are not thread-safe. This means that more than one thread may be in any, or even all, of the event functions defined in your trigger class! If you wish to keep state, it is up to you to manage concurrent access to that state appropriately.

As they are simpler to implement than document triggers, we will look first at collection triggers, which will show you how to implement event functions. We will then look at document triggers, which have similar event functions and a whole lot more!

Java collection triggers

When you’re implementing Java collection triggers your class must provide implementations for all of the methods defined in org.exist.collections.triggers.Trigger and org.exist.collections.triggers.CollectionTrigger to compile. However, triggers were designed in such a way that you really need only fill out those methods that you wish to act upon.

The simplest collection trigger would only provide code for a single method, as shown in Example 16-11, where we log the username of a user creating a collection.

Example 16-11. Simplest collection trigger

package example;

import org.apache.log4j.Logger;

import org.exist.collections.triggers.CollectionTrigger;

import org.exist.collections.triggers.TriggerException;

import org.exist.storage.DBBroker;

import org.exist.storage.txn.Txn;

import org.exist.xmldb.XmldbURI;

import java.util.List;

import java.util.Map;

public class SimplestCollectionTrigger implements CollectionTrigger { 1

private final static Logger LOG =

Logger.getLogger(SimplestCollectionTrigger.class);

@Override

public void beforeCreateCollection(DBBroker broker, Txn txn, XmldbURI uri) 2

throws TriggerException {

LOG.info("User '" + broker.getSubject().getName() + 3

"' is creating the Collection '" + uri + "'...");

}

// Omitted: other empty function implementations here...

4

}

1

We must implement the CollectionTrigger interface.

2

We provide an implementation of the event method beforeCreateCollection.

3

Before the collection indicated by uri is created, we record the event in eXist’s logfile, attributing it to a specific user.

4

We must provide implementations of all the other functions from CollectionTrigger here for the code to compile; however, as we are not using them and they are easily generated by an IDE, we have omitted them for brevity.

Just like in previous examples, to have the trigger fired on the database events that you are interested in, you must add it to the collection configuration documents for those collections that you wish your trigger to act upon. An example configuration for theSimplestCollectionTrigger would look like:

<collection xmlns="http://exist-db.org/collection-config/1.0">

<triggers>

<trigger class="example.SimplestCollectionTrigger"/>

</triggers>

</collection>

TIP

Remember that you must configure your collection triggers on the parent of the collection for those collection events that you wish them to be triggered upon. For example, if you want to be made aware when a new collection is created in /db/myapp/data, then you must add your trigger to the collection configuration document for the collection /db/myapp/data. In this way, if a user were to create, copy, move, or delete the collection /db/myapp/data/some-collection, your trigger would receive the event. Likewise, it would receive the events for any descendant collections—for example, /db/myapp/data/some-collection/subcollection—as collection configuration is inherited!

No delete example collection trigger

An example class implementing CollectionTrigger called exist.book.example.trigger.collection.NoDeleteCollectionTrigger is supplied in the folder chapters/advanced-topics/java-database-trigger/nodelete-collection-trigger-example of the book-code Git repository (see “Getting the Source Code”).

The example is designed to show how database operations can be aborted by Java triggers. The trigger takes a blacklist of collection URIs as a parameter and prevents those collections from being deleted and optionally from being moved (indicated by another parameter). The trigger is able to prevent these collections from being deleted or moved by throwing a TriggerException during the before phase of the delete or move events, which causes eXist to abort the operation and report the exception.

To compile the example, enter the java-database-trigger folder and run mvn package.

To deploy the NoDeleteCollectionTrigger to eXist, you need to:

1. Compile the code as described previously, and then copy all of the files from java-database-trigger/nodelete-collection-trigger-example/target/nodelete-collection-trigger-example-1.0-assembly to $EXIST_HOME/lib/user.

2. Restart eXist so that it picks up the new JAR files.

3. Create the collections /db/data, /db/data/private, and /db/data/private/subcollection.

4. Create the configuration collection /db/system/config/db/data.

5. Configure the trigger in a collection configuration document in the database, which you should locate at /db/system/config/db/data/collection.xconf:

6. <collection xmlns="http://exist-db.org/collection-config/1.0">

7. <triggers>

8. <trigger

9. class="exist.book.example.trigger.collection.NoDeleteCollectionTrigger">

10.

11. <!-- whether to also prevent collection moves

12. for your blacklisted collections -->

13. <parameter name="treatMoveAsDelete" value="true"/>

14.

15. <!-- your blacklist: -->

16. <parameter name="blacklist" value="/db/data/super-secret"/>

17. <parameter name="blacklist" value="/db/data/private"/>

18. </trigger>

19. </triggers>

</collection>

To test the example:

1. Attempt to delete or move the collection /db/data/private/subcollection, /db/data/private, or /db/data/super-secret; you should find that it is now impossible.

2. Check the logfile $EXIST_HOME/webapp/WEB-INF/logs/exist.log. You should see the NoDeleteCollectionTrigger log messages that look similar to:

3. 2014-01-16 14:18:52,918 [eXistThread-32] INFO

4. (NoDeleteCollectionTrigger.java [beforeDeleteCollection]:97) -

Preventing deletion of blacklisted collection '/db/ data/private'.

Java document triggers

While our discussion in “Java collection triggers” focused on handling the before and after events for various database operations on collections, here we’ll look at handling events during the storage of XML documents. This allows us to modify a document dynamically as it is being stored. Of course, document triggers still have all of the before and after events that you would expect for database operations on documents, but implementing them is similar enough to implementing collection triggers that we need not discuss them further here. Document triggers also give you the ability to perform streaming validation and transformations on documents.

When you’re implementing Java document triggers, your class must provide implementations for all of the methods defined in org.exist.collections.triggers.Trigger and org.exist.collections.triggers.DocumentTrigger to compile.

The DocumentTrigger interface mainly varies from CollectionTrigger in that it also acts as a SAX (Simple API for XML) event handler by extending org.xml.sax.ContentHandler and org.xml.sax.ext.LexicalHandler. With SAX events, your trigger effectively sits in the middle of a pipeline, receiving SAX events from the parser (which is reading the document to store) and sending them on to the database for validation or storage (see Figure 16-6). If you choose to discard some of these events or generate new events, you are effectively modifying the incoming document for validation or storage.

Figure 16-6. Document trigger SAX pipeline

Having to implement ContentHandler and LexicalHandler brings some complexity to document triggers, so for convenience eXist offers the abstract class org.exist.collections.triggers.FilteringTrigger to reduce this. It is recommended that you always extendFilteringTrigger and never directly implement DocumentTrigger in your own triggers. FilteringTrigger provides default implementations of both ContentHandler and LexicalHandler by simply forwarding the SAX events to either validation or storage. If you are only interested in working with the before and after document events, by extending FilteringTrigger you need not ever worry about the SAX events. Conversely, if you are interested in the SAX events for the purpose of modifying the document, additional validation, or some other reason, by extending FilteringTrigger you can just override those SAX methods of interest, while the remainder will be handled correctly.

Caution

If you choose to override the SAX event methods in FilteringTrigger and you still want the event to be passed on to the database for validation or storage, then you must remember to call the equivalent method on the super class. If you do not call the method on the superclass, your trigger is actually discarding those events and they will never reach the database!

When dealing with SAX events in a document trigger you must recognize that the trigger is called twice, once during each of eXist’s two phases of storing a document:

Validation

As the entire document is being parsed, the generated SAX events are sent to your trigger, which is responsible for swallowing or forwarding them (with or without modifications). When the events are forwarded, they are sent to the validator, which ensures the resultant document is well formed and valid according to any configured schemas. If you choose to throw a SAXException (or any RuntimeException) from one of your SAX event handler methods, then you are effectively indicating to eXist that you consider the document to be invalid, which stops the validation phase and aborts the store process.

Storage

If the validation phase completes, eXist then enters the storage phase. The entire document is parsed for a second time, and the generated SAX events are again sent to your trigger, which is responsible for forwarding them (with or without modifications). When the events are forwarded, they are sent to the database storage engine, which is responsible for writing the document into the database. If you choose to throw a SAXException (or any RuntimeException) from one of your SAX event handler methods, you are signaling to eXist that there was a problem with the store process, and eXist will abort the store and roll back the transaction.

You can determine which phase your SAX event handler methods are being called in by calling the function isValidating from the super FilteringTrigger class. An interesting effect of having different validation and storage phases is that you can modify the document stream in a different manner in each phase. This offers some interesting possibilities, such as allowing the validation phase to pass, and then rewriting the document into another form before it is stored.

Consider the startElement method from a fictional implementation of FilteringTrigger shown in Example 16-12. This example shows you how you can drop elements, rename elements, and create new elements as the document is being stored.

Example 16-12. Event handling in a filtertrigger

@Override

public void startElement(final String namespaceURI, final String localName,

final String qname, final Attributes attributes) throws SAXException {

if(localName.equals("author")) {

//drop an element 1

} else if(localName.equals("color")) {

//rename an element

super.startElement(namespaceURI, "colour", 2

qname.replace("color", "colour"), attributes);

} else if(localName.equals("isbn")) {

//encapsulate an element

super.startElement(namespaceURI, "reference", "reference", null); 3

super.startElement(namespaceURI, localName, qname, attributes); 4

} else {

//keep other elements

super.startElement(namespaceURI, localName, qname, attributes); 5

}

}

1

We drop any element that is named author. We achieve the drop simply by not calling super.startElement.

2

We rename any element that is named color to colour. We do so by calling super.startElement but replacing the element name.

34

We encapsulate any element that is named isbn (and its following siblings) inside an element named reference. We do this by calling super.startElement to start a new element, then calling super.startElement for the current element.

5

We keep any other element by simply calling super.startElement for the current element.

For the trigger to actually work, though, we must also have a matching endElement method that balances the start and end of elements; otherwise, we will end up with a document that is not well formed. Such a matching endElement method implementation would look like:

@Override

public void endElement(final String namespaceURI, final String localName,

final String qname) throws SAXException {

if(localName.equals("author")) {

//drop an element

} else if(localName.equals("color")) {

//rename an element

super.endElement(namespaceURI, "colour",

qname.replace("color", "colour"));

} else if(localName.equals("isbn")) {

//encapsulate an element

1super.endElement(namespaceURI, localName, qname);

2super.endElement(namespaceURI, "reference", "reference");

} else {

//keep other elements

super.endElement(namespaceURI, localName, qname);

}

}

12

Note that when you encapsulate an element in another, the order of the generated events is reversed in the endElement method as compared to the startElement method. In other words, the isbn element has to be ended before the reference element, as the isbn element was started after the reference element.

Document triggers are incredibly powerful, and here we have barely scratched the surface of what is possible. However, to further assist you we have included a more complete example with this book.

Example filtering trigger

An example class extending FilteringTrigger called exist.book.example.trigger.document.ExampleFilteringTrigger is supplied in the folder chapters/advanced-topics/java-database-trigger/filtering-trigger-example of the book-code Git repository (see “Getting the Source Code”).

The example is designed to show how you can build a path to the current element even though you are processing a stream, and you can then use this path to make decisions about whether to remove an element. The example also shows how to pass in a significantly more complicated set of configuration parameters to the trigger from the collection configuration document. These parameters are then used for keeping a map of elements that should be renamed.

To compile the example, enter the java-database-trigger folder and run mvn package.

To deploy the ExampleFilteringTrigger to eXist, you need to:

1. Compile the code as described previously, and then copy all of the files from java-database-trigger/filtering-trigger-example/target/filtering-trigger-example-1.0-assembly to $EXIST_HOME/lib/user.

2. Restart eXist so that it picks up the new JAR files.

3. Create the collection /db/test-data.

4. Create the configuration collection /db/system/config/db/test-data.

5. Configure the trigger in a collection configuration document in the database, which you should locate at /db/system/config/db/test-data/collection.xconf:

6. <collection xmlns="http://exist-db.org/collection-config/1.0">

7. <triggers>

8. <trigger

9. class="exist.book.example.trigger.document.ExampleFilteringTrigger">

10.

11. <!-- paths to elements that should be dropped -->

12. <parameter name="drop" value="/a/b/c"/>

13.

14. <!-- map of elements that should be renamed -->

15. <parameter name="elements">

16. <rename from="d" to="e"/>

17. <rename from="e" to="d"/>

18. </parameter>

19. </trigger>

20. </triggers>

</collection>

To test the example:

1. Store the following document into /db/test-data:

2. <a>

3. <b>

4. <c>should be removed</c>

5. <c>should also be removed</c>

6. </b>

7. <c>should not be removed</c>

8. <d>should be renamed to e</d>

9. <e>should be renamed to d</e>

</a>

10.Open the actual document stored into /db/test-data. You should see that it instead contains something like the following:

11.<a>

12. <b/>

13. <c>should not be removed</c>

14. <e>should be renamed to e</e>

15. <d>should be renamed to d</d>

</a>

Internal XQuery Library Modules

As you know by now, XQuery modules come in two varieties, main modules and library modules. Main modules can be directly invoked, and execution begins at the query body. Main modules may import other library modules, but a complete XQuery may contain only a single main module. Library modules reside in a specific namespace and contain function and variable declarations grouped by that namespace. Library modules do not have a query body, and thus there is no way to directly execute a library module, but frameworks like RESTXQ (see “Building Applications with RESTXQ”) and eXist’s SOAP Server (see “SOAP Server”) are able to map HTTP requests onto specific library module function invocations.

eXist provides two types of library modules:

External

These are library modules written in XQuery. They follow the W3C XQuery specification for library modules and allow users to easily write modules in XQuery. For further information, see http://www.w3.org/TR/xquery/#dt-library-module.

Internal

These are library modules written in Java. In these modules, the internals of XQuery functions and variables are written in eXist’s host programming language (Java) but are callable from XQuery as though they were any other XQuery function or variable. This arguably follows the W3C XQuery specification for library modules, but deviates slightly as eXist does not require you to explicitly declare the functions as external functions in XQuery because it is able to perform the required static type analysis regardless. For further information, seehttp://www.w3.org/TR/xquery/#dt-external-function.

In this chapter we do not look at external modules, as they are not specific to eXist and there is already a great wealth of material on them, both in this book and other XQuery learning resources. Instead, we focus here on internal modules and how you can easily build your own using Java.

So perhaps first we should ask: why would we write a library module in Java as opposed to XQuery?

There is really only one valid reason to consider:

It cannot be done in XQuery!

You’ll need to turn to Java when it is impossible to solve your problem by putting other XQuery functions together. You most likely want to introduce a new and unique function. For example, W3C XQuery 1.0 has no functional capability for sending email, so you may wish to create a function that allows this (in fact, such an extension function is already included in eXist, as covered in mail).

Conversely, there are many reasons why you should not write a library module in Java as opposed to XQuery. Here are a few of the important ones:

Not understanding XQuery

It may be tempting to implement something in Java because you have not had as much experience with XQuery. Generally speaking, this is a bad plan, as you will be calling this Java from XQuery regardless—your time would most likely be better invested in learning more about XQuery. XQuery is really the thing that makes eXist so powerful, so you would be well advised to get to grips with it.

Performance

When you write an XQuery function to implement a specific piece of business logic or use case, it may call many other XQuery functions. Often people misunderstand how XQuery is executed in eXist and fear that this long function call chain is affecting performance, so they produce a single function in Java that can be called from XQuery, which does it all. In fact, eXist compiles all XQuery code down to Java function calls and caches the compiled form. It is much more likely that performance issues are caused by misconfigured indexes or collections. Even if you do find that some eXist XQuery functions are slow, it would be better to work on optimizing those so that all XQuery code benefits!

Deadlocks, memory leaks, and death

If you wish to interact with eXist from your own code, doing so from within Java is much harder than from within XQuery. From within Java, your internal module code is running directly inside eXist, so to talk to eXist you need to use its internal APIs. While this is absolutely possible, you must take great care to lock and unlock resources appropriately, and to free up any memory that you allocate. Failure to do so can quickly lock up the database and potentially crash eXist, and if you mismanage resources and transactions you may even corrupt yourdatabase!

If you still wish to implement an internal module, your implementation needs to implement the interface org.exist.xquery.InternalModule. As a convenience, eXist provides the abstract class org.exist.xquery.AbstractModule as a starting point; this greatly eases implementation.

NOTE

XQuery is a functional programming language, so its functions—including those that you implement in your internal modules—should really not cause side effects. However, sometimes with XQuery you have to allow side effects to be able to achieve the desired outcome. For example, the eXist xmldb module’s side effects allow you to change the state of the database. If you can avoid it, it is good practice to write your functions as transformations from their input to their output, without causing side effects!

Implementing a library module typically involves implementing at least two classes. The first, the module class, contains information about all the functions that the module provides. The others (one or more) are classes for each function that the module provides (although functions may also be grouped into classes if desired). Perhaps the easiest way to explain this is for us to dive straight in at the deep end and look at some code for our first internal module, a very simple Hello World module that provides a single function to XQuery (see Example 16-13).

Example 16-13. HelloWorld Java XQuery module

public class HelloModule extends AbstractInternalModule { 1

protected final static String NS = "http://hello"; 2

protected final static String NS_PREFIX = "h"; 3

private final static FunctionDef functions[] = { 4

new FunctionDef(HelloFunctions.FNS_HELLO_WORLD, HelloFunctions.class) 5

};

public HelloModule(Map<String, List<? extends Object>> parameters) {

super(functions, parameters); 6

}

@Override

public String getNamespaceURI() {

return NS;

}

@Override

public String getDefaultPrefix() {

return NS_PREFIX;

}

@Override

public String getDescription() {

return "Simple Hello World module";

}

@Override

public String getReleaseVersion() {

return "2.1"; 7

}

}

1

Our class implements InternalModule by extending AbstractInternalModule.

23

We define a namespace and namespace prefix for our module.

4

We define the functions that will form a part of this module.

5

We reference a single function, which will be a function of this module. This shows how modules and their functions are linked together: basically, the module has a static array of references to the functions that it provides, and it declares these by calling the constructor of thesuper class.

6

We call the constructor of the super class, passing in the array of functions that make up this module.

7

We have to return a string that describes which version of eXist this module became available in. This is just for documentation purposes and is not further processed.

To implement the actual “Hello World” function, which will be named hello-world, we need to extend the abstract class class org.exist.xquery.Function. To assist in this, eXist provides the abstract subclass org.exist.xquery.BasicFunction, which makes life much easier by dealing with the necessary mechanics for the XQuery profiler and extracting argument values that are passed to our function from the XQuery context. Almost all of the internal module functions already implemented in eXist extend BasicFunction, and we would recommend that you do the same unless you need more control over processing (which is unlikely in most use cases). So now that we have seen the preceding internal module implementation, which tells eXist about our hello-world function, let’s see how we actually implement the function in Example 16-14.

Example 16-14. HelloWorld Java XQuery function

public class HelloFunctions extends BasicFunction { 1

private final static QName qnHelloWorld = 2

new QName("hello-world", HelloModule.NS, HelloModule.NS_PREFIX);

//signature of our XQuery h:hello-world() function

public final static FunctionSignature FNS_HELLO_WORLD = 3

new FunctionSignature(

qnHelloWorld 4,

"Say \”hello world\”!" 5,

null 6,

new FunctionReturnSequenceType(

Type.DOCUMENT, Cardinality.ONE, "The hello!"

) 7

);

//standard constructor, which allows multiple functions to be

//implemented in one class

public HelloFunctions(final XQueryContext context,

final FunctionSignature signature) {

super(context, signature);

}

//called when the xquery function is executed

@Override

public Sequence eval(final Sequence[] args, 8

final Sequence contextSequence) throws XPathException {

final Sequence result;

//act on the invoked function name

if(isCalledAs(qnHelloWorld.getLocalName())) { 9

result = sayHelloWorld(); 10

} else {

throw new XPathException("Unknown function call: " 11

+ this.getName().toString());

}

return result; 12

}

//Constructs the in-memory XML document:

//* <hello>world</hello>

//@return The in-memory XML document

private Sequence sayHelloWorld() { 13

final MemTreeBuilder builder = new MemTreeBuilder(); 14

builder.startDocument();

builder.startElement(new QName("hello", HelloModule.NS, 15

HelloModule.NS_PREFIX), null);

builder.characters("world");

builder.endElement();

builder.endDocument();

return builder.getDocument(); 16

}

}

1

Our class implements org.exist.xquery.Function by extending BasicFunction.

2

We define the name of our XQuery function to be hello-world. You should always define this within the namespace of the internal module.

TIP

The standard way of naming functions and variables in XQuery is to use all lowercase letters and separate terms with a hyphen.

3

Here we define the function signature of our XQuery function, which includes the name, description (for documentation), any expected parameters, and the return type. Our function will have the signature h:hello-world() as xs:string.

4

The function signature includes the name of our function. We define the name as a separate variable for the purposes or referencing it later as a constant—for example, within the eval function.

5

The textual description of our function.

6

Any expected parameters for our function. Our hello-world function does not take any parameters, so we can use null here.

7

The return type and cardinality of our function. Our hello-world function will return a single XML document node.

8

Any parameters that our functions expect will be passed into the eval function as an array of Sequence objects.

9

As it is possible to encode more than one XQuery function in a single Java class that extends BasicFunction, we switch on the name of the function that was called from XQuery.

10

We call our business logic, which generates the “Hello World” XML document.

11

If we do not recognize which function was called, we throw an org.exist.xquery.XPathException. This is really just for completeness and should not ever be invoked, as eXist should not route unexpected XQuery function calls to us.

12

We return the results of our processing to the XQuery.

13

This is our isolated business logic, which will generate our “Hello World” XML.

14

We use eXist’s MemTreeBuilder to construct an XML document dynamically in memory.

15

We define the namespace of the XML document that we are producing.

TIP

If you’re constructing a custom XML document and there is otherwise no defined namespace to use, it is considered good practice to place the nodes of the document into the namespace of the module as opposed to the default namespace.

16

We return the XML document node of our constructed document.

Using the Hello World Module

As mentioned previously, the Java source code implementing the internal module called exist.book.example.module.internal.HelloModule is supplied in the folder chapters/advanced-topics/internal-module/hello-world-module-example of the book-code Git repository (see “Getting the Source Code”). The example covers both the simple hello-world function just discussed and a more complex say-hello function, which is described next. It is designed to show how relatively little bespoke code is required to implement a simple internal module.

To compile the example, enter the internal-module folder and run mvn package.

To deploy the HelloModule to eXist, you need to:

1. Compile the code as previously described, and then copy all of the files from internal-module/hello-world-module-example/target/hello-world-module-example-1.0-assembly to $EXIST_HOME/lib/user.

2. Add the following module definition to $EXIST_HOME/conf.xml, in the xquery/builtin-modules section:

3. <module uri="http://hello"

class="exist.book.example.module.internal.HelloModule"/>

4. Restart eXist so that it picks up the new JAR files.

To test the example:

1. Execute the following XQuery from either eXist’s Java Admin Client (see “Java Admin Client”) or eXide (see “eXide”):

2. xquery version "1.0";

3.

4. declare namespace h = "http://hello";

5.

h:hello-world()

6. You should see a result that looks similar to:

<h:hello xmlns:h="http://hello">world</h:hello>

While this example shows you how to write an extension function for XQuery in Java, it is very basic. To expand on this, we will look next at eXist’s Java model of the XDM types used in XQuery, and then study an example where we create a function that takes several parameters using these types and acts upon them.

Types and Cardinality

Where functions take arguments and return values as a result of their computation (and they should in the functional world), these arguments and return values have both a type and a cardinality. The type defines the variety of data that can be held (for example, a string or number), and the cardinality defines how many values of that type may be present. Types and cardinalities are defined in the W3C XQuery 1.0 and XPath 2.0 Data Model (XDM) specification. If you are an experienced XQuery developer, you’re most likely already familiar with this document; if not, as an implementer of an internal module you should have at least a basic understanding of these subjects. A very useful summary diagram of the type hierarchy in XQuery is available in the specification.

When implementing an internal module, you will be working with the XDM types in Java as opposed to XQuery. eXist has a Java class to model each of those XDM types. An understanding of how to map from an XDM type as used in XQuery to eXist’s Java type is essential to enable you to create functions that accept parameters and return values. All of eXist’s XDM Java types for atomic types are in the package org.exist.xquery.value. The type mappings are listed in Table 16-6.

XDM atomic value type

eXist’s Java class

Notes

item

Item

An interface.

xs:anyAtomicType

AtomicValue

An abstract class.

xs:untypedAtomic

UntypedAtomicValue

Internally represented using java.lang.String.

xs:anyURI

AnyURIValue

Internally represented using java.lang.String. Provides utility methods for converting to/from org.exist.xmldb.XmldbURI.

xs:base64Binary

BinaryValue

Internally represented using java.io.InputStream and java.io.OutputStream. Actual encoding/decoding is lazy, and uses either Base64BinaryValueType or HexBinaryType, respectively.

xs:hexBinary

xs:boolean

BooleanValue

Internally represented using boolean.

xs:dateTime

DateTimeValue

Internally represented using javax.xml.datatype.XMLGregorianCalendar.

xs:date

DateValue

xs:time

TimeValue

xs:gDay

GDayValue

xs:gMonth

GMonthValue

xs:gMonthDay

GMonthDay

xs:gYear

GYearValue

xs:gYearMonth

GYearMonthValue

xs:duration

DurationValue

Internally represented using javax.xml.datatype.Duration.

xs:dayTimeDuration

DayTimeDurationValue

xs:yearMonthDuration

YearMonthDurationValue

xs:string

StringValue

Internally represented using a composition of java.lang.String, int, and boolean.

xs:normalizedString

xs:language

xs:NMTOKEN

xs:Name

xs:NCName

xs:ID

xs:IDREF

xs:ENTITY

xs:QName

QNameValue

Internally represented using org.exist.dom.QName.

xs:float

FloatValue

Internally represented using float.

xs:double

DoubleValue

Internally represented using double.

xs:decimal

DecimalValue

Internally represented using java.math.BigDecimal.

xs:integer

IntegerValue

Internally represented using a composition of java.lang.BigInteger and int.

xs:nonPositiveInteger

xs:negativeInteger

xs:long

xs:int

xs:short

xs:byte

xs:nonNegativeInteger

xs:unsignedLong

xs:unsignedInt

xs:unsignedShort

xs:unsignedByte

xs:positiveInteger

Table 16-6. XDM atomic value type mappings

eXist has two Java implementations of each XDM node type. One is an in-memory implementation, called Memtree, that solely retains the nodes in memory and is useful for computed node construction. The other is a persistent Document Object Model (DOM) implementation that represents nodes that are stored in the database. The classes of each have mostly the same names but are maintained in different packages. The Java classes for the in-memory implementation of XDM node types are in the package org.exist.memtree, while the persistent DOM implementations are in the package org.exist.dom. See Table 16-7.

XDM node type

eXist’s Java class

node

NodeImpl

attribute

AttrImpl (DOM) / AttributeImpl (memtree)

comment

CommentImpl

document

DocumentImpl

element

ElementImpl

processing-instruction

ProcessingInstructionImpl

text

TextImpl

Table 16-7. XDM node type mappings

While all function parameters in eXist are sequences, the cardinality of these parameters is constrained in the definition of the function signature. eXist provides cardinality constants that model the occurrence indicators used for function parameters in the XQuery specification. These cardinality constants are defined in the class org.exist.xquery.Cardinality (see Table 16-8).

XQuery occurrence indicator

eXist’s cardinality constant

Cardinality.EXACTLY_ONE (when explicitly typed)/Cardinality.ZERO_OR_MORE (when not explicitly typed)

?

Cardinality.ZERO_OR_ONE

*

Cardinality.ZERO_OR_MORE

+

Cardinality.ONE_OR_MORE

Table 16-8. XQuery occurrence mappings

Function Parameters and Return Types

Now that we have an understanding of how XDM types are implemented by eXist in Java, we can consider how we might use these to accept parameters to our functions or return certain result types. When you’re implementing a function for an internal module, it helps to think of the function as a transformation from an array of sequences to a sequence. For example, the eval function of your BasicFunction will be passed a Java array of org.exist.xquery.value.Sequence objects and must either throw an org.exist.xquery.XPathException or return a Sequence. Sequences are also described in the XDM specification, and can basically be thought of as collections of zero or more atomic values and/or nodes. Each item in the Java array of Sequence objects represents an individual parameter that was passed to your XQuery function; even though these are Sequence objects, you will have already declared the type and cardinality of the parameters for your function in its FunctionSignature for the internal module.

Now let’s look at defining a function signature for a function that allows one person to say hello to many other people. This function potentially needs to know:

§ The name of the person who is saying hello.

§ The names of the people she is saying hello to.

§ We could also optionally allow the greeting to be customized so that instead of saying “Hello,” she could say Bonjour or use any other desired form of greeting.

If we imagine the function signature for such an XQuery function, it might look something like:

h:say-hello($greeter as xs:string, $greeting as xs:string?,

$visitors as xs:string+) as xs:string+

Such a function could be called like so:

xquery version "1.0";

declare namespace h = "http://hello";

h:say-hello("adam", "Hi" ("Erik", "Simon"))

and we might expect to see a result similar to:

("Adam says Hi to Erik", "Adam says Hi to Simon")

Now that we know what we want our XQuery function to do and we know what the signature should look like, we need to implement this in our internal module just as we did before (in Example 16-14) by defining another FunctionSignature:

private final static QName qnSayHello =

new QName("say-hello", HelloModule.NS, HelloModule.NS_PREFIX);

public final static FunctionSignature FNS_SAY_HELLO = new FunctionSignature(

qnSayHello,

"Say \"hello world\"!",

new SequenceType[] { 1

new FunctionParameterSequenceType("greeter",

Type.STRING,

Cardinality.EXACTLY_ONE,

"The greeter, i.e. the name of the person that is saying 'hello'."

),

new FunctionParameterSequenceType("greeting",

Type.STRING,

Cardinality.ZERO_OR_ONE,

"An optional greeting, if omitted then 'hello' is used."

),

new FunctionParameterSequenceType("visitors",

Type.STRING,

Cardinality.ONE_OR_MORE,

"The visitors, i.e. the names of the people that the greeter is "

+ "saying 'hello' to."

),

},

new FunctionReturnSequenceType(Type.DOCUMENT,

Cardinality.ONE,

"The hello!"

)

);

1

As our new function takes parameters, we define an array of org.exist.xquery.FunctionParameterSequenceType objects in our FunctionSignature. You can see each of the parameters named, its type and cardinality defined, and a description provided for documentation purposes.

Now that we have written a signature for our function, we can implement the actual processing of the function. We can do this within the same class of our h:hello-world function from Example 16-14 by simply checking for a different calling signature within our eval function, handling the parameters we are interested in, and then executing our business logic. For example:

@Override

public Sequence eval(final Sequence[] args,

final Sequence contextSequence) throws XPathException {

final Sequence result;

//act on the invoked function name

if(isCalledAs(qnHelloWorld.getLocalName())) {

result = sayHelloWorld();

} else if(isCalledAs(qnSayHello.getLocalName())) { 1

final String greeter = args[0].itemAt(0).getStringValue(); 2

final String greeting;

if(args[1].hasOne()) { 3

greeting = args[1].itemAt(0).getStringValue(); 4

} else {

greeting = "hello"; 5

}

final List<String> visitors =

new ArrayList<String>(args[2].getItemCount());

final SequenceIterator itVisitors = args[2].iterate(); 6

while(itVisitors.hasNext()) {

final String visitor = itVisitors.nextItem().getStringValue(); 7

visitors.add(visitor);

}

result = sayHello(greeter, greeting, visitors); 8

} else {

throw new XPathException("Unknown function call: " +

this.getName().toString());

}

return result;

}

/**

* Says a greeting to many people

*

* @param greeter The name of the person saying the greeting

* @param greeting The greeting to use

* @param visitors The visitors to say the greeting to

*

* @return A sequence of greetings, one for each visitor

*/

private Sequence sayHello(final String greeter, final String greeting, 9

final List<String> visitors) throws XPathException {

final Sequence results = new ValueSequence(); 10

for(final String visitor : visitors) {

final StringValue result =

new StringValue(greeter + " says " + greeting + " to " + visitor); 11

results.add(result); 12

}

return results; 13

}

1

We add a switch on the name of our new function.

2

From the first Sequence in the array, we extract the first item and get its string value. This is the value of our greeter parameter. Note that while indexes in sequences in XQuery start at 1, in Java they start at 0.

3

As our second parameter, greeting, is optional, we first check whether an xs:string value or an empty sequence was given.

4

If a value for the greeting parameter was given, we extract it.

5

If an empty sequence was used for the greeting parameter, we fall back to the default greeting of hello.

6

As our third parameter, visitors, is a sequence of one or more values, we obtain an iterator over the values.

7

We iterate over each visitor name from visitors and add it to our list of visitors.

8

We call our business logic, which generates a greeting for each visitor.

9

This is our isolated business logic, which will generate our greetings.

10

We create a new ValueSequence to hold each of the greetings that we wish to return to the XQuery.

11

We create a new StringValue, which represents an xs:string value to hold each of our greetings.

12

We add our greeting to the value sequence.

13

We return the value sequence, which now contains each of our greetings as strings. This is then returned to the XQuery by the eval method.

The source code for this example is included in the HelloModule code, as discussed earlier. To compile and deploy the module, see “Using the Hello World Module”. To test the example, execute the following XQuery from either eXist’s Java Admin Client (see “Java Admin Client”) or eXide (see “eXide”):

xquery version "1.0";

declare namespace h = "http://hello";

h:say-hello("Adam", (), ("Elisabeth", "David"))

The result of the query should look similar to:

Adam says hello to Elisabeth

Adam says hello to David

You can experiment with providing different values for the second argument to the function and observe how the results change.

We have now built an XQuery extension function in Java for our internal module that can both accept multiple parameters of varying cardinality and return a sequence of results. Every internal module extension function that is written for XQuery in eXist follows this same pattern.

There are many, many internal modules of extension functions already provided with eXist, the vast majority of which are described at a high level in Appendix A. When you are developing your own modules, these are excellent examples from which to learn. You can find their source code in the folders $EXIST_HOME/src/org/exist/xquery/functions and $EXIST_HOME/extensions/modules/src/org/exist/xquery/modules.

If you do choose to write your own internal module for eXist, we strongly recommend reading both “Developing eXist” and “Debugging eXist”, which will assist you with developing and debugging the Java code of your module running inside eXist.

Variable Declarations

While we have so far focused on defining functions within an internal module, you can also declare variables that live within the namespace of the internal module. This is most useful when your module wishes to expose a number of variables to XQuery that either represent some static constants or confer some configuration information.

You can define variable declarations in the constructor of your module class that extends AbstractInternalModule by calling the declareVariable method of the super class. For example, consider a module that provides mathematical constants:

public class MathConstantsModule extends AbstractInternalModule {

protected final static String NS = "https://math/constants";

protected final static String NS_PREFIX = "mc";

public MathConstantsModule(

final Map<String, List<? extends Object>> parameters) {

super(functions, parameters);

final Variable piApprox = 1

new VariableImpl(new QName("pi", NS, NS_PREFIX)); 2

piApprox.setValue(new FloatValue(22f / 7f)); 3

declareVariable(piApprox); 4

5

final Variable speedOfLightApprox =

new VariableImpl(new QName("speed-of-light", NS, NS_PREFIX));

speedOfLightApprox.setValue(new FloatValue(1f / 299792458f));

declareVariable(speedOfLightApprox);

//...further variable declarations omitted for brevity

}

//...module body omitted for brevity

}

1

We create a new variable by instantiating an instance of org.exist.xquery.VariableImpl.

2

The variable must, of course, be named, but that name must reside within the namespace of the internal module.

3

We set the value of the variable.

4

We declare the variable within the internal module.

5

We again create a variable, set its value, and declare it.

Module Configuration

You have probably noticed by now the parameters argument that is given to your internal module class’s constructor. So far we have largely ignored this, and we simply pass it on to the super class’s constructor as required. When you configure eXist to use your internal module in$EXIST_HOME/conf.xml, you can also specify configuration parameters inside the module declaration, and these parameters will be parsed, extracted, and passed in the parameters argument of your module’s constructor.

This is a rather simple configuration facility, and is the same used elsewhere for scheduled tasks (see “Java Jobs”, “Startup Triggers”, and “Java Triggers”). As well as those other mechanisms, which share the same configuration semantics, the xslfo module (see xslfo) makes use of such parameters and serves well as an example of how to do this for your own internal modules.

Developing eXist

The eXist development community is always open to new contributors, beginners or experts, from those who just want to fix a typo in the documentation to those who want to reengineer the core storage of the database. Whatever your level of expertise, all contributions are treated equally and follow the same process to reach acceptance. eXist makes use of the fork and pull GitHub model of collaborative development. Simply put, all contributors follow the same three steps:

1. Fork the eXist Git repository that you are interested in contributing to from https://github.com/eXist-db to your own GitHub user/organization.

2. Make your changes within your fork (preferably using git-flow).

If you are modifying Java code, you must run eXist’s test suite (see the test entry in Table 16-9) and check that there are no regressions.

3. When you are happy with your completed changes, you send a pull request via GitHub.

NOTE

When you are adding new source code folders or JAR files to eXist, it is important to make sure that each of the IDE project files is updated to support your changes before sending a pull request.

Now, to be clear, all pull requests to eXist are evaluated by at least one member of the eXist CDT (Core Development Team), each of whom has a responsibility to evaluate and merge pull requests in a timely fashion. Even the members of the CDT are not exempt from this process; they too must submit their changes by pull request and have them merged by a (different) member of the CDT. Nothing is ever merged into eXist without at least two people knowing about it and agreeing that it improves the status quo. For full details of the development process employed by eXist, see https://github.com/eXist-db/eXist#contributing-to-exist.

While simple bug fixes and updates are often obvious and easy to develop, more complicated bug fixes or features should be openly discussed on the eXist-development mailing-list. There are two main advantages to doing this:

Avoiding duplication

With each contributor communicating his intentions clearly, we can hopefully avoid any duplication of work, as it is possible otherwise that two people may be attempting to solve the same problem simultaneously!

Continuity

This ensures that proposed new features complement the community vision for eXist. It is very unlikely that a new feature would be rejected outright; however, there are often many ways to approach the same problem, and an open discussion between peers can often bring new insights!

Should you have any pressing development concerns, or need to chase a pull request, at the time of writing, the eXist CDT comprises Tobi Krebs, Wolfgang Meier, Leif-Jöran Olsson, Adam Retter, Dmitriy Shabanov, Joern Turner, Dannes Wessels, and Lars Windauer, all of whom should be contactable through the eXist-open and eXist-development mailing lists. The members of the CDT are not considered special in any way; they are simply people who have made many contributions to eXist over time and have a feeling for what eXist means. Anyone is very welcome to join the CDT if they are willing to invest time to review and merge pull requests in the longer term.

On a more technical level, it is perhaps pertinent to mention here that eXist is written almost entirely in Java, its XQuery parser is written in ANTLR v2, and it uses the Apache Ant build system (although there is an embryonic effort underway to migrate to Apache Maven). In addition, many add-ons for eXist (such as the dashboard and demo applications) are written in HTML, JavaScript, and XQuery. The documentation for eXist is entirely authored in DocBook v5.

You can use any IDE or other text editor that you wish to when developing eXist, but for convenience IDE project files can be found for NetBeans, IntelliJ, and Eclipse inside $EXIST_HOME.

WARNING

Each of the IDE projects is configured to build eXist, but note that the IntelliJ build configuration does not compile in the AspectJ aspects that eXist uses for database and XQuery execution security enforcement! Therefore, when eXist is run from IntelliJ it will run with very few security constraints and will not be suitable for testing database operations.

Building eXist from Source

As eXist is an open source project, it is fundamentally important that anyone should be able to download the source code and compile their own version of it. The developers of eXist have gone to great lengths to ensure that the build process is simple for all to use. Anyone can download the source code, compile it, and compare it with a released version of eXist to make sure they are the same and that some nefarious person or organization has not interfered with the software, which enables transparency. Another nice outcome of having an easy-to-use build system is that any user can compile eXist, for the purpose of either having the latest and greatest version in advance of the next release, or contributing fixes or features back as a developer.

The eXist source code repository was recently moved from its previous home on SourceForge to GitHub. It is laid out using the git-flow scheme; thus, all of the development for the next release of eXist takes place in the develop branch, which is the default branch for eXist. Themaster branch represents the latest stable release of eXist; however, to be certain which version of eXist you will be working with, it is simpler to use the correct tag. At the time of writing, the latest tagged release of eXist was eXist-2.1. You will need to have Git installed if you wish to pull the latest source code directly from GitHub; see http://www.git-scm.com to get an installer for your platform. From the same website, there are also various GUI clients, such as SourceTree.app and GitHub Client, available if you prefer a graphical interface. If you are interested in reading further details on how eXist is developed and even potentially contributing, see “Developing eXist”.

NOTE

Not all aspects of the eXist project are on GitHub, only the source code and issue tracker. The mailing lists and downloads of compiled releases remain at SourceForge for the time being. Whatever the infrastructure of the project, links to the latest locations will always be available from the eXist website.

The eXist source code is built with the Apache Ant build tool. eXist includes a copy of the Ant runtime in its $EXIST_HOME/lib/tools/ant folder so that you do not need to separately install it. eXist’s Ant build scripts are just a series of XML files and can be found in$EXIST_HOME/build.xml and $EXIST_HOME/build/scripts. However, rather than having you use them directly, eXist provides two executable scripts that run Ant with the appropriate build scripts: $EXIST_HOME/build.sh (for Unix/Linux/Mac platforms) and$EXIST_HOME/build.bat (for Windows platforms).

TIP

The settings for the build are configurable: take a look at $EXIST_HOME/build.properties. This is particularly useful when you are working in a corporate environment behind a proxy server, as eXist may attempt to download some resources as part of the build. You can configure this using the proxy settings in build.properties.

When executing the build script, you can provide one or more targets that describe which build action(s) you wish to take. There are many build targets available, some of the most useful of which are described in Table 16-9. The table is followed by Example 16-15, which demonstrates the typical sequence of building eXist from source code.

Target

Description

clean

Removes all compiled code. Useful when you wish to do a clean recompile.

clean-all

Similar to clean, but also deletes the database.

Use with care!

jar

Compiles just the eXist source code into JAR files.

extension-modules

Builds any extension modules that are defined and enabled in $EXIST_HOME/extensions/build.properties.

Can be used by itself to compile in new extension modules to an existing installation.

wrapper

Builds the Java Service Wrapper for eXist. See “Windows Linux and Other Unix”.

sign

Signs the built JAR files so that they may be used in a security-restricted environment such as Java Web Start or a restricted application server.

all

The default build target. Builds the eXist source code, the Java Service Wrapper, the betterFORM XForms extension, the extension modules, and EXPath package support.

rebuild

A useful shortcut when making changes and rebuilding; calls the targets clean and then all.

dist

Builds a distribution of eXist. The result is a folder in $EXIST_HOME/dist that can be distributed to other machines and installed.

There are also dist-zip and dist-tgz targets, which will create a ZIP file or tarball, respectively, under $EXIST_HOME/dist for you to distribute to other machines and install.

dist-war

Creates a WAR file in $EXIST_HOME/dist that can be deployed to any Java application server, such as Apache Tomcat.

installer

You can create installers for eXist, just like the binary releases provided on SourceForge. On completion the installers can be found in $EXIST_HOME/installer.

However, this takes a little more effort, as you need to install the supporting tools IzPack 4.3.5 and Launch4j. The paths to the supporting tools then need to be configured in $EXIST_HOME/build.properties.

app

Similar to installer, but specific to Mac OS X. Creates a self-contained application and packages it in an Apple disk image (as a .dmg file). On completion, the disk image can be found in $EXIST_HOME/dist.

There is also an app-signed target, which does the same as app but also signs the application. You will, however, need a valid Apple developer certificate installed for this to work. For further details, see https://developer.apple.com/support/technical/certificates/.

test

If you are making modifications to the eXist source code, it is essential to execute the test suite to ensure that you have not introduced any regressions. The output of the test suite can be found in the web page report located at$EXIST_HOME/test/junit/html/index.html.

Table 16-9. Useful eXist Ant build targets

Example 16-15. Typical sequence of building eXist from source code

git clone https://github.com/eXist-db/exist.git 1

git checkout tags/eXist-2.1 2

./build.sh 3

1

Clone the eXist source code from GitHub. If you are planning to contribute, you should first fork the repository and then clone your own fork.

2

Check out the eXist-2.1 release tag from the repository. You can view all available tags by running git tag -l. If you wish to track the latest version of eXist, you need not check out a tag; instead, running git branch -a should show that you are on the develop branch.

3

Build the eXist source code. By default, this builds the all target.

TIP

Remember that while it is, of course, possible to make distributions and installers for eXist from the source code, you can also work with eXist in place. You do so by checking out the source code, building it, and then running it directly by using $EXIST_HOME/bin/startup.sh (or $EXIST_HOME/bin/startup.bat on Windows), or even installing eXist as a service (see“Installing eXist as a Service”). A major advantage of this approach is that you can easily update to a newer version of eXist by using Git to pull changes if you are tracking the develop branch, or by checking out a newer release tag when it becomes available.

Make sure to back up your config and database before switching branches with Git!

Debugging eXist

If you are using one of the IDEs for which eXist provides project files (NetBeans, IntelliJ, and Eclipse), then these projects are already set up to enable you to debug either the eXist Java Admin Client or the eXist server. It is also worth remembering that you can debug the Java Admin Client in embedded mode, which can sometimes provide a simple mechanism for debugging the database core without your needing to run the full server.

However, if you are not using one of the supported IDEs or wish to debug eXist code that is running on a remote server, then your only real option is to use the Java Debugging Wire Protocol (JDWP). It is also worth mentioning that each of the supported IDEs functions as an excellent debugger when you’re debugging eXist remotely. JDWP supports using a TCP/IP socket to communicate between the application you are debugging and the debugger on all platforms; depending on how it’s configured, you can also run this across a network. Between the application that you wish to debug and the debugger, JDWP can work in either direction. That is, you can start up the JVM running your Java application that you wish to debug as either:

§ A JDWP server that will listen for connection requests from a debugger

§ A JDWP client that will connect to a remote debugger

To enable JDWP, where your Java application offers a JDWP server, pass the following options to the JVM when you start your Java application:

-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=127.0.0.1:4000

To enable JDWP, where your Java application connects using JDWP to a debugger, pass the following options to the JVM when you start your Java application:

-agentlib:jdwp=transport=dt_socket,server=n,suspend=y,address=127.0.0.1:4005

The suspend parameter may be set to either y or n. When set to y (yes), it will cause your Java application not to start running within the JVM immediately, but to wait until a debugger connects to the application in server mode, or the Java application connects to the debugger in client mode. This can be very useful with applications like eXist where you may wish to debug the database startup process.

The address parameter will cause the JVM to listen on a specific IP address and TCP port for JDWP requests in server mode, or to connect to a debugger listening on that specific address and port in client mode. If you use the localhost address of 127.0.0.1, then the debugger must be running on the same machine. If you wish to debug across the network, you need to specify the IP address of the server’s network interface in server mode (or you can omit the IP address entirely to listen on all server addresses), or the client’s IP address in client mode.

We have discussed the JDWP settings in the context of any Java application, as eXist can be set up to run in many different ways, and you may need to add these options to whichever mechanism you are using to start eXist on your local machine or server. For further details, seehttp://docs.oracle.com/javase/7/docs/technotes/guides/jpda/conninv.html#Invocation. However, if you are using the $EXIST_HOME/client.sh and/or $EXIST_HOME/startup.sh scripts to start eXist, then you can simply uncomment the line that starts DEBUG_OPTS near the top of those files to have JDWP enabled in server mode. Remember that you will have to restart eXist for these changes to take effect.

Remote debugging with the NetBeans IDE

While any IDE or client that supports JDWP can be used as a debugger against the Java code that makes up eXist, here we show how you can use the NetBeans IDE to connect to eXist running as a JDWP server. Before continuing you must start eXist running as a JDWP server, as discussed earlier.

eXist ships with project files for NetBeans, so you can simply go to the File→Open Project menu item in NetBeans and select your $EXIST_HOME folder. Once the project has loaded, you need to attach the NetBeans debugger to the eXist JDWP server by choosing theDebug→Attach Debugger menu item (see Figure 16-7).

Figure 16-7. NetBeans: opening the Attach dialog

This will bring up the Attach dialog box. Assuming that eXist is running on the same machine as NetBeans and that you have used the default TCP port that eXist’s JDWP settings are configured for (4000), you should set the following options in the dialog (see Figure 16-8):

Debugger

Java Debugger (JPDA)

Connector

SocketAttach (Attaches by socket to other VMs)

Transport

dt_socket

Host

localhost

Port

4000

Figure 16-8. NetBeans: the Attach dialog

After you click the OK button, NetBeans will attempt to connect its debugger to eXist. All being well, you should initially see a confirmation that the debugger has connected to eXist and a list of the running threads that make up eXist, as shown in Figure 16-9.

Figure 16-9. NetBeans: confirmation of attachment and list of running threads

From here you can perform all the typical Java debugging steps, such as setting break-points in the eXist code, showing variable values, getting stack dumps, and stepping through the running code, stack frame by stack frame. Java debugging in itself is a huge and advanced topic ,and we do not presume to teach it here; however, hopefully if you are an aspiring or advanced Java developer, we have provided you with the information that you need to get started with debugging eXist’s code base.