Preface - eXist: A NoSQL Document Database and Application Platform (2015)

eXist: A NoSQL Document Database and Application Platform (2015)

Preface

Welcome

Welcome, dear reader, to our book on eXist. Whether you have purchased, begged, borrowed, or stolen this book, we hope that you find its contents of great use when applied to solving your information management problems.

While it’s true that eXist has been around for some years now—in fact, for longer than many of the now popular NoSQL platforms—eXist has continued to innovate and evolve. eXist, while stable and widely used for many years, has now hit a milestone in its history where it can be considered “battle-worn”—a veteran, if you like (or as we like to say in software engineering, “mature”). We have considered writing a book on eXist for the past few years, but we now know that the time is right to share our knowledge with the world. Welcome eXist 2.0.

Who Is This Book For?

Perhaps we should first answer this question with another question: Who is eXist for?

eXist aims to meet the requirements of a wide user base, and therefore is probably the most feature-rich product in its class. eXist has been engineered over the years to meet the needs of users ranging from humanities students and professors undertaking interesting linguistic projects, to large international publishers working with millions of documents, to developers wishing to rapidly create document- and data-driven web applications, and most cases in between.

This book aims to meet the needs of a wide audience: from tinkerers, students, professors, and information managers right up to software engineers. This book assumes that you wish to learn and use eXist; if not, you may have bought the wrong book! No familiarity with eXist is assumed; we start with the basics and progresses to more complicated topics. This book does not set out to teach XML, XPath, XQuery, XSLT, XForms, or any of the other XML technologies. While of course you may gain an understanding of them from this book, there are other books and online resources available that focus on these topics as their raison d'être. We assume that you have a working knowledge of, or access to learning resources for, XML technologies.

As always, beginners should start at the beginning, while those who already have some experience with eXist may find new insights in Chapters 4 to 6 onward. We hope you will find the book an excellent reference resource.

Should you be looking for books on XML technologies, in our experience and from the feedback of colleagues and beginners we have met, it is a good idea to have a copy of XQuery by Priscilla Walmsley (O’Reilly) at hand, as XQuery is the predominant language used for working with eXist. For further useful resources, see “Additional Resources”.

Conventions Used in This Book

The following typographical conventions are used in this book:

Italic

Indicates new terms, URLs, email addresses, file- and pathnames, database collections, and file extensions.

Constant width

Used for program listings, as well as within paragraphs to refer to program elements such as variable or function names, module names, data types, environment variables, statements, and keywords. Also used for commands and command-line output, database user and group names, and permission modes.

Constant width italic

Shows text that should be replaced with user-supplied values or by values determined by context.

$EXIST_HOME

While $EXIST_HOME typically follows the Unix-like syntactical expression of an environment variable, it is used throughout the book to refer to the location where you have installed eXist, whether that be on a Windows/Linux/Mac or any other type of system. The corresponding expression for referencing the equivalent environment variable on Windows platforms would be %EXIST_HOME%.

TIP

This element signifies a tip or suggestion.

NOTE

This element signifies a general note.

Caution

This element indicates a warning or caution.

XQuery Filename Conventions

The XQuery specification as published by the W3C does not define a particular filename extension for XQuery files. The specification, however, does define two different types of XQuery module:

XQuery main module

A main module is defined as having a query body. Simply put, this means that an XQuery processor can directly evaluate the XQuery code in this file.

XQuery library module

A library module does not have a query body and must start with a module declaration. Again, simply put, this means that an XQuery processor cannot directly evaluate a library module; rather, the library module must be directly or indirectly imported into a main module.

As a result, there has been a proliferation of different filename extensions used for XQuery files, including .xq, .xql, .xqm, .xqy, .xql, .xqws, and .xquery. Each XQuery implementation vendor, and even individual XQuery developers, seem to have their own ideas about XQuery file naming. Some projects differentiate between main and library modules by using two different file extensions, but which two is entirely inconsistent across projects. Other projects opt to use a single file extension and apply it to both main and library modules. This proliferation of different file extensions can be disorienting and leads to confusion when you’re approaching an existing code base.

eXist recognizes and supports XQuery files with any of the aforementioned file extensions, and will load and store them correctly into its database as XQuery. However, we believe that such an accumulation of different file extensions for what is effectively one or two (main and library) types of file is ridiculous and raises the barrier to truly reusable and portable XQuery code within projects, between projects, and across XQuery implementations.

This book takes the strong opinion that the following XQuery file extension convention should be used by at least all users of eXist, if not all XQuery developers:

.xq

The .xq filename extension is to be used for all main modules.

.xqm

The .xqm extension is to be used for all library modules. The m suffix in the file extension indicates that the XQuery module starts with a module declaration and is therefore a library module.

This convention is justified by the following points:

§ The ability to differentiate between main modules and library modules at the file level proves very useful within a large project. Especially if you are new to the project, you can easily and quickly locate the main entry points of the application.

§ This is not yet another new convention (standard); this is already the convention in at least one other project outside of eXist.

§ It is backward compatible with various approaches that have been adopted by eXist community members in the past.

Accompanying Source Code

Many of the code examples provided in the book and example programs that are discussed in the book are publicly available from GitHub at https://github.com/eXist-book, where we currently provide two repositories:

https://github.com/eXist-book/book-code

This encompasses all of the code that accompanies the book (i.e., XQuery, XSL-FO, XSLT, XForms, XML, Java, and Python), except for the examples discussed in Chapter 3.

For convenience, build scripts are included so that the majority of examples can be compiled into an EXPath Package file (see “Packaging”) that can be easily deployed into eXist, and the Java projects can be compiled into JAR files for use with eXist or from the command line.

https://github.com/eXist-book/using-exist-101

This is provided as a reference for the tutorials set out in Chapter 3. It is deliberately kept separate from the other code examples, as we felt that you would benefit more from following the tutorials and entering the code manually while considering each line of code that you are writing.

This repository is structured as an eXist backup. To restore the backup, see “Backup and Restore”.

Getting the Source Code

With either of our two GitHub repositories, to get a copy of the source code you need to ideally have Git installed. If you do not wish to install Git, it is also possible from the GitHub repositories to download a ZIP or compressed TAR file of the source code. However, using Git is recommended, as it will allow you to easily update the source code in the future, should we make any corrections or additions.

Assuming that you have Git installed (if you are on a Windows platform, we will assume that you are using Git Shell), from your Unix/Linux/Mac terminal (or your Windows Git Shell), you can run the following to clone (make a copy of) our repositories:

$ mkdir exist-book

$ cd exist-book

$ git clone https://github.com/eXist-book/book-code

$ git clone https://github.com/eXist-book/using-exist-101

You now have a clone of each repository. In the future, should you wish to pull in any updates we have made, you can simply run:

$ cd exist-book/book-code

$ git pull

$ cd ../using-exist-101

$ git pull

Building and Deploying

Now let’s look at how you build and deploy the code from the book-code repository.

The book-code repository contains the following top-level folders:

build-parent

This folder contains the build configuration that is inherited by each project.

build-parent-java

This folder contains the build configuration that is inherited by each of the Java projects.

chapters

This folder contains subfolders for each chapter of the book where example code is provided.

xml-examples-xar

This folder contains the build configuration for building an EXPath package.

Building everything

We use the Apache Maven build tool for compiling all of the projects that accompany the book. Therefore, to make the most of the example code that goes along with the book, you will also need to download and install Maven. Maven, like eXist, requires Java; if you do not already have Java installed you can download either Java 6 or 7 from http://java.oracle.com. Each pom.xml file that you see in the code is a Maven project file that describes how to build the code and resolves any dependencies that are required.

If you wish to build all of the code projects that accompany the book in one step, you can simply run the following commands from your terminal (or Git Shell on Windows):

$ cd book-code

$ mvn package

Building the EXPath package

If you wish to build just the EXPath package of the example XQuery, XSLT, XForms, and XML code that accompanies the book, you can simply enter the xml-examples-xar subfolder and run mvn package. To achieve this, we have used the excellent EXPath package Maven plug-in written by Claudius Teodorescu, which allows us to easily create a XAR file from a manifest (see the file xml-examples-xar/expath-pkg.assembly.xml) that describes the EXPath package.

The result of the Maven build process is the file exist-book-1.0.xar in the target sub-folder of xml-examples-xar. You can then deploy the package by either copying it to $EXIST_HOME/autodeploy, or using the dashboard app as follows:

1. Open up the eXist dashboard in your web browser, log in as admin, and click on the Package Manager tile.

2. Click on the upload application icon (in the top left of the screen; it looks like a stack of disks).

3. Browse to and select the exist-book-1.0.xar file and press the Submit button.

After installation, the sample code is available as another tile in the dashboard. It runs as a simple application that allows you quick access to running the examples.

See “The Dashboard” and “Packaging” for further information on working with the dashboard and EXPath packages.

Compiling the Java examples

The Java examples that accompany the book will also be built if you build everything, and the resultant artifacts will be placed into the target subfolders of each project. Each Java project example is discussed in detail in the relevant chapter later in the book. You can also compile the Java projects individually by running mvn package in the folder of each Java project. For example, if you wanted to build just the REST Server client examples, you would run:

$ cd book-code/chapters/integration/restserver-client

$ mvn package

Each Java example is designed to both educate and potentially serve as a skeleton for your own Java projects. By simply changing the groupId and artifactId of the project’s pom.xml file and including any additional required dependencies, you have a very quick mechanism to start building your own projects.

It is also worth mentioning that a ZIP or fat JAR file assembly is also created for many of the Java project examples, and this can be found in the appropriate target subfolder. A fat JAR file assembly is simply a JAR file that also contains all of the dependencies of the project, to allow you to have a single file artifact. So, for example, when you are compiling the restserver-client examples, the following assemblies are created:

§ restserver-client-query/target/restserver-client-query-1.0-example.jar

§ restserver-client-remove/target/restserver-client-remove-1.0-example.jar

§ restserver-client-retrieve/target/restserver-client-retrieve-1.0-example.jar

§ restserver-client-store/target/restserver-client-store-1.0-example.jar