Introducing N1QL - Couchbase Essentials (2015)

Couchbase Essentials (2015)

Chapter 5. Introducing N1QL

After two chapters of exploring views and MapReduce, you might be wondering just how you would go about finding data in your bucket more easily. If you've worked with relational systems, you're likely used to being able to query your database ad hoc without having to create a stored procedure first. Having to write a view for one-off queries of your data probably seems less than optimal.

Fortunately, there is another option with Couchbase and it is known as N1QL (pronounced nickel). N1QL is a query language, reminiscent of SQL. Not only does it support ad hoc querying of your data, it also provides a means to perform joins and aggregation.

At the time of writing this book, N1QL is still in developer preview, though some Couchbase SDKs are starting to see support for it. We'll explore N1QL in some detail throughout this chapter, but keep in mind that with any prerelease product some changes are likely. However, the core concepts and interface are unlikely to see any drastic changes.

Installing N1QL

Couchbase Server does not ship with N1QL. It needs separate downloading and installation. As of Developer Preview 3, there are binaries for Windows, Red Hat Linux, and Mac OS X. You can find these packages at http://www.couchbase.com/nosql-databases/downloads#PreRelease. The package for your operating system will be in the form of a zipped archive.

Tip

It's important to remember that N1QL is still in developer preview and is not necessarily being kept in sync with all current Couchbase releases. While N1QL Developer Preview 3 does appear to work fine on Windows with Couchbase Server Community Edition 3.0.1, it will not work with Couchbase Server Enterprise Edition 3.0.2. If you are unable to complete the following tasks using the Enterprise Edition, you should try the Community Edition instead.

Once you've obtained the binaries for your system, you'll need to extract the contents. Expand (unzip) the archive to any location where you'll easily be able to get to via the command line. After extracting the files, navigate to that path and run the following command:

./cbq-engine -couchbase http://localhost:8091

This line attaches N1QL to your Couchbase Server instance. It will start a process that will listen on port 8093. After it starts, you should see output similar to this:

22:19:08.343741 Info line disabled false

22:19:08.348746 tuqtng started...

22:19:08.348746 version: v0.7.2

22:19:08.348746 site: http://localhost:8091

Once the cbq-engine is up and running, you're ready to start running N1QL queries against your buckets. The easiest way to do so is to run the command-line query interface, which can be found in the same directory as cbq-engine. You can run this tool using the following command:

./cbq http://localhost:8093

This command will open the cbq prompt where you are able to enter N1QL commands. In this chapter, we'll work with the beer-sample bucket. To prepare that bucket for queries, run this command from the cbq prompt:

cbq> CREATE PRIMARY INDEX ON beer-sample

With this step, you've actually created a view on the beer-sample bucket. If you open the Couchbase Console and navigate to the Views tab, you'll find a view named #primary contained within a ddl_#primary design document. Note that this design document will only appear under Production Views:

function (doc, meta) {

var stringToUtf8Bytes = function (str) {

var utf8 = unescape(encodeURIComponent(str));

var bytes = [];

for (var i = 0; i<str.length; ++i) {

bytes.push(str.charCodeAt(i));

}

return bytes;

};

emit([160, stringToUtf8Bytes(meta.id)], null);

}

This relatively compact map function creates an index on document keys. You might be wondering why N1QL won't use a simpler map function, such as the following. After all, this map function also creates an index on document keys. The important distinction between N1QL queries and view queries is that view queries use the standard HTTP REST API and N1QL has its own query processor:

function (doc, meta) {

emit(meta.id, null);

}

At this time, N1QL is simply taking advantage of the fact that indexes may be created using views. This coupling may very well change at some point. So, though the underlying facility to create indexes is the same between N1QL and MapReduce views, the actual querying is quite different and requires a different index key structure.

Having a primary index on a database simply means that primary keys have been indexed for use with N1QL. Just as in a relational database, if you were to query regularly by certain fields, it would be important to create indexes to avoid unnecessary scanning through documents.

For this chapter, we'll worry only about the primary index. If you want to experiment with creating other field indexes, the syntax is as follows. In this example, we create an index named by_abv on the abv field of documents in the beer-sample bucket:

CREATE INDEX by_abv ON beer-sample(abv)

With the primary index created, you're now ready to write your first N1QL query.

Tip

If you're using Windows and its standard command line, you'll want to remove./ from the beginning of command-line examples. If you are using PowerShell on Windows or a Linux variant, you do not need to make any changes.

Simple queries

N1QL queries are likely to feel somewhat familiar to you as the language is very much like SQL and other such query languages. To illustrate just how similar N1QL and SQL are, consider the following query:

SELECT *

FROM beer-sample

This basic N1QL query looks and feels like the equivalent SQL query, and it does what you would expect it to do—it retrieves all documents from the beer-sample bucket. Recall that documents in a Couchbase bucket are not contained in a second-level namespace. As such, there is no equivalent of a SELECT * FROM Table statement.

Instead, if you want to find all brewery documents, you can write a N1QL query similar to the map function you would write in the case of a view. In both the cases, you have to check the convention-based type property to identify a document's taxonomy:

SELECT *

FROM beer-sample

WHERE type == "brewery"

Similarly, we can apply an additional WHERE clause to filter the results by another property before they are returned, as follows:

SELECT *

FROM beer-sample

WHERE type = "brewery"

AND country = "United States"

Again, there is virtually no difference between SQL and this N1QL query, save for the double quotes around strings. Even though you've seen only a couple of snippets of N1QL, it should be obvious that the Couchbase team designed N1QL to be immediately accessible to developers experienced with more traditional and relational systems.

As is the case with SQL, it's generally a good idea to project data from your queries rather than returning all properties in a document (or columns in the case of an RDBMS). Doing so with N1QL requires only that you specify the property names with your SELECTstatement:

SELECT name

FROM beer-sample

WHERE type = "brewery"

AND country = "United States"

Ordering is also possible using familiar, SQL-like operations:

SELECT name

FROM beer-sample

WHERE type = "brewery"

ORDER BY name

Another common SQL task is limiting the number of results or skipping certain numbers of results before returning rows. Often, you'll do so to page results displayed on a client application. It is also possible to skip and limit with N1QL, as demonstrated in the following code. In this snippet, only 10 documents are returned, starting with the sixth. In other words, documents numbered six to 15 are returned:

SELECT *

FROM beer-sample

LIMIT 10 OFFSET 5

The default ordering of documents is based on the document's meta.id value. Recall that this is also the default ordering for a view where no explicit key is set (that is, emit(null, null);). To verify this ordering behavior, you can run the following query. This query uses the meta() function, which provides access to a document's metadata:

SELECT city, name, meta().id

FROM beer-sample

WHERE type = "brewery"

Should you wish to sort by a document's property, you may add the ORDER BY clause to your SELECT statement. Again, the syntax for this clause should be familiar to SQL developers, as shown in the following code. In this example, documents are sorted first in descending order by name, and then in ascending order by style:

SELECT *

FROM beer-sample

WHERE type = "beer"

ORDER BY name DESC, style

N1QL also provides a means to search for documents by document keys. This search is performed by adding a KEY (KEYS for multiple keys) clause to a SELECT statement. To search for a single item by key, you provide a single key:

SELECT *

FROM beer-sample

KEY "thomas_hooker_brewing"

Alternatively, you can search for multiple documents with multiple keys:

SELECT *

FROM beer-sample

KEYS [

"thomas_hooker_brewing-hooker_oktoberfest",

"thomas_hooker_brewing-thomas-hooker_irish_red_ale",

"thomas_hooker_brewing"

]

To remove duplicate results from a query result set, simply apply the DISTINCT keyword to the projected properties. For example, to retrieve the distinct set of countries with breweries, you can execute this query:

SELECT DISTINCT country

FROM beer-sample

WHERE type = "brewery"

Null or missing properties

Throughout our exercises in writing map functions, it was common to test properties before attempting to emit them to a view index. N1QL also provides the capability to check for null or missing properties.

In JavaScript map functions, to check whether a property contains a null value, you simply compare the value to null:

if (doc.property == null) // do something

Missing properties are not null; rather, they don't exist. In order to check for a missing property with JavaScript, you can compare it to the undefined literal string or simply apply the bang (!) operator to your check. Using the latter test will allow for both a null check and a missing check:

if (! doc.property) // do something

There are two separate operators in N1QL used to test null and missing property values. The first query in the following snippet tests whether the value for style is Null:

SELECT *

FROM beer-sample

WHERE type = "beer"

AND style IS NULL

The second tests whether the style property was omitted entirely from the beer document:

SELECT *

FROM beer-sample

WHERE type = "beer"

AND style IS MISSING

These sorts of tests are important, given the schema-less nature of Couchbase and other document databases. Most developers are used to the safety of a relational schema, but in the world of NoSQL, it's important to expect the unexpected!

String utilities

One of the more obvious limitations of querying views is the lack of a proper LIKE operator. Though we saw in Chapter 4, Advanced Views, that it is possible to emulate a query like "starts with", it is not as robust as SQL's LIKE operator.

Fortunately, N1QL addresses this limitation with its own LIKE operator. Similar to SQL, you define a search pattern with a wildcard that is specified by a % character. In the following snippet, all breweries with Boston in their name will be returned in the results:

SELECT *

FROM brewery-sample

WHERE type = "brewery"

AND name LIKE "%Boston%"

Other string operators exist to perform standard string transformations such as SUBSTR, LOWER, UPPER, and LENGTH. You can use these functions as you do in a JavaScript map function or with string operations in most frameworks:

SELECT *

FROM beer-sample

WHERE type = "brewery"

AND LOWER(name) ="thomas hooker brewing"

It's also possible to perform string concatenation using the double pipe (||) operator. You can use this operator to combine properties into a single projected property. If you want to combine the city, state, and postal code into a single value, you can write this query:

SELECT city || ", " || state || " " || code AS Address

FROM beer-sample

WHERE type = "brewery"

AND city IS NOT NULL

AND state IS NOT NULL

AND code IS NOT NULL

More accurately, in the beer-sample bucket, breweries without city, state, or code values are stored as empty strings, so the preceding query won't actually filter the data as you might expect. Instead, you will have check whether those properties have a nonempty string. Which test you perform will of course depend on how your documents are structured:

SELECT city || ", " || state || " " || code AS Address

FROM beer-sample

WHERE type = "brewery"

AND city != ""

AND state != ""

AND code != ""

Aggregation and math

Performing aggregation is also a familiar operation. To write a query to count the number of breweries by state, you use the built-in count aggregate function:

SELECT state, COUNT(*) AS Count

FROM beer-sample

WHERE type = "brewery"

GROUP BY state

As you might expect, N1QL supports mathematical and aggregate operations such as AVG, ROUND, MIN, MAX, and SUM. You can use these operations to perform calculations on either aggregated data or on projected columns. As another example of aggregation with N1QL, this query computes the average abv (alcohol by volume) of a brewery's beer brands:

SELECT brewery_id, AVG(abv) AS Average

FROM beer-sample

WHERE type = "beer"

AND abv != 0

GROUP BY brewery_id

Similarly, if you want to find the beer with the highest or lowest alcohol content, you can use the MAX or MIN function respectively. In the following snippet, the HAVING clause is added to the GROUP BY clause to filter the results:

SELECT name, MAX(abv) AS Strength

FROM beer-sample

WHERE type = "beer"

AND abv != 0

GROUP BY name

HAVING MAX(abv) > 5

The BETWEEN operator may also be used to query for documents with a property value within a range; for example, if we want to find beers with abv between 5 and 10, we can use this query:

SELECT *

FROM beer-sample

WHERE abv BETWEEN 5 AND 10

At the time of writing this book, the BETWEEN operator doesn't work with the AND operator. In order to test for abv and type, you will need to use the "greater than" (>) and "less than" (<) operators:

SELECT *

FROM beer-sample

WHERE abv> 5 AND abv< 10

AND type = "beer"

Of course, N1QL also supports standard arithmetic operators for multiplication, division, addition, and subtraction. You are able to use these operators in your projections as you could with SQL. The following snippet calculates the proof (twice the abv value) of each beer. Also note the use of the standard "greater than" operator. Of course, the "less than" operator is also supported:

SELECT abv * 2 AS proof

FROM beer-sample

WHERE type = "beer"

AND abv> 0

Tip

Note that N1QL is smart enough to know when a hyphenated property or bucket name is used in a query, and it won't confuse the query engine into attempting subtraction. There is no need to avoid such properties.

Complex structures

N1QL queries are not limited to simple data types such as strings and numbers. With N1QL, you are able to operate on JSON objects and arrays as you could with map functions written in JavaScript.

As a simple example of a nested object, consider the brewery documents in the beer-sample bucket. These documents have geo data contained in a nested object with the geo property:

{

"type": "brewery",

"geo": {

"lng": -72.1234,

"lat": 34.1234

}

}

The geo object contains properties for longitude and latitude. If you want to write a query to find a brewery's geo information, you can use the standard dot (.) notation, which is common with most modern object-oriented programming languages:

SELECT geo.lon, geo.lat

FROM beer-sample

WHERE type = "brewery"

Arrays are another common data structure in JSON documents. N1QL supports working with arrays in a few ways. The beer-sample database doesn't have much when it comes to interesting array data, but the brewery documents do contain an address property, which is an array. Unfortunately for this example, there is no more than a single address in any brewery document:

SELECT address[0]

FROM beer-sample

WHERE type = "brewery"

In this case, the result of the query will be the first address (in our case the only address) in each brewery document. N1QL also includes a few functions to work with arrays. If a beer document didn't contain a valid address array, the preceding query would break. A safer query should include a check for the length of the array:

SELECT address[0]

FROM beer-sample

WHERE type = "brewery"

AND address[0] IS NOT NULL

AND ARRAY_LENGTH(address[0]) > 0

It's also possible to use array slicing to achieve a similar result. The following snippet demonstrates how to select the first two addresses from a brewery document and ensure that those addresses are not missing. Note that the beer-sample database doesn't contain address data to satisfy this query:

SELECT address[0:2]

FROM beer-sample

WHERE type = "brewery"

AND address[0:2] IS NOT MISSING

There are also methods for combining and adding items to arrays, such as array_prepend, array_append, and array_concat. As their names suggest, these methods add elements to the beginning or end of an array, or combine two arrays into one.

Working with collections

N1QL provides a means to succinctly query collections within a document. Recall that to examine nested collections in a map function, you used to run a for loop over the items in that collection. To achieve similar results in N1QL queries, you can filter a collection using the ANY operator.

For example, if we continue to use the address property of brewery documents, we can search for only those addresses that are not empty. In the following example, we're checking the length of each address string as our condition. Note that if a document contained two addresses, where one was valid and another was an empty string, the condition would still be satisfied:

SELECT address

FROM beer-sample

WHERE type = "brewery"

AND ANY addr IN address

SATISFIES LENGTH(addr) > 0 END

With a very slight change to the query, we can modify the behavior so that instead of returning breweries with a mix of empty and valid addresses in the address array, we return only those documents where all addresses are valid. In this case, we change the ANYoperator to EVERY:

SELECT address

FROM beer-sample

WHERE type = "brewery"

AND EVERY addr IN address

SATISFIES LENGTH(addr) > 0 END

With EVERY, only documents with address arrays with nonempty entries will be included. This means that if an array contained a valid address (nonempty) and an invalid address (empty), it would be excluded from the results. Note that in the beer-sample database, there aren't any records that do not satisfy the preceding previous query. Again, all address records contain either a single address or an empty array.

Joins

N1QL does contain support to perform joins on documents with a caveat—the joins must be made across different buckets. While this is a limitation for several use cases, it does provide a means of putting data together from disparate document sources.

Since this chapter focuses on the beer-sample database, the following join next imagines a setup where beer and brewery documents are stored in two separate buckets named beers and breweries, respectively:

SELECT *

FROM beers AS b

JOIN breweries AS b2

KEYS b.brewery_id

SDK support

At the time of writing this book, SDK support for N1QL is somewhat limited. .NET, Java, PHP, and Node.js have experimental support for N1QL. Ruby and Python should see N1QL support in the future. Until both the N1QL framework and the SDKs are more locked down, it's worth keeping an eye out for changes. At this stage, we'll look briefly at a Java snippet that demonstrates how to use N1QL with the 2.0 SDK:

Observable<QueryResult> result = bucket.query("select * from beer-sample");

Notice that the N1QL language is reminiscent of working with SQL-oriented frameworks such as JDBC or ADO.NET.

If you're familiar with prepared statements in SQL, where you provide parameters as positional arguments to a query statement with placeholders, there are work items for the Couchbase N1QL team to provide support for these types of queries. The advantage of prepared statements is that the query optimizer doesn't have to reparse and replan the execution with each run of query that differs only by arguments.

Summary

This chapter introduced N1QL, a powerful and experimental Couchbase query language. It's important to understand that this was not an exhaustive introduction to N1QL; some stones were left unturned. In particular, there are several additional functions for working with dates, strings, and numeric values. However, we have seen the most important bits.

As we've seen, N1QL is a somewhat radical departure from the MapReduce view model. This new feature is not meant to replace MapReduce, but rather to create greater flexibility in accessing your data.

Couchbase Server is truly unique because it provides developers with so many options to access data. With three distinct models for accessing documents, developers are able to build applications the way they wish to build them. Some developers will stick to the tried-and-true key/value model, while other developers who enjoy the power of MapReduce are likely to stick to views. Nevertheless, newer Couchbase users who prefer cutting-edge technology will likely find the familiarity of N1QL appealing.

As we'll see in the next chapter, designing a schema for Couchbase means considering both key/value and document design. Adding N1QL to the mix does mean that some of the design considerations made for MapReduce may have to be rethought. However, with N1QL still in developer preview status, we'll consider MapReduce when discussing document schemas.