Couchbase Essentials (2015)

Chapter 2. Using Couchbase CRUD Operations

Couchbase Server has a vast and powerful key/value API. There are basic operations to read and write values. There are facilities for easy and quick modification of simple data types. There are also methods used to manage concurrency with locks. You'll even find advanced key/value API methods that allow you to verify persistence and replication. In this chapter, we're going to explore the key/value interface in detail.

In order to examine this API, you'll need to install one of the Couchbase SDKs. While the Couchbase Console provides tools to insert and update documents, it doesn't expose the Couchbase CRUD API to the user in any way. To get a full feel for the Couchbase key/value API, we're going to jump right into using an SDK.

The Couchbase SDKs

The Couchbase team supports a number of SDKs, also known as Couchbase client libraries. At the time of writing this book, there are official libraries for Java, .NET, PHP, Ruby, Python, C, and Node.js. There are also community-supported libraries for Perl, Erlang, Go, and other platforms.

In this chapter, we'll explore a few of these clients. You should install the library for the platform with which you are most comfortable. Many of the clients are available through package managers such as .NET's NuGet or Python's pip. Visithttp://www.couchbase.com/communities to find instructions about installation. Each community has a Getting Started guide that details how to obtain your chosen SDK, as shown next:

The Couchbase SDKs

Getting a client up and running in your environment of choice is beyond the scope of this chapter. If you wish to follow along with the examples, then you should run through the Getting Started tutorial for your platform. In the final chapter, we'll work through building a to-do list application, where we'll explore SDK usage in more detail. If you get stuck, be sure to check out the community forums.

Basic operations

Couchbase Server's key/value API includes standard CRUD operations, and each of the SDKs contains corresponding CRUD methods. We'll begin our API exploration by demonstrating how to insert and retrieve a record from our default bucket. If you're following along, make sure you read the Getting Started guide's description on how to configure your client for use.

Connecting to your cluster

Before reading from or writing to a Couchbase Server bucket, you must first configure your client. The basic setup is consistent across all SDKs. You first connect to the cluster and then open a connection to a bucket, as follows:

var cluster = new Cluster();

var bucket = cluster.OpenBucket();

In the preceding C# snippet, the client assumes that the cluster is located on localhost (127.0.0.1), and the bucket you're connecting to is default. You can also set these values explicitly, like this:

var cluster = new Cluster("127.0.0.1");

var bucket = cluster.OpenBucket("default");

If you have multiple nodes in your cluster, you can supply multiple nodes when creating the cluster. If your bucket has a password, you can also specify that when opening the bucket:

var cluster = new Cluster("192.168.0.1", "192.168.0.2");

var bucket = cluster.OpenBucket("beer-sample", "b33rs@mpl3");

It's also possible to manage your cluster using SDKs. For example, if you want to create a bucket programmatically in .NET, you can use the ClusterManager class and its management APIs:

var mgr = cluster.CreateManager("Administrator", "password");

mgr.CreateBucket("beer-sample");

Creating and updating a record

With any database system, create is the CRUD method with which you'll generally begin creating a record (assuming you have no data yet). There are a couple of different methods for creating a record in a Couchbase bucket, the simplest of which is add. The addmethod takes a key and a corresponding value. Then it inserts the pair into your bucket:

client.add("message", "Hello, Couchbase World!")

The preceding Python snippet demonstrates adding a record with a value of Hello, Couchbase World! and a message key. If no record with a message key existed when you ran this code, a record will be created. If you try to run the same code again, you'll receive an error. The add method fails when trying to write a value to an existing key.

If you want to update the message record, then you should use the replace method. This method performs an update to a document with an existing key. The following Python snippet demonstrates how to use this method:

client.replace("message", "Hello, Couchbase World!")

In the preceding example, the Hello, Couchbase World! value will be replaced with Hello, Couchbase SDK World!, leaving a document with a message key and a Hello, Couchbase World! value. Similar to add, the replace method will fail if you try to update a record using a key that does not exist.

You might be wondering how to work around these potential failures. Fortunately, Couchbase provides a third CRUD operation called set. The set operation behaves as a combination of both add and replace. If you try to set a record with a key that does not exist,set will perform an add operation. If you try to set a value for a key that does exist, set will perform a replace operation.

client.replace("message", "Hello, Couchbase World!")

You'll realize that using the set method is generally the easiest option. However, there will be occasions where using add or replace makes more sense. For example, using add instead of set would allow you to have keys based on a user's nickname without worrying about a collision wiping out an existing record.

For bulk operations, some SDKs support a multi_set operation. When using this method, you supply a dictionary structure instead of a single key and value. The keys and values from the dictionary are sent to the server and processed concurrently. The client SDKs will determine which node owns which keys and send them in parallel. The multi_set operation will almost always be faster than a single set operation:

messages = { "Alice" : "Hello!", "Bob" : "Cheers!" }

client.multi_set(messages)

The Python snippet we just saw demonstrates writing multiple keys to the server in a single call. At the time of writing this book, not all SDKs support multi_set, though support should be on the roadmaps of those that don't.

Reading and deleting records

Reading a value from the server is performed by providing a key for the Get command. If the key exists on the server, Get will return the value. If the key doesn't exist, then the SDK will return either its language's version of null (for example, None in Python or nil in Ruby) or a wrapper around the result, which is the case with .NET and Java:

var result = bucket.Get<string>("message");

The preceding C# snippet demonstrates retrieving a record from the server and assigning it to a local variable. In this case, the result variable will be of the IOperationResult<T> type. It will contain properties that indicate whether the operation succeeded as well as the value itself:

if (result.Success)

{

Console.WriteLine(result.Value);

}

When using SDKs from one of the strongly-typed platforms (for example, .NET or Java), you'll likely want to cast the value to a specific type. The C# Get example we just saw sets the generic type parameter to a string and tells the client to treat the stored object as a .NET string.

It's important to know the type of data you've stored with a particular key. If you try to cast the result of a Get operation to the wrong data type, your SDK will likely raise a cast exception of some sort. In the .NET client, if you supply an incorrect generic type parameter, then InvalidCastException will be thrown:

var result = bucket.Get<int>("message");

The .NET client will catch the exception in this case. The caught exception is available in the Exception property of the result variable. The Success property will also be set to false, allowing you to react to the exception:

if (!result.Success&&result.Exception != null)

{  Console.WriteLine(result.Exception.Message);

}

The Value property of the result variable will be zero (the default value for integers in .NET) after the assignment in the previous example completes. When a non-primitive type is supplied as the generic type parameter, Value would be null (the default for non-primitive types). As such, it is not sufficient to check if Value is null to know whether the key was found.

Because Couchbase Server does not explicitly define data types for your records, your SDK will decide what type it should serialize and deserialize values to. Cast and use type methods carefully to avoid errors in your application.

Tip

You should be aware that a client may raise a "not found" error instead of null. However, this is a typical behavior, and you must explicitly enable it. Moreover, most SDKs don't expose this behavior. With the Python and Ruby clients, you are able to enable or disable "not found" exceptions by passing a quiet parameter to the get method.

There is also a variant of the Get operation that allows you to retrieve multiple values at once by providing multiple keys. When you use Get in this way, the SDKs will return a sort of dictionary structure where each of the keys in the dictionary will be the keys for which you requested values. The values of the dictionary will be the values from those keys on the server, or null if no values are found:

bucket.Insert("artist", "Arcade Fire");

bucket.Insert("album", "Funeral");

bucket.Insert("track", "Neighborhood #1 (Tunnels)");

var keys = new List<string> { "artist", "album", "track"};

var results = bucket.Get<string>(keys);

foreach (var key in keys)

{

Console.WriteLine(results[key].Value);

}

The preceding C# snippet demonstrates how to read multiple keys at once and iterate over the resulting IDictionary object. The exact data structure returned by the SDK will, of course, vary according to the language you use, but it will be an iterable key/value structure.

The multi-get operation is implemented in the SDKs using parallel operations. More precisely, the client figures out which keys are on which servers, and then makes concurrent requests to each server. The client then returns the unified map object. Thisconcurrency almost always means that it is more efficient to request many keys at once, as opposed to performing many individual Get operations serially.

To remove a key from the server, you'll simply pass that key to the delete operation on your SDK. Deleting a key using the .NET SDK is done as follows:

bucket.remove("message");

Advanced CRUD operations

The basic CRUD operations we've just seen are fairly straightforward and mimic what you'd expect to see in a relational system. As a key/value store, however, Couchbase provides a handful of additional, unique CRUD operations.

Temporary keys

As a descendant of the in-memory-only Memcached, Couchbase supports a set of operations you might not expect to see in a persistent store. Specifically, each of the CRUD methods outlined allows an expiry date to be provided. When set, this "time to live" option will be used to trigger the removal of a key by the server.

It is common in relational systems to have tables with expiration date columns. In this case, the expiry date is likely a flag to be used by a scheduled task that cleans old records. Couchbase Server allows you to achieve this very functionality without the need for a scheduled task or additional properties in the stored value.

To create a key with an expiry date, you can use either the set or add operation. You'll use these methods just as you used them previously, but you'll provide the additional "time to live" argument. In the following Python snippet, the key is set to expire in 1 hour:

client.set("message", "Goodbye, Couchbase World!", ttl = 3600)

How the expiry flag is set will vary by client, but it is commonly an integer value. In the case of .NET, it is set using .NET date and time structures.

You might wish to cause your keys to expire based on when they were last accessed. Using touch operations, you are able to achieve this sort of sliding expiry for your keys. The standard Get operation includes a time-to-live option. When you include a value for this parameter, you reset (or set) the time-to-live for the key:

client.get("message", ttl = 3600)

This Python snippet will reset the expiry on the message key to 1 hour from when the Get operation is performed. If you wish to extend the life of a key but not return its value there is a touch operation. Again, this operation is shown as follows in Python:

client.touch("message", ttl = 3600)

Appending and incrementing data

Couchbase Server also provides the ability to append or prepend additional data to (typically) string values. These operations are useful to store data structures such as delimited lists of values. Consider a key that stores tags for a blog post. One option would be to serialize and deserialize a list structure through your SDK:

tags = ["python", "couchbase", "nosql"]

client.set("tags", tags)

saved_tags = client.get("tags")

While this option would certainly work, it does require additional work to update data. You'd need to retrieve the record, update it in the memory, and then write it back to the server. Moreover, you'd also likely need to use a locking operation to ensure that the list hasn't changed since you retrieved it.

Another possibility is to use the append operation. With the append operation, you can push data to the end of a key's value. The concatenation takes place on the server, which means you don't have to manipulate the existing value first. The following Python snippet demonstrates the usage of append. In this example, we're maintaining the list of tags as a simple, comma-delimited string:

client.set("tags", "python,couchbase,")

client.append("tags", "nosql,")

saved_tags = client.get("tags")

#saved_tags == "python,couchbase,nosql,"

Similarly, Couchbase supports a prepend operation to save data to the beginning of a key's value, as seen next in the Python snippet:

client.set("tags", "python,couchbase,")

client.prepend("tags", "nosql,")

saved_tags = client.get("tags")

#saved_tags == "couchbase,nosql,python,"

Another useful operation is increment. This command provides a means of updating an integer value on the server. Similar to prepend and append, incr allows you to modify a key's value without having to modifying it in your client application. Incrementing a counter is the most common use of this feature:

client.set("counter", 1)

client.incr("counter") # counter == 2

client.incr("counter", 4) # counter == 6

The preceding Python sample shows that the default increment behavior is to add 1 to the existing value of the key. If you provide a value for the offset parameter, the key's value will be incremented by the offset. If you want to decrement a counter, you can provide a negative offset value:

client.incr("counter", -1)

There is also a decrement operation, and it can be used instead of a negative offset with increment:

client.decr("counter", 1)

Storing complex types

So far, we've limited our exploration primarily to simple data types such as strings and integers. In a real application, you're more likely to have business objects or other complex types that you will need to store. To Couchbase Server, the values that you store are nothing more than byte arrays. Therefore, the SDKs are able to use their respective language's binary serializer (often called a transcoder) to store any data structures.

Consider an application that stores information on a user profile. In .NET, you might have a data object that looks like this:

public class UserProfile

{

public string Username { get; set; }

public string Email { get; set; }

}

When you use the .NET client to save an instance of the UserProfile class in Couchbase Server, it will be serialized using .NET's default binary serializer. Couchbase Server, of course, knows nothing about a client platform's serialization format. It will simply store the byte array it received from the client:

var userProfile = new UserProfile {

Username = "jsmith", Email = "js@asdf.com" };

client.Upsert(userProfile.Username, userProfile);

In the preceding snippet, an instance of the UserProfile class is saved with a key value that is set to the user's username. To retrieve that instance, simply use the Get operation we've already seen. This time, our SDK's transcoder will return an instance ofUserProfile set as the value property of the result variable:

var result = client.Get<UserProfile>("jsmith");

Recall that if the value for the jsmith key is not an instance of UserProfile, the operation will fail with an invalid cast exception being thrown.

It is important to note that platform-specific serializers may not be compatible between SDKs. Imagine you have the following Python class (full class definition omitted for brevity):

class UserProfile:

@property

def username(self):

pass

@property

def email(self):

pass

If you tried to retrieve the .NET-serialized UserProfile object and deserialize it into an instance of the preceding Python class, you'd encounter an exception. Python and .NET have different binary serialization formats:

client.get("jsmith") #will likely break

There is a solution to the problem of hybrid systems where multiple clients need to access Couchbase Server data from multiple frameworks. We'll explore that solution when we start to work with Couchbase Server's document-oriented features. For now, we'll assume that we're using a single-client SDK environment.

It's also worth noting that Couchbase SDKs support custom transcoders. If you want to change the default serialization behavior for your SDK, implementing your own transcoder is the way to achieve this goal. For example, if you want to force all of the data to be stored as JSON, a custom transcoder can solve this problem. You can also use the data_passthrough parameter in certain SDKs, which will force all values to be returned as raw bytes.

Concurrency and locking

While the Couchbase SDKs have been written to be thread-safe, your Couchbase applications still must consider concurrency. Whether two users or two threads are attempting to modify the same key, locking is a necessity in order to limit stale data writes. Couchbase Server supports both pessimistic and optimistic locking.

The CRUD operations we've seen so far do not make use of any locking. To see why this is a problem, consider the following C# code:

public class Story

{

public String Title { get; set; }

public String Body { get; set; }

public List<String> Comments { get; set; }

}

var story = bucket.Get<Story>("story_slug").Value;

story.Comments.add("Nice Article!");

bucket.Replace<Story>("story_slug", story);

Now suppose that in the preceding code, in the moments between the get and set calls, the following code ran on another thread (that is, another web request):

var story = bucket.get<Story>("story_slug");

story.Comments.add("Great writing!");

client.Replace<Story>("story_slug", story);

In this scenario, both clients received the same initial Story values. After the second client sets its value back in the bucket with a new comment, it is quickly overwritten because the first client completes its set. The Great writing! comment is lost. Fortunately, the Couchbase API does provide a mechanism to prevent this situation from occurring.

In traditional relational applications, a common pattern is to include a timestamp column on tables where stale records should not be updated without first retrieving the most recent write of a row. When this approach is used, the UPDATE statement includes the timestamp in the WHERE clause:

UPDATE Story

SET Title = 'New Title',

Timestamp = @NewTimeStamp

WHERE ID = 1 AND Timestamp = @CurrentTimestamp;

In the preceding SQL statement, the update will not occur unless the row's current timestamp value is provided for the @CurrentTimestamp parameter. With Couchbase Server, you are able to use CAS (short for compare and swap) operations to provide the same optimistic locking.

CAS operations take advantage of the fact that with each mutation of a key, the server will maintain with it a unique 64-bit integer, known as a CAS. CAS operations work by disallowing a key's mutation if the provided CAS value doesn't match the server's current version. You could think of CAS as acting like a version control system. If you try to commit a patch without first getting the latest revisions, your commit fails. However, Couchbase does not maintain revisions for each CAS, it simply prevents stale writes:

var result = bucket.Get<Story>("story_slug");

var story = result.Value;

story.Comments.add("Awesome!");

var resp = bucket.Replace<Story>(result.Cas, "story_slug", story);

In the preceding C# example, the result variable is returned from the client by way of its Get method. This object contains both the stored object and the current CAS value from the server. That CAS value is used with a call to the Replace method. After the Getmethod is called, if another thread has updated the story_slug key, then the Replace call will not result in a mutated value. The response from the attempt will include the status of the operation:

if (resp.Success)

{

//operation success

}

else if (resp.Status == ResponseStatus.KeyNotFound) {

//key does not exist, use add instead

}

else if (resp.Status == ResponseStatus.KeyExists)

{

//key exists, but CAS didn't match

//call Getagain, try again

}

In this example, you can see that the C# client provides the three possible outcomes for a CAS operation. If the CAS is the same, the mutation occurs. If the key is not found, an insert operation should be performed. If the CAS is different, the mutation is stopped. The question that follows then is, how do you handle a CAS mismatch?

In the simplest case, you'd simply retry your Get and Replace operations, hoping that the CAS value you've obtained is now current. However, a more robust solution is to employ some sort of retry loop:

for(var i=0; i< 5; i++) {

var result = bucket.Get<Story>("story_slug");

var story = result.Value();

story.Comments.add("Awesome!");

var resp = bucket.Replace<Story>(result.Cas, "story_slug", story);

if (resp.Success) break;

}

The advantage of this sort of locking is that it is optimistic, meaning that the server doesn't employ any locking of its own. One 64-bit integer is compared to another. If they match, the values for a key are swapped. This operation has virtually no impact on performance. However, it does make room for the possibility that a thread may never acquire a current CAS. If such a situation is unacceptable, Couchbase Server provides a pessimistic locking option.

The getl (or get and lock) operation allows you to obtain a read/write lock on a key for up to 30 seconds. While you hold the lock, no other clients or threads will be able to modify the key. You consume getl in a manner similar to the CAS operations. When you request a lock, you're provided a CAS with which only your client will be able to update the key:

var result= bucket.GetWithLock<Story>("story_slug", TimeSpan.FromSeconds(10));

var story = result.Value;

story.Comments.Add("Good stuff!");

bucket.Replace<Story>("story_slug", story, result.Cas);

The preceding C# code demonstrates how a client may acquire an exclusive lock on a key. In this case, the lock will expire in 30 seconds. Clients who attempt to read or write to this key will receive an error. In this example, the lock will be released once the CAS operation is performed.

Rather than waiting for an expiry or a CAS operation, it is also possible to explicitly unlock a key. Generally speaking, a CAS operation is likely to be your primary means of unlocking a key. However, there will be times when some condition in your code leads to a path where the locked document shouldn't be mutated. In those cases, it's more efficient to unlock the document rather than wait for the timeout:

Var result = bucket.GetWithLock<Story>("story_slug", TimeSpan.FromSeconds(10))

if (result.value.IsCommentingClosed)

{

bucket.Unlock("story_slug", result.Cas);

}

else

{

result.value.Comments.Add("Couchbase is fast!");

bucket.Replace<Story>("story_slug", story);

}

This C# code demonstrates retrieving a key, checking whether the value should be modified, and then deciding how to perform. In this example, we're checking whether commenting is closed for a story. If it is, we won't accept a new comment. Therefore, we'll release the lock rather than wait for the remaining 10 seconds.

When deciding between a CAS operation and a getl operation, you will have to consider whether you want other threads to be blocked from reading the locked key. In such a case, a GetWithLock method is required. More often, a CAS operation is probably the safest in terms of performance and side effects.

Asynchronous operations

One of the primary reasons for the growth of Couchbase is its massive scalability. Few databases come close to the performance offered by Couchbase Server. Any system that is capable of handling millions of operations per second across a small cluster of nodes will have to deal with concurrency issues at some point.

Traditionally, servers dealt with concurrency by spinning up threads to handle multiple requests simultaneously. However, as load increases on such systems, the overhead of creating and maintaining threads becomes quite expensive in terms of CPU and memory.

Couchbase Server makes use of nonblocking I/O libraries to provide scaling, without the need to spin a thread or process every request. In a nutshell, nonblocking I/O makes heavy use of asynchronous callbacks to avoid blocking the receiving thread.

In other words, the thread that receives the request will only delegate the work to be done, and later receive a notification when that work is done. This pattern of handling concurrency is popular in modern servers and frameworks, including Node.js and the nginxweb server.

All the operations covered used so far are blocking. In other words, when your client calls Couchbase Server with a command, it blocks the calling thread until that operation completes. It is common to use Couchbase in a fire-and-forget fashion, and blocking calls slows this process down.

Some (but not all) clients support asynchronous operations. Clients such as Ruby and Node.js are built on top of the C library, which is fully asynchronous. Therefore, such libraries are able to piggyback on client implementation. The fully managed Java library does support asynchronous operations using Java Futures.

We won't explore the asynchronous operations in detail, as they are effectively similar to the operations we've already seen. The following Ruby snippet gives you a taste of how you'd use such a method:

client.run do |c|

c.get("message") {|ret| puts ret.value}

end

In this example, the client runs the get operation asynchronously. When the method returns, the callback (in curly braces) is executed. The thread that called client.run was not blocked while waiting on the get call. Similarly, in Java, you may use the asynchronous versions of operations to allow nonblocking calls to Couchbase Server:

String message;

GetFuture<Object> future = client.asyncGet("message");

message = (String)future.get(10, TimeUnit.SECONDS);

In this Java example, the client asynchronously retrieves the message key. The value of that key is then assigned back to the message variable with a wait timeout of 10 seconds. A try/catch block should wrap the future.get call, but was omitted for brevity.

Durability operations

In Chapter 1, Getting Comfortable with Couchbase, you learned that Couchbase Server handles reads and writes by writing to the memory first, and then writing asynchronously to the disk. The standard CRUD operations we've seen so far make no distinction between a key being written to the cluster memory and a key persisting in the disk.

If you've set up replication, you've likely guarded your data against potential data loss from a single server failing before flushing the key to the disk. However, there will be times when your business process cannot tolerate the possibility that a record did not persist. If you have such a requirement, Couchbase Server supports inclusion of durability requirements with your store requests.

These durability requirements are tunable to your specific needs. For example, you might wish to know whether a key was written to the disk on its master node and replicated to at least two nodes in the memory. To use a durability check with a .NET client, you will use the standard store method with additional arguments, as follows:

bucket.Upsert<string>("key", "value", PersistTo.One, ReplicateTo.Two);

The PersistTo argument specifies that the operation must return a failure if the key hasn't persisted in the master node after a timeout (globally configurable). The ReplicateTo option adds the additional requirement that the key must be copied to least two nodes in the memory.

If your durability concern is only that the key is replicated, you can use the previous operation without the PersistTo argument. Similarly, you can check for persistence only by omitting the replication argument. Importantly, if any persistence option is set, success will occur only if the master node wrote the key to the disk. If the replica wrote a key to the disk somehow but the master died before it could do so, the store operation will fail.

It might seem counterintuitive, but it is also possible to use durability requirements with delete methods. Similar to writes, delete operations are also applied to the memory first. Therefore, if you want to be sure that a key was also removed from the disk, you should include a persistence requirement.

bucket.Remove("key", PersistTo.One);

The SDKs generally reuse their persistence enumerations in both store and delete operations. In the case of delete, PersistTo is perhaps more accurately thought of as RemoveFrom.

It is important to use durability requirements with care if your application is in need of the peak scale. With much of Couchbase Server's performance being dependent on its heavy use of cache, blocking disk writes will obviously introduce latency. Generally speaking, it's best to use durability requirements only when absolutely necessary. It is more important to enable replication in your cluster.

Summary

In this chapter, we explored the Couchbase Server key/value API in detail. You saw that Couchbase supports the basic CRUD operations you'd expect of a database system, whether relational or nonrelational. We examined operations that are unique to Couchbase, for example, append and prepend operations can be used to store data, while increment and decrement operations can be used to modify a key's value.

You learned how Couchbase supports both pessimistic and optimistic locking as well as basic strategies to use both. We explored the ability to use durability checks and asynchronous methods to tweak the performance of our application. Most importantly, we got a taste of a few of the client SDKs and how they perform the various operations.

At this point, we've explored about 98 percent of the Couchbase key/value API. There are a few other legacy methods that you might encounter, depending on your SDK; for example, the flush operation is used to remove all records from a bucket. The key/value version of this method has been deprecated in favor of the cluster API version, which is performed over HTTP. However, you might find this method still accessible, given the backward compatibility with Memcached.

Though we omitted 2 percent of the available key/value operations in this chapter, 98 percent of the methods we looked at should cover 100 percent of your key/value requirements. Moreover, the design of your application may reveal that the basic CRUD operations and CAS are sufficient to meet your requirements.

In the next chapter, we're going to start exploring the document capabilities of Couchbase Server. As we do, you'll learn how it complements the key/value API you just learned about.