Node.js in Action (2014)

Part 2. Web application development with Node

Chapter 5. Storing Node application data

This chapter covers

· In-memory and filesystem data storage

· Conventional relational database storage

· Nonrelational database storage

Almost every application, web-based or otherwise, requires data storage of some kind, and the applications you build with Node are no different. The choice of an appropriate storage mechanism depends on five factors:

· What data is being stored

· How quickly data needs to be read and written to maintain adequate performance

· How much data exists

· How data needs to be queried

· How long and reliably the data needs to be stored

Methods of storing data range from keeping data in server memory to interfacing with a full-blown database management system (DBMS), but all methods require trade-offs of one sort or another.

Mechanisms that support long-term persistence of complex structured data, along with powerful search facilities, incur significant performance costs, so using them is not always the best strategy. Similarly, storing data in server memory maximizes performance, but it’s less reliably persistent because data will be lost if the application restarts or the server loses power.

So how will you decide which storage mechanism to use in your applications? In the world of Node application development, it isn’t unusual to use different storage mechanisms for different use cases. In this chapter, we’ll talk about three different options:

· Storing data without installing and configuring a DBMS

· Storing data using a relational DBMS—specifically, MySQL and PostgreSQL

· Storing data using NoSQL databases—specifically, Redis, MongoDB, and Mongoose

You’ll use some of these storage mechanisms to build applications later in the book, and by the end of this chapter you’ll know how to use these storage mechanisms to address your own application needs.

To start, let’s look at the easiest and lowest level of storage possible: serverless data storage.

5.1. Serverless data storage

From the standpoint of system administration, the most convenient storage mechanisms are those that don’t require you to maintain a DBMS, such as in-memory storage and file-based storage. Removing the need to install and configure a DBMS makes the applications you build much easier to install.

The lack of a DBMS makes serverless data storage a perfect fit for Node applications that users will run on their own hardware, like web applications and other TCP/IP applications. It’s also great for command-line interface (CLI) tools: a Node-driven CLI tool might require storage, but it’s likely the user won’t want to go through the hassle of setting up a MySQL server in order to use the tool.

In this section, you’ll learn when and how to use in-memory storage and file-based storage, both of which are primary forms of serverless data storage. Let’s start with the simplest of the two: in-memory storage.

5.1.1. In-memory storage

In the example applications in chapters 2 and 4, in-memory storage was used to keep track of details about chat users and tasks. In-memory storage uses variables to store data. Reading and writing this data is fast, but as we mentioned earlier, you’ll lose the data during server and application restarts.

The ideal use of in-memory storage is for small bits of frequently accessed data. One such application would be a counter that keeps track of the number of page views since the last application restart. For example, the following code will start a web server on port 8888 that counts each request:

var http = require('http');

var counter = 0;

var server = http.createServer(function(req, res) {

counter++;

res.write('I have been accessed ' + counter + ' times.');

res.end();

}).listen(8888);

For applications that need to store information that can persist beyond application and server restarts, file-based storage may be more suitable.

5.1.2. File-based storage

File-based storage uses a filesystem to store data. Developers often use this type of storage for application configuration information, but it also allows you to easily persist data that can survive application and server restarts.

Concurrency issues

File-based storage, although easy to use, isn’t suitable for all types of applications. If a multiuser application, for example, stored records in a file, there could be concurrency issues. Two users could load the same file at the same time and modify it; saving one version would overwrite the other, causing one user’s changes to be lost. For multiuser applications, database management systems are a more sensible choice because they’re designed to deal with concurrency issues.

To illustrate the use of file-based storage, let’s create a simple command-line variant of chapter 4’s web-based Node to-do list application. Figure 5.1 shows this variant in operation.

Figure 5.1. A command-line to-do list tool

The application will store tasks in a file named .tasks in whatever directory the script runs from. Tasks will be converted to JSON before being stored, and they’ll be converted from JSON when they’re read from the file.

To create the application, you’ll need to write the starting logic and then define helper functions to retrieve and store tasks.

Writing the starting logic

The logic begins by requiring the necessary modules, parsing the task command and description from the command-line arguments, and specifying the file in which tasks should be stored. This is shown in the following code.

Listing 5.1. Gather argument values and resolve file database path

If you provide an action argument, the application either outputs a list of stored tasks or adds a task description to the task store, as shown in the following listing. If you don’t provide the argument, usage help will be displayed.

Listing 5.2. Determining what action the CLI script should take

Defining a helper function to retrieve tasks

The next step is to define a helper function called loadOrInitializeTaskArray in the application logic to retrieve existing tasks. As listing 5.3 shows, loadOrInitializeTaskArray loads a text file in which JSON-encoded data is stored. Two asynchronous fs module functions are used in the code. These functions are non-blocking, allowing the event loop to continue instead of having it sit and wait for the filesystem to return results.

Listing 5.3. Loading JSON-encoded data from a text file

Next, you use the loadOrInitializeTaskArray helper function to implement the listTasks functionality.

Listing 5.4. List tasks function

function listTasks(file) {

loadOrInitializeTaskArray(file, function(tasks) {

for(var i in tasks) {

console.log(tasks[i]);

}

});

}

Defining a helper function to store tasks

Now you need to define another helper function, storeTasks, to store JSON-serialized tasks into a file.

Listing 5.5. Storing a task to disk

function storeTasks(file, tasks) {

fs.writeFile(file, JSON.stringify(tasks), 'utf8', function(err) {

if (err) throw err;

console.log('Saved.');

});

}

Then you can use the storeTasks helper function to implement the addTask functionality.

Listing 5.6. Adding a task

function addTask(file, taskDescription) {

loadOrInitializeTaskArray(file, function(tasks) {

tasks.push(taskDescription);

storeTasks(file, tasks);

});

}

Using the filesystem as a data store enables you to add persistence to an application relatively quickly and easily. It’s also a great way to handle application configuration. If application configuration data is stored in a text file and encoded in JSON, the logic defined earlier inloadOrInitializeTaskArray could be repurposed to read the file and parse the JSON.

In chapter 13, you’ll learn more about manipulating the filesystem with Node. Now let’s move on to look at the traditional data storage workhorses of applications: relational database management systems.

5.2. Relational database management systems

Relational database management systems (RDBMSs) allow complex information to be stored and easily queried. RDBMSs have traditionally been used for relatively high-end applications, such as content management, customer relationship management, and shopping carts. They can perform well when used correctly, but they require specialized administration knowledge and access to a database server. They also require knowledge of SQL, although there are object-relational mappers (ORMs) with APIs that can write SQL for you in the background. RDBMS administration, ORMs, and SQL are beyond the scope of this book, but you’ll find many online resources that cover these technologies.

Developers have many relational database options, but most choose open source databases, primarily because they’re well supported, they work well, and they don’t cost anything. In this section, we’ll look at MySQL and PostgreSQL, the two most popular full-featured relational databases. MySQL and PostgreSQL have similar capabilities, and both are solid choices. If you haven’t used either, MySQL is easier to set up and has a larger user base. If you happen to use the proprietary Oracle database, you’ll want to use the db-oracle module (https://github.com/mariano/node-db-oracle), which is also outside the scope of this book.

Let’s start with MySQL and then look at PostgreSQL.

5.2.1. MySQL

MySQL is the world’s most popular SQL database, and it’s well supported by the Node community. If you’re new to MySQL and interested in learning about it, you’ll find the official tutorial online (http://dev.mysql.com/doc/refman/5.0/en/tutorial.html). For those new to SQL, many online tutorials and books, including Chris Fehily’s SQL: Visual QuickStart Guide (Peachpit Press, 2008), are available to help you get up to speed.

Using MySQL to build a work-tracking app

To see how Node takes advantage of MySQL, let’s look at an application that requires an RDBMS. Let’s say you’re creating a serverless web application to keep track of how you spend your workdays. You’ll need to record the date of the work, the time spent on the work, and a description of the work performed.

The application you’ll build will have a form in which details about the work performed can be entered, as shown in figure 5.2.

Figure 5.2. Recording details of work performed

Once the work information has been entered, it can be archived or deleted so it doesn’t show above the fields used to enter more work, as shown in figure 5.3. Clicking the Archived Work link will then display any work items that have been archived.

Figure 5.3. Archiving or deleting details of work performed

You could build this web application using the filesystem as a simple data store, but it would be tricky to build reports with the data. If you wanted to create a report on the work you did last week, for example, you’d have to read every work record stored and check the record’s date. Having application data in an RDBMS gives you the ability to generate reports easily using SQL queries.

To build a work-tracking application, you’ll need to do the following:

· Create the application logic

· Create helper functions needed to make the application work

· Write functions that let you add, delete, update, and retrieve data with MySQL

· Write code that renders the HTML records and forms

The application will leverage Node’s built-in http module for web server functionality and will use a third-party module to interact with a MySQL server. A custom module named timetrack will contain application-specific functions for storing, modifying, and retrieving data using MySQL.Figure 5.4 provides an overview of the application.

Figure 5.4. How the work-tracking application will be structured

The end result, as shown in figure 5.5, will be a simple web application that allows you to record work performed and review, archive, and delete the work records.

Figure 5.5. A simple web application that allows you to track work performed

To allow Node to talk to MySQL, we’ll use Felix Geisendörfer’s popular node-mysql module (https://github.com/felixge/node-mysql). To begin, install the MySQL Node module using the following command:

npm install mysql

Creating the application logic

Next, you need to create two files for application logic. The application will be composed of two files: timetrack_server.js, used to start the application, and timetrack.js, a module containing application-related functionality.

To start, create a file named timetrack_server.js and include the code in listing 5.7. This code includes Node’s HTTP API, application-specific logic, and a MySQL API. Fill in the host, user, and password settings with those that correspond to your MySQL configuration.

Listing 5.7. Application setup and database connection initialization

Next, add the logic in listing 5.8 to define the basic web application behavior. The application allows you to browse, add, and delete work performance records. In addition, the app will let you archive work records. Archiving a work record hides it on the main page, but archived records remain browsable on a separate web page.

Listing 5.8. HTTP request routing

The code in listing 5.9 is the final addition to timetrack_server.js. This logic creates a database table if none exists and starts the HTTP server listening to IP address 127.0.0.1 on TCP/IP port 3000. All node-mysql queries are performed using the query function.

Listing 5.9. Database table creation

Creating helper functions that send HTML, create forms, and receive form data

Now that you’ve fully defined the file you’ll use to start the application, it’s time to create the file that defines the rest of the application’s functionality. Create a directory named lib, and inside this directory create a file named timetrack.js. Inside this file, insert the logic from listing 5.10, which includes the Node querystring API and defines helper functions for sending web page HTML and receiving data submitted through forms.

Listing 5.10. Helper functions: sending HTML, creating forms, receiving form data

Adding data with MySQL

With the helper functions in place, it’s time to define the logic that will add a work record to the MySQL database. Add the code in the next listing to timetrack.js.

Listing 5.11. Adding a work record

Note that you use the question mark character (?) as a placeholder to indicate where a parameter should be placed. Each parameter is automatically escaped by the query method before being added to the query, preventing SQL injection attacks.

Note also that the second argument of the query method is now a list of values to substitute for the placeholders.

Deleting MySQL data

Next, you need to add the following code to timetrack.js. This logic will delete a work record.

Listing 5.12. Deleting a work record

Updating MySQL data

To add logic that will update a work record, flagging it as archived, add the following code to timetrack.js.

Listing 5.13. Archiving a work record

Retrieving MySQL data

Now that you’ve defined the logic that will add, delete, and update a work record, you can add the logic in listing 5.14 to retrieve work-record data—archived or unarchived—so it can be rendered as HTML. When issuing the query, a callback is provided that includes a rows argument for the returned records.

Listing 5.14. Retrieving work records

Rendering MySQL records

Add the logic in the following listing to timetrack.js. It’ll do the rendering of work records to HTML.

Listing 5.15. Rendering work records to an HTML table

Rendering HTML forms

Finally, add the following code to timetrack.js to render the HTML forms needed by the application.

Listing 5.16. HTML forms for adding, archiving, and deleting work records

Trying it out

Now that you’ve fully defined the application, you can run it. Make sure that you’ve created a database named timetrack using your MySQL administration interface of choice. Then start the application by entering the following into your command line:

node timetrack_server.js

Finally, navigate to http://127.0.0.1:3000/ in a web browser to use the application.

MySQL may be the most popular relational database, but PostgreSQL is, for many, the more respected of the two. Let’s look at how you can use PostgreSQL in your application.

5.2.2. PostgreSQL

PostgreSQL is well regarded for its standards compliance and robustness, and many Node developers favor it over other RDBMSs. Unlike MySQL, PostgreSQL supports recursive queries and many specialized data types. PostgreSQL can also use a variety of standard authentication methods, such as Lightweight Directory Access Protocol (LDAP) and Generic Security Services Application Program Interface (GSSAPI). For those using replication for scalability or redundancy, PostgreSQL supports synchronous replication, a form of replication in which data loss is prevented by verifying replication after each data operation.

If you’re new to PostgreSQL and interested in learning it, you’ll find the official tutorial online (www.postgresql.org/docs/7.4/static/tutorial.html).

The most mature and actively developed PostgreSQL API module is Brian Carlson’s node-postgres (https://github.com/brianc/node-Postgres).

Untested for Windows

While the node-postgres module is intended to work for Windows, the module’s creator primarily tests using Linux and OS X, so Windows users may encounter issues, such as a fatal error during installation. Because of this, Windows users may want to use MySQL instead of PostgreSQL.

Install node-postgres via npm using the following command:

npm install pg

Connecting to PostgreSQL

Once you’ve installed the node-postgres module, you can connect to PostgreSQL and select a database to query using the following code (omit the :mypassword portion of the connection string if no password is set):

var pg = require('pg');

var conString = "tcp://myuser:mypassword@localhost:5432/mydatabase";

var client = new pg.Client(conString);

client.connect();

Inserting a row into a database table

The query method performs queries. The following example code shows how to insert a row into a database table:

client.query(

'INSERT INTO users ' +

"(name) VALUES ('Mike')"

);

Placeholders ($1, $2, and so on) indicate where to place a parameter. Each parameter is escaped before being added to the query, preventing SQL injection attacks. The following example shows the insertion of a row using placeholders:

client.query(

"INSERT INTO users " +

"(name, age) VALUES ($1, $2)",

['Mike', 39]

);

To get the primary key value of a row after an insert, you can use a RETURNING clause to specify the name of the column whose value you’d like to return. You then add a callback as the last argument of the query call, as the following example shows:

client.query(

"INSERT INTO users " +

"(name, age) VALUES ($1, $2) " +

"RETURNING id",

['Mike', 39],

function(err, result) {

if (err) throw err;

console.log('Insert ID is ' + result.rows[0].id);

}

);

Creating a query that returns results

If you’re creating a query that will return results, you’ll need to store the client query method’s return value to a variable. The query method returns an object that has inherited EventEmitter behavior to take advantage of Node’s built-in functionality. This object emits a row event for each retrieved database row. Listing 5.17 shows how you can output data from each row returned by a query. Note the use of Event-Emitter listeners that define what to do with database table rows and what to do when data retrieval is complete.

Listing 5.17. Selecting rows from a PostgreSQL database

An end event is emitted after the last row is fetched, and it may be used to close the database or continue with further application logic.

Relational databases may be classic workhorses, but another breed of database manager that doesn’t require the use of SQL is becoming increasingly popular.

5.3. NoSQL databases

In the early days of the database world, nonrelational databases were the norm. But relational databases slowly gained in popularity and over time became the mainstream choice for applications both on and off the web. In recent years, a resurgent interest in nonrelational DBMSs has emerged as their proponents claimed advantages in scalability and simplicity, and these DBMSs target a variety of usage scenarios. They’re popularly referred to as “NoSQL” databases, interpreted as “No SQL” or “Not Only SQL.”

Although relational DBMSs sacrifice performance for reliability, many NoSQL databases put performance first. For this reason, NoSQL databases may be a better choice for real-time analytics or messaging. NoSQL databases also usually don’t require data schemas to be predefined, which is useful for applications in which stored data is hierarchical but whose hierarchy varies.

In this section, we’ll look at two popular NoSQL databases: Redis and MongoDB. We’ll also look at Mongoose, a popular API that abstracts access to MongoDB, adding a number of time-saving features. The setup and administration of Redis and MongoDB are out of the scope of this book, but you’ll find quick-start instructions on the web for Redis (http://redis.io/topics/quickstart) and MongoDB (http://docs.mongodb.org/manual/installation/#installation-guides) that should help you get up and running.

5.3.1. Redis

Redis is a data store well suited to handling simple data that doesn’t need to be stored for long-term access, such as instant messages and game-related data. Redis stores data in RAM, logging changes to it to disk. The downside to this is that storage space is limited, but the advantage is that Redis can perform data manipulation quickly. If a Redis server crashes and the contents of RAM are lost, the disk log can be used to restore the data.

Redis provides a vocabulary of primitive but useful commands (http://redis.io/commands) that work on a number of data structures. Most of the data structures supported by Redis will be familiar to developers, as they’re analogous to those frequently used in programming: hash tables, lists, and key/value pairs (which are used like simple variables). Hash table and key/value pair types are illustrated in figure 5.6. Redis also supports a less-familiar data structure called a set, which we’ll talk about later in this chapter.

Figure 5.6. Redis supports a number of simple data types, including hash tables and key/value pairs.

We won’t go into all of Redis’s commands in this chapter, but we’ll run through a number of examples that will be applicable for most applications. If you’re new to Redis and want to get an idea of its usefulness before trying these examples, a great place to start is the “Try Redis” tutorial (http://try.redis.io/). For an in-depth look at leveraging Redis for your applications, check out Josiah L. Carlson’s book, Redis in Action (Manning, 2013).

The most mature and actively developed Redis API module is Matt Ranney’s node_redis (https://github.com/mranney/node_redis) module. Install this module using the following npm command:

npm install redis

Connecting to a Redis server

The following code establishes a connection to a Redis server using the default TCP/IP port running on the same host. The Redis client you’ve created has inherited EventEmitter behavior that emits an error event when the client has problems communicating with the Redis server. As the following example shows, you can define your own error-handling logic by adding a listener for the error event type:

var redis = require('redis');

var client = redis.createClient(6379, '127.0.0.1');

client.on('error', function (err) {

console.log('Error ' + err);

});

Manipulating data in Redis

After you’ve connected to Redis, your application can start manipulating data immediately using the client object. The following example code shows the storage and retrieval of a key/value pair:

Storing and retrieving values using a hash table

Listing 5.18 shows the storage and retrieval of values in a slightly more complicated data structure: a hash table, also known as a hash map. A hash table is essentially a table of identifiers, called keys, that are associated with corresponding values.

The hmset Redis command sets hash table elements, identified by a key, to a value. The hkeys Redis command lists the keys of each element in a hash table.

Listing 5.18. Storing data in elements of a Redis hash table

Storing and retrieving data using the list

Another data structure Redis supports is the list. A Redis list can theoretically hold over four billion elements, memory permitting.

The following code shows the storage and retrieval of values in a list. The lpush Redis command adds a value to a list. The lrange Redis command retrieves a range of list items using start and end arguments. The -1 end argument in the following code signifies the last item of the list, so this use of lrange will retrieve all list items:

client.lpush('tasks', 'Paint the bikeshed red.', redis.print);

client.lpush('tasks', 'Paint the bikeshed green.', redis.print);

client.lrange('tasks', 0, -1, function(err, items) {

if (err) throw err;

items.forEach(function(item, i) {

console.log(' ' + item);

});

A Redis list is an ordered list of strings. If you were creating a conference-planning application, for example, you might use a list to store the conference’s itinerary.

Redis lists are similar, conceptually, to arrays in many programming languages, and they provide a familiar way to manipulate data. One downside to lists, however, is their retrieval performance. As a Redis list grows in length, retrieval becomes slower (O(n) in big O notation).

Big O Notation

In computer science, big O notation is a way of categorizing algorithms by complexity. Seeing an algorithm’s description in big O notation gives you a quick idea of the performance ramifications of using the algorithm. If you’re new to big O, Rob Bell’s “A Beginner’s Guide to Big O Notation” provides a great overview (http://mng.bz/UJu7).

Storing and retrieving data using sets

A Redis set is an unordered group of strings. If you were creating a conference-planning application, for example, you might use a set to store attendee information. Sets have better retrieval performance than lists. The time it takes to retrieve a set member is independent of the size of the set (O(1) in big O notation).

Sets must contain unique elements—if you try to store two identical values in a set, the second attempt to store the value will be ignored.

The following code illustrates the storage and retrieval of IP addresses. The sadd Redis command attempts to add a value to the set, and the smembers command returns stored values. In this example, we’ve twice attempted to add the IP address 204.10.37.96, but as you can see, when we display the set members, the address has only been stored once:

client.sadd('ip_addresses', '204.10.37.96', redis.print);

client.sadd('ip_addresses', '72.32.231.8', redis.print);

client.smembers('ip_addresses', function(err, members) {

if (err) throw err;

console.log(members);

});

Delivering data with channels

It’s worth noting that Redis goes beyond the traditional role of data store by providing channels. Channels are data-delivery mechanisms that provide publish/subscribe functionality, as shown conceptually in figure 5.7. They’re useful for chat and gaming applications.

Figure 5.7. Redis channels provide an easy solution to a common data-delivery scenario.

A Redis client can either subscribe or publish to any given channel. Subscribing to a channel means you get any message sent to the channel. Publishing a message to a channel sends the message to all clients subscribed to that channel.

Listing 5.19 shows an example of how Redis’s publish/subscribe functionality can be used to implement a TCP/IP chat server.

Listing 5.19. A simple chat server implemented with Redis pub/sub functionality

Maximizing node_redis performance

When you’re deploying a Node.js application that uses the node_redis API to production, you may want to consider using Pieter Noordhuis’s hiredis module (https://github.com/pietern/hiredis-node). This module will speed up Redis performance significantly because it takes advantage of the official hiredis C library. The node_redis API will automatically use hiredis, if it’s installed, instead of the JavaScript implementation.

You can install hiredis using the following npm command:

npm install hiredis

Note that because the hiredis library compiles from C code, and Node’s internal APIs change occasionally, you may have to recompile hiredis when upgrading Node.js. Use the following npm command to rebuild hiredis:

npm rebuild hiredis

Now that we’ve looked at Redis, which excels at high-performance handling of data primitives, let’s look at a more generally useful database: MongoDB.

5.3.2. MongoDB

MongoDB is a general-purpose nonrelational database. It’s used for the same sorts of applications that you’d use an RDBMS for.

A MongoDB database stores documents in collections. Documents in a collection, as shown in figure 5.8, need not share the same schema—each document could conceivably have a different schema. This makes MongoDB more flexible than conventional RDBMSs, as you don’t have to worry about predefining schemas.

Figure 5.8. Each item in a MongoDB collection can have a completely different schema.

The most mature, actively maintained MongoDB API module is Christian Amor Kvalheim’s node-mongodb-native (https://github.com/mongodb/node-mongodb-native). You can install this module using the following npm command. Windows users, note that the installation requires msbuild.exe, which is installed by Microsoft Visual Studio:

npm install mongodb

Connecting to MongoDB

After installing node-mongodb-native and running your MongoDB server, use the following code to establish a server connection:

var mongodb = require('mongodb');

var server = new mongodb.Server('127.0.0.1', 27017, {});

var client = new mongodb.Db('mydatabase', server, {w: 1});

Accessing a MongoDB collection

The following snippet shows how you can access a collection once the database connection is open. If at any time after completing your database operations you want to close your MongoDB connection, execute client.close():

Inserting a document into a collection

The following code inserts a document into a collection and prints its unique document ID:

Safe Mode

Specifying {safe: true} in a query indicates that you want the database operation to complete before executing the callback. If your callback logic is in any way dependent on the database operation being complete, you’ll want to use this option. If your callback logic isn’t dependent, you can get away with using {} instead.

Although you can use console.log to display documents[0]._id as a string, it’s not actually a string. Document identifiers from MongoDB are encoded in binary JSON (BSON). BSON is a data interchange format primarily used by MongoDB instead of JSON to move data to and from the MongoDB server. In most cases, it’s more space efficient than JSON and can be parsed more quickly. Taking less space and being easier to scan means database interactions end up being faster.

Updating data using document IDs

BSON document identifiers can be used to update data. The following listing shows how to update a document using its ID.

Listing 5.20. Updating a MongoDB document

var _id = new client.bson_serializer

.ObjectID('4e650d344ac74b5a01000001');

collection.update(

{_id: _id},

{$set: {"title": "I ate too much cake"}},

{safe: true},

function(err) {

if (err) throw err;

}

);

Searching for documents

To search for documents in MongoDB, use the find method. The following example shows logic that will display all items in a collection with a title of “I like cake”:

collection.find({"title": "I like cake"}).toArray(

function(err, results) {

if (err) throw err;

console.log(results);

}

);

Deleting documents

Want to delete something? You can delete a record by referencing its internal ID (or any other criteria) using code similar to the following:

var _id = new client

.bson_serializer

.ObjectID('4e6513f0730d319501000001');

collection.remove({_id: _id}, {safe: true}, function(err) {

if (err) throw err;

});

MongoDB is a powerful database, and node-mongodb-native offers high-performance access to it, but you may want to use an API that abstracts database access, handling the details for you in the background. This allows you to develop faster, while maintaining fewer lines of code. The most popular of these APIs is called Mongoose.

5.3.3. Mongoose

LearnBoost’s Mongoose is a Node module that makes using MongoDB painless. Mongoose’s models (in model-view-controller parlance) provide an interface to MongoDB collections as well as additional useful functionality, such as schema hierarchies, middleware, and validation. A schema hierarchy allows the association of one model with another, enabling, for example, a blog post to contain associated comments. Middleware allows the transformation of data or the triggering of logic during model data operations, making possible tasks like the automatic pruning of child data when a parent is removed. Mongoose’s validation support lets you determine what data is acceptable at the schema level, rather than having to manually deal with it.

Although we’ll focus solely on the basic use of Mongoose as a data store, if you decide to use Mongoose in your application, you’ll definitely benefit from reading its online documentation and learning about all it has to offer (http://mongoosejs.com/).

In this section, we’ll walk you through the basics of Mongoose, including how to do the following:

· Open and close a MongoDB connection

· Register a schema

· Add a task

· Search for a document

· Update a document

· Remove a document

First, you can install Mongoose via npm using the following command:

npm install mongoose

Opening and closing a connection

Once you’ve installed Mongoose and have started your MongoDB server, the following example code will establish a MongoDB connection, in this case to a database called tasks:

var mongoose = require('mongoose');

var db = mongoose.connect('mongodb://localhost/tasks');

If at any time in your application you want to terminate your Mongoose-created connection, the following code will close it:

mongoose.disconnect();

Registering a schema

When managing data using Mongoose, you’ll need to register a schema. The following code shows the registration of a schema for tasks:

var Schema = mongoose.Schema;

var Tasks = new Schema({

project: String,

description: String

});

mongoose.model('Task', Tasks);

Mongoose schemas are powerful. In addition to defining data structures, they also allow you to set defaults, process input, and enforce validation. For more on Mongoose schema definition, see Mongoose’s online documentation (http://mongoosejs.com/docs/schematypes.html).

Adding a task

Once a schema is registered, you can access it and put Mongoose to work. The following code shows how to add a task using a model:

var Task = mongoose.model('Task');

var task = new Task();

task.project = 'Bikeshed';

task.description = 'Paint the bikeshed red.';

task.save(function(err) {

if (err) throw err;

console.log('Task saved.');

});

Searching for a document

Searching with Mongoose is similarly easy. The Task model’s find method allows you to find all documents, or to select specific documents using a JavaScript object to specify your filtering criteria. The following example code searches for tasks associated with a specific project and outputs each task’s unique ID and description:

var Task = mongoose.model('Task');

Task.find({'project': 'Bikeshed'}, function(err, tasks) {

for (var i = 0; i < tasks.length; i++) {

console.log('ID:' + tasks[i]._id);

console.log(tasks[i].description);

}

});

Updating a document

Although it’s possible to use a model’s find method to zero in on a document that you can subsequently change and save, Mongoose models also have an update method expressly for this purpose. The following snippet shows how you can update a document using Mongoose:

Removing a document

It’s easy to remove a document in Mongoose once you’ve retrieved it. You can retrieve and remove a document using its internal ID (or any other criteria, if you use the find method instead of findById) using code similar to the following:

var Task = mongoose.model('Task');

Task.findById('4e65b3dce1592f7d08000001', function(err, task) {

task.remove();

});

You’ll find much to explore in Mongoose. It’s an all-around great tool that enables you to pair the flexibility and performance of MongoDB with the ease of use traditionally associated with relational database management systems.

5.4. Summary

Now that you’ve gained a healthy understanding of data storage technologies, you have the basic knowledge you need to deal with common application data storage scenarios.

If you’re creating multiuser web applications, you’ll most likely use a DBMS of some sort. If you prefer the SQL-based way of doing things, MySQL and PostgreSQL are well-supported RDBMSs. If you find SQL limiting in terms of performance or flexibility, Redis and MongoDB are rock-solid options. MongoDB is a great general-purpose DBMS, whereas Redis excels in dealing with frequently changing, less complex data.

If you don’t need the bells and whistles of a full-blown DBMS and want to avoid the hassle of setting one up, you have several options. If speed and performance are key, and you don’t care about data persisting beyond application restarts, in-memory storage may be a good fit. If you aren’t concerned about performance and don’t need to do complex queries on your data—as with a typical command-line application—storing data in files may suit your needs.

Don’t be afraid to use more than one type of storage mechanism in an application. If you were building a content management system, for example, you might store web application configuration options using files, stories using MongoDB, and user-contributed story-ranking data using Redis. How you handle persistence is limited only by your imagination.

With the basics of web application development and data persistence under your belt, you’ve learned the fundamentals you need to create simple web applications. You’re now ready to move on to testing, an important skill you’ll need to ensure that what you code today works tomorrow.

All materials on the site are licensed Creative Commons Attribution-Sharealike 3.0 Unported CC BY-SA 3.0 & GNU Free Documentation License (GFDL)

If you are the copyright holder of any material contained on our site and intend to remove it, please contact our site administrator for approval.