Neo4j in Action (2015)

Preface

Graph issues are some of the most common problems in computer programming, and have been since the early days. Back then, hierarchy trees, access control lists, and mapping tables were built, typically, in code. When it came time to store the graphs, programmers transformed them into tables and used the relational database as underlying storage. We had to do a lot of plumbing to save the most basic graph data, but there was no other option—until graph databases, with Neo4j leading the parade, entered the scene.

Neo4j started its journey more than a decade ago, with the first official version, the 1.0 release, coming out in 2010, and the more recent 2.0 release coming out in December 2013. Most of us have been involved with actively using Neo4j and watching it evolve over this period on various projects for clients. The hype and excitement around graph databases, and Neo4j in particular, have been gaining more and more traction, with many people and companies realizing that Neo4j is uniquely placed in the graph database space to provide a robust and solid solution capable of solving complex and challenging, interconnected business problems.

It is with great pleasure that we tried to distill much of this real-world experience and knowledge into this hands-on book in a way that lays solid foundations and then builds on those to help you get up and running with Neo4j as soon as possible.

About this Book

Neo4j as a graph database has evolved quite a bit over the last decade or so. Starting as a database operating purely within the Java-based world, it has since evolved to cater to many languages and frameworks.

When we first embarked on writing this book, it was targeting the then-latest 1.9 release. The Neo4j 2.0 release was a real game changer, introducing new features, including the much-desired (built-in) concept of node labels. Though there is still some 1.x related material, you will be pleased to know that the content of this book has indeed been updated to cover 2.0 features, with all the associated sample code and examples having been specifically validated against the 2.0.1 release. No doubt there will be later releases by the time this book hits the printing press; however, the deliberate step-by-step approach taken by Neo4j in Action should provide you with the core foundational knowledge and skills necessary to learn about, and get up and running with, any Neo4j 2.0+ release—subject to any unforeseen breaking changes introduced, of course.

With Java being the language used to give birth to Neo4j, we decided to use Java as the primary language for demonstrating the various techniques and approaches in this inaugural Neo4j book. Besides the fact that this was previously one of the only options available, the language choice has also afforded us the ability to include chapters and sections detailing how you can explicitly take advantage of some of the native core Neo4j APIs for performing certain tasks. This certainly has major benefits for Java-based clients. However, if we were starting from scratch again, more time and attention would probably have been given to Cypher. Using Cypher where possible to interact with the graph promotes easier integration regardless of the client—Java, shell, or something else. In any case, we leave this for a potential second edition as we still believe there are many core fundamental concepts and approaches in this book that need to be conveyed first. The book assumes the latest version of JDK 7 is being used. Additionally, the sample code that accompanies this book makes use of Maven as our build dependency tool of choice. For those unfamiliar with Maven, we provide a quick getting started section in appendix B to help get you up and running.

It should be noted that this is not meant to be a reference book; it would be a lot longer if that were the case. It does aim to arm you with enough knowledge and understanding in each area to be relatively proficient before moving on. Links are provided to appropriate content where you can get more information should you want to explore any specific area further.

Roadmap

This book is divided into three parts. Part 1 is an introduction and covers data modeling, starting development with Neo4j, and the power of transversals. Part 2 takes on application development and covers Cypher and Spring. Part 3 covers Neo4j in production.

Chapter 1 introduces graph database concepts in general, including looking at some of Neo4j’s key aspects and the typical use cases which it is well suited to address. The chapter goes on to address some of the questions about where Neo4j fits within the so-called NoSQL space, including comparing it with more traditional relational databases.

Chapter 2 examines how and why we model data in Neo4j, including common approaches to data modeling scenarios in a graph database. Examples from a variety of domains are also presented, giving you a sense for just how flexible data modeling in Neo4j can be.

Chapter 3 is where we really start getting our hands dirty. This chapter introduces you to the Neo4j Core Java API, where you are taken through the steps of creating a graph representing a social network of users and the movies they like. This chapter covers creating and connecting nodes and capturing additional information against these nodes. It also looks at strategies for differentiating between types of nodes, including the use of labels.

Chapter 4 builds on this social network domain, exploring the core API in more depth and focusing specifically on traversals—in this case the Neo4j Traversal API—as a powerful way of querying graph data.

Chapter 5 introduces the indexing strategies available in Neo4j. Creating and traversing graph data is great, but you will need a strategy for finding the starting point, or points, in your graph from which to begin. This chapter covers these options. You will begin by looking at the manual (legacy) indexing options, before moving on to the built-in indexing options available from Neo4j 2.0 onward.

Chapter 6 introduces Cypher, Neo4j’s human-readable query language. The nature of Cypher is explained, its basic syntax for graph operations is demonstrated, and advanced features that can be useful in day-to-day development and maintenance of Neo4j databases are also covered.

Chapter 7 focuses on one of the unique selling points of Neo4j in the NoSQL space—the fact that it fully supports ACID-based transactions, providing examples of different uses as well as taking a more in-depth look at certain aspects.

While chapter 4 provides your initial foray into the world of traversals, writing efficient traversals is the key to successfully querying graph data. In chapter 8 we dig deeper into the inner workings of the Traversal API so you can learn how to solve the most complex graph problems in an efficient manner with the native API.

Chapter 9 looks at Spring Data Neo4j library (SDN), the object graph-mapping library. Though not an official Neo4j offering, this chapter focuses on demonstrating how the Neo4j-specific open source framework can be used as a library to provide a robust and seamless mapping experience between a rich object graph model and data backed by Neo4j. Once again our trusty social network of users and their favorite movies is used to demonstrate these points.

Chapter 10 explores the two main usage modes in Neo4j, namely embedded and server. Much of the book has focused on demonstrating core concepts using the embedded mode. This chapter additionally introduces the server mode, which can be used by just about any client, and explores each mode in a bit more depth, weighing the pros and cons of each, including how to get the most out of your server if you choose to use this option.

Chapter 11 finishes off with an overview of the high-level Neo4j architecture. Framed with this knowledge, the chapter explores what should be considered when you want to take Neo4j to production, including scaling and other requirements for making Neo4j highly available, finishing off with instructions for how to back up and restore your database should it be required.

The four appendixes guide you through installing, setting up, and running Neo4j, Maven, and SDN, and offer guidance for seeking more help.