Preface - Neo4j Essentials (2015)

Neo4j Essentials (2015)

Preface

Understanding the data is the key to all systems and depicting analytical models on that data, which is built on the paradigm of real-world entities and relationships, are the success stories of large-scale enterprise systems.

Architects/developers and data scientists have been struggling hard to derive the real-world models and relationships from discrete and disparate systems, which consist of structured / un-structured data.

We would all agree that without any relationships, data is useless. If we cannot derive relationships between entities, then it will be of little or no use. After all, it's all about the connections in the data.

Coming from the relational background, our first choice would be to model this into relational models and use RDBMS.

No doubt we can use RDBMS, but relational models such as RDBMS focus more on entities and less on relationships between entities. There are certain disadvantages in using RDBMS for modeling the above data structure:

· Extensive Joins: While RDBMS can be made to replicate graphical ones, the extensive use of joins would make the technique quite slow.

· It is difficult to query and to handle complex queries and relationships where entities are linked deeply and at multiple levels.

Relationships are dynamic and evolving, which makes it more difficult to create models.

Let's consider the example of social networks:

Preface

Social networks are highly complex to define. They are agile and evolving. Considering the preceding example where John is linked to Kevin because of the fact that they work in the same company will be changed as soon as either of them switch companies. Moreover, their relationship IS_SUPREVISOR is dependent upon the company dynamics and is not static. The same is the case with relationship IS_FRIEND, which can also change at any time.

The following are some more use cases:

· Model and store 7 billion people objects and 3 billion nonpeople objects to provide an earth-view drill down from the planet to the sidewalk

· Network management

· Genealogy

· Public transport links and road maps

All the preceding examples state a need for a generic data structure, which can elegantly represent any kind of data and at the same time is it easy to query in a highly accessible manner.

Let's introduce a different form of database that focuses on relationships between the entities rather than the entities itself—Neo4j.

Neo4j as NoSQL database that leverages the theory of graphs. Though there is no single definition of graphs, here is the simplest one, which helps to understand the theory of graphs (http://en.wikipedia.org/wiki/Graph_(abstract_data_type)):

"A graph data structure consists of a finite (and possibly mutable) set of nodes or vertices, together with a set of ordered pairs of these nodes (or, in some cases, a set of unordered pairs). These pairs are known as edges or arcs. As in mathematics, an edge (x,y) is said to point or go from x to y. The nodes may be part of the graph structure, or may be external entities represented by integer indices or references."

Neo4j is an open source graph database implemented in Java.

Its first version (1.0) was released in February 2010, and since then it has never stopped. It is amazing to see the pace at which Neo4j has evolved over the years. At the time of writing this book, the current stable version is 2.1.5, released in September 2014.

Let's move forward and jump into the nitty-gritties of Neo4j. In the subsequent chapters, we will cover the various aspects of Neo4j dealing with installation, data modeling and designing, querying, performance tuning, security, extensions, and many more.

What this book covers

Chapter 1, Installation and the First Query, details the installation process of Neo4j. It briefly explains the function of every tool installed together with Neo4j (shell, server, and browser). More importantly, the chapter will introduce and help developers to get familiar with the Neo4j browser and run the first basic Cypher query using different methods exposed by Neo4j (shell, Java, browser, and REST).

Chapter 2, Ready for Take Off, details the possible options at hand for integrating Neo4j with Business Intelligence (BI) tools and also inserting bulk data in Neo4j from various data sources such as CSV and Excel. It also talks about the usage of Java and REST APIs exposed by Neo4j for performing bulk data import and indexing operations and ends with the various strategies/parameters available and recommended for optimizing Neo4j.

Chapter 3, Pattern Matching in Neo4j, talks briefly about the data modeling techniques for graph databases such as Neo4j and then describes the usefulness of pattern and pattern matching for querying the data from the Neo4j database along with its syntactical details.

This will then dive into the syntactic details of the read-only Cypher queries and its new indexing capabilities.

Chapter 4, Querying and Structuring Data, starts with writing data in Neo4j using patterns, then it talks about structuring the data by applying various constraints, schema, and indexes on the underlying data in Neo4j, and ends with the strategies and recommendations for optimizing the Cypher queries.

Chapter 5, Neo4j from Java, talks about the available strategies for integrating Neo4j and Java and the applicability of these integration strategies in various real-world scenarios. It briefly talks about the integration and usage with various other open source unit testing APIs, and then discusses the Java packages / APIs / graph algorithms exposed by Neo4j along with examples of graph traversals and searching.

Chapter 6, Spring Data and Neo4j, talks about the best of both worlds, that is, Spring and Neo4j. It starts with the philosophy of Spring Data, key concepts, and then explores the possibilities offered by Spring and Spring Data Neo4j with appropriate examples.

Chapter 7, Neo4j Deployment, dives into the principles/strategies for deploying Neo4j in the distributed environment and recommended deployments for varied needs of scaling reads, writes, or both. It also talks about the monitoring parameters and APIs available in Neo4j.

Chapter8, Neo4j Security and Extension, explains various ways to extend and customize Neo4j and provide new functionalities or secure our Neo4j deployments by implementing custom plugins or extensions.

What you need for this book

Readers should have some basic knowledge and understanding of any graph or NoSQL databases. It would also be good to have some prior knowledge of Java.

Who this book is for

This book is for expert programmers (especially those experienced in a graph-based or NoSQL-based database) who now want to learn Neo4j at a fast pace.

If you are reading this book, then you probably already have sufficient knowledge about the graph databases and you will appreciate their contribution in the complex world of relationships. This book will provide in-depth details in a fast-paced manner for learning and starting development with Neo4j. We will talk about various aspects of Neo4j—data structure, query, integrations, tools, performance parameters, deployments, and so on.

Conventions

In this book, you will find a number of text styles that distinguish between different kinds of information. Here are some examples of these styles and an explanation of their meaning.

Code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles are shown as follows: "The addData() method is annotated with @Transactional, which will inform the Spring framework to start a new transaction while executing this method."

A block of code is set as follows:

@Transactional

public void addData() {

// Create Movie

Movie rocky = new Movie();

rocky.setTitle("Rocky");

rocky.setYear(1976);

rocky.setId(rocky.getTitle() +"-"+String.valueOf(rocky.getYear()));

}

Any command-line input or output is written as follows:

mvn install

mvn exec:java -Dexec.cleanupDaemonThreads=false -Dexec.mainClass="org.neo4j.spring.samples.MainClass"

New terms and important words are shown in bold. Words that you see on the screen, for example, in menus or dialog boxes, appear in the text like this: "Select the Local Process option and then select org.neo4j.server.Bootstrapper and click on Connect."

Note

Warnings or important notes appear in a box like this.

Tip

Tips and tricks appear like this.