# Introduction to Neo4j

Neo4j enables developers to create applications that are best architected as graph-powered systems that are built upon the rich connectedness of data.

At the end of this module, you will be able to:

• Describe graph components.

• Describe what a graph database is.

• Describe the Neo4j Graph Platform:

• Features of the Neo4j DBMS.

• Tools used by developers.

• Libraries, drivers, integrations used with Neo4j.

## Graph concepts

We will begin by defining the components that make up a graph in Neo4j.

Here we see two nodes. These two nodes are related to each other which is why we have a line or link connecting them.

### Nodes

First, let’s define a node in Neo4j. Nodes are the main objects of a graph.

We say this not because nodes contain more data than relationships—​although they usually do—​but because it is perfectly acceptable to have nodes without any relationships. Conversely, it is not possible to have a relationship without any nodes.

Here we see that each node has a different color, perhaps to differentiate its type from another node. For example, the blue node represents a person and that person node has a name of Jane. The green node represents a vehicle and that vehicle node has a type of car.

Mathematically, a node is defined as “a point where two or more edges (relationships) meet.” This is not at all intuitive! However, it provides important insight into how graphs are used. Since the entire point of modeling data as a graph is to traverse a chain of linked nodes, one useful way of thinking about nodes is that they are waypoints along the traversal route. They contain information you need to decide which links are good ones to follow, and which can be ignored. Relationships are those links.

### Relationships

Next, let’s define a relationship in Neo4j.

Relationships are the links between nodes. In their simplest form, relationships need not contain any information other than which two nodes they link. But they can (and in Neo4j always do) contain much more.

For example, relationships can include a direction. "Jane owns a car" is a very different relationship than "A car owns Jane", and the direction of the relationship between Jane and her car captures this. Relationship directions are required in Neo4j.

In addition, relationships can contain any amount of additional data, such as weight or relationship type. For example, the relationship between Jane and her car is of the type OWNS. Like direction, type is a required element of Neo4j relationships, although all other data is optional.

In Neo4j, the purpose of the data attached to a relationship is to reduce the need to "gather and inspect", where you follow all relationships from a given node, then scan the target node data to determine which hops were “good” ones and then discard the rest. This does not scale well. Instead, we can use the data attached to a relationship to preemptively filter which links to follow during traversal.

## The Neo4j property graph model

A Neo4j database (graph) is composed of two elements: nodes and relationships. Neo4j allows you to identify nodes and relationships and add properties to them to further define the data in the graph.

### Nodes

Each node represents an entity (a person, place, thing, category or other piece of data). With Neo4j, nodes can have labels that are used to define types for nodes. For example, a Location node is a node with the label Location. That same node can also have a label, Residence. Another Location node can also have a label, Business. A label can be used to group nodes of the same type. For example, you may want to retrieve all of the Business nodes.

### Relationships

Each relationship represents how two nodes are connected. For example, the two nodes Person and Location, might have the relationship LIVES_AT pointing from a Person node to Location node. A relationship represents the verb or action between two entities. The MARRIED relationship is defined from one Person node to another Person node. Although the relationship is created as directional, it can be queried in a non-directional manner. That is, you can query if two Person nodes have a MARRIED relationship, regardless of the direction of the relationship. For some data models, the direction of the relationship is significant. For example, in Facebook, using the IS_FRIEND relationship is used to indicate which Person invited the other Person to be a friend.

### Property graph model

The Neo4j database is a property graph. You can add properties to nodes and relationships to further enrich the graph model.

This enables you to closely align data and connections in the graph to your real-world application. For example, a Person node might have a property, name and a Location node might have a property, address. In addition, a relationship, MARRIED , might have a property, since.

## What is a graph database?

A graph database is an online database management system with Create, Read, Update, and Delete (CRUD) operations working on a graph data model.

Graph databases are often used as part of an Enterprise online transaction processing (OLTP) system. Accordingly, they are normally optimized for transactional performance, and engineered with transactional integrity and operational availability in mind.

Unlike other databases, relationships take first priority in graph databases. This means your application doesn’t have to infer data connections using foreign keys or out-of-band processing, such as MapReduce. By assembling the simple abstractions of nodes and relationships into connected structures, graph databases enable us to build sophisticated models that map closely to our problem domain. Data scientists can use the relationships in a data model to help enterprises make important business decisions.

### Neo4j is a native graph database

In this video, you will see the benefits of using the native graph database, Neo4j.

### Querying a graph is intuitive in Cypher

If you have ever tried to write a SQL statement with a large number of JOINs, you know that you quickly lose sight of what the query actually does, due to all tables that are required to connect the data.

For example, in an organizational domain, here is what a SQL statement that lists the employees in the IT Department would look like:

``````SELECT name FROM Person
LEFT JOIN Person_Department
ON Person.Id = Person_Department.PersonId
LEFT JOIN Department
ON Department.Id = Person_Department.DepartmentId
WHERE Department.name = "IT Department"``````

Cypher is the language for accessing data in the Neo4j database. In contrast to SQL, Cypher syntax stays clean and focused on domain concepts since queries are expressed visually:

``````MATCH (p:Person)<-[:EMPLOYEE]-(d:Department)
WHERE d.name = "IT Department"
RETURN p.name``````

## Neo4j Graph Platform

The Neo4j Graph Platform includes components that enable developers to create graph-enabled applications.

It is used by developers, administrators, data analysts, and data scientists to access application data. Developers create the data in the graph by either importing it into the graph or using the Cypher language to implement the data model. In addition, developers are responsible for integrating the graph with other systems and DBMS installations,. Admins manage the processes and files related to the Neo4j installation. Data scientists and data analysts typically use a combination Cypher queries, as well as use tools to analyze the data. End-users typically use applications written by developers to access the graph data.

### Components of the Neo4j Graph Platform

To better understand the Neo4j Graph Platform, you will learn about these components and the benefits they provide.

• Neo4j DBMS

• Neo4j Aura

• Neo4j Sandbox

• Neo4j Desktop

• Neo4j graph applications

• Neo4j Browser

• Neo4j Bloom

• cypher-shell

• Neo4j libraries

• Neo4j drivers

• Neo4j integration

• Neo4j administration tools

## Neo4j DBMS

The heart of the Neo4j Graph Platform is the Neo4j Database Management system (DBMS). The Neo4j DBMS includes processes and resources needed to manage a single Neo4j instance or a set of Neo4j instances that form a cluster. A Neo4j instance is a single process that runs the Neo4j server code. A Neo4j instance at a minimum contains two databases, the system database and the default database, neo4j.

The system database stores metadata about the databases for the installation, as well as security configuration. The default database (named neo4j by default) is the "user" database where you typically implement your graph data model.

## Neo4j 4.x DBMS

In Neo4j Enterprise Edition 4.0, you may have more than one "user" database.

Here we have three "user" databases that hold the application data. You specify one of the databases as the default database.

Next, you will learn about some features of Neo4j DBMS that make it different from traditional RDBMS.

One of key features that makes Neo4j graph databases different from an RDBMS is that Neo4j implements index-free adjacency.

To better understand the benefit of index-free adjacency, let’s look at how a query executes in an RDBMS. Suppose you have this table in the RDBMS:

You execute this SQL query to find the third-degree parents of the group with the ID of 3:

``````SELECT PARENT_ID
FROM GROUPS
WHERE ID = (SELECT PARENT_ID
FROM GROUPS
WHERE ID = (SELECT PARENT_ID
FROM GROUPS
WHERE ID = 3))``````

The result of this query is 1, but in order to determine this result, the query engine needed to:

1. Locate the innermost clause.

2. Build the query plan for the subclause.

3. Execute the query plan for the subclause.

4. Locate the next innermost clause.

5. Repeat Steps 2-4.

Resulting in:

• 3 planning cycles

• 3 index lookups

• 3 DB reads

#### Neo4j DBMS: Index-free adjacency (IFA)

With index-free adjacency, Neo4j stores nodes and relationships as objects that are linked to each other. Conceptually, the graph looks as follows:

These nodes and relationships are stored as follows:

Suppose we had this query in Cypher:

``````MATCH (n) <-- (:Group) <-- (:Group) <-- (:Group {id: 3})
RETURN n.id``````

Using IFA, the Neo4j query engine starts with the anchor of the query which is the Group node with the id of 3. Then it uses the links stored in the relationship and node objects to traverse the graph pattern.

To perform this query, the Neo4j query engine needed to:

• Plan the query based upon the anchor specified.

• Use an index to retrieve the anchor node.

• Follow pointers to retrieve the desired result node.

Using IFA, there are:

• Much fewer index lookups or table scans

• Reduced duplication of foreign keys

### Neo4j DBMS: ACID (Atomic, Consistent, Isolated, Durable)

Transactionality is very important for robust applications that require an ACID (atomicity, consistency, isolation, and durability) guarantees for their data. If a relationship between nodes is created, not only is the relationship created, but the nodes are updated as connected. All of these updates to the database must all succeed or fail.

### Neo4j DBMS: Clusters

Neo4j supports clusters that provide high availability, scalability for read access to the data, and failover which is important to many enterprises. Neo4j clusters are only available with Neo4j Enterprise Edition.

### Neo4j DBMS: Graph engine

The Neo4j graph engine is used to interpret Cypher statements and also executes kernel-level code to store and retrieve data, whether it is on disk, or cached in memory. The graph engine has been improved with every release of Neo4j to provide the most efficient access to an application’s graph data. There are many ways that you can tune the performance of the engine to suit your particular application needs.

## Neo4j Aura

Neo4j Aura is the simplest way to run the Neo4j DBMS in the cloud.

Completely automated and fully-managed, Neo4j Aura delivers the world’s most flexible, reliable and developer-friendly graph database as a service. With Neo4j Aura, you leave the day-to-day management of your database to the same engineers who built Neo4j, freeing you to focus on building rich graph-powered applications. Backups are done automatically for you and the database is available 24X7. In addition, the Neo4j support engineers will ensure that the database instance is always up-to-date with the latest version of Neo4j. To use Neo4j Aura, you must pay a monthly subscription fee which is based upon the size of your graph.

Once you create a Neo4j Database at the Neo4j Aura site, it will be managed by Neo4j support engineers.

## Neo4j Sandbox

The Neo4j Sandbox is way that developers can test data models with Neo4j without needing to install anything. It is a free, temporary, and cloud-based instance of a Neo4j Server with its associated graph that you can access from any Web browser. The database in a Sandbox may be blank or it may be pre-populated. It is started automatically for you when you create the Sandbox.

By default, the Neo4j Sandbox is available for three days, but you can extend it for up to 10 days. If you do not want to install Neo4j Desktop on your system, consider creating a Neo4j Sandbox. You must make sure that you extend your lease of the Sandbox, otherwise you will lose your graph and any saved Cypher scripts you have created in the Sandbox. However, you can use Neo4j Browser Sync to save Cypher scripts from your Sandbox. We recommend you use the Neo4j Desktop or Neo4j Aura for a real development project. The Sandbox is intended as a temporary environment or for learning about the features of Neo4j as well as specific graph use-cases.

## Neo4j Desktop

Neo4j Desktop is intended for developers who want to develop a Neo4j application and test it on their local machine. It is free to use. Neo4j is a UI that enables developers to create projects, each with their own Neo4j DBMS instances where they can easily add or remove graph applications and libraries for use with their own Neo4j DBMS. It includes an application called Neo4j Browser which is the UI used to access the started database using Cypher queries.

The Neo4j Desktop runs on OS X, Linux, and Windows. You can download it from our download page.

## Neo4j graph applications

Graph applications provide specific functionality to users that make their roles as developers, administrators, data scientists, or data analysts easier. Some of them are Web browser-based and some run in their own JVM. Graph applications are written by Neo4j engineers or Neo4j community members. Many of the graph applications supported by Neo4j are the work of Neo4j Labs. Some graph applications are supported by Neo4j and some are not, so you must be aware of the type of technical support you can receive for a particular graph application. You typically install graph applications from a Neo4j Desktop environment from https://install.graphapp.io/.

### Some Neo4j graph applications

Here are some of the graph applications that users can use:

• Neo4j Browser: UI for testing Cypher queries and visualizing the graph.

• Neo4j Bloom: A tool for exploring graphs and generating Cypher code.

• Neo4j ETL Tool: UI for connecting to a data source to import into the graph.

• Halin: Monitor your Neo4j DBMS.

• Query Log Analyzer: Analyze queries that executed on your system.

• Neo4j Cloud Tool: Tools for working with Neo4j Aura

• Graph Algos Playground: Run graph algorithms and generate code for them.

## Neo4j Browser

Neo4j Browser is a Neo4j-supported tool that enables you to access a Neo4j Database by executing Cypher statements to create or update data in the graph and to query the graph to return data. The data returned is typically visualized as nodes and relationships in a graph, but can also be displayed as tables. In addition to executing Cypher statements, you can execute a number of system calls that are related to the database being accessed by the Browser. For example, you can retrieve the list of queries that are currently running in the server.

### Using Neo4j Browser

There are two ways that you can use Neo4j Browser functionality:

1. UI started by Neo4j Desktop.

2. Web browser interface.

## Neo4j Bloom

Neo4j Bloom is a Neo4j-supported graph application where you can experience:

• Visual presentation of your graph data tangibly reveals non-obvious connections.

• Easy-to-understand visualizations explain data connectedness to every colleague.

• Codeless search tools let you quickly explore your data without technical expertise.

• Browsing tools make it easy for you to discover new insights from your data.

Visit the Bloom page to learn more about Neo4j Bloom.

You can use Bloom from Neo4j Desktop to access a local database. Another way that you can try Neo4j Bloom is to create a Neo4j Bloom Sandbox that you can use for up to 10 days.

## cypher-shell

`cypher-shell` is part of the Neo4j installation and is located in the bin directory. It is a command-line tool that you can use to connect to a Neo4j DBMS instance and run Cypher statements against the database.

It is useful if you want to create scripts that automatically run against the database(s). It is commonly used for advanced query tuning and for administration.

Even if you have not installed Neo4j, you can download and install cypher-shell as a stand-alone application if you want to connect to a running database and execute Cypher queries.

## Libraries

Just as there are graph applications written by Neo4j engineers and Neo4j community members, there are libraries that can be incorporated into your application. A library is also called a plug-in as it is used to extend what can be done in Cypher. Some libraries are available in Neo4j Desktop, while others you must download and install.

In early 2020, some functions and procedures from the Graph Algorithms Library will be officially supported by Neo4j as the Graph Data Science Library (GDSL). Before that, support for this library has come from and will continue to come from Neo4j Labs for the algorithms that are not officially supported by Neo4j.

One of the most popular libraries that is used by most developers is Awesome Procedures of Cypher (APOC). This library has close to 500 procedures and functions that extend Cypher is ways that make programming in Cypher much easier for complex tasks.

Another library that also comes with Neo4j Desktop is GraphQL. GraphQL is an open-source query language for querying parts of a graph. It is not as flexible or powerful as Cypher, but it is used by some applications.

A very popular library for graph visualization is neoviz.js, another project in Neo4j Labs.

## Drivers and language support

Because Neo4j is open source, one can delve into the details of how the Neo4j Database is accessed, but most developers simply use Neo4j without needing a deeper understanding of the underlying code. Neo4j provides a full stack that implements all levels of access to the database and clustering layer where developers can use our published APIs. The language used for querying the Neo4j database is Cypher, an open source language.

In addition, Neo4j supports Java, JavaScript, Python, C#, and Go drivers out of the box that use Neo4j’s bolt protocol for binary access to the database layer. Bolt is an efficient binary protocol that compresses data sent over the wire as well as encrypting the data. For example, one can write a Java application that uses the Bolt driver to access the Neo4j database, and the application may use other packages that allow data integration between Neo4j and other data stores or uses as common framework such as spring. You download drivers from the Neo4j driver download page.

It is also possible to develop server-side extensions in Java that access the data in the database directly without using Cypher. The Neo4j community has developed drivers for a number of languages including Ruby, PHP, and R.

Developers can also extend the functionality of Neo4j by creating user defined functions and procedures that are callable from Cypher.

## Neo4j integration

Neo4j has integrations with many systems in the IOT ecosystem. Neo4j can be part of a system that uses GRANDstack, Kettle, Docker, and many others. How your organization integrates Neo4j into a larger system will depend on how you intend to use Neo4j. Neo4j engineers and Community members have worked through some of the challenges of integration and their discussions and work can be found on the Neo4j User slack channel, the Neo4j online forum, stack overflow, and on Github.

One Neo4j-supported integration that developers can download enables data to be streamed to/from Kafka.

Developers and administrators use command-line tools for managing the Neo4j DBMS. The main tools used that are part of the Neo4j installation (located in the bin directory) include:

• neo4j used to start, stop and retrieve the status of the Neo4j DBMS instance.

• neo4j-admin used to create, copy, remove, backup, restore and perform other administrative tasks.

• cypher-shell used to create, start, stop, drop databases and indexes and to and retrieve data from a user database.

## Check your understanding

### Question 1

What are some of the benefits provided by the Neo4j DBMS?

Select the correct answers.

• Clustering

• ACID

• Optimized graph engine

### Question 2

As an administrator, what language do you use to retrieve data from a Neo4j database?

Select the correct answer.

• Cypher

• SQL

• APOC

• GraphQL

### Question 3

What are some of the language drivers that come with Neo4j out-of-the-box that developers might use?

Select the correct answers.

• Java

• Ruby

• Python

• JavaScript

## Summary

You can now:

• Describe graph components.

• Describe what a graph database is.

• Describe the Neo4j Graph Platform:

• Features of the Neo4j DBMS.

• Tools used by developers.

• Libraries, drivers, integrations used with Neo4j.