Online Course Introduction to Neo4j 4.0 Neo4j is a Graph Database The Neo4j Graph Platform Introduction to Cypher Using WHERE to Filter Queries Working with Patterns in Queries Working with Cypher Data Controlling the Query Chain Controlling Results Returned Creating… Read more →

Neo4j is a Graph Database

About this module

Neo4j enables developers to create applications that are best architected as graph-powered systems that are built upon the rich connectedness of data.

At the end of this module, you should be able to:

  • Describe what a graph is.
  • Describe what a graph database is.
  • Describe some common use cases for using a graph database.
  • Describe the Neo4j property graph model.
  • Whiteboard a graph data model.

What is a graph?

A graph is a collection of objects, each of which has links with the other objects.

Graphs are a useful way to model data for the purposes of mathematical analysis, and they have been around for quite a long time. The first use of graphs was by Leonhard Euler in 1735; he used graph theory to prove that the famous Seven Bridges of Konigsberg problem had no solution.

SevenBridges

Konigsberg, Prussia (now Kaliningrad, Russia) is a city spanning the Pregel River. It includes two large islands, which are connected to one another and to the river banks by seven bridges. A map of this arrangement is pictured here.

The Seven Bridges problem is this: Is there any possible route through the city that crosses every bridge exactly once? Some pen-and-paper experimentation will quickly suggest that there is no such route. However, PROVING that there is no solution is much more difficult; Euler invented Graph Theory as a tool to aid this proof. The graph representation of the problem is on the right: four land masses, each represented by a node, and seven bridges, each represented by a relationship.

Euler observed the following:

  • The route taken within each landmass is irrelevant. Only the sequence of bridges crossed matters.
  • Whenever you enter a land mass via a bridge, you must leave it by a different bridge.
    • …​except at the start (where you leave but do not enter) and at the end (where you enter but do not leave). *Thus, for the problem to be solvable, every land mass (except the start and end) must be touched by an even number of bridges.

Based on the above observations, the problem has no solution, because all four of the land masses are touched by an odd number of bridges.

Anything can be a graph

If graphs are an abstract mathematical tool, why do we care about them?

We care because any dataset can be represented as a graph. Any structure or concept, no matter how simple or complex, can be broken down into a set constituent parts that have some relationship to one another.

For example, the Internet can be modeled as a network of computers.

TheInternetAsAGraph

At the other extreme of complexity, a single molecule can be represented as a graph of its constituent atoms—​or an atom represented as a graph of subatomic particles.

AWaterMoleculeAsAGraph

And, as Euler showed, there are some clear benefits to representing data in this way. We will discuss many of these as we go. But mainly, graphs are uniquely useful when answering a question that involves following a path along a chain of related items.

You will frequently hear Neo4j people refer to such questions as graph problems—​and they are surprisingly common in the real world.

Graph concepts – nodes and relationships

We will begin by defining the components that make up a graph in Neo4j. Note that these are the Neo4j terms for these objects; the original mathematical terms are Vertex and Edge. We’ve chosen our nomenclature because it more intuitively describes what the objects represent.

Here we see two nodes. These two nodes are related to each other which is why we have a line or link connecting them.

GraphComponents

Nodes

First, let’s define a node in Neo4j. Nodes are the main objects of a graph.

We say this not because nodes contain more data than relationships—​although they usually do—​but because it is perfectly acceptable to have nodes without any relationships. Conversely, is is not possible to have a relationship without any nodes.

Here we see that each node has a different color, perhaps to differentiate its type from another node. For example, the blue node represents a person and that person node has a name of Jane. The green node represents a vehicle and that vehicle node has a type of car.

GraphConcepts-Nodes

Mathematically, a node is defined as “a point where two or more edges (relationships) meet.” This is not at all intuitive! However, it provides important insight into how graphs are used. Since the entire point of modeling data as a graph is to traverse a chain of linked nodes, one useful way of thinking about nodes is that they are waypoints along the traversal route. They contain information and you need to decide which links are good ones to follow, and which should be ignored. Relationships are those links.

Relationships

Next, let’s define a relationship in Neo4j.

Relationships are the links between nodes. In their simplest form, relationships need not contain any information other than which two nodes they link. But they can (and in Neo4j always do) contain much more.

GraphConcepts-Relationships

For example, relationships can include a direction. “Jane owns a car” is a very different relationship than “A car owns Jane”, and the direction of the relationship between Jane and her car captures this. Relationship directions are required in Neo4j.

In addition, relationships can contain any amount of additional data, such as weight or relationship type. For example, the relationship between Jane and her car is of the type OWNS. Like direction, type is a required element of Neo4j relationships, although all other data is optional.

In Neo4j, the purpose of the data attached to a relationship is to reduce the need to “gather and inspect”, where you follow all relationships from a given node, then scan the target node data to determine which hops were “good” ones and then discard the rest. This does not scale well. Instead, we can use the data attached to a relationship to preemptively filter which links to follow during traversal.

Traversing a graph

As mentioned before, the main point of graphs is to follow a chain of nodes and relationships; the process of finding such routes is the definition of traversal. There are many ways a graph can be traversed and there are graph algorithms that perform traversals of all types from random, to shortest path, to longest path.

In Neo4j traversal is optimized since that is how one performs queries in the graph.

During traversal of a Neo4j graph, nodes may be visited more than once, but a relationship is only traversed once. This is called a path in Neo4j.

GraphConcepts-Traversal

Relationships have direction, as well as types so you can identify which relationship you want to traverse.

In this example, although the “2” node is visited multiple times, no relationship between the nodes is traversed more than once.

What is a graph database?

GraphDatabaseInEnterprise

A graph database is an online database management system with Create, Read, Update, and Delete (CRUD) operations working on a graph data model.

Graph databases are often used as part of an Enterprise online transaction processing (OLTP) system. Accordingly, they are normally optimized for transactional performance, and engineered with transactional integrity and operational availability in mind.

Unlike other databases, relationships take first priority in graph databases. This means your application doesn’t have to infer data connections using foreign keys or out-of-band processing, such as MapReduce. By assembling the simple abstractions of nodes and relationships into connected structures, graph databases enable us to build sophisticated models that map closely to our problem domain. Data scientists can use the relationships in a data model to help enterprises make important business decisions.

Neo4j is a native graph database

In this video, you will see the benefits of using the native graph database, Neo4j.

Querying a graph is intuitive in Cypher

If you have ever tried to write a SQL statement with a large number of JOINs, you know that you quickly lose sight of what the query actually does, due to the number of tables that are required to connect the data.

For example, in an organizational domain, here is what a SQL statement that lists the employees in the IT Department would look like:

SELECT name FROM Person
LEFT JOIN Person_Department
  ON Person.Id = Person_Department.PersonId
LEFT JOIN Department
  ON Department.Id = Person_Department.DepartmentId
WHERE Department.name = "IT Department"

In contrast, Cypher syntax stays clean and focused on domain concepts since queries are expressed visually:

MATCH (p:Person)<-[:EMPLOYEE]-(d:Department)
WHERE d.name = "IT Department"
RETURN p.name

Use cases for graph databases

The biggest value that graphs bring to an application is their ability to store relationships and connections as first-class entities.

For instance, the early adopters of graph technology built their businesses around the value of data relationships. These companies have now become industry leaders: LinkedIn, Google, Facebook, and PayPal.

There are many uses for graph databases in an enterprise. Here are some of them:

UseCases

You can read more about how customers are using Neo4j here.

The Neo4j property graph model

A graph is composed of two elements: nodes and relationships. Neo4j allows you to identify nodes and relationships and add properties to them to further define the data in the graph.

Nodes

Each node represents an entity (a person, place, thing, category or other piece of data). With Neo4j, nodes can have labels that are used to define types for nodes. For example, a Location node is a node with the label Location. That same node can also have a label, Residence. Another Location node can also have a label, Business. A label can be used to group nodes of the same type. For example, you may want to retrieve all of the Business nodes.

Nodes

Relationships

Each relationship represents how two nodes are connected. For example, the two nodes Person and Location, might have the relationship LIVES_AT pointing from a Person node to Location node. A relationship represents the verb or action between two entities. The MARRIED relationship is defined from one Person node to another Person node. Although the relationship is created as directional, it can be queried in a non-directional manner. That is, you can query if two Person nodes have a MARRIED relationship, regardless of the direction of the relationship. For some data models, the direction of the relationship is significant. For example, in Facebook, using the IS_FRIEND relationship is used to indicate which Person invited the other Person to be a friend.

Relationships

Property graph model

The Neo4j database is a property graph. You can add properties to nodes and relationships to further enrich the graph model.

Properties

This enables you to closely align data and connections in the graph to your real-world application. For example, a Person node might have a property, name and a Location node might have a property, address. In addition, a relationship, MARRIED , might have a property, since.

Training data model

For this training, we will use a Movies graph to get you started with accessing the data in a Neo4j database.

The data model is very simple, it includes entities such as:

  • Movie
  • Person/Actor/Director

that are implemented as nodes in the graph.

People are related to Movies using these relationships:

  • DIRECTED
  • ACTED_IN
  • WROTE
  • REVIEWED

People, who have reviewed movies, are related to each other using the FOLLOWS relationships.

Initial whiteboard the data model

You can use a whiteboard to better understand the graph data model. Stakeholders for an application typically communicate their ideas about how the data will be modeled by drawing bubbles for the entities and directed arrows for the relationships between the entities.

Whiteboard1

Here we see that people such as Tom Hanks and Hugo Weaving have acted in the movie, Cloud Atlas.

Implementation of the graph data model

Whiteboard2

Once the graph data model has been agreed-upon, developers then create the nodes with the relationships. When nodes and relationships are added to the graph, additional properties are set for nodes and relationships using the property graph modeling capabilities of Neo4j. Some properties for nodes should be unique such as the name of a person or the title of a movie.

Here is the partial implementation of this graph data model. Notice that some nodes not only have a Person label, but also have a Actor or Director label on them. A Person node has a name property and some Person nodes have a born property. A Movie node has title and released properties. The ACTED_IN relationship has a roles property.

As you learn Cypher in this course, you will explore all of these node and relationship types in the graph.

Exploring graphgists

As a developer just learning Neo4j and Cypher, you will use the very simple Movies graph data model. If, however, you are curious about how other developers have created graph data models for Neo4j, you can explore the graphgists page.

Graphgists1

Graphgists are developed by a wide range of developers from Neo4j and from the Neo4j user community. You can explore this site to see how other applications have modeled their data and get an appreciation for what can be done in Neo4j.

Example graphgist

For example, here is the Exploring a Conference graphgist that has a description of the model, how to load the sample data, and how to query the data.

Graphgists2

As you gain experience with Cypher and think about modeling your application, you should check out some of these graphgists for ideas for how to model your data. We recommend that you take our training course, Graph Data Modeling for Neo4j, which introduces you to the best practices for modeling data and implementing graph data models in Neo4j.

Check your understanding

Question 1

What elements make up a graph?

Select the correct answers.

  • tuples
  • nodes
  • documents
  • relationships

Question 2

From a data scientist perspective, which element is key for analyzing the data in the graph?

Select the correct answer.

  • node
  • property
  • relationship
  • join

Question 3

In Neo4j, you can attach properties to the following elements in the graph?

Select the correct answers.

  • label
  • node
  • relationship
  • direction

Summary

You should now be able to:

  • Describe what a graph is.
  • Describe what a graph database is.
  • Describe some common use cases for using a graph database.
  • Describe the Neo4j property graph model.
  • Whiteboard a graph data model.

Stay Connected

Sign up to find out more about Neo4j's upcoming events & meetups.