Get Started What is a GraphDB? GraphDB vs RDBMS GraphDB vs NOSQL Language Drivers Cypher Cypher Basics Build a Recommendation Engine Cypher Refcard Data Modeling Graph Data Modeling Guidelines Working with Data Neo4j Browser Visualization Importing Data into Neo4j Graph… Learn More »

Goals
This guide is designed to walk you through the graph data modeling lifecycle of Neo4j. You will be introduced to the basic process of designing a graph data model that can answer a wide range of business questions across a variety of domains.
Prerequisites
You should have a firm understanding of the Neo4j’s property graph data model, and its query language, Cypher.
Intermediate

Introduction

Graph data modeling is the process in which a Neo4j user describes an arbitrary domain as a connected graph of nodes and relationships. From this description, a graph data model is designed to answer questions in the form of Cypher queries.

Describing a Domain

Consider the following description of the connection between two people, Sally and John.



Two people, John and Sally, are friends. Both John and Sally have read the book, Graph Databases.


We can take this statement, identify components such as labels, nodes, and relationships, and use these to build our model.

Nodes

The first entity that we’ll identify in our domain are the nodes.



The fundamental units that form a graph are nodes and relationships. In Neo4j, both nodes and relationships can contain properties. Nodes are often used to represent entities, but depending on the domain relationships may be used for that purpose as well. Apart from properties and relationships, nodes can also be labeled with zero or more labels.
docs.neo4j.org
— 3.1 Nodes


Nodes

We can identify nodes as entities with a unique conceptual identity. In this case, these entities are outlined below in bold.



Two people, John and Sally, are friends. Both John and Sally have read the book, Graph Databases.


Extracting the nodes:

  • John
  • Sally
  • Graph Databases

Extract Nodes

Labels

Now that we have an idea of what our nodes will be, we decide what labels (if any) to assign our nodes.



A label is a named graph construct that is used to group nodes into sets. All nodes labeled with the same label belongs to the same set. Many database queries can work with these sets instead of the whole graph, making queries easier to write and more efficient. A node may be labeled with any number of labels, including none, making labels an optional addition to the graph.
docs.neo4j.org
— 3.4 Labels


Labels

First we’ll start by identifying the roles of objects (John, Sally, Graph Databases) mentioned in the statement. We know that this statement is about two different types of objects:



Two people, John and Sally, are friends. Both John and Sally have read the book, Graph Databases.


Extracting the labels:

  • Person
  • Book

Extract labels

Now that we have identified both our nodes and labels, we can assign the labels to the nodes they describe. For John and Sally we apply the role Person. For Graph Databases we apply the role Book.

Apply labels

Relationships

Further, we can identify the interactions between these subjects:

  • John is a friend of Sally
  • Sally is a friend of John
  • John has read Graph Databases
  • Sally has read Graph Databases

Now that we have described the kinds of things in our domain as nodes with their labels, we can connect the nodes together to describe their interactions.

Pairs of nodes labeled Person can be connected to each other by the friend of relationship.

Pairs of nodes labeled Book and Person can be connected to each other by a has read relationship.

Draw the Data Model

Now that we have identified the kinds of relationships that can exist between labels of nodes, we can complete our graph data model.

Graph data model

Answering Questions

We have gone through the process of creating a basic graph data model for the interactions between people and books. We can take this data model further by defining attributes of these entities as key-value properties.

List your Questions

First, start by listing your questions that you want to answer about your data.

  • When did John and Sally become friends?
  • What is the average rating of the book Graph Databases?
  • Who is the author of the book Graph Databases?
  • How old is Sally?
  • How old is John?
  • Who is older, Sally or John?
  • Who read the book Graph Databases first, Sally or John?

From these list of questions, you can identify the attributes that must belong to entities within your data model.

Graph data model with properties

Create a Sample Dataset

Now that we have a complete graph data model for our domain that sufficiently answers our questions, we can go about creating a sample dataset using Cypher.

// Create Sally
CREATE (sally:Person { name: 'Sally', age: 29 })

// Create John
CREATE (john:Person { name: 'John', age: 27 })

// Create Graph Databases book
CREATE (gdb:Book { title: 'Graph Databases',
                   authors: ['Ian Robinson', 'Jim Webber'] })

// Connect Sally and John as friends
CREATE (sally)-[:FRIEND_OF { since: 1357718400 }]->(john)

// Connect Sally to Graph Databases book
CREATE (sally)-[:HAS_READ { rating: 4, on: 1360396800 }]->(gdb)

// Connect John to Graph Databases book
CREATE (john)-[:HAS_READ { rating: 5, on: 1359878400 }]->(gdb)

Translate Questions to Cypher Queries

Now that we have a sample dataset of our graph data model, we can translate our questions from earlier into queries.

When did John and Sally become Friends?
MATCH (sally:Person { name: 'Sally' })
MATCH (john:Person { name: 'John' })
MATCH (sally)-[r:FRIEND_OF]-(john)
RETURN r.since as friends_since
What is the average rating of Graph Databases?
MATCH (gdb:Book { title: 'Graph Databases' })
MATCH (gdb)<-[r:HAS_READ]-()
RETURN avg(r.rating) as average_rating
Who are the authors of Graph Databases?
MATCH (gdb:Book { title: 'Graph Databases' })
RETURN gdb.authors as authors
How old is Sally?
MATCH (sally:Person { name: 'Sally' })
RETURN sally.age as sally_age
How old is John?
MATCH (john:Person { name: 'John' })
RETURN john.age as john_age
Who is older, Sally or John?
MATCH (people:Person)
WHERE people.name = 'John' OR people.name = 'Sally'
RETURN people.name as oldest
ORDER BY people.age DESC
LIMIT 1
Who Read Graph Databases First, Sally or John?
MATCH (people:Person)
WHERE people.name = 'John' OR people.name = 'Sally'
MATCH (people)-[r:HAS_READ]->(gdb:Book { title: 'Graph Databases' })
RETURN people.name as first_reader
ORDER BY r.on
LIMIT 1