Get Started What is a GraphDB? GraphDB vs RDBMS GraphDB vs NOSQL Language Drivers Cypher Cypher Basics Build a Recommendation Engine Cypher Refcard Data Modeling Graph Data Modeling Guidelines Working with Data Neo4j Browser Visualization Importing Data into Neo4j Graph… Learn More »

Goals
Existing data comes in many shapes and sizes, and is commonly stored in a relational database. This guide explores how to export data from a PostgreSQL database into Neo4j.
Prerequisites
You should have a basic understanding of the property graph model and have completed the modeling guide. If you download and install Neo4j server you’ll be able to follow along with the examples.
Beginner

Northwind Introduction

In this guide we’ll be using the NorthWind dataset, a commonly used SQL dataset. Although the NorthWind dataset often used to demonstrate SQL and relational databases, it is graphy enough to be interesting for us.

The following is an entity relationship diagram of the NorthWind dataset:

Northwind diagram

Developing a Graph Model

When deriving a graph model from a relational model, we should keep the following guidelines in mind:

  • A row is a node
  • A table name is a label name

In this data set, the following graph model serves as a first iteration:

northwind graph simple

How does the Graph Model Differ from the Relational Model?
  • There are no nulls. In the relational version, to track the employee hierarchy we have a null entry in the ‘ReportsTo’ column if they don’t report to anybody. In the graph version we just don’t define a relationship.
  • It describes the relationships in more detail e.g. we know that an employee SOLD an order rather than having a foreign key relationship between the Orders and Employees tables. We could also choose to add more meta data about that relationship should we wish.
  • It will often be more normalised. e.g. addresses have been denormalised in several of the tables but in a future version of the graph model we might choose to make it a node in its own right.

Exporting the Data to CSV

Now that we know what we’d like our graph to look like, we need to extract the data from PostgreSQL so we can create it as a graph. The easiest way to do that is to export the appropriate tables in CSV format. The PostgreSQL ‘copy’ command lets us execute a SQL query and write the result to a CSV file:

COPY (SELECT * FROM customers) TO '/tmp/customers.csv' WITH CSV header;
COPY (SELECT * FROM suppliers) TO '/tmp/suppliers.csv' WITH CSV header;
COPY (SELECT * FROM products)  TO '/tmp/products.csv' WITH CSV header;
COPY (SELECT * FROM employees) TO '/tmp/employees.csv' WITH CSV header;
COPY (SELECT * FROM categories) TO '/tmp/categories.csv' WITH CSV header;

COPY (SELECT * FROM orders
      LEFT OUTER JOIN order_details ON order_details."OrderID" = orders."OrderID") TO '/tmp/orders.csv' WITH CSV header;

Importing the Data using Cypher

After we’ve exported our data from PostgreSQL, we’ll use Cypher’s LOAD CSV command to transform the contents of the CSV file into a graph structure.

First, create the nodes:

# Create customers
LOAD CSV WITH HEADERS FROM "file:/tmp/customers.csv" AS row
CREATE (:Customer {CompanyName: row.CompanyName, CustomerID: row.CustomerID, Fax: row.Fax, Phone: row.Phone});

# Create products
LOAD CSV WITH HEADERS FROM "file:/tmp/products.csv" AS row
CREATE (:Product {ProductName: row.ProductName, ProductID: row.ProductID, UnitPrice: row.UnitPrice});

# Create suppliers
LOAD CSV WITH HEADERS FROM "file:/tmp/suppliers.csv" AS row
CREATE (:Supplier {CompanyName: row.CompanyName, SupplierID: row.SupplierID});

# Create employees
LOAD CSV WITH HEADERS FROM "file:/tmp/employees.csv" AS row
CREATE (:Employee {EmployeeID:row.EmployeeID,  FirstName: row.FirstName, LastName: row.LastName, Title: row.Title});

# Create categories
LOAD CSV WITH HEADERS FROM "file:/tmp/categories.csv" AS row
CREATE (:Category {CategoryID: row.CategoryID, CategoryName: row.CategoryName, Description: row.Description});

Next, we’ll create indexes on the just-created nodes to ensure their quick lookup when creating relationships in the next step.

CREATE INDEX ON :Product(ProductID);
CREATE INDEX ON :Category(CategoryID);
CREATE INDEX ON :Employee(EmployeeID);
CREATE INDEX ON :Supplier(SupplierID);
CREATE INDEX ON :Customer(CustomerID);

Initial nodes and indices in place, we can now create relationships orders and their relationships to products and employees:

LOAD CSV WITH HEADERS FROM "file:/tmp/orders.csv" AS row
CREATE (order:Order {OrderID: row.OrderID, ShipName: row.ShipName})

WITH *
MATCH (product:Product {ProductID: row.ProductID})
MATCH (employee:Employee {EmployeeID: row.EmployeeID})

MERGE (employee)-[:SOLD]->(order)
MERGE (order)-[:PRODUCT]->(product);

Next, create relationships between products, suppliers, and categories:

LOAD CSV WITH HEADERS FROM "file:/tmp/products.csv" AS row
MATCH (product:Product {ProductID: row.ProductID})
MATCH (supplier:Supplier {SupplierID: row.SupplierID})
MATCH (category:Category {CategoryID: row.CategoryID})

MERGE (supplier)-[:SUPPLIES]->(product)
MERGE (product)-[:PART_OF]->(category);

Finally we’ll create the ‘REPORTS_TO’ relationship between employees to represent the reporting structure:

LOAD CSV WITH HEADERS FROM "file:/tmp/employees.csv" AS row
MATCH (employee:Employee {EmployeeID: row.EmployeeID})
MATCH (manager:Employee {EmployeeID: row.ReportsTo})
MERGE (employee)-[:REPORTS_TO]->(manager);

For completeness and optimal query speed, create an index on orders:

CREATE INDEX ON :Order(OrderId);

The resulting graph should look like this:

northwind graph sample

We can now query the graph.

Querying the Graph

One question we might be interested in is:

Which Employee had the Highest Cross-Selling Count of ‘Chocolade’ and Which Product?
MATCH (choc:Product {ProductName:'Chocolade'})<-[:PRODUCT]-(:Order)<-[:SOLD]-(employee),
      (employee)-[:SOLD]->()-[:PRODUCT]->(other:Product)
RETURN employee.EmployeeID, other.ProductName, count(*) as count
ORDER BY count DESC
LIMIT 5

Looks like employee number 1 was very busy!

employee.employeeId other.name count

1

Flotemysost

24

1

Gorgonzola Telino

22

1

Pavlova

22

1

Camembert Pierrot

22

1

Ikura

20

We might also like to answer the following question:

How are Employees Organized? Who Reports to Whom?
MATCH path = (e:Employee)<-[:REPORTS_TO]-(sub)
RETURN e.EmployeeID AS manager, sub.EmployeeID AS employee
manager employee

2

1

2

3

2

4

2

5

2

8

5

6

5

7

5

9

Notice that employee #5 has people reporting to them but also reports to employee #2.

Let’s investigate that a bit more:

Which Employees Report to Each Other Indirectly?
MATCH path = (e:Employee)<-[:REPORTS_TO*]-(sub)
WITH e, sub, [person in NODES(path) | person.EmployeeID][1..-1] AS path
RETURN e.EmployeeID AS manager, sub.EmployeeID AS employee, CASE WHEN LENGTH(path) = 0 THEN "Direct Report" ELSE path END AS via
ORDER BY LENGTH(path)
e.EmployeeID sub.EmployeeID via

2

1

Direct Report

2

3

Direct Report

2

4

Direct Report

2

5

Direct Report

2

8

Direct Report

5

6

Direct Report

5

7

Direct Report

5

9

Direct Report

2

6

["5"]

2

7

["5"]

2

9

["5"]

How Many Orders were Made by Each Part of the Hierachy?
MATCH (e:Employee)
OPTIONAL MATCH (e)<-[:REPORTS_TO*0..]-(sub)-[:SOLD]->()
RETURN e.EmployeeID, [x IN COLLECT(DISTINCT sub.EmployeeID) WHERE x <> e.EmployeeID] AS reports, COUNT(*) AS totalOrders
ORDER BY totalOrders DESC
e.EmployeeID reports totalOrders

2

["1"

"3"

"4"

"5"

"6"

"7"

"9"

"8"]

2155

5

["6"

"7"

"9"]

568

4

[]

420

1

[]

345

3

[]

321

8

[]

260

7

[]

176

6

[]

168

9

[]

107