Setting up and using a composite database

Composite databases allow queries that access multiple graphs at once. This is a function that enables:

Data Federation: the ability to access data available in distributed sources in the form of disjoint graphs.
Data Sharding: the ability to access data available in distributed sources in the form of a common graph partitioned on multiple databases.

In this tutorial, you will learn how to:

Model your data for Composite database use
Create databases for the composite
Import data to your databases
Configure a Composite database
Retrieve data with a single Cypher query

Model your data for Composite database use

The example data in this tutorial is based on the Northwind dataset, created by Microsoft. It contains the sales data of a fictitious small company called “Northwind Traders”. The data includes customers, products, customer orders, warehouse stock, shipping, suppliers, employees, and sales territories.

For more information on how Northwind (a relational dataset) is modeled into a graph, run :guide northwind-graph in Neo4j Browser to play the built-in guide Northwind Graph. See the Neo4j Browser documentation.

The Northwind graph model consists of the following data:

Node labels
- :Product
- :Category
- :Supplier
- :Order
- :Customer
Relationship types
- :SUPPLIES
- :PART_OF
- :ORDERS
- :PURCHASED

Figure 1. The Northwind data model

In this scenario, assume that data privacy constraints require customers’ data to be stored in their original region. For simplicity, there are two regions: the Americas (AME) and Europe (EU). The first step is to remodel the Northwind dataset, so that customer data can be separated from the Product catalog, which has no privacy constraints. You create two graphs: one for the Product catalog, which includes :Product, :Category, :Supplier, :PART_OF, :SUPPLIES, and one partitioned graph in two databases for the Customer orders in EU and AME, with :Product, :Order, :Customer, :PURCHASED, and :ORDERS.

Figure 2. The new data model

Data Federation

This way, the Product and Customer data are in two disjoint graphs, with different labels and relationship types. This is called Data Federation.

To query across them, you have to federate the graphs, because relationships cannot span across them. This is done by using a proxy node modeling pattern: nodes with the :Product label must be present in both federated domains.

In the Product catalog graph, nodes with the :Product label contain all the data related to a product, while in the Customer graphs, the same label is associated to a proxy node which only contains productID. The productID property allows you to link data across the graphs in this federation.

Figure 3. Data Federation

Data Sharding

Since the Customer data is for two regions (EU and AME), you have to partition it into two databases. The resulting two graphs have the same model (same labels, same relationship types), but different data. This is called Data Sharding.

Figure 4. Data Sharding

In general, there are a couple of main use cases that require sharding. The most common is scalability, i.e. different shards can be deployed on different servers, splitting the load on different resources. Another reason could be data regulations: different shards can be deployed on servers, residing in different locations, and managed independently.

Create databases for the composite

For this tutorial, you will create the following databases:

db0 for the Product catalog.
db1 for the EU customer data.
db2 for the AME customers.

Start the Neo4j DBMS.
```
bin/neo4j start
```

Check all available databases.

ls -al /data/databases/

total 0
drwxr-xr-x@  5 username  staff   160  9 Jun 12:53 .
drwxr-xr-x@  5 username  staff   160  9 Jun 12:53 ..
drwxr-xr-x  37 username  staff  1184  9 Jun 12:53 neo4j
-rw-r--r--   1 username  staff     0  9 Jun 12:53 store_lock
drwxr-xr-x  38 username  staff  1216  9 Jun 12:53 system

Connect to the Neo4j DBMS using cypher-shell with the default credentials and change the password when prompted:

bin/cypher-shell -u neo4j -p neo4j

Password change required
new password: ********
Connected to Neo4j 2025.01 at neo4j://localhost:7687 as user neo4j.
Type :help for a list of available commands or :exit to exit the shell.
Note that Cypher queries must end with a semicolon.

For more information about the Cypher Shell command-line interface (CLI) and how to use it, see Cypher Shell.

Run the command SHOW DATABASES to list all available databases:

SHOW DATABASES;

+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| name     | type       | aliases | access       | address          | role      | writer | requestedStatus | currentStatus | statusMessage | default | home  | constituents |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| "neo4j"  | "standard" | []      | "read-write" | "localhost:7687" | "primary" | TRUE   | "online"        | "online"      | ""            | TRUE    | TRUE  | []           |
| "system" | "system"   | []      | "read-write" | "localhost:7687" | "primary" | TRUE   | "online"        | "online"      | ""            | FALSE   | FALSE | []           |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
2 rows available after 102 ms, consumed after another 11 ms

Run the command CREATE DATABASE <database-name> to create the databases:

CREATE DATABASE db0;

0 rows available after 137 ms, consumed after another 0 ms

CREATE DATABASE db1;

0 rows available after 141 ms, consumed after another 0 ms

CREATE DATABASE db2;

0 rows available after 135 ms, consumed after another 0 ms

Run the command SHOW DATABASES again to verify that the new databases have been created and are online:

SHOW DATABASES;

+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| name     | type       | aliases | access       | address          | role      | writer | requestedStatus | currentStatus | statusMessage | default | home  | constituents |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| "db0"    | "standard" | []      | "read-write" | "localhost:7687" | "primary" | TRUE   | "online"        | "online"      | ""            | FALSE   | FALSE | []           |
| "db1"    | "standard" | []      | "read-write" | "localhost:7687" | "primary" | TRUE   | "online"        | "online"      | ""            | FALSE   | FALSE | []           |
| "db2"    | "standard" | []      | "read-write" | "localhost:7687" | "primary" | TRUE   | "online"        | "online"      | ""            | FALSE   | FALSE | []           |
| "neo4j"  | "standard" | []      | "read-write" | "localhost:7687" | "primary" | TRUE   | "online"        | "online"      | ""            | TRUE    | TRUE  | []           |
| "system" | "system"   | []      | "read-write" | "localhost:7687" | "primary" | TRUE   | "online"        | "online"      | ""            | FALSE   | FALSE | []           |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
5 rows available after 8 ms, consumed after another 7 ms

Import data to your databases

You can use the command LOAD CSV WITH HEADERS FROM to import data to the databases.

Load the Product catalog in db0

Run the following Cypher query to change the active database to db0, and add the Product data:

:use db0;

LOAD CSV WITH HEADERS FROM "https://data.neo4j.com/northwind/products.csv" AS row
CREATE (n:Product)
SET n = row,
n.unitPrice = toFloat(row.unitPrice),
n.unitsInStock = toInteger(row.unitsInStock), n.unitsOnOrder = toInteger(row.unitsOnOrder),
n.reorderLevel = toInteger(row.reorderLevel), n.discontinued = (row.discontinued <> "0");

LOAD CSV WITH HEADERS FROM "https://data.neo4j.com/northwind/categories.csv" AS row
CREATE (n:Category)
SET n = row;

LOAD CSV WITH HEADERS FROM "https://data.neo4j.com/northwind/suppliers.csv" AS row
CREATE (n:Supplier)
SET n = row;

CREATE INDEX FOR (p:Product) ON (p.productID);
CREATE INDEX FOR (c:Category) ON (c.categoryID);
CREATE INDEX FOR (s:Supplier) ON (s.supplierID);

MATCH (p:Product),(c:Category)
WHERE p.categoryID = c.categoryID
CREATE (p)-[:PART_OF]->(c);

MATCH (p:Product),(s:Supplier)
WHERE p.supplierID = s.supplierID
CREATE (s)-[:SUPPLIES]->(p);

Press Enter.

Verify that the product data is loaded in db0:

MATCH (s:Supplier)-[:SUPPLIES]->(p:Product)-[:PART_OF]->(c:Category)
RETURN s.companyName AS Supplier, p.productName AS Product, c.categoryName AS Category
LIMIT 5;

+--------------------------------------------------------------------------+
| Supplier                     | Product                     | Category    |
+--------------------------------------------------------------------------+
| "Bigfoot Breweries"          | "Sasquatch Ale"             | "Beverages" |
| "Pavlova"                    | "Outback Lager"             | "Beverages" |
| "Bigfoot Breweries"          | "Laughing Lumberjack Lager" | "Beverages" |
| "Bigfoot Breweries"          | "Steeleye Stout"            | "Beverages" |
| "Aux joyeux ecclésiastiques" | "Côte de Blaye"             | "Beverages" |
+--------------------------------------------------------------------------+

5 rows available after 202 ms, consumed after another 5 ms

Load EU customers and related orders in db1

Run the following Cypher query to change the active database to db1, and add the EU customers and orders:

:use db1;

:param europe => ['Germany', 'UK', 'Sweden', 'France', 'Spain', 'Switzerland', 'Austria', 'Italy', 'Portugal', 'Ireland', 'Belgium', 'Norway', 'Denmark', 'Finland'];

LOAD CSV WITH HEADERS FROM "https://data.neo4j.com/northwind/customers.csv" AS row
WITH row
WHERE row.country IN $europe
CREATE (n:Customer)
SET n = row;

CREATE INDEX FOR (c:Customer) ON (c.customerID);

LOAD CSV WITH HEADERS FROM "https://data.neo4j.com/northwind/orders.csv" AS row
WITH row
MATCH (c:Customer)
WHERE row.customerID = c.customerID
CREATE (o:Order)
SET o = row;

CREATE INDEX FOR (o:Order) ON (o.orderID);

MATCH (c:Customer),(o:Order)
WHERE c.customerID = o.customerID
CREATE (c)-[:PURCHASED]->(o);

LOAD CSV WITH HEADERS FROM "https://data.neo4j.com/northwind/products.csv" AS row
CREATE (n:Product)
SET n.productID = row.productID;

CREATE INDEX FOR (p:Product) ON (p.productID);

LOAD CSV WITH HEADERS FROM "https://data.neo4j.com/northwind/order-details.csv" AS row
MATCH (p:Product), (o:Order)
WHERE p.productID = row.productID AND o.orderID = row.orderID
CREATE (o)-[details:ORDERS]->(p)
SET details = row, details.quantity = toInteger(row.quantity);

Press Enter.

Verify that the EU Customer orders data is loaded in db1:

MATCH (c:Customer)-[:PURCHASED]->(o:Order)-[:ORDERS]->(p:Product)
RETURN c.companyName AS Customer, c.country AS CustomerCountry, o.orderID AS Order, p.productID AS Product
LIMIT 5;

+-------------------------------------------------------------+
| Customer              | CustomerCountry | Order   | Product |
+-------------------------------------------------------------+
| "Alfreds Futterkiste" | "Germany"       | "10692" | "63"    |
| "Alfreds Futterkiste" | "Germany"       | "10835" | "77"    |
| "Alfreds Futterkiste" | "Germany"       | "10835" | "59"    |
| "Alfreds Futterkiste" | "Germany"       | "10702" | "76"    |
| "Alfreds Futterkiste" | "Germany"       | "10702" | "3"     |
+-------------------------------------------------------------+

5 rows available after 47 ms, consumed after another 2 ms

Load AME customers and related orders in db2

Run the following Cypher query to change the active database to db2 and add the AME customers and orders:

:use db2;

:param americas => ['Mexico', 'Canada', 'Argentina', 'Brazil', 'USA', 'Venezuela'];

LOAD CSV WITH HEADERS FROM "https://data.neo4j.com/northwind/customers.csv" AS row
WITH row
WHERE row.country IN $americas
CREATE (n:Customer)
SET n = row;

CREATE INDEX FOR (c:Customer) ON (c.customerID);

LOAD CSV WITH HEADERS FROM "https://data.neo4j.com/northwind/orders.csv" AS row
WITH row
MATCH (c:Customer)
WHERE row.customerID = c.customerID
CREATE (o:Order)
SET o = row;

CREATE INDEX FOR (o:Order) ON (o.orderID);

MATCH (c:Customer),(o:Order)
WHERE c.customerID = o.customerID
CREATE (c)-[:PURCHASED]->(o);

LOAD CSV WITH HEADERS FROM "https://data.neo4j.com/northwind/products.csv" AS row
CREATE (n:Product)
SET n.productID = row.productID;

CREATE INDEX FOR (p:Product) ON (p.productID);

LOAD CSV WITH HEADERS FROM "https://data.neo4j.com/northwind/order-details.csv" AS row
MATCH (p:Product), (o:Order)
WHERE p.productID = row.productID AND o.orderID = row.orderID
CREATE (o)-[details:ORDERS]->(p)
SET details = row,
details.quantity = toInteger(row.quantity);

Press Enter.

Verify that the AME Customer orders data is loaded in db2:

MATCH (c:Customer)-[:PURCHASED]->(o:Order)-[:ORDERS]->(p:Product)
RETURN c.companyName AS Customer, c.country AS CustomerCountry, o.orderID AS Order, p.productID AS Product
LIMIT 5;

+----------------------------------------------------------------------------+
| Customer                             | CustomerCountry | Order   | Product |
+----------------------------------------------------------------------------+
| "Ana Trujillo Emparedados y helados" | "Mexico"        | "10759" | "32"    |
| "Ana Trujillo Emparedados y helados" | "Mexico"        | "10926" | "72"    |
| "Ana Trujillo Emparedados y helados" | "Mexico"        | "10926" | "13"    |
| "Ana Trujillo Emparedados y helados" | "Mexico"        | "10926" | "19"    |
| "Ana Trujillo Emparedados y helados" | "Mexico"        | "10926" | "11"    |
+----------------------------------------------------------------------------+

5 rows available after 42 ms, consumed after another 1 ms

Configure a Composite database

Set up a Composite database with the CREATE COMPOSITE DATABASE Cypher command and add local database aliases as constituents to the Composite database. In this example, the Composite database is called compositenw.

Run the command CREATE COMPOSITE DATABASE <composite-database-name> to create the Composite database:
```
CREATE COMPOSITE DATABASE compositenw;
```
```
0 rows available after 137 ms, consumed after another 0 ms
```

Run the command CREATE ALIAS <composite-database-name>.<alias-name> FOR DATABASE <database-name> to create the constituent database aliases:

CREATE ALIAS compositenw.product FOR DATABASE db0;

0 rows available after 101 ms, consumed after another 0 ms

CREATE ALIAS compositenw.customerEU FOR DATABASE db1;

0 rows available after 107 ms, consumed after another 0 ms

CREATE ALIAS compositenw.customerAME FOR DATABASE db2;

0 rows available after 98 ms, consumed after another 0 ms

The constituent database aliases in this tutorial are local database aliases (targeting databases in the same Neo4j DBMS), but they can just as well be remote database aliases (targeting databases in another Neo4j DBMS).

Run the command SHOW DATABASES to verify that the Composite database has been configured and is online:

SHOW DATABASES;

+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| name          | type        | aliases                     | access       | address          | role      | writer | requestedStatus | currentStatus | statusMessage | default | home  | constituents                                                                 |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| "db0"         | "standard"  | ["compositenw.product"]     | "read-write" | "localhost:7687" | "primary" | TRUE   | "online"        | "online"      | ""            | FALSE   | FALSE | []                                                                           |
| "db1"         | "standard"  | ["compositenw.customerEU"]  | "read-write" | "localhost:7687" | "primary" | TRUE   | "online"        | "online"      | ""            | FALSE   | FALSE | []                                                                           |
| "db2"         | "standard"  | ["compositenw.customerAME"] | "read-write" | "localhost:7687" | "primary" | TRUE   | "online"        | "online"      | ""            | FALSE   | FALSE | []                                                                           |
| "compositenw" | "composite" | []                          | "read-only"  | "localhost:7687" | "primary" | FALSE  | "online"        | "online"      | ""            | FALSE   | FALSE | ["compositenw.customerAME", "compositenw.customerEU", "compositenw.product"] |
| "neo4j"       | "standard"  | []                          | "read-write" | "localhost:7687" | "primary" | TRUE   | "online"        | "online"      | ""            | TRUE    | TRUE  | []                                                                           |
| "system"      | "system"    | []                          | "read-write" | "localhost:7687" | "primary" | TRUE   | "online"        | "online"      | ""            | FALSE   | FALSE | []                                                                           |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
6 rows available after 242 ms, consumed after another 18 ms

Run the command SHOW ALIASES FOR DATABASES to verify that the database aliases have been configured:

SHOW ALIASES FOR DATABASES;

+--------------------------------------------------------------------------------+
| name                      | composite     | database  | location | url  | user |
+--------------------------------------------------------------------------------+
| "compositenw.product"     | "compositenw" | "db0"     | "local"  | null | null |
| "compositenw.customerEU"  | "compositenw" | "db1"     | "local"  | null | null |
| "compositenw.customerAME" | "compositenw" | "db2"     | "local"  | null | null |
+--------------------------------------------------------------------------------+
3 rows available after 203 ms, consumed after another 16 ms

Retrieve data with a single Cypher query

Query a single database

When connected to a Composite database you can retrieve data from a single database by using the Cypher clause USE and the name of an alias:

:use compositenw

USE compositenw.product
MATCH (p:Product)
RETURN p.productName AS product
LIMIT 5;

+--------------------------------+
| product                        |
+--------------------------------+
| "Chai"                         |
| "Chang"                        |
| "Aniseed Syrup"                |
| "Chef Anton's Cajun Seasoning" |
| "Chef Anton's Gumbo Mix"       |
+--------------------------------+

5 rows available after 6 ms, consumed after another 21 ms

Query across multiple shards

Use the Composite database to query both shards and get customers whose name starts with A:

:use compositenw

USE compositenw.customerAME
MATCH (c:Customer)
WHERE c.customerID STARTS WITH 'A'
RETURN c.customerID AS name, c.country AS country
  UNION
USE compositenw.customerEU
MATCH (c:Customer)
WHERE c.customerID STARTS WITH 'A'
RETURN c.customerID AS name, c.country AS country
LIMIT 5;

+---------------------+
| name    | country   |
+---------------------+
| "ANATR" | "Mexico"  |
| "ANTON" | "Mexico"  |
| "ALFKI" | "Germany" |
| "AROUT" | "UK"      |
+---------------------+

4 rows available after 25 ms, consumed after another 56 ms

Or, using a more common Composite database idiom:

:use compositenw

UNWIND ['compositenw.customerAME', 'compositenw.customerEU'] AS g
CALL {
  USE graph.byName(g)
  MATCH (c:Customer)
  WHERE c.customerID STARTS WITH 'A'
  RETURN c.customerID AS name, c.country AS country
}
RETURN name, country
LIMIT 5;

+---------------------+
| name    | country   |
+---------------------+
| "ANATR" | "Mexico"  |
| "ANTON" | "Mexico"  |
| "ALFKI" | "Germany" |
| "AROUT" | "UK"      |
+---------------------+

4 rows available after 61 ms, consumed after another 8 ms

Query across federation and shards

Here is a more complex query that uses all 3 databases to find all customers who have bought discontinued products in the Meat/Poultry category:

:use compositenw

CALL {
  USE compositenw.product
  MATCH (p:Product)-[:PART_OF]->(c:Category)
  WHERE p.discontinued = true
    AND c.categoryName = 'Meat/Poultry'
  RETURN COLLECT(p.productID) AS pids
}
WITH *
UNWIND [g IN graph.names() WHERE g STARTS WITH 'compositenw.customer'] AS g
CALL {
  USE graph.byName(g)
  WITH pids
  UNWIND pids as pid
  MATCH (p:Product{productID:pid})<-[:ORDERS]-(:Order)<-[:PURCHASED]-(c:Customer)
  RETURN DISTINCT c.customerID AS customer, c.country AS country
}
RETURN customer, country
LIMIT 20;

+--------------------------+
| customer | country       |
+--------------------------+
| "RICSU"  | "Switzerland" |
| "PERIC"  | "Mexico"      |
| "WARTH"  | "Finland"     |
| "WELLI"  | "Brazil"      |
| "DRACD"  | "Germany"     |
| "RATTC"  | "USA"         |
| "HUNGO"  | "Ireland"     |
| "QUEDE"  | "Brazil"      |
| "SEVES"  | "UK"          |
| "ANTON"  | "Mexico"      |
| "BERGS"  | "Sweden"      |
| "SAVEA"  | "USA"         |
| "AROUT"  | "UK"          |
| "FAMIA"  | "Brazil"      |
| "WANDK"  | "Germany"     |
| "WHITC"  | "USA"         |
| "ISLAT"  | "UK"          |
| "LONEP"  | "USA"         |
| "QUICK"  | "Germany"     |
| "HILAA"  | "Venezuela"   |
+--------------------------+

20 rows available after 51 ms, consumed after another 2 ms

The way this query works is by compositenw calling database db0 to retrieve all discontinued products in the Meat/Poultry category. Then, using the returned product IDs, it queries both db1 and db2 in parallel and gets the customers who have purchased these products and their country.

You have just learned how to store and retrieve data from multiple databases using a single Cypher query.

For more details on Composite databases, see Concepts.