B.3. Set up and use Fabric

This tutorial walks through the basics of setting up and using Neo4j Fabric.

Neo4j Fabric is a tool for storing and retrieving data in multiple databases, located in one or many Neo4j DBMS(s), with a single Cypher query.

In this tutorial, you will learn how to:

For more information on how to manage multiple active databases in Neo4j, see Chapter 6, Manage databases.

For more details on Fabric, see Chapter 8, Fabric.

B.3.1. Model your data for Fabric

B.3.1.1. Northwind data

northwind logo The example data in this tutorial is based on the Northwind dataset, created by Microsoft.

It contains the sales data of a fictitious small company called “Northwind Traders”. The data includes customers, products, customer orders, warehouse stock, shipping, suppliers, employees, and sales territories.

B.3.1.2. The model

The Northwind graph model consists of the following data:

  • Node labels

    • :Product
    • :Category
    • :Supplier
    • :Order
    • :Customer
  • Relationship types

    • :SUPPLIES
    • :PART_OF
    • :ORDERS
    • :PURCHASED
Figure B.1. The Northwind data model
northwind datamodel

B.3.1.3. Remodeling the Northwind dataset

In this scenario, you imagine that data privacy constraints require customers’ data to be stored in their original region. For simplicity, there are two regions: the Americas (AME) and Europe (EU). The first step is to remodel the Northwind dataset, so that customer data can be separated from the Product catalog, which has no privacy constraints. You create two graphs: one for the Product catalog, which includes :Product, :Category, :Supplier, :PART_OF, :SUPPLIES, and one partitioned graph in two databases for the Customer orders in EU and AME, with :Product, :Order, :Customer, :PURCHASED, and :ORDERS.

Figure B.2. The new data model
northwind fabric datamodel

Data Federation

This way, the Product and Customer data are in two disjointed graphs, with different labels and relationship types. This is called Data Federation. To query across them, you have to federate the graphs, because relationships cannot span across them. This is done by using a proxy node modeling pattern: nodes with the :Product label must be present in both federated domains. In the Product catalog graph, nodes with the :Product label contain all the data related to a product, while in the Customer graphs, the same label is associated to a proxy node, which only contains productID. The productID property allows you to link data across the graphs in this federation.

Figure B.3. Data Federation
federation

Data Sharding

Since the Customer data is for two regions (EU and AME), you have to partition it into two databases. The resulting two graphs have the same model (same labels, same relationship types), but different data. This is called Data Sharding.

Figure B.4. Data Sharding
sharding2

In general, there are a couple of main use cases that require sharding. The most common is scalability, i.e., different shards can be deployed on different servers, splitting the load on different resources. Another reason could be data regulations: different shards can be deployed on servers, residing in different locations, and managed independently.

B.3.2. Configure Fabric with three databases

Now that you have a new multi-database model defined, you can start to configure the Fabric infrastructure.

B.3.2.1. Create three databases

You need three databases: db0 for the Product catalog, db1 for the EU customer data, and db2 for the AME customers.

  1. Start the Neo4j DBMS.

    ~/neo4j-enterprise-4.1.x/bin$ ./neo4j start
  2. Check all available databases.

    ~/neo4j-enterprise-4.1.x/bin$  ls -al ../data/databases/
    ls -al ../data/databases/
    total 0
    drwxr-xr-x@  5 username  staff   160  9 Jun 12:53 .
    drwxr-xr-x@  5 username  staff   160  9 Jun 12:53 ..
    drwxr-xr-x  37 username  staff  1184  9 Jun 12:53 neo4j
    -rw-r--r--   1 username  staff     0  9 Jun 12:53 store_lock
    drwxr-xr-x  38 username  staff  1216  9 Jun 12:53 system
  3. Connect to the Neo4j DBMS using ./cypher-shell with the default credentials and change the password when prompted.

    ~/neo4j-enterprise-4.1.x/bin$  ./cypher-shell -u neo4j -p neo4j
    Password change required
    new password: *****
    Connected to Neo4j 4.1.x at neo4j://localhost:7687 as user neo4j.
    Type :help for a list of available commands or :exit to exit the shell.
    Note that Cypher queries must end with a semicolon.
  4. Run the command show databases; to list all available databases.

    neo4j@neo4j> show databases;
    +------------------------------------------------------------------------------------------------+
    | name     | address          | role         | requestedStatus | currentStatus | error | default |
    +------------------------------------------------------------------------------------------------+
    | "neo4j"  | "localhost:7687" | "standalone" | "online"        | "online"      | ""    | TRUE    |
    | "system" | "localhost:7687" | "standalone" | "online"        | "online"      | ""    | FALSE   |
    +------------------------------------------------------------------------------------------------+
    
    2 rows available after 102 ms, consumed after another 11 ms
  5. Run the command create database <database-name> to create the databases.

    neo4j@neo4j> create database db0;
    0 rows available after 137 ms, consumed after another 0 ms
    neo4j@neo4j> create database db1;
    0 rows available after 14 ms, consumed after another 0 ms
    neo4j@neo4j> create database db2;
    0 rows available after 10 ms, consumed after another 0 ms
  6. Again run the command show databases; to verify that the new databases have been created.

    neo4j@neo4j> show databases;
    +------------------------------------------------------------------------------------------------+
    | name     | address          | role         | requestedStatus | currentStatus | error | default |
    +------------------------------------------------------------------------------------------------+
    | "db0"    | "localhost:7687" | "standalone" | "online"        | "online"      | ""    | FALSE   |
    | "db1"    | "localhost:7687" | "standalone" | "online"        | "online"      | ""    | FALSE   |
    | "db2"    | "localhost:7687" | "standalone" | "online"        | "online"      | ""    | FALSE   |
    | "neo4j"  | "localhost:7687" | "standalone" | "online"        | "online"      | ""    | TRUE    |
    | "system" | "localhost:7687" | "standalone" | "online"        | "online"      | ""    | FALSE   |
    +------------------------------------------------------------------------------------------------+
    
    5 rows available after 8 ms, consumed after another 7 ms
  7. Exit the Cypher Shell command-line tool.

    neo4j@system> :exit;
    Bye!

B.3.2.2. Configure Fabric

You set up Fabric by configuring the fabric database and the graph names and IDs in the neo4j.conf file. In this example, the Fabric database is called fabricnw.

  1. Navigate to the <NEO4j_HOME>/conf/ folder and open the neo4j.conf file.
  2. Add the following lines and save it.

    #********************************************************************
    # Fabric tutorial
    #********************************************************************
    
    fabric.database.name=fabricnw
    
    fabric.graph.0.uri=neo4j://localhost:7687
    fabric.graph.0.name=product
    fabric.graph.0.database=db0
    
    fabric.graph.1.uri=neo4j://localhost:7687
    fabric.graph.1.name=customerEU
    fabric.graph.1.database=db1
    
    fabric.graph.2.uri=neo4j://localhost:7687
    fabric.graph.2.name=customerAME
    fabric.graph.2.database=db2
  3. Navigate back to the <NEO4j_HOME>/bin/ folder and restart the Neo4j DBMS.

    ~/neo4j-enterprise-4.1.x/bin$ ./neo4j restart
  4. Connect to the Neo4j DBMS using ./cypher-shell and your credentials.

    ~/neo4j-enterprise-4.1.x/bin$  ./cypher-shell -u neo4j -p your-password
  5. Run the command show databases; to verify that the Fabric database has been configured and is online.

    neo4j@neo4j> show databases;
    +--------------------------------------------------------------------------------------------------+
    | name       | address          | role         | requestedStatus | currentStatus | error | default |
    +--------------------------------------------------------------------------------------------------+
    | "db0"      | "localhost:7687" | "standalone" | "online"        | "online"      | ""    | FALSE   |
    | "db1"      | "localhost:7687" | "standalone" | "online"        | "online"      | ""    | FALSE   |
    | "db2"      | "localhost:7687" | "standalone" | "online"        | "online"      | ""    | FALSE   |
    | "fabricnw" | "localhost:7687" | "standalone" | "online"        | "online"      | ""    | FALSE   |
    | "neo4j"    | "localhost:7687" | "standalone" | "online"        | "online"      | ""    | TRUE    |
    | "system"   | "localhost:7687" | "standalone" | "online"        | "online"      | ""    | FALSE   |
    +--------------------------------------------------------------------------------------------------+
    
    6 rows available after 242 ms, consumed after another 18 ms

B.3.3. Import data in your databases

You can use the command LOAD CSV WITH HEADERS FROM to import data in the databases.

B.3.3.1. Load the Product catalog in db0

  1. Run the following Cypher query to change the active database to db0, and add the product data.

    :use db0;
    
    LOAD CSV WITH HEADERS FROM "http://data.neo4j.com/northwind/products.csv" AS row
    CREATE (n:Product)
    SET n = row,
    n.unitPrice = toFloat(row.unitPrice),
    n.unitsInStock = toInteger(row.unitsInStock), n.unitsOnOrder = toInteger(row.unitsOnOrder),
    n.reorderLevel = toInteger(row.reorderLevel), n.discontinued = (row.discontinued <> "0");
    
    LOAD CSV WITH HEADERS FROM "http://data.neo4j.com/northwind/categories.csv" AS row
    CREATE (n:Category)
    SET n = row;
    
    LOAD CSV WITH HEADERS FROM "http://data.neo4j.com/northwind/suppliers.csv" AS row
    CREATE (n:Supplier)
    SET n = row;
    
    CREATE INDEX ON :Product(productID);
    CREATE INDEX ON :Category(categoryID);
    CREATE INDEX ON :Supplier(supplierID);
    
    MATCH (p:Product),(c:Category)
    WHERE p.categoryID = c.categoryID
    CREATE (p)-[:PART_OF]->(c);
    
    MATCH (p:Product),(s:Supplier)
    WHERE p.supplierID = s.supplierID
    CREATE (s)-[:SUPPLIES]->(p);
  2. Press Enter.
  3. Verify that the product data is loaded in db0.

    MATCH (s:Supplier)-[:SUPPLIES]->(p:Product)-[:PART_OF]->(c:Category)
    RETURN s.companyName AS Supplier, p.productName AS Product, c.categoryName AS Category
    LIMIT 5;
    +--------------------------------------------------------------------------+
    | Supplier                     | Product                     | Category    |
    +--------------------------------------------------------------------------+
    | "Bigfoot Breweries"          | "Sasquatch Ale"             | "Beverages" |
    | "Pavlova"                    | "Outback Lager"             | "Beverages" |
    | "Bigfoot Breweries"          | "Laughing Lumberjack Lager" | "Beverages" |
    | "Bigfoot Breweries"          | "Steeleye Stout"            | "Beverages" |
    | "Aux joyeux ecclésiastiques" | "Côte de Blaye"             | "Beverages" |
    +--------------------------------------------------------------------------+
    
    5 rows available after 202 ms, consumed after another 5 ms

B.3.3.2. Load EU customers and related orders in db1

  1. Run the following Cypher query to change the active database to db1, and add the EU customers and orders.

    :use db1;
    
    :param europe => ['Germany', 'UK', 'Sweden', 'France', 'Spain', 'Switzerland', 'Austria', 'Italy', 'Portugal', 'Ireland', 'Belgium', 'Norway', 'Denmark', 'Finland'];
    
    LOAD CSV WITH HEADERS FROM "http://data.neo4j.com/northwind/customers.csv" AS row
    WITH row
    WHERE row.country IN $europe
    CREATE (n:Customer)
    SET n = row;
    
    CREATE INDEX ON :Customer(customerID);
    
    LOAD CSV WITH HEADERS FROM "http://data.neo4j.com/northwind/orders.csv" AS row
    WITH row
    MATCH (c:Customer)
    WHERE row.customerID = c.customerID
    CREATE (o:Order)
    SET o = row;
    
    CREATE INDEX ON :Order(orderID);
    
    MATCH (c:Customer),(o:Order)
    WHERE c.customerID = o.customerID
    CREATE (c)-[:PURCHASED]->(o);
    
    LOAD CSV WITH HEADERS FROM "http://data.neo4j.com/northwind/products.csv" AS row
    CREATE (n:Product)
    SET n.productID = row.productID;
    
    CREATE INDEX ON :Product(productID);
    
    LOAD CSV WITH HEADERS FROM "http://data.neo4j.com/northwind/order-details.csv" AS row
    MATCH (p:Product), (o:Order)
    WHERE p.productID = row.productID AND o.orderID = row.orderID
    CREATE (o)-[details:ORDERS]->(p)
    SET details = row, details.quantity = toInteger(row.quantity);
  2. Press Enter.
  3. Verify that the EU Customer orders data is loaded in db1.

    MATCH (c:Customer)-[:PURCHASED]->(o:Order)-[:ORDERS]->(p:Product)
    RETURN c.companyName AS Customer, c.country AS CustomerCountry, o.orderID AS Order, p.productID AS Product
    LIMIT 5;
    +-------------------------------------------------------------+
    | Customer              | CustomerCountry | Order   | Product |
    +-------------------------------------------------------------+
    | "Alfreds Futterkiste" | "Germany"       | "10692" | "63"    |
    | "Alfreds Futterkiste" | "Germany"       | "10835" | "77"    |
    | "Alfreds Futterkiste" | "Germany"       | "10835" | "59"    |
    | "Alfreds Futterkiste" | "Germany"       | "10702" | "76"    |
    | "Alfreds Futterkiste" | "Germany"       | "10702" | "3"     |
    +-------------------------------------------------------------+
    
    5 rows available after 47 ms, consumed after another 2 ms

B.3.3.3. Load AME customers and related orders in db2

  1. Run the following Cypher query to change the active database to db2 and add the AME customers and orders.

    :use db2;
    
    :param americas => ['Mexico', 'Canada', 'Argentina', 'Brazil', 'USA', 'Venezuela'];
    
    LOAD CSV WITH HEADERS FROM "http://data.neo4j.com/northwind/customers.csv" AS row
    WITH row
    WHERE row.country IN $americas
    CREATE (n:Customer)
    SET n = row;
    
    CREATE INDEX ON :Customer(customerID);
    
    LOAD CSV WITH HEADERS FROM "http://data.neo4j.com/northwind/orders.csv" AS row
    WITH row
    MATCH (c:Customer)
    WHERE row.customerID = c.customerID
    CREATE (o:Order)
    SET o = row;
    
    CREATE INDEX ON :Order(orderID);
    
    MATCH (c:Customer),(o:Order)
    WHERE c.customerID = o.customerID
    CREATE (c)-[:PURCHASED]->(o);
    
    LOAD CSV WITH HEADERS FROM "http://data.neo4j.com/northwind/products.csv" AS row
    CREATE (n:Product)
    SET n.productID = row.productID;
    
    CREATE INDEX ON :Product(productID);
    
    LOAD CSV WITH HEADERS FROM "http://data.neo4j.com/northwind/order-details.csv" AS row
    MATCH (p:Product), (o:Order)
    WHERE p.productID = row.productID AND o.orderID = row.orderID
    CREATE (o)-[details:ORDERS]->(p)
    SET details = row,
    details.quantity = toInteger(row.quantity);
  2. Press Enter.
  3. Verify that the AME Customer orders data is loaded in db2.

    MATCH (c:Customer)-[:PURCHASED]->(o:Order)-[:ORDERS]->(p:Product)
    RETURN c.companyName AS Customer, c.country AS CustomerCountry, o.orderID AS Order, p.productID AS Product
    LIMIT 5;
    +----------------------------------------------------------------------------+
    | Customer                             | CustomerCountry | Order   | Product |
    +----------------------------------------------------------------------------+
    | "Ana Trujillo Emparedados y helados" | "Mexico"        | "10759" | "32"    |
    | "Ana Trujillo Emparedados y helados" | "Mexico"        | "10926" | "72"    |
    | "Ana Trujillo Emparedados y helados" | "Mexico"        | "10926" | "13"    |
    | "Ana Trujillo Emparedados y helados" | "Mexico"        | "10926" | "19"    |
    | "Ana Trujillo Emparedados y helados" | "Mexico"        | "10926" | "11"    |
    +----------------------------------------------------------------------------+
    
    5 rows available after 42 ms, consumed after another 1 ms

B.3.4. Retrieve data with a single Cypher query

Fabric allows you to retrieve data from all your databases with a single Cypher query.

As the databases db0, db1, db2 in this tutorial are part of the same Neo4j DBMS, you can also access them directly, using their database names. This is especially useful when you want to set up Fabric locally for development or testing purposes. In this case, you only have to add fabric.database.name=fabricnw to the neo4j.conf file, and use queries as the following one.

neo4j@db2>> :use fabricnw;
USE db1
MATCH (c:Customer)
WHERE c.customerID STARTS WITH 'A'
RETURN c.customerID AS name, c.country AS country
  UNION
USE db2
MATCH (c:Customer)
WHERE c.customerID STARTS WITH 'A'
RETURN c.customerID AS name, c.country AS country
LIMIT 5;
+---------------------+
| name    | country   |
+---------------------+
| "ALFKI" | "Germany" |
| "AROUT" | "UK"      |
| "ANATR" | "Mexico"  |
| "ANTON" | "Mexico"  |
+---------------------+

4 rows available after 404 ms, consumed after another 1 ms

However, if your databases db0, db1, db2 are located in other Neo4j DBMSs, on completely different servers for example, then you must update the URI settings to connect to them.

In this tutorial, you will try the Fabric capabilities as if the data is deployed on different servers.

B.3.4.1. Query a single database

You can retrieve data from a single database by using the cypher clause USE and the name of the Fabric graph. When querying a single database, you do not have to change the active database to Fabric.

USE fabricnw.product
MATCH (p:Product)
RETURN p.productName AS product
LIMIT 5;
+--------------------------------+
| product                        |
+--------------------------------+
| "Chai"                         |
| "Chang"                        |
| "Aniseed Syrup"                |
| "Chef Anton's Cajun Seasoning" |
| "Chef Anton's Gumbo Mix"       |
+--------------------------------+

5 rows available after 6 ms, consumed after another 21 ms

B.3.4.2. Query across multiple shards

Use Fabric to query both shards and get customers whose name starts with A.

When you want to retrieve data from multiple databases, you have to change the active database to fabricnw.

neo4j@db2>> :use fabricnw;
USE fabricnw.customerAME
MATCH (c:Customer)
WHERE c.customerID STARTS WITH 'A'
RETURN c.customerID AS name, c.country AS country
  UNION
USE fabricnw.customerEU
MATCH (c:Customer)
WHERE c.customerID STARTS WITH 'A'
RETURN c.customerID AS name, c.country AS country
LIMIT 5;
+---------------------+
| name    | country   |
+---------------------+
| "ANATR" | "Mexico"  |
| "ANTON" | "Mexico"  |
| "ALFKI" | "Germany" |
| "AROUT" | "UK"      |
+---------------------+

4 rows available after 25 ms, consumed after another 56 ms

Or, using a more common Fabric idiom:

UNWIND [1,2]AS gid
CALL {
	USE fabricnw.graph(gid)
	MATCH (c:Customer)
	WHERE c.customerID STARTS WITH 'A'
	RETURN c.customerID AS name, c.country AS country
}
RETURN name, country
LIMIT 5;
+---------------------+
| name    | country   |
+---------------------+
| "ANATR" | "Mexico"  |
| "ANTON" | "Mexico"  |
| "ALFKI" | "Germany" |
| "AROUT" | "UK"      |
+---------------------+

4 rows available after 61 ms, consumed after another 8 ms

B.3.4.3. Query across federation and shards

Finally, a more complex query that uses all 3 databases to find all customers who have bought discontinued products in the Meat/Poultry category.

CALL {
USE fabricnw.product
MATCH (p:Product{discontinued:true})-[:PART_OF]->(c:Category{categoryName:'Meat/Poultry'})
              RETURN COLLECT(p.productID) AS pids
}
WITH *, [g IN fabricnw.graphIds() WHERE g<>0] AS gids
UNWIND gids AS gid
CALL {
	USE fabricnw.graph(gid)
	WITH pids
	UNWIND pids as pid
	MATCH (p:Product{productID:pid})<-[:ORDERS]-(:Order)<-[:PURCHASED]-(c:Customer)
	RETURN DISTINCT c.customerID AS customer, c.country AS country
}
RETURN customer, country
LIMIT 20;
+--------------------------+
| customer | country       |
+--------------------------+
| "RICSU"  | "Switzerland" |
| "PERIC"  | "Mexico"      |
| "WARTH"  | "Finland"     |
| "WELLI"  | "Brazil"      |
| "DRACD"  | "Germany"     |
| "RATTC"  | "USA"         |
| "HUNGO"  | "Ireland"     |
| "QUEDE"  | "Brazil"      |
| "SEVES"  | "UK"          |
| "ANTON"  | "Mexico"      |
| "BERGS"  | "Sweden"      |
| "SAVEA"  | "USA"         |
| "AROUT"  | "UK"          |
| "FAMIA"  | "Brazil"      |
| "WANDK"  | "Germany"     |
| "WHITC"  | "USA"         |
| "ISLAT"  | "UK"          |
| "LONEP"  | "USA"         |
| "QUICK"  | "Germany"     |
| "HILAA"  | "Venezuela"   |
+--------------------------+

20 rows available after 51 ms, consumed after another 2 ms

First, fabricnw calls database db0 to retrieve all discontinued products in the Meat/Poultry category. Then, using the returned product IDs, it queries both db1 and db2 in parallel and gets the customers who have purchased these products and their country.

B.3.5. The end

You have just learned how to store and retrieve data from multiple databases using a single Cypher query. For more details on the Neo4j Fabric, see Chapter 8, Fabric.