Prerequisites

Neo4j instance

You need a running Neo4j instance which the data can flow into.

If you don’t have an instance yet, you have two options:

  • sign-up for a free AuraDB instance

  • install and self-host Neo4j in a location that is publicly accessible (see Neo4j → Installation) with port 7687 open (Bolt protocol)

Either way, you then need to create a file containing your database connection information in JSON format. We will refer to this file as neo4j-connection-info.json. The file can be uploaded either as a secret to Google Cloud Secret Manager or directly into a Google Cloud Storage bucket.

The basic authentication scheme relies on traditional username and password. This scheme can also be used to authenticate against an LDAP server.

{
  "server_url": "neo4j+s://xxxx.databases.neo4j.io",
  "database": "neo4j",
  "username": "<username>",
  "pwd": "<password>"
}

If authentication is disabled on the server, credentials can be omitted.

{
  "server_url": "neo4j+s://xxxx.databases.neo4j.io",
  "database": "neo4j",
  "auth_type": "none"
}

The Kerberos authentication scheme requires a base64-encoded ticket. It can only be used if the server has the Kerberos Add-on installed.

{
  "server_url": "neo4j+s://xxxx.databases.neo4j.io",
  "database": "neo4j",
  "auth_type": "kerberos",
  "ticket": "<base 64 encoded Kerberos ticket>"
}

The bearer authentication scheme requires a base64-encoded token provided by an Identity Provider through Neo4j’s Single Sign-On feature.

{
  "server_url": "neo4j+s://xxxx.databases.neo4j.io",
  "database": "neo4j",
  "auth_type": "bearer",
  "token": "<bearer token>"
}

To log into a server having a custom authentication scheme.

{
  "server_url": "neo4j+s://xxxx.databases.neo4j.io",
  "database": "neo4j",
  "auth_type": "custom",
  "principal": "<principal>",
  "credentials": "<credentials>",
  "realm": "<realm>",
  "scheme": "<scheme>",
  "parameters": {"<key>": "<value>"}
}

Google Secret Manager

If you wish to store the credentials file as a Google secret, you need access to Google Secret Manager.

Go ahead and create a new secret and upload the neo4j-connection-info.json file as value.

Google Cloud Storage bucket

You need a Google Cloud Storage bucket. This is the one and only location from where the Dataflow job can source files (both configuration files and source CSVs, if any).

Unless you stored the credentials file in Google Secret Manager, go ahead and upload the neo4j-connection-info.json file to your Cloud Storage bucket.

Dataset to import

You need a Google BigQuery dataset that you want to import into Neo4j.

This tutorial uses a subset of the movies dataset. It contains entities Person and Movie, linked together by DIRECTED and ACTED_IN relationships. In other words, each Person may have DIRECTED and/or ACTED_IN a Movie. Both entities and relationships have extra details attached to each of them. The data is sourced from the following files: persons.csv, movies.csv, acted_in.csv, directed.csv.

image$movies model
Since you are moving data from a relational database into a graph database, the data model will have to change. Checkout Graph data modeling guidelines to learn how to model for graph databases.

Google Dataflow job

The Google Dataflow job glues all the pieces together and performs the data import. All the work that is now needed is to craft a job specification file to provide Dataflow with all the information it needs to load the data into Neo4j.

image$google dataflow
All Google-related resources (Cloud project, Cloud Storage buckets, Dataflow job) should either belong to the same account, or to one which the Dataflow job has permissions to access.