Importing data

There is a wide range of ways to import data from files into a Neo4j instance. This page describes the most common ways to import data into a Neo4j instance running on a Kubernetes cluster.

Importing data into Neo4j on Kubernetes

The Neo4j Helm chart configures a volume mount at /import as the Neo4j import directory, as described in Default file locations. You place all the files that you want to import in this volume.

To import data from CSV files into Neo4j, use the command neo4j-admin database import or the Cypher query LOAD CSV.

  • The neo4j-admin database import command can be used to do batch imports of large amounts of data into a previously unused database and can only be performed once per database.

  • LOAD CSV Cypher statement can be used to import small to medium-sized CSV files into an existing database. LOAD CSV can be run as many times as needed and does not require an empty database. For a simple example, see Getting Started Guide → Import data.

Depending on your Neo4j configuration, some methods support fetching data to import from a remote location (e.g., using HTTP or fetching from cloud object storage). Therefore, it is not always necessary to place the source data files in the Neo4j import directory.

Configure the import volume mount

The default configuration of the /import volume mount is to share the /data volume mount. Generally, this is sufficient, and it is unnecessary to explicitly configure an import volume in the Helm deployment’s values.yaml file. For the full details of configuring volume mounts for a Neo4j Helm deployment, see Volume mounts and persistent volumes.

This example shows how to configure /import to use a dynamically provisioned Persistent Volume of the default StorageClass:

volumes:
  import:
    mode: "defaultStorageClass"
    defaultStorageClass:
      requests:
        storage: 100Gi

Copy files to the import volume using kubectl cp

Files can be copied to the import volume using kubectl cp. This example shows how to copy a local directory my-files/ to /import/files-1 to a Neo4j instance with the release name my-graph-db in the namespace default.

kubectl cp my-files/ default/my-graph-db-0:/import/files-1

# Validate: list the contents of /import/files-1
kubectl exec my-graph-db-0 -- ls /import/files-1

Instead of using kubectl cp, data can also be loaded onto the /import directory by:

  • using an additional container or initContainer to load data.

  • using kubectl exec to run commands to load data.

  • mounting a volume that is already populated with data.

    Data must be placed in the volume’s /import directory.

Use neo4j-admin database import

The simplest way to run neo4j-admin database import is to use kubectl exec to run it in the Neo4j container. However, running neo4j-admin database import to perform a large import in the same container as the Neo4j process may cause resource contention problems, including causing either or both processes to be OOM Killed by the node operating system. To avoid this, either use a separate container or initContainer or place the Neo4j Helm deployment in offline maintenance mode to run neo4j-admin database import.

neo4j-admin database import cannot be used to replace an existing database while Neo4j is running. To replace an existing database, either DROP the database or put the Neo4j Helm deployment into offline maintenance mode before running neo4j-admin database import.

Alternative approach

An alternative approach to importing data into Neo4j is to run a separate Neo4j standalone instance outside Kubernetes, perform the import on that Neo4j instance, and then copy the resulting database into the Kubernetes-based Neo4j instance using the backup and restore or dump and load procedures.