Import Data

This section describes how to import data into a standalone Neo4j instance on a Kubernetes cluster.

1. Importing data into Neo4j on Kubernetes

The Neo4j Helm chart configures a volume mount at /import as the Neo4j import directory, as described in File locations. You place all the files that you want to import in this volume.

To import data from CSV files into Neo4j, use the command neo4j-admin import or the cypher query LOAD CSV.

  • The neo4j-admin import command can be used to do batch imports of large amounts of data into a previously unused database and can only be performed once per database.

  • LOAD CSV cypher statement can be used to import small to medium-sized CSV files into an existing database. LOAD CSV can be run as many times as needed and does not require an empty database. For a simple example, see Getting Started Guide → Import data.

Depending on your Neo4j configuration, some methods support fetching data to import from a remote location (e.g., using HTTP or fetching from cloud object storage). Therefore, it is not always necessary to place the source data files in the Neo4j import directory.

2. Configure the import volume mount

The default configuration of the /import volume mount is to share the /data volume mount. Generally, this is sufficient, and it is unnecessary to explicitly configure an import volume in the Helm deployment’s values.yaml file. For the full details of configuring volume mounts for a Neo4j Helm deployment, see Volume mounts and persistent volumes.

This example shows how to configure /import to use a dynamically provisioned Persistent Volume of the default StorageClass:

volumes:
  import:
    mode: "defaultStorageClass"
    defaultStorageClass:
      requests:
        storage: 100Gi

3. Copy files to the import volume using kubectl cp

Files can be copied to the import volume using kubectl cp. This example shows how to copy a local directory my-files/ to /import/files-1 to a Neo4j instance with the release name my-graph-db in the namespace default.

kubectl cp my-files/ default/my-graph-db-0:/import/files-1

# Validate: list the contents of /import/files-1
kubectl exec my-graph-db-0 -- ls /import/files-1

Instead of using kubectl cp, data can also be loaded onto the /import directory by:

  • using an additional container or initContainer to load data.

  • using kubectl exec to run commands to load data.

  • mounting a volume that is already populated with data.

    Data must be placed in the volume’s /import directory.

4. Use neo4j-admin import

The simplest way to run neo4j-admin import is to use kubectl exec to run it in the Neo4j container. However, running neo4j-admin import to perform a large import in the same container as the Neo4j process may cause resource contention problems, including causing either or both processes to be OOM Killed by the node operating system. To avoid this, either use a separate container or initContainer or place the Neo4j Helm deployment in offline maintenance mode to run neo4j-admin import.

neo4j-admin import cannot be used to replace an existing database while Neo4j is running. To replace an existing database, either DROP the database or put the Neo4j Helm deployment into offline maintenance mode before running neo4j-admin import.

5. Alternative approach

An alternative approach to importing data into Neo4j is to run a separate Neo4j standalone instance outside Kubernetes, perform the import on that Neo4j instance, and then copy the resulting database into the Kubernetes-based Neo4j instance using the backup and restore or dump and load procedures.