apoc.import.csv

Procedure APOC Core

apoc.import.csv(nodes, relationships, config) - imports nodes and relationships from the provided CSV files with given labels and types

Signature

apoc.import.csv(nodes :: LIST? OF MAP?, relationships :: LIST? OF MAP?, config :: MAP?) :: (file :: STRING?, source :: STRING?, format :: STRING?, nodes :: INTEGER?, relationships :: INTEGER?, properties :: INTEGER?, time :: INTEGER?, rows :: INTEGER?, batchSize :: INTEGER?, batches :: INTEGER?, done :: BOOLEAN?, data :: STRING?)

Input parameters

Name Type Default

nodes

LIST? OF MAP?

null

relationships

LIST? OF MAP?

null

config

MAP?

null

Config parameters

The procedure support the following config parameters:

Table 1. Config parameters
name type default description import tool counterpart

delimiter

String

,

delimiter character between columns

--delimiter=,

arrayDelimiter

String

;

delimiter character in arrays

--array-delimiter=;

ignoreDuplicateNodes

Boolean

false

for duplicate nodes, only load the first one and skip the rest (true) or fail the import (false)

--ignore-duplicate-nodes=false

quotationCharacter

String

"

quotation character

--quote='"'

stringIds

Boolean

true

treat ids as strings

--id-type=STRING

skipLines

Integer

1

lines to skip (incl. header)

N/A

ignoreBlankString

Boolean

false

if true ignore properties with a blank string

N/A

ignoreEmptyCellArray

Boolean

false

if true ignore array properties containing a single empty string, like the import tool

N/A

compression

Enum[NONE, BYTES, GZIP, BZIP2, DEFLATE, BLOCK_LZ4, FRAMED_SNAPPY]

null

Allow taking binary data, either not compressed (value: NONE) or compressed (other values) . See the Binary file example

N/A

Output parameters

Name Type

file

STRING?

source

STRING?

format

STRING?

nodes

INTEGER?

relationships

INTEGER?

properties

INTEGER?

time

INTEGER?

rows

INTEGER?

batchSize

INTEGER?

batches

INTEGER?

done

BOOLEAN?

data

STRING?

Usage Examples

This procedure imports CSV files that comply with the Neo4j import tool’s header format.

Nodes

The following file contains two people:

persons.csv
id:ID,name:STRING
1,John
2,Jane

We’ll place this file into the import directory of our Neo4j instance.

We can create two Person nodes with their name properties set, by running the following query:

CALL apoc.import.csv([{fileName: 'file:/persons.csv', labels: ['Person']}], [], {});
Table 2. Results
file source format nodes relationships properties time rows batchSize batches done data

"progress.csv"

"file"

"csv"

2

0

4

7

0

-1

0

TRUE

NULL

We can check what’s been imported by running the following query:

MATCH (p:Person)
RETURN p;
Table 3. Results
p

(:Person {name: "John", id: "1"})

(:Person {name: "Jane", id: "2"})

Nodes and relationships

The following files contain nodes and relationships in CSV format:

people-nodes.csv
:ID|name:STRING|speaks:STRING[]
1|John|en,fr
2|Jane|en,de
knows-rels.csv
:START_ID|:END_ID|since:INT
1|2|2016

We will import two Person nodes and a KNOWS relationship between them (with the value of the since property set). The field terminators and the array delimiters are changed from the default value, and the CSVs use numeric ids.

CALL apoc.import.csv(
  [{fileName: 'file:/people-nodes.csv', labels: ['Person']}],
  [{fileName: 'file:/knows-rels.csv', type: 'KNOWS'}],
  {delimiter: '|', arrayDelimiter: ',', stringIds: false}
);
Table 4. Results
file source format nodes relationships properties time rows batchSize batches done data

"progress.csv"

"file"

"csv"

2

1

7

7

0

-1

0

TRUE

NULL

We can check what’s been imported by running the following query:

MATCH path = (p1:Person)-[:KNOWS]->(p2:Person)
RETURN path;
Table 5. Results
path

(:Person {name: "John", speaks: ["en", "fr"], csv_id: 1})-[:KNOWS {since: 2016}]→(:Person {name: "Jane", speaks: ["en", "de"], csv_id: 2})

Binary file

You can also import a file from a binary byte[] (not compressed, with default value NONE) or a compressed file (allowed compression algos are: GZIP, BZIP2, DEFLATE, BLOCK_LZ4, FRAMED_SNAPPY). Note that in this case, for both relations and nodes parameter maps, the key for the file is data instead of fileName, that is:

CALL apoc.import.csv([{data: `binaryGzipByteArray`, labels: ['Person']}], [{data: `binaryGzipByteArray`, type: 'KNOWS'}], {compression: 'GZIP'})

or:

CALL apoc.import.csv([{data: `binaryFileNotCompressed`, labels: ['Person']}], [{data: `binaryFileNotCompressed`, type: 'KNOWS'}], {compression: 'NONE'})

For example, this one works well with apoc.util.compress function:

WITH apoc.util.compress(':ID|firstname:STRING|lastname:IGNORE|age:INT\n1|John|Doe|25\n2|Jane|Doe|26', {compression: 'DEFLATE'}) AS nodeCsv
WITH apoc.util.compress(':START_ID|:END_ID|prop1:IGNORE|prop2:INT\n1|2|a|3\n2|1|b|6', {compression: 'DEFLATE'}) AS relCsv, nodeCsv
CALL apoc.import.csv([{data: nodeCsv, labels: ['Person']}], [{data: relCsv, type: 'KNOWS'}], {delimiter: '|', compression: 'DEFLATE'})
YIELD source, format, nodes, relationships, properties
RETURN source, format, nodes, relationships, properties
Table 6. Results
source format nodes relationships properties

"binary"

"csv"

2

2

8

Properties

For properties, the <name> part of the field designates the property key, while the <field_type> part (after :) assigns a data type (see below). You can have properties in both node data files and relationship data files.

Use one of int, long, float, double, boolean, byte, short, char, string, point, date, localtime, time, localdatetime, datetime, and duration to designate the data type for properties. If no data type is given, this defaults to string. To define an array type, append [] to the type. For example:

test.csv
:ID,name,joined:date,active:boolean,points:int
user01,Joe Soap,2017-05-05,true,10
user02,Jane Doe,2017-08-21,true,15
user03,Moe Know,2018-02-17,false,7

Point

A point is specified using the Cypher syntax for maps. The map allows the same keys as the input to the Point function. The point data type in the header can be amended with a map of default values used for all values of that column, e.g. point{crs: 'WGS-84'}. Specifying the header this way allows you to have an incomplete map in the value position in the data file. Optionally, a value in a data file may override default values from the header.

point.csv
:ID,name,location:point{crs:WGS-84}
city01,"Malmö","{latitude:55.6121514, longitude:12.9950357}"
city02,"London","{y:51.507222, x:-0.1275}"
city03,"San Mateo","{latitude:37.554167, longitude:-122.313056, height: 100, crs:'WGS-84-3D'}"

In the above csv:

  • the first city’s location is defined using latitude and longitude, as expected when using the coordinate system defined in the header.

  • the second city uses x and y instead. This would normally lead to a point using the coordinate reference system cartesian. Since the header defines crs:WGS-84, that coordinate reference system will be used.

  • the third city overrides the coordinate reference system defined in the header, and sets it explicitly to WGS-84-3D.

Temporal

The format for all temporal data types must be defined as described in Temporal instants syntax and Durations syntax. Two of the temporal types, Time and DateTime, take a time zone parameter which might be common between all or many of the values in the data file. It is therefore possible to specify a default time zone for Time and DateTime values in the header, for example: time{timezone:+02:00} and: datetime{timezone:Europe/Stockholm}.

point.csv
:ID,date1:datetime{timezone:Europe/Stockholm},date2:datetime
1,2018-05-10T10:30,2018-05-10T12:30
2,2018-05-10T10:30[Europe/Berlin],2018-05-10T12:30[Europe/Berlin]

In the above csv:

  • the first row has two values that do not specify an explicit timezone. The value for date1 will use the Europe/Stockholm time zone that was specified for that field in the header. The value for date2 will use the configured default time zone of the database.

  • in the second row, both date1 and date2 set the time zone explicitly to be Europe/Berlin. This overrides the header definition for date1, as well as the configured default time zone of the database.