apoc.import.csv

Procedure APOC Core

apoc.import.csv(nodes, relationships, config) - imports nodes and relationships from the provided CSV files with given labels and types

Signature

apoc.import.csv(nodes :: LIST? OF MAP?, relationships :: LIST? OF MAP?, config :: MAP?) :: (file :: STRING?, source :: STRING?, format :: STRING?, nodes :: INTEGER?, relationships :: INTEGER?, properties :: INTEGER?, time :: INTEGER?, rows :: INTEGER?, batchSize :: INTEGER?, batches :: INTEGER?, done :: BOOLEAN?, data :: STRING?)

Input parameters

Name	Type	Default
nodes	LIST? OF MAP?	null
relationships	LIST? OF MAP?	null
config	MAP?	null

Name

Type

Default

nodes

LIST? OF MAP?

null

relationships

LIST? OF MAP?

null

config

MAP?

null

Config parameters

The procedure support the following config parameters:

Table 1. Config parameters
name	type	default	description	import tool counterpart
delimiter	String	,	delimiter character between columns	`--delimiter=,`
arrayDelimiter	String	;	delimiter character in arrays	`--array-delimiter=;`
ignoreDuplicateNodes	Boolean	false	for duplicate nodes, only load the first one and skip the rest (true) or fail the import (false)	`--ignore-duplicate-nodes=false`
quotationCharacter	String	"	quotation character	`--quote='"'`
stringIds	Boolean	true	treat ids as strings	`--id-type=STRING`
skipLines	Integer	1	lines to skip (incl. header)	N/A
ignoreBlankString	Boolean	false	if true ignore properties with a blank string	N/A
ignoreEmptyCellArray	Boolean	false	if true ignore array properties containing a single empty string, like the import tool	N/A
compression	`Enum[NONE, BYTES, GZIP, BZIP2, DEFLATE, BLOCK_LZ4, FRAMED_SNAPPY]`	`null`	Allow taking binary data, either not compressed (value: `NONE`) or compressed (other values) . See the Binary file example	N/A

Output parameters

Name	Type
file	STRING?
source	STRING?
format	STRING?
nodes	INTEGER?
relationships	INTEGER?
properties	INTEGER?
time	INTEGER?
rows	INTEGER?
batchSize	INTEGER?
batches	INTEGER?
done	BOOLEAN?
data	STRING?

Name

Type

file

STRING?

source

STRING?

format

STRING?

nodes

INTEGER?

relationships

INTEGER?

properties

INTEGER?

time

INTEGER?

rows

INTEGER?

batchSize

INTEGER?

batches

INTEGER?

done

BOOLEAN?

data

STRING?

Usage Examples

This procedure imports CSV files that comply with the Neo4j import tool’s header format.

Nodes

The following file contains two people:

persons.csv

id:ID,name:STRING
1,John
2,Jane

We’ll place this file into the import directory of our Neo4j instance.

We can create two Person nodes with their name properties set, by running the following query:

CALL apoc.import.csv([{fileName: 'file:/persons.csv', labels: ['Person']}], [], {});

Table 2. Results
file	source	format	nodes	relationships	properties	time	rows	batchSize	batches	done	data
"progress.csv"	"file"	"csv"	2	0	4	7	0	-1	0	TRUE	NULL

We can check what’s been imported by running the following query:

MATCH (p:Person)
RETURN p;

Table 3. Results
p
(:Person {name: "John", id: "1"})
(:Person {name: "Jane", id: "2"})

Nodes and relationships

The following files contain nodes and relationships in CSV format:

people-nodes.csv

:ID|name:STRING|speaks:STRING[]
1|John|en,fr
2|Jane|en,de

knows-rels.csv

:START_ID|:END_ID|since:INT
1|2|2016

We will import two Person nodes and a KNOWS relationship between them (with the value of the since property set). The field terminators and the array delimiters are changed from the default value, and the CSVs use numeric ids.

CALL apoc.import.csv(
  [{fileName: 'file:/people-nodes.csv', labels: ['Person']}],
  [{fileName: 'file:/knows-rels.csv', type: 'KNOWS'}],
  {delimiter: '|', arrayDelimiter: ',', stringIds: false}
);

Table 4. Results
file	source	format	nodes	relationships	properties	time	rows	batchSize	batches	done	data
"progress.csv"	"file"	"csv"	2	1	7	7	0	-1	0	TRUE	NULL

We can check what’s been imported by running the following query:

MATCH path = (p1:Person)-[:KNOWS]->(p2:Person)
RETURN path;

Table 5. Results
path
(:Person {name: "John", speaks: ["en", "fr"], csv_id: 1})-[:KNOWS {since: 2016}]→(:Person {name: "Jane", speaks: ["en", "de"], csv_id: 2})

Binary file

You can also import a file from a binary byte[] (not compressed, with default value NONE) or a compressed file (allowed compression algos are: GZIP, BZIP2, DEFLATE, BLOCK_LZ4, FRAMED_SNAPPY). Note that in this case, for both relations and nodes parameter maps, the key for the file is data instead of fileName, that is:

CALL apoc.import.csv([{data: `binaryGzipByteArray`, labels: ['Person']}], [{data: `binaryGzipByteArray`, type: 'KNOWS'}], {compression: 'GZIP'})

or:

CALL apoc.import.csv([{data: `binaryFileNotCompressed`, labels: ['Person']}], [{data: `binaryFileNotCompressed`, type: 'KNOWS'}], {compression: 'NONE'})

For example, this one works well with apoc.util.compress function:

WITH apoc.util.compress(':ID|firstname:STRING|lastname:IGNORE|age:INT\n1|John|Doe|25\n2|Jane|Doe|26', {compression: 'DEFLATE'}) AS nodeCsv
WITH apoc.util.compress(':START_ID|:END_ID|prop1:IGNORE|prop2:INT\n1|2|a|3\n2|1|b|6', {compression: 'DEFLATE'}) AS relCsv, nodeCsv
CALL apoc.import.csv([{data: nodeCsv, labels: ['Person']}], [{data: relCsv, type: 'KNOWS'}], {delimiter: '|', compression: 'DEFLATE'})
YIELD source, format, nodes, relationships, properties
RETURN source, format, nodes, relationships, properties

Table 6. Results
source	format	nodes	relationships	properties
"binary"	"csv"	2	2	8

Properties

For properties, the <name> part of the field designates the property key, while the <field_type> part (after :) assigns a data type (see below). You can have properties in both node data files and relationship data files.

Use one of int, long, float, double, boolean, byte, short, char, string, point, date, localtime, time, localdatetime, datetime, and duration to designate the data type for properties. If no data type is given, this defaults to string. To define an array type, append [] to the type. For example:

test.csv

:ID,name,joined:date,active:boolean,points:int
user01,Joe Soap,2017-05-05,true,10
user02,Jane Doe,2017-08-21,true,15
user03,Moe Know,2018-02-17,false,7

Point

A point is specified using the Cypher syntax for maps. The map allows the same keys as the input to the Point function. The point data type in the header can be amended with a map of default values used for all values of that column, e.g. point{crs: 'WGS-84'}. Specifying the header this way allows you to have an incomplete map in the value position in the data file. Optionally, a value in a data file may override default values from the header.

point.csv

:ID,name,location:point{crs:WGS-84}
city01,"Malmö","{latitude:55.6121514, longitude:12.9950357}"
city02,"London","{y:51.507222, x:-0.1275}"
city03,"San Mateo","{latitude:37.554167, longitude:-122.313056, height: 100, crs:'WGS-84-3D'}"

In the above csv:

the first city’s location is defined using latitude and longitude, as expected when using the coordinate system defined in the header.
the second city uses x and y instead. This would normally lead to a point using the coordinate reference system cartesian. Since the header defines crs:WGS-84, that coordinate reference system will be used.
the third city overrides the coordinate reference system defined in the header, and sets it explicitly to WGS-84-3D.

Temporal

The format for all temporal data types must be defined as described in Temporal instants syntax and Durations syntax. Two of the temporal types, Time and DateTime, take a time zone parameter which might be common between all or many of the values in the data file. It is therefore possible to specify a default time zone for Time and DateTime values in the header, for example: time{timezone:+02:00} and: datetime{timezone:Europe/Stockholm}.

point.csv

:ID,date1:datetime{timezone:Europe/Stockholm},date2:datetime
1,2018-05-10T10:30,2018-05-10T12:30
2,2018-05-10T10:30[Europe/Berlin],2018-05-10T12:30[Europe/Berlin]

In the above csv:

the first row has two values that do not specify an explicit timezone. The value for date1 will use the Europe/Stockholm time zone that was specified for that field in the header. The value for date2 will use the configured default time zone of the database.
in the second row, both date1 and date2 set the time zone explicitly to be Europe/Berlin. This overrides the header definition for date1, as well as the configured default time zone of the database.