apoc.import.csv

Importing from local files requires setting apoc.import.file.enabled=true in apoc.conf. This is not supported on Aura. Aura instances are therefore limited to importing publicly hosted files.

Details
Syntax	`apoc.import.csv(nodes, rels, config) :: (file, source, format, nodes, relationships, properties, time, rows, batchSize, batches, done, data)`
Description	Imports `NODE` and `RELATIONSHIP` values with the given labels and types from the provided CSV file.
Input arguments	Name	Type	Description
	`nodes`	`LIST<MAP>`	List of map values of where to import the node values from; `{ fileName :: STRING, data :: BYTEARRAY, labels :: LIST<STRING> }`.
	`rels`	`LIST<MAP>`	List of map values specifying where to import relationship values from: `{ fileName :: STRING, data :: BYTEARRAY, type :: STRING }.
	`config`	`MAP`	`{ delimiter = "," :: STRING, arrayDelimiter = ";" :: STRING, ignoreDuplicateNodes = false :: BOOLEAN, quotationCharacter = """ :: STRING, stringIds = true :: BOOLEAN, skipLines = 1 :: INTEGER, ignoreBlankString = false :: BOOLEAN, ignoreEmptyCellArray = false :: BOOLEAN, compression = "NONE" :: ["NONE", "BYTES", "GZIP", "BZIP2", "DEFLATE", "BLOCK_LZ4", "FRAMED_SNAPPY"], charset = "UTF-8" :: STRING, batchSize = 2000 :: INTEGER }`
Return arguments	Name	Type	Description
	`file`	`STRING`	The name of the file from which the data was imported.
	`source`	`STRING`	The source of the imported data: "file", "binary" or "file/binary".
	`format`	`STRING`	The format of the file: ["csv", "graphml", "json"].
	`nodes`	`INTEGER`	The number of imported nodes.
	`relationships`	`INTEGER`	The number of imported relationships.
	`properties`	`INTEGER`	The number of imported properties.
	`time`	`INTEGER`	The duration of the import.
	`rows`	`INTEGER`	The number of rows returned.
	`batchSize`	`INTEGER`	The size of the batches the import was run in.
	`batches`	`INTEGER`	The number of batches the import was run in.
	`done`	`BOOLEAN`	Whether the import ran successfully.
	`data`	`ANY`	The data returned by the import.

Config parameters

The procedure support the following config parameters:

Config parameters
Name	Type	Default	Description	Import tool counterpart
`delimiter`	`STRING`	,	Delimiter character between columns	`--delimiter=,`
`arrayDelimiter`	`STRING`	;	Delimiter character in arrays	`--array-delimiter=;`
`ignoreDuplicateNodes`	`BOOLEAN`	false	For duplicate nodes, only load the first one and skip the rest (true) or fail the import (false)	`--ignore-duplicate-nodes=false`
`quotationCharacter`	`STRING`	"	Quotation character	`--quote='"'`
`stringIds`	`BOOLEAN`	true	Treat ids as strings	`--id-type=STRING`
`skipLines`	`INTEGER`	1	Lines to skip (incl. header)	N/A
`ignoreBlankString`	`BOOLEAN`	false	If true ignore properties with a blank string	N/A
`ignoreEmptyCellArray`	`BOOLEAN`	false	If true ignore array properties containing a single empty string, like the import tool	N/A
`compression`	`Enum[NONE, BYTES, GZIP, BZIP2, DEFLATE, BLOCK_LZ4, FRAMED_SNAPPY]`	`null`	Allow taking binary data, either not compressed (value: `NONE`) or compressed (other values) .	N/A
`charset`	`STRING`	'UTF-8'	Name of the character extending java.nio.Charset in the currently used JDK. E.g.: `US-ASCII`, `ISO-8859-1`, `UTF-8`, `UTF-16`	`--input-encoding`
`batchSize`	`INTEGER`	2000	Commits and continues after the defined number of rows have been processed	N/A

Usage Examples

This procedure imports CSV files that comply with the Neo4j import tool’s header format.

Nodes

The following file contains two people:

persons.csv

id:ID,name:STRING
1,John
2,Jane

We’ll place this file into the import directory of our Neo4j instance.

We can create two Person nodes with their name properties set, by running the following query:

CALL apoc.import.csv([{fileName: 'file:/persons.csv', labels: ['Person']}], [], {});

Results
file	source	format	nodes	relationships	properties	time	rows	batchSize	batches	done	data
"progress.csv"	"file"	"csv"	2	0	4	7	0	-1	0	TRUE	NULL

We can check what’s been imported by running the following query:

MATCH (p:Person)
RETURN p;

Results
p
(:Person {name: "John", id: "1"})
(:Person {name: "Jane", id: "2"})

Nodes and relationships

The following files contain NODE and RELATIONSHIP values in CSV format:

people-nodes.csv

:ID|name:STRING|speaks:STRING[]
1|John|en,fr
2|Jane|en,de

knows-rels.csv

:START_ID|:END_ID|since:INT
1|2|2016

We will import two Person nodes and a KNOWS relationship between them (with the value of the since property set). The field terminators and the array delimiters are changed from the default value, and the CSVs use numeric ids.

CALL apoc.import.csv(
  [{fileName: 'file:/people-nodes.csv', labels: ['Person']}],
  [{fileName: 'file:/knows-rels.csv', type: 'KNOWS'}],
  {delimiter: '|', arrayDelimiter: ',', stringIds: false}
);

Results
file	source	format	nodes	relationships	properties	time	rows	batchSize	batches	done	data
"progress.csv"	"file"	"csv"	2	1	7	7	0	-1	0	TRUE	NULL

We can check what’s been imported by running the following query:

MATCH path = (p1:Person)-[:KNOWS]->(p2:Person)
RETURN path;

Results
path
(:Person {name: "John", speaks: ["en", "fr"], csv_id: 1})-[:KNOWS {since: 2016}]→(:Person {name: "Jane", speaks: ["en", "de"], csv_id: 2})

Binary file

You can also import a file from a binary byte[] (not compressed, with default value NONE) or a compressed file (allowed compression algos are: GZIP, BZIP2, DEFLATE, BLOCK_LZ4, FRAMED_SNAPPY). Note that in this case, for both relations and nodes parameter maps, the key for the file is data instead of fileName, that is:

CALL apoc.import.csv([{data: `binaryGzipByteArray`, labels: ['Person']}], [{data: `binaryGzipByteArray`, type: 'KNOWS'}], {compression: 'GZIP'})

or:

CALL apoc.import.csv([{data: `binaryFileNotCompressed`, labels: ['Person']}], [{data: `binaryFileNotCompressed`, type: 'KNOWS'}], {compression: 'NONE'})

For example, this one works well with apoc.util.compress function:

WITH apoc.util.compress(':ID|firstname:STRING|lastname:IGNORE|age:INT\n1|John|Doe|25\n2|Jane|Doe|26', {compression: 'DEFLATE'}) AS nodeCsv
WITH apoc.util.compress(':START_ID|:END_ID|prop1:IGNORE|prop2:INT\n1|2|a|3\n2|1|b|6', {compression: 'DEFLATE'}) AS relCsv, nodeCsv
CALL apoc.import.csv([{data: nodeCsv, labels: ['Person']}], [{data: relCsv, type: 'KNOWS'}], {delimiter: '|', compression: 'DEFLATE'})
YIELD source, format, nodes, relationships, properties
RETURN source, format, nodes, relationships, properties

Results
source	format	nodes	relationships	properties
"binary"	"csv"	2	2	8

Properties

For properties, the <name> part of the field designates the property key, while the <field_type> part (after :) assigns a data type (see below). You can have properties in both node data files and relationship data files.

Use one of int, long, float, double, boolean, byte, short, char, string, point, date, localtime, time, localdatetime, datetime, and duration to designate the data type for properties. If no data type is given, this defaults to string. To define an array type, append [] to the type. For example:

test.csv

:ID,name,joined:date,active:boolean,points:int
user01,Joe Soap,2017-05-05,true,10
user02,Jane Doe,2017-08-21,true,15
user03,Moe Know,2018-02-17,false,7

Point

A point is specified using the Cypher syntax for maps. The map allows the same keys as the input to the Point function. The point data type in the header can be amended with a MAP of default values used for all values of that column, e.g. point{crs: 'WGS-84'}. Specifying the header this way allows you to have an incomplete map in the value position in the data file. Optionally, a value in a data file may override default values from the header.

point.csv

:ID,name,location:point{crs:WGS-84}
city01,"Malmö","{latitude:55.6121514, longitude:12.9950357}"
city02,"London","{y:51.507222, x:-0.1275}"
city03,"San Mateo","{latitude:37.554167, longitude:-122.313056, height: 100, crs:'WGS-84-3D'}"

In the above csv:

the first city’s location is defined using latitude and longitude, as expected when using the coordinate system defined in the header.
the second city uses x and y instead. This would normally lead to a point using the coordinate reference system cartesian. Since the header defines crs:WGS-84, that coordinate reference system will be used.
the third city overrides the coordinate reference system defined in the header, and sets it explicitly to WGS-84-3D.

Temporal

The format for all temporal data types must be defined as described in Temporal instants syntax and Durations syntax. Two of the temporal types, Time and DateTime, take a time zone parameter which might be common between all or many of the values in the data file. It is therefore possible to specify a default time zone for Time and DateTime values in the header, for example: time{timezone:+02:00} and: datetime{timezone:Europe/Stockholm}.

point.csv

:ID,date1:datetime{timezone:Europe/Stockholm},date2:datetime
1,2018-05-10T10:30,2018-05-10T12:30
2,2018-05-10T10:30[Europe/Berlin],2018-05-10T12:30[Europe/Berlin]

In the above csv:

the first row has two values that do not specify an explicit timezone. The value for date1 will use the Europe/Stockholm time zone that was specified for that field in the header. The value for date2 will use the configured default time zone of the database.
in the second row, both date1 and date2 set the time zone explicitly to be Europe/Berlin. This overrides the header definition for date1, as well as the configured default time zone of the database.