apoc.import.csv
Procedure APOC Core
apoc.import.csv(nodes, relationships, config) - imports nodes and relationships from the provided CSV files with given labels and types
Signature
apoc.import.csv(nodes :: LIST? OF MAP?, relationships :: LIST? OF MAP?, config :: MAP?) :: (file :: STRING?, source :: STRING?, format :: STRING?, nodes :: INTEGER?, relationships :: INTEGER?, properties :: INTEGER?, time :: INTEGER?, rows :: INTEGER?, batchSize :: INTEGER?, batches :: INTEGER?, done :: BOOLEAN?, data :: STRING?)
Input parameters
Name | Type | Default |
---|---|---|
nodes |
LIST? OF MAP? |
null |
relationships |
LIST? OF MAP? |
null |
config |
MAP? |
null |
Config parameters
The procedure support the following config parameters:
name | type | default | description | import tool counterpart |
---|---|---|---|---|
delimiter |
String |
, |
delimiter character between columns |
|
arrayDelimiter |
String |
; |
delimiter character in arrays |
|
ignoreDuplicateNodes |
Boolean |
false |
for duplicate nodes, only load the first one and skip the rest (true) or fail the import (false) |
|
quotationCharacter |
String |
" |
quotation character |
|
stringIds |
Boolean |
true |
treat ids as strings |
|
skipLines |
Integer |
1 |
lines to skip (incl. header) |
N/A |
ignoreBlankString |
Boolean |
false |
if true ignore properties with a blank string |
N/A |
ignoreEmptyCellArray |
Boolean |
false |
if true ignore array properties containing a single empty string, like the import tool |
N/A |
compression |
|
|
Allow taking binary data, either not compressed (value: |
N/A |
Output parameters
Name | Type |
---|---|
file |
STRING? |
source |
STRING? |
format |
STRING? |
nodes |
INTEGER? |
relationships |
INTEGER? |
properties |
INTEGER? |
time |
INTEGER? |
rows |
INTEGER? |
batchSize |
INTEGER? |
batches |
INTEGER? |
done |
BOOLEAN? |
data |
STRING? |
Usage Examples
This procedure imports CSV files that comply with the Neo4j import tool’s header format.
Nodes
The following file contains two people:
id:ID,name:STRING
1,John
2,Jane
We’ll place this file into the import
directory of our Neo4j instance.
We can create two Person
nodes with their name
properties set, by running the following query:
CALL apoc.import.csv([{fileName: 'file:/persons.csv', labels: ['Person']}], [], {});
file | source | format | nodes | relationships | properties | time | rows | batchSize | batches | done | data |
---|---|---|---|---|---|---|---|---|---|---|---|
"progress.csv" |
"file" |
"csv" |
2 |
0 |
4 |
7 |
0 |
-1 |
0 |
TRUE |
NULL |
We can check what’s been imported by running the following query:
MATCH (p:Person)
RETURN p;
p |
---|
(:Person {name: "John", id: "1"}) |
(:Person {name: "Jane", id: "2"}) |
Nodes and relationships
The following files contain nodes and relationships in CSV format:
:ID|name:STRING|speaks:STRING[]
1|John|en,fr
2|Jane|en,de
:START_ID|:END_ID|since:INT
1|2|2016
We will import two Person
nodes and a KNOWS
relationship between them (with the value of the since
property set).
The field terminators and the array delimiters are changed from the default value, and the CSVs use numeric ids.
CALL apoc.import.csv(
[{fileName: 'file:/people-nodes.csv', labels: ['Person']}],
[{fileName: 'file:/knows-rels.csv', type: 'KNOWS'}],
{delimiter: '|', arrayDelimiter: ',', stringIds: false}
);
file | source | format | nodes | relationships | properties | time | rows | batchSize | batches | done | data |
---|---|---|---|---|---|---|---|---|---|---|---|
"progress.csv" |
"file" |
"csv" |
2 |
1 |
7 |
7 |
0 |
-1 |
0 |
TRUE |
NULL |
We can check what’s been imported by running the following query:
MATCH path = (p1:Person)-[:KNOWS]->(p2:Person)
RETURN path;
path |
---|
(:Person {name: "John", speaks: ["en", "fr"], csv_id: 1})-[:KNOWS {since: 2016}]→(:Person {name: "Jane", speaks: ["en", "de"], csv_id: 2}) |
Binary file
You can also import a file from a binary byte[]
(not compressed, with default value NONE
) or a compressed file (allowed compression algos are: GZIP
, BZIP2
, DEFLATE
, BLOCK_LZ4
, FRAMED_SNAPPY
).
Note that in this case, for both relations
and nodes
parameter maps, the key for the file is data
instead of fileName
, that is:
CALL apoc.import.csv([{data: `binaryGzipByteArray`, labels: ['Person']}], [{data: `binaryGzipByteArray`, type: 'KNOWS'}], {compression: 'GZIP'})
or:
CALL apoc.import.csv([{data: `binaryFileNotCompressed`, labels: ['Person']}], [{data: `binaryFileNotCompressed`, type: 'KNOWS'}], {compression: 'NONE'})
For example, this one works well with apoc.util.compress function:
WITH apoc.util.compress(':ID|firstname:STRING|lastname:IGNORE|age:INT\n1|John|Doe|25\n2|Jane|Doe|26', {compression: 'DEFLATE'}) AS nodeCsv
WITH apoc.util.compress(':START_ID|:END_ID|prop1:IGNORE|prop2:INT\n1|2|a|3\n2|1|b|6', {compression: 'DEFLATE'}) AS relCsv, nodeCsv
CALL apoc.import.csv([{data: nodeCsv, labels: ['Person']}], [{data: relCsv, type: 'KNOWS'}], {delimiter: '|', compression: 'DEFLATE'})
YIELD source, format, nodes, relationships, properties
RETURN source, format, nodes, relationships, properties
source | format | nodes | relationships | properties |
---|---|---|---|---|
"binary" |
"csv" |
2 |
2 |
8 |
Properties
For properties, the <name>
part of the field designates the property key,
while the <field_type>
part (after :
) assigns a data type (see below).
You can have properties in both node data files and relationship data files.
Use one of int
, long
, float
, double
, boolean
, byte
, short
, char
, string
, point
, date
, localtime
, time
, localdatetime
, datetime
, and duration
to designate the data type for properties.
If no data type is given, this defaults to string. To define an array type, append []
to the type.
For example:
:ID,name,joined:date,active:boolean,points:int
user01,Joe Soap,2017-05-05,true,10
user02,Jane Doe,2017-08-21,true,15
user03,Moe Know,2018-02-17,false,7
Point
A point is specified using the Cypher syntax for maps. The map allows the same keys as the input to the Point function.
The point data type in the header can be amended with a map of default values used for all values of that column, e.g. point{crs: 'WGS-84'}
.
Specifying the header this way allows you to have an incomplete map in the value position in the data file. Optionally, a value in a data file may override default values from the header.
:ID,name,location:point{crs:WGS-84}
city01,"Malmö","{latitude:55.6121514, longitude:12.9950357}"
city02,"London","{y:51.507222, x:-0.1275}"
city03,"San Mateo","{latitude:37.554167, longitude:-122.313056, height: 100, crs:'WGS-84-3D'}"
In the above csv:
-
the first city’s location is defined using
latitude
andlongitude
, as expected when using the coordinate system defined in the header. -
the second city uses
x
andy
instead. This would normally lead to a point using the coordinate reference system cartesian. Since the header defines crs:WGS-84, that coordinate reference system will be used. -
the third city overrides the coordinate reference system defined in the header, and sets it explicitly to WGS-84-3D.
Temporal
The format for all temporal data types must be defined as described in Temporal instants syntax and Durations syntax.
Two of the temporal types, Time and DateTime, take a time zone parameter which might be common between all or many of the values in the data file.
It is therefore possible to specify a default time zone for Time and DateTime values in the header, for example: time{timezone:+02:00}
and: datetime{timezone:Europe/Stockholm}
.
:ID,date1:datetime{timezone:Europe/Stockholm},date2:datetime
1,2018-05-10T10:30,2018-05-10T12:30
2,2018-05-10T10:30[Europe/Berlin],2018-05-10T12:30[Europe/Berlin]
In the above csv:
-
the first row has two values that do not specify an explicit timezone. The value for
date1
will use theEurope/Stockholm
time zone that was specified for that field in the header. The value fordate2
will use the configured default time zone of the database. -
in the second row, both
date1
anddate2
set the time zone explicitly to beEurope/Berlin
. This overrides the header definition for date1, as well as the configured default time zone of the database.