Load CSV
Many existing applications and data integrations use CSV as the minimal denominator format. CSV files contain text with delimiters (most often comma, but also tab (TSV) and colon (DSV)) separating columns and newlines for rows. Fields are possibly quoted to handle stray quotes, newlines, and the use of the delimiter within a field.
In Cypher it is supported by LOAD CSV
and with the neo4j-import
(neo4j-admin import
) tool for bulk imports.
The existing LOAD CSV
works ok for most uses, but has a few features missing, that apoc.load.csv
and apoc.load.xls
add.
-
provide a line number
-
provide both a map and a list representation of each line
-
automatic data conversion (including split into arrays)
-
option to keep the original string formatted values
-
ignoring fields (makes it easier to assign a full line as properties)
-
headerless files
-
replacing certain values with null
The APOC procedures also support reading compressed files.
The data conversion is useful for setting properties directly, but for computation within Cypher it’s problematic as Cypher doesn’t know the type of map values so they default to Any
.
To use them correctly, you’ll have to indicate their type to Cypher by using the built-in (e.g. toInteger
) or apoc (e.g. apoc.convert.toBoolean
) conversion functions on the value.
For reading from files you’ll have to enable the config option:
apoc.import.file.enabled=true
By default file paths are global, for paths relative to the import
directory set:
apoc.import.file.use_neo4j_config=true
Examples for apoc.load.csv
name,age,beverage Selma,9,Soda Rana,12,Tea;Milk Selina,19,Cola
CALL apoc.load.csv('test.csv')
YIELD lineNo, map, list
RETURN *;
lineNo | list | map |
---|---|---|
0 |
["Selma", "9", "Soda"] |
{name: "Selma", age: "9", beverage: "Soda"} |
1 |
["Rana", "12", "Tea;Milk"] |
{name: "Rana", age: "12", beverage: "Tea;Milk"} |
2 |
["Selina", "19", "Cola"] |
{name: "Selina", age: "19", beverage: "Cola"} |
Configuration Options
Besides the file you can pass in a config map:
name | default | description |
---|---|---|
|
|
skip result rows |
|
|
limit result rows |
|
|
indicates if file has a header |
|
|
separator character or 'TAB' |
|
|
the char to use for quoted elements |
|
|
array separator |
|
|
which columns to ignore |
|
|
which values to treat as null, e.g. |
|
|
per field mapping, entry key is field name, .e.g |
|
|
true |
name | default | description |
---|---|---|
|
|
'int', 'string' etc. |
|
|
indicates if field is an array |
|
|
separator for array |
|
|
rename field |
|
|
ignore/remove this field |
|
|
which values to treat as null, e.g. |
CALL apoc.load.csv('test.csv', {skip:1, limit:1, header:true, ignore:['name'],
mapping:{
age: {type:'int'},
beverage: {array:true, arraySep:';', name:'drinks'}
}
})
YIELD lineNo, map, list
RETURN *;
lineNo | list | map |
---|---|---|
1 |
[12,["Tea","Milk"]] |
{"age":12,"drinks":["Tea","Milk"]} |
Transaction Batching
To handle large files, USING PERIODIC COMMIT
can be prepended to LOAD CSV
, you’ll have to watch out though for Eager operations which might break that behavior.
In apoc you can combine any data source with apoc.periodic.iterate
to achieve the same.
CALL apoc.periodic.iterate('
CALL apoc.load.csv({url}) yield map as row return row
','
CREATE (p:Person) SET p = row
', {batchSize:10000, iterateList:true, parallel:true});
Please note that the parallel operation only works well for non-conflicting updates otherwise you might run into deadlocks. |
To make these data structures available to Cypher, you can use apoc.load.xml
.
It takes a file or http URL and parses the XML into a map data structure.
See the following usage-examples for the procedures.
Error handling
You can use failOnError
configuration to handle the result in case of incorrect url or csv.
For example, with the help of the apoc.when
procedure, you can return nothingToList
and nothingToMap
as list and map result, with incorrect url:
CALL apoc.load.csv("MY_CSV_URL", {failOnError:false})
YIELD list, map
WITH list, map
call apoc.do.when(list = [], "return 'nothingToList' as list, 'nothingToMap' as map", "return list, map", {list: list, map: map})
YIELD value
RETURN value["list"], value["map"]