This section explains the header format of CSV files when using the Neo4j import tool.
The header row of each data source specifies how the data fields should be interpreted. You must use the same delimiter for the header row and for the data rows.
The header contains information for each field, with the format
<name> is used for properties and node IDs.
For details see the relevant sections below.
In all other cases, the
<name> part of the field is ignored.
For properties, the
<name> part of the field designates the property key, while the
<field_type> part assigns a data type (see below).
You can have properties in both node data files and relationship data files.
stringto designate the data type for properties. If no data type is given, this defaults to
string. To define an array type, append
to the type. By default, array values are separated by
;. A different delimiter can be specified with
IGNOREmust be prepended with a
:. See example in the relevant section below.
For files containing node data, there is one mandatory field; the
ID, and one optional field; the
<name>part of the field definition
<name>:ID. If no such property name is defined, the unique ID will be used for the purpose of the import but not be available for reference later.
;, or by the character specified with
We define three movies in the movies.csv file.
They have the properties
All the movies are given the label
Two of them are also given the label
movieId:ID,title,year:int,:LABEL tt0133093,"The Matrix",1999,Movie tt0234215,"The Matrix Reloaded",2003,Movie;Sequel tt0242653,"The Matrix Revolutions",2003,Movie;Sequel
We also define three actors in the actors.csv file.
They all have the properties
name, and the label
personId:ID,name,:LABEL keanu,"Keanu Reeves",Actor laurence,"Laurence Fishburne",Actor carrieanne,"Carrie-Anne Moss",Actor
For files containing relationship data, there are three mandatory fields:
END_ID refer to the unique node ID defined in one of the node data sources, as explained in the previous section.
None of these takes a name, e.g. if
<name>:END_ID is defined, the
<name> part will be ignored.
In this example we assume that the two nodes files from the previous example are used together with the following relationships file.
We define relationships between actors and movies in the file roles.csv.
Each row connects a start node and an end node with a relationship of relationship type
Notice how we use the unique identifiers
movieId from the nodes files above.
The name of character that the actor is playing in this movie is stored as a
role property on the relationship.
:START_ID,role,:END_ID,:TYPE keanu,"Neo",tt0133093,ACTED_IN keanu,"Neo",tt0234215,ACTED_IN keanu,"Neo",tt0242653,ACTED_IN laurence,"Morpheus",tt0133093,ACTED_IN laurence,"Morpheus",tt0234215,ACTED_IN laurence,"Morpheus",tt0242653,ACTED_IN carrieanne,"Trinity",tt0133093,ACTED_IN carrieanne,"Trinity",tt0234215,ACTED_IN carrieanne,"Trinity",tt0242653,ACTED_IN
By default, the import tool assumes that node identifiers are unique across node files.
In many cases the ID is only unique across each entity file, for example when our CSV files contain data extracted from a
relational database and the ID field is pulled from the primary key column in the corresponding table.
To handle this situation we define ID spaces.
ID spaces are defined in the
ID field of node files using the syntax
ID(<ID space identifier>).
To reference an ID of an ID space in a relationship file, we use the syntax
START_ID(<ID space identifier>) and
END_ID(<ID space identifier>).
Movie-ID ID space in the movies.csv file.
movieId:ID(Movie-ID),title,year:int,:LABEL 1,"The Matrix",1999,Movie 2,"The Matrix Reloaded",2003,Movie;Sequel 3,"The Matrix Revolutions",2003,Movie;Sequel
Actor-ID ID space in the header of the actors.csv file.
personId:ID(Actor-ID),name,:LABEL 1,"Keanu Reeves",Actor 2,"Laurence Fishburne",Actor 3,"Carrie-Anne Moss",Actor
Now use the previously defined ID spaces when connecting the actors to movies.
:START_ID(Actor-ID),role,:END_ID(Movie-ID),:TYPE 1,"Neo",1,ACTED_IN 1,"Neo",2,ACTED_IN 1,"Neo",3,ACTED_IN 2,"Morpheus",1,ACTED_IN 2,"Morpheus",2,ACTED_IN 2,"Morpheus",3,ACTED_IN 3,"Trinity",1,ACTED_IN 3,"Trinity",2,ACTED_IN 3,"Trinity",3,ACTED_IN
If you use the
IGNORE keyword in a column position, that field will be ignored completely.
In this example, we are not interested in the data in the third column of the nodes file and wish to skip over it.
Note that the
IGNORE keyword is prepended by a
personId:ID,name,:IGNORE,:LABEL keanu,"Keanu Reeves","male",Actor laurence,"Laurence Fishburne","male",Actor carrieanne,"Carrie-Anne Moss","female",Actor
If all your superfluous data is placed in columns located to the right of all the columns that you wish to import, you can
instead use the command line option