Appendix B. Tutorial

Table of Contents

This chapter contains tutorials for deploying and operating Neo4j.

B.1. Set up a Neo4j cluster

This guide will give step-by-step instructions for setting up a basic cluster of three separate machines. For a description of the clustering architecture and related design considerations, refer to Introduction.

B.1.1. Download and configure

  • Download Neo4j Enterprise Edition from the Neo4j download site, and unpack on three separate machines.
  • Configure the HA related settings for each installation as outlined below. Note that all three installations have the same configuration except for the ha.server_id property.
Neo4j instance #1 — neo4j-01.local

conf/neo4j.conf. 

# Unique server id for this Neo4j instance
# can not be negative id and must be unique
ha.server_id = 1

# List of other known instances in this cluster
ha.initial_hosts = neo4j-01.local:5001,neo4j-02.local:5001,neo4j-03.local:5001
# Alternatively, use IP addresses:
#ha.initial_hosts = 192.168.0.20:5001,192.168.0.21:5001,192.168.0.22:5001

# HA - High Availability
# SINGLE - Single mode, default.
dbms.mode=HA

dbms.connector.http.type=HTTP
dbms.connector.http.enabled=true
dbms.connector.http.address=0.0.0.0:7474

Neo4j instance #2 — neo4j-02.local

conf/neo4j.conf. 

# Unique server id for this Neo4j instance
# can not be negative id and must be unique
ha.server_id = 2

# List of other known instances in this cluster
ha.initial_hosts = neo4j-01.local:5001,neo4j-02.local:5001,neo4j-03.local:5001
# Alternatively, use IP addresses:
#ha.initial_hosts = 192.168.0.20:5001,192.168.0.21:5001,192.168.0.22:5001

# HA - High Availability
# SINGLE - Single mode, default.
dbms.mode=HA

dbms.connector.http.type=HTTP
dbms.connector.http.enabled=true
dbms.connector.http.address=0.0.0.0:7474

Neo4j instance #3 — neo4j-03.local

conf/neo4j.conf. 

# Unique server id for this Neo4j instance
# can not be negative id and must be unique
ha.server_id = 3

# List of other known instances in this cluster
ha.initial_hosts = neo4j-01.local:5001,neo4j-02.local:5001,neo4j-03.local:5001
# Alternatively, use IP addresses:
#ha.initial_hosts = 192.168.0.20:5001,192.168.0.21:5001,192.168.0.22:5001

# HA - High Availability
# SINGLE - Single mode, default.
dbms.mode=HA

dbms.connector.http.type=HTTP
dbms.connector.http.enabled=true
dbms.connector.http.address=0.0.0.0:7474

B.1.2. Start the Neo4j Servers

Start the Neo4j servers as usual. Note that the startup order does not matter.

neo4j-01$ ./bin/neo4j start
neo4j-02$ ./bin/neo4j start
neo4j-03$ ./bin/neo4j start

When running in HA mode, the startup script returns immediately instead of waiting for the server to become available. The database will be unavailable until all members listed in ha.initial_hosts are online and communicating with each other. In the example above this happens when you have started all three instances. To keep track of the startup state you can follow the messages in neo4j.log — the path is printed before the startup script returns.

Now, you should be able to access the three servers and check their HA status. Open the locations below in a web browser and issue the following command in the editor after having set a password for the database: :play sysinfo

  • http://neo4j-01.local:7474/
  • http://neo4j-02.local:7474/
  • http://neo4j-03.local:7474/

You can replace database #3 with an 'arbiter' instance, see Section 2.4.2, “Arbiter instances”.

That’s it! You now have a Neo4j cluster of three instances running. You can start by making a change on any instance and those changes will be propagated between them. For more cluster related configuration options take a look at Section 2.4.1, “Setup and configuration”.

B.2. Set up a local cluster

If you want to start a cluster similar to the one described above, but for development and testing purposes, it is convenient to run all Neo4j instances on the same machine. This is easy to achieve, although it requires some additional configuration as the defaults will conflict with each other. Furthermore, the default dbms.memory.pagecache.size assumes that Neo4j has the machine to itself. If we in this example assume that the machine has 4 gigabytes of memory, and that each JVM consumes 500 megabytes of memory, then we can allocate 500 megabytes of memory to the page cache of each server.

B.2.1. Download and configure

  1. Download Neo4j Enterprise Edition from the Neo4j download site, and unpack into three separate directories on your test machine.
  2. Configure the HA related settings for each installation as outlined below.

    Neo4j instance #1 — ~/neo4j-01

    conf/neo4j.conf. 

    # Reduce the default page cache memory allocation
    dbms.memory.pagecache.size=500m
    
    # Port to listen to for incoming backup requests.
    dbms.backup.address = 127.0.0.1:6366
    
    # Unique server id for this Neo4j instance
    # can not be negative id and must be unique
    ha.server_id = 1
    
    # List of other known instances in this cluster
    ha.initial_hosts = 127.0.0.1:5001,127.0.0.1:5002,127.0.0.1:5003
    
    # IP and port for this instance to bind to for communicating cluster information
    # with the other neo4j instances in the cluster.
    ha.host.coordination = 127.0.0.1:5001
    
    # IP and port for this instance to bind to for communicating data with the
    # other neo4j instances in the cluster.
    ha.host.data = 127.0.0.1:6363
    
    # HA - High Availability
    # SINGLE - Single mode, default.
    dbms.mode=HA
    
    dbms.connector.http.type=HTTP
    dbms.connector.http.enabled=true
    dbms.connector.http.address=0.0.0.0:7474
    
    # Bolt connector
    dbms.connector.bolt.type=BOLT
    dbms.connector.bolt.enabled=true
    dbms.connector.bolt.tls_level=OPTIONAL
    dbms.connector.bolt.address=0.0.0.0:7687

    Neo4j instance #2 — ~/neo4j-02

    conf/neo4j.conf. 

    # Reduce the default page cache memory allocation
    dbms.memory.pagecache.size=500m
    
    # Port to listen to for incoming backup requests.
    dbms.backup.address = 127.0.0.1:6367
    
    # Unique server id for this Neo4j instance
    # can not be negative id and must be unique
    ha.server_id = 2
    
    # List of other known instances in this cluster
    ha.initial_hosts = 127.0.0.1:5001,127.0.0.1:5002,127.0.0.1:5003
    
    # IP and port for this instance to bind to for communicating cluster information
    # with the other neo4j instances in the cluster.
    ha.host.coordination = 127.0.0.1:5002
    
    # IP and port for this instance to bind to for communicating data with the
    # other neo4j instances in the cluster.
    ha.host.data = 127.0.0.1:6364
    
    # HA - High Availability
    # SINGLE - Single mode, default.
    dbms.mode=HA
    
    dbms.connector.http.type=HTTP
    dbms.connector.http.enabled=true
    dbms.connector.http.address=0.0.0.0:7475
    
    # Bolt connector
    dbms.connector.bolt.type=BOLT
    dbms.connector.bolt.enabled=true
    dbms.connector.bolt.tls_level=OPTIONAL
    dbms.connector.bolt.address=0.0.0.0:7688

    Neo4j instance #3 — ~/neo4j-03

    conf/neo4j.conf. 

    # Reduce the default page cache memory allocation
    dbms.memory.pagecache.size=500m
    
    # Port to listen to for incoming backup requests.
    dbms.backup.address = 127.0.0.1:6368
    
    # Unique server id for this Neo4j instance
    # can not be negative id and must be unique
    ha.server_id = 3
    
    # List of other known instances in this cluster
    ha.initial_hosts = 127.0.0.1:5001,127.0.0.1:5002,127.0.0.1:5003
    
    # IP and port for this instance to bind to for communicating cluster information
    # with the other neo4j instances in the cluster.
    ha.host.coordination = 127.0.0.1:5003
    
    # IP and port for this instance to bind to for communicating data with the
    # other neo4j instances in the cluster.
    ha.host.data = 127.0.0.1:6365
    
    # HA - High Availability
    # SINGLE - Single mode, default.
    dbms.mode=HA
    
    dbms.connector.http.type=HTTP
    dbms.connector.http.enabled=true
    dbms.connector.http.address=0.0.0.0:7476
    
    # Bolt connector
    dbms.connector.bolt.type=BOLT
    dbms.connector.bolt.enabled=true
    dbms.connector.bolt.tls_level=OPTIONAL
    dbms.connector.bolt.address=0.0.0.0:7689

B.2.1.1. Start the Neo4j Servers

Start the Neo4j servers as usual. Note that the startup order does not matter.

localhost:~/neo4j-01$ ./bin/neo4j start
localhost:~/neo4j-02$ ./bin/neo4j start
localhost:~/neo4j-03$ ./bin/neo4j start

Now, you should be able to access the three servers and check their HA status. Open the locations below in a web browser and issue the following command in the editor after having set a password for the database: :play sysinfo

  • http://127.0.0.1:7474/
  • http://127.0.0.1:7475/
  • http://127.0.0.1:7476/

B.3. Use the Import tool

This tutorial walks us through a series of examples to illustrate the capabilities of the Import tool.

When using CSV files for loading a database, each node must have a unique identifier, a node identifier, in order to be able to create relationships between nodes in the same process. Relationships are created by connecting the node identifiers. In the examples below, the node identifiers are stored as properties on the nodes. Node identifiers may be of interest later for cross-reference to other systems, traceability etc., but they are not mandatory. If you do not want the identifiers to persist after a completed import, then do not specify a property name in the :ID field.

It is possible to import only nodes using the import tool For doing so simply omit a relationships file when calling neo4j-import. Any relationships between the imported nodes will have to be created later by another method, since the import tool works for initial graph population only.

For this tutorial we will use a data set containing movies, actors and roles. If running the examples, exchange path_to_target_directory with the path to the database file directory. In a default installation, the path_to_target_directory is: <neo4j-home>/data/databases/graph.db. Note that if you wish to run one example after another you have to remove the database files in between.

B.3.1. Basic example

First we will look at the movies. Each movie has an id, which is used for referring to it from other data sources Moreover, each movie has a title and a year. Along with these properties we also add the node labels Movie and Sequel.

movies.csv. 

movieId:ID,title,year:int,:LABEL
tt0133093,"The Matrix",1999,Movie
tt0234215,"The Matrix Reloaded",2003,Movie;Sequel
tt0242653,"The Matrix Revolutions",2003,Movie;Sequel

Next up are the actors. They have an id - in this case a shorthand of their name - and a name. All the actors have the node label Actor.

actors.csv. 

personId:ID,name,:LABEL
keanu,"Keanu Reeves",Actor
laurence,"Laurence Fishburne",Actor
carrieanne,"Carrie-Anne Moss",Actor

Finally we have the roles that an actor plays in a movie, which will be represented by relationships in the database. In order to create a relationship between nodes we use the ids defined in actors.csv and movies.csv for the START_ID and END_ID fields. We also need to provide a relationship type (in this case ACTED_IN) for the :TYPE field.

roles.csv. 

:START_ID,role,:END_ID,:TYPE
keanu,"Neo",tt0133093,ACTED_IN
keanu,"Neo",tt0234215,ACTED_IN
keanu,"Neo",tt0242653,ACTED_IN
laurence,"Morpheus",tt0133093,ACTED_IN
laurence,"Morpheus",tt0234215,ACTED_IN
laurence,"Morpheus",tt0242653,ACTED_IN
carrieanne,"Trinity",tt0133093,ACTED_IN
carrieanne,"Trinity",tt0234215,ACTED_IN
carrieanne,"Trinity",tt0242653,ACTED_IN

The call to neo4j-import would look like this:

neo4j_home$ ./bin/neo4j-import --into path_to_target_directory --nodes movies.csv --nodes actors.csv --relationships roles.csv

Now start up a database from the target directory:

neo4j_home$ ./bin/neo4j start

B.3.2. Customizing configuration options

We can customize the configuration options that the import tool uses (see Section 2.7.2.3, “Options”) if our data does not fit the default format. The following CSV files are delimited by ;, use | as the array delimiter and use ' for quotes.

movies2.csv. 

movieId:ID;title;year:int;:LABEL
tt0133093;'The Matrix';1999;Movie
tt0234215;'The Matrix Reloaded';2003;Movie|Sequel
tt0242653;'The Matrix Revolutions';2003;Movie|Sequel

actors2.csv. 

personId:ID;name;:LABEL
keanu;'Keanu Reeves';Actor
laurence;'Laurence Fishburne';Actor
carrieanne;'Carrie-Anne Moss';Actor

roles2.csv. 

:START_ID;role;:END_ID;:TYPE
keanu;'Neo';tt0133093;ACTED_IN
keanu;'Neo';tt0234215;ACTED_IN
keanu;'Neo';tt0242653;ACTED_IN
laurence;'Morpheus';tt0133093;ACTED_IN
laurence;'Morpheus';tt0234215;ACTED_IN
laurence;'Morpheus';tt0242653;ACTED_IN
carrieanne;'Trinity';tt0133093;ACTED_IN
carrieanne;'Trinity';tt0234215;ACTED_IN
carrieanne;'Trinity';tt0242653;ACTED_IN

The call to neo4j-import would look like this:

neo4j_home$ ./bin/neo4j-import --into path_to_target_directory --nodes movies2.csv --nodes actors2.csv --relationships roles2.csv --delimiter ";" --array-delimiter "|" --quote "'"

B.3.3. Using separate header files

When dealing with very large CSV files it is more convenient to have the header in a separate file. This makes it easier to edit the header as you avoid having to open a huge data file just to change it.

The import tool can also process single file compressed archives, for example: . --nodes nodes.csv.gz . --relationships rels.zip

We will use the same data as in the previous example but put the headers in separate files.

movies3-header.csv. 

movieId:ID,title,year:int,:LABEL

movies3.csv. 

tt0133093,"The Matrix",1999,Movie
tt0234215,"The Matrix Reloaded",2003,Movie;Sequel
tt0242653,"The Matrix Revolutions",2003,Movie;Sequel

actors3-header.csv. 

personId:ID,name,:LABEL

actors3.csv. 

keanu,"Keanu Reeves",Actor
laurence,"Laurence Fishburne",Actor
carrieanne,"Carrie-Anne Moss",Actor

roles3-header.csv. 

:START_ID,role,:END_ID,:TYPE

roles3.csv. 

keanu,"Neo",tt0133093,ACTED_IN
keanu,"Neo",tt0234215,ACTED_IN
keanu,"Neo",tt0242653,ACTED_IN
laurence,"Morpheus",tt0133093,ACTED_IN
laurence,"Morpheus",tt0234215,ACTED_IN
laurence,"Morpheus",tt0242653,ACTED_IN
carrieanne,"Trinity",tt0133093,ACTED_IN
carrieanne,"Trinity",tt0234215,ACTED_IN
carrieanne,"Trinity",tt0242653,ACTED_IN

The call to neo4j-import would look as follows. Note how the file groups are enclosed in quotation marks in the command.

neo4j_home$ ./bin/neo4j-import --into path_to_target_directory --nodes "movies3-header.csv,movies3.csv" --nodes "actors3-header.csv,actors3.csv" --relationships "roles3-header.csv,roles3.csv"

B.3.4. Multiple input files

In addition to using a separate header file you can also provide multiple nodes or relationships files. This may be useful for example for processing the output from a Hadoop pipeline. Files within such an input group can be specified with multiple match strings, delimited by ,, where each match string can be either: the exact file name or a regular expression matching one or more files. Multiple matching files will be sorted according to their characters and their natural number sort order for file names containing numbers.

movies4-header.csv. 

movieId:ID,title,year:int,:LABEL

movies4-part1.csv. 

tt0133093,"The Matrix",1999,Movie
tt0234215,"The Matrix Reloaded",2003,Movie;Sequel

movies4-part2.csv. 

tt0242653,"The Matrix Revolutions",2003,Movie;Sequel

actors4-header.csv. 

personId:ID,name,:LABEL

actors4-part1.csv. 

keanu,"Keanu Reeves",Actor
laurence,"Laurence Fishburne",Actor

actors4-part2.csv. 

carrieanne,"Carrie-Anne Moss",Actor

roles4-header.csv. 

:START_ID,role,:END_ID,:TYPE

roles4-part1.csv. 

keanu,"Neo",tt0133093,ACTED_IN
keanu,"Neo",tt0234215,ACTED_IN
keanu,"Neo",tt0242653,ACTED_IN
laurence,"Morpheus",tt0133093,ACTED_IN
laurence,"Morpheus",tt0234215,ACTED_IN

roles4-part2.csv. 

laurence,"Morpheus",tt0242653,ACTED_IN
carrieanne,"Trinity",tt0133093,ACTED_IN
carrieanne,"Trinity",tt0234215,ACTED_IN
carrieanne,"Trinity",tt0242653,ACTED_IN

The call to neo4j-import would look like this:

neo4j_home$ ./bin/neo4j-import --into path_to_target_directory --nodes "movies4-header.csv,movies4-part1.csv,movies4-part2.csv" --nodes "actors4-header.csv,actors4-part1.csv,actors4-part2.csv" --relationships "roles4-header.csv,roles4-part1.csv,roles4-part2.csv"

B.3.5. Types and labels

B.3.5.1. Using the same label for every node

If you want to use the same node label(s) for every node in your nodes file you can do this by specifying the appropriate value as an option to neo4j-import. There is then no need to specify the :LABEL field in the node file if you pass it as a command line option. If you do then both the label provided in the file and the one provided on the command line will be added to the node.

In this example we put the label Movie on every node specified in movies5a.csv, and we put the labels Movie and Sequel on the nodes specified in sequels5a.csv.

movies5a.csv. 

movieId:ID,title,year:int
tt0133093,"The Matrix",1999

sequels5a.csv. 

movieId:ID,title,year:int
tt0234215,"The Matrix Reloaded",2003
tt0242653,"The Matrix Revolutions",2003

actors5a.csv. 

personId:ID,name
keanu,"Keanu Reeves"
laurence,"Laurence Fishburne"
carrieanne,"Carrie-Anne Moss"

roles5a.csv. 

:START_ID,role,:END_ID,:TYPE
keanu,"Neo",tt0133093,ACTED_IN
keanu,"Neo",tt0234215,ACTED_IN
keanu,"Neo",tt0242653,ACTED_IN
laurence,"Morpheus",tt0133093,ACTED_IN
laurence,"Morpheus",tt0234215,ACTED_IN
laurence,"Morpheus",tt0242653,ACTED_IN
carrieanne,"Trinity",tt0133093,ACTED_IN
carrieanne,"Trinity",tt0234215,ACTED_IN
carrieanne,"Trinity",tt0242653,ACTED_IN

The call to neo4j-import would look like this:

neo4j_home$ ./bin/neo4j-import --into path_to_target_directory --nodes:Movie movies5a.csv --nodes:Movie:Sequel sequels5a.csv --nodes:Actor actors5a.csv --relationships roles5a.csv

B.3.5.2. Using the same relationship type for every relationship

If you want to use the same relationship type for every relationship in your relationships file this can be done by specifying the appropriate value as an option to neo4j-import. If you provide a relationship type both on the command line and in the relationships file, the one in the file will be applied. In this example we put the relationship type ACTED_IN on every relationship specified in roles5b.csv:

movies5b.csv. 

movieId:ID,title,year:int,:LABEL
tt0133093,"The Matrix",1999,Movie
tt0234215,"The Matrix Reloaded",2003,Movie;Sequel
tt0242653,"The Matrix Revolutions",2003,Movie;Sequel

actors5b.csv. 

personId:ID,name,:LABEL
keanu,"Keanu Reeves",Actor
laurence,"Laurence Fishburne",Actor
carrieanne,"Carrie-Anne Moss",Actor

roles5b.csv. 

:START_ID,role,:END_ID
keanu,"Neo",tt0133093
keanu,"Neo",tt0234215
keanu,"Neo",tt0242653
laurence,"Morpheus",tt0133093
laurence,"Morpheus",tt0234215
laurence,"Morpheus",tt0242653
carrieanne,"Trinity",tt0133093
carrieanne,"Trinity",tt0234215
carrieanne,"Trinity",tt0242653

The call to neo4j-import would look like this:

neo4j_home$ ./bin/neo4j-import --into path_to_target_directory --nodes movies5b.csv --nodes actors5b.csv --relationships:ACTED_IN roles5b.csv

B.3.6. Property types

The type for properties specified in nodes and relationships files is defined in the header row. (see Section 2.7.1, “CSV file header format”)

The following example creates a small graph containing one actor and one movie connected by an ACTED_IN relationship. There is a roles property on the relationship which contains an array of the characters played by the actor in a movie.

movies6.csv. 

movieId:ID,title,year:int,:LABEL
tt0099892,"Joe Versus the Volcano",1990,Movie

actors6.csv. 

personId:ID,name,:LABEL
meg,"Meg Ryan",Actor

roles6.csv. 

:START_ID,roles:string[],:END_ID,:TYPE
meg,"DeDe;Angelica Graynamore;Patricia Graynamore",tt0099892,ACTED_IN

The call to neo4j-import would look like this:

neo4j_home$ ./bin/neo4j-import --into path_to_target_directory --nodes movies6.csv --nodes actors6.csv --relationships roles6.csv

B.3.7. ID handling

Each node processed by neo4j-import must provide a unique id. This id is used to find the correct nodes when creating relationships.

B.3.7.1. Working with sequential or auto incrementing identifiers

The import tool makes the assumption that identifiers are unique across node files. This may not be the case for data sets which use sequential, auto incremented or otherwise colliding identifiers. Those data sets can define id spaces where identifiers are unique within their respective id space.

For example if movies and people both use sequential identifiers then we would define Movie and Actor id spaces.

movies7.csv. 

movieId:ID(Movie-ID),title,year:int,:LABEL
1,"The Matrix",1999,Movie
2,"The Matrix Reloaded",2003,Movie;Sequel
3,"The Matrix Revolutions",2003,Movie;Sequel

actors7.csv. 

personId:ID(Actor-ID),name,:LABEL
1,"Keanu Reeves",Actor
2,"Laurence Fishburne",Actor
3,"Carrie-Anne Moss",Actor

We also need to reference the appropriate id space in our relationships file so it knows which nodes to connect together:

roles7.csv. 

:START_ID(Actor-ID),role,:END_ID(Movie-ID)
1,"Neo",1
1,"Neo",2
1,"Neo",3
2,"Morpheus",1
2,"Morpheus",2
2,"Morpheus",3
3,"Trinity",1
3,"Trinity",2
3,"Trinity",3

The call to neo4j-import would look like this:

neo4j_home$ ./bin/neo4j-import --into path_to_target_directory --nodes movies7.csv --nodes actors7.csv --relationships:ACTED_IN roles7.csv

B.3.8. Bad input data

The import tool has a threshold of how many bad entities (nodes or relationships) to tolerate and skip before failing the import. By default 1000 bad entities are tolerated. A bad tolerance of 0 will as an example fail the import on the first bad entity. For more information, see the --bad-tolerance option.

There are different types of bad input, which we will look into.

B.3.8.1. Relationships referring to missing nodes

Relationships that refer to missing node ids, either for :START_ID or :END_ID are considered bad relationships. Whether or not such relationships are skipped is controlled with --skip-bad-relationships flag which can have the values true or false or no value, which means true. Specifying false means that any bad relationship is considered an error and will fail the import. For more information, see the --skip-bad-relationships option.

In the following example there is a missing emil node referenced in the roles file.

movies8a.csv. 

movieId:ID,title,year:int,:LABEL
tt0133093,"The Matrix",1999,Movie
tt0234215,"The Matrix Reloaded",2003,Movie;Sequel
tt0242653,"The Matrix Revolutions",2003,Movie;Sequel

actors8a.csv. 

personId:ID,name,:LABEL
keanu,"Keanu Reeves",Actor
laurence,"Laurence Fishburne",Actor
carrieanne,"Carrie-Anne Moss",Actor

roles8a.csv. 

:START_ID,role,:END_ID,:TYPE
keanu,"Neo",tt0133093,ACTED_IN
keanu,"Neo",tt0234215,ACTED_IN
keanu,"Neo",tt0242653,ACTED_IN
laurence,"Morpheus",tt0133093,ACTED_IN
laurence,"Morpheus",tt0234215,ACTED_IN
laurence,"Morpheus",tt0242653,ACTED_IN
carrieanne,"Trinity",tt0133093,ACTED_IN
carrieanne,"Trinity",tt0234215,ACTED_IN
carrieanne,"Trinity",tt0242653,ACTED_IN
emil,"Emil",tt0133093,ACTED_IN

The call to neo4j-import would look like this:

neo4j_home$ ./bin/neo4j-import --into path_to_target_directory --nodes movies8a.csv --nodes actors8a.csv --relationships roles8a.csv

Since there was only one bad relationship the import process will complete successfully and a not-imported.bad file will be created and populated with the bad relationships.

not-imported.bad. 

InputRelationship:
   source: roles8a.csv:11
   properties: [role, Emil]
   startNode: emil
   endNode: tt0133093
   type: ACTED_IN
 refering to missing node emil

B.3.8.2. Multiple nodes with same id within same id space

Nodes that specify :ID which has already been specified within the id space are considered bad nodes. Whether or not such nodes are skipped is controlled with --skip-duplicate-nodes flag which can have the values true or false or no value, which means true. Specifying false means that any duplicate node is considered an error and will fail the import. For more information, see the --skip-duplicate-nodes option.

In the following example there is a node id that is specified twice within the same id space.

actors8b.csv. 

personId:ID,name,:LABEL
keanu,"Keanu Reeves",Actor
laurence,"Laurence Fishburne",Actor
carrieanne,"Carrie-Anne Moss",Actor
laurence,"Laurence Harvey",Actor

The call to neo4j-import would look like this:

neo4j_home$ ./bin/neo4j-import --into path_to_target_directory --nodes actors8b.csv --skip-duplicate-nodes

Since there was only one bad node the import process will complete successfully and a not-imported.bad file will be created and populated with the bad node.

not-imported.bad. 

Id 'laurence' is defined more than once in global id space, at least at actors8b.csv:3 and actors8b.csv:5