10.2.3. Options

This section describes in details the options available when using the Neo4j import tool to import data from CSV files.

--database=<name>
Name of database. Default: graph.db
--additional-config=<config-file-path>
Configuration file to supply additional configuration in. Default:
--mode=<database|csv>
Import a collection of CSV files or a pre-3.0 installation. Default: csv
--from=<source-directory>
The location of the pre-3.0 database (e.g. <neo4j-root>/data/graph.db). Default:
--report-file=<filename>
File in which to store the report of the csv-import. Default: import.report
--nodes[:Label1:Label2]=<"file1,file2,…​">
Node CSV header and data. Multiple files will be logically seen as one big file from the perspective of the importer. The first line must contain the header. Multiple data sources like these can be specified in one import, where each data source has its own header. Note that file groups must be enclosed in quotation marks. Files can also be specified using regular expressions. For an example, see Using regular expressions for specifying input files. Default:
--relationships[:RELATIONSHIP_TYPE]=<"file1,file2,…​">
Relationship CSV header and data. Multiple files will be logically seen as one big file from the perspective of the importer. The first line must contain the header. Multiple data sources like these can be specified in one import, where each data source has its own header. Note that file groups must be enclosed in quotation marks. Files can also be specified using regular expressions. For an example, see Using regular expressions for specifying input files. Default:
--id-type=<STRING|INTEGER|ACTUAL>
Each node must provide a unique id. This is used to find the correct nodes when creating relationships. Possible values are: STRING: arbitrary strings for identifying nodes, INTEGER: arbitrary integer values for identifying nodes, ACTUAL: (advanced) actual node ids. Default: STRING
--input-encoding=<character-set>
Character set that input data is encoded in. Default: UTF-8
--ignore-extra-columns=<true/false>
If unspecified columns should be ignored during the import. Default: false
--ignore-duplicate-nodes=<true/false>
If duplicate nodes should be ignored during the import. Default: false
--ignore-missing-nodes=<true/false>
If relationships referring to missing nodes should be ignored during the import. Default: false
--multiline-fields=<true/false>
Whether or not fields from input source can span multiple lines, i.e. contain newline characters. Default: false
--delimiter=<delimiter-character>
Delimiter character between values in CSV data. Unicode character encoding can be used if prepended by \. For example, \44 is equivalent to ,. Default: ,
--array-delimiter=<array-delimiter-character>
Delimiter character between array elements within a value in CSV data. Unicode character encoding can be used if prepended by \. For example, \59 is equivalent to ;. Default: ;
--quote=<quotation-character>
Character to treat as quotation character for values in CSV data. Quotes can be escaped by doubling them, for example "" would be interpreted as a literal ". You cannot escape using \. Default: "
--max-memory=<max-memory-that-importer-can-use>
Maximum memory that neo4j-admin can use for various data structures and caching to improve performance. Values can be plain numbers such as 10000000 or e.g. 20G for 20 gigabyte. It can also be specified as a percentage of the available memory, e.g. 70%. Default: 90%
Using regular expressions for specifying input files

To simplify command line when there are many data source files, the file names can be specified using regular expressions. For each file name containing regular expressions, the matching files will be included. The matching is aware of numbers inside the file names and will sort them accordingly, without the need for padding with zeros.

Consider the file names:

  • Category1_Part_001.csv
  • Category1_Part_002.csv
  • Category2_Part_001.csv
  • Category12_Part_001.csv

Specifying those to the import tool using, for example, the regular expression Category.* will select all of those files and keep them in the order displayed above.

Heap size for the import

You want to set the heap size to a relevant value for the import. This is done by defining the HEAP_SIZE environment parameter before starting the import. 2G is a good value.

Output

If you run large imports with "messy" data, the import log file can grow very large, which may cause problems. You control the location of the log file with the --report-file option. You can even get rid of the output altogether for UNIX-like systems, by directing the report file to /dev/null.

10.2.3.1. Debugging

If you need to debug the import it might be useful to collect the stack trace. This is done by setting the environment variable NEO4J_DEBUG=true and rerun the import.