10.2.3. Options

This section describes in details the options available when using the Neo4j import tool to import data from CSV files.

--database=<name>
Name of database. Default: graph.db
--additional-config=<config-file-path>
Configuration file to supply additional configuration in. Default:
--mode=<database|csv>
Import a collection of CSV files or a pre-3.0 installation. Default: csv
--from=<source-directory>
The location of the pre-3.0 database (e.g. <neo4j-root>/data/graph.db). Default:
--report-file=<filename>
File in which to store the report of the csv-import. Default: import.report
--nodes[:Label1:Label2]=<"headerfile,file1,file2,…​">
Node CSV header and data. Multiple files will be logically seen as one big file from the perspective of the importer. The first line must contain the header. Multiple data sources like these can be specified in one import, where each data source has its own header. Note that file groups must be enclosed in quotation marks. Files can also be specified using regular expressions. For an example, see Section B.4.4.1, “Using regular expressions for specifying multiple input files”. Default:
--relationships[:RELATIONSHIP_TYPE]=<"headerfile,file1,file2,…​">
Relationship CSV header and data. Multiple files will be logically seen as one big file from the perspective of the importer. The first line must contain the header. Multiple data sources like these can be specified in one import, where each data source has its own header. Note that file groups must be enclosed in quotation marks. Files can also be specified using regular expressions. For an example, see Section B.4.4.1, “Using regular expressions for specifying multiple input files”. Default:
--id-type=<STRING|INTEGER|ACTUAL>
Each node must provide a unique id. This is used to find the correct nodes when creating relationships. Possible values are: STRING: arbitrary strings for identifying nodes, INTEGER: arbitrary integer values for identifying nodes, ACTUAL: (advanced) actual node ids. Default: STRING
--input-encoding=<character-set>
Character set that input data is encoded in. Default: UTF-8
--ignore-extra-columns=<true/false>
If unspecified columns should be ignored during the import. Default: false
--ignore-duplicate-nodes=<true/false>
If duplicate nodes should be ignored during the import. Default: false
--ignore-missing-nodes=<true/false>
If relationships referring to missing nodes should be ignored during the import. Default: false
--multiline-fields=<true/false>
Whether or not fields from input source can span multiple lines, i.e. contain newline characters. Setting --multiline-fields=true can severely degrade performance of the importer. Therefore, use it with care, especially with large imports. Default: false
--delimiter=<delimiter-character>
Delimiter character between values in CSV data. Unicode character encoding can be used if prepended by \. For example, \44 is equivalent to ,. Default: ,
--array-delimiter=<array-delimiter-character>
Delimiter character between array elements within a value in CSV data. Unicode character encoding can be used if prepended by \. For example, \59 is equivalent to ;. Default: ;
--quote=<quotation-character>
Character to treat as quotation character for values in CSV data. Quotes can be escaped by doubling them, for example "" would be interpreted as a literal ". You cannot escape using \. Default: "
--max-memory=<max-memory-that-importer-can-use>
Maximum memory that neo4j-admin can use for various data structures and caching to improve performance. Values can be plain numbers such as 10000000 or e.g. 20G for 20 gigabyte. It can also be specified as a percentage of the available memory, e.g. 70%. Default: 90%
--f=<arguments-file>
File containing all arguments, used as an alternative to supplying all arguments on the command line directly. Each argument can be on a separate line, or multiple arguments per line and separated by space. Arguments containing spaces must be quoted. If this argument is used, no additional arguments are supported.
--high-io=<true/false>
Ignore environment-based heuristics, and specify whether the target storage subsystem can support parallel IO with high throughput. Typically this is true for SSDs, large raid arrays and network-attached storage.
Heap size for the import

You want to set the maximum heap size to a relevant value for the import. This is done by defining the HEAP_SIZE environment parameter before starting the import. For example, 2G is an appropriate value for smaller imports. If doing imports in the order of magnitude of 100 billion entities, 20G will be an appropriate value.

Record format

If your import data will result in a graph that is larger than 34 billion nodes, 34 billion relationships, or 68 billion properties you will need to configure the importer to use the high limit record format. This is achieved by setting the parameter dbms.record_format=high_limit in a configuration file, and supplying that file to the importer with --additional-config. The high_limit format is available for Enterprise Edition only.

10.2.3.1. Output

The location of the import log file can be controlled using the --report-file option. If you run large imports of CSV files that have low data quality, the import log file can grow very large. For example, CSV files that contain duplicate node IDs, or that attempt to create relationships between non-existent nodes, could be classed as having low data quality. In these cases, you may wish to direct the output to a location that can handle the large log file. If you are running on a UNIX-like system and you are not interested in the output, you can get rid of it altogether by directing the report file to /dev/null.

If you need to debug the import it might be useful to collect the stack trace. This is done by setting the environment variable NEO4J_DEBUG=true, and rerunning the import.