Alireza Rezaei Mahdiraji
I am Alireza Rezaei Mahdiraji and I am a PhD student. My field or research is database systems.
I am experimenting several databases to support large scale scientific and simulation data. Some of the datasets have an inherent graph structure which make graph databases a good choice for modeling and querying such data.
I picked Neo4j for my modeling tasks because it is an important open source graph database which draw a lot of attentions from database research community and companies.        

Running a Cypher CREATE command in the Neo4j Shell with a large graph ends to the following error: “argument list too long”. So, how do we execute such large CREATE statement?

The solution is use Neo4j Shell transaction facility and break down the original CREATE command into several smaller CREATE command and write the result in a file. An excerpt of the output looks like as follows:

begin
CREATE
(m{n:’m’, d:’3′}),
(f0{n:’f0′, d:’2′}),
m-[:so]->f0
    …;
commit
exit
begin
(v20825{n:’v20825′, d:’0′}),
(e102800{n:’e102800′, d:’1′}),
e102800-[:so]->v20624,
    …;
commit
exit
begin
(v20825{n:’v20825′, d:’0′}),
(e102800{n:’e102800′, d:’1′}),
e102800-[:so]->v20624,
    …;
commit
exit
(e198203{n:’e198203′, d:’1′}),
e198203-[:so]->v40000,
e198203-[:so]->v39800,
f39600-[:so]->e198203
    …
commit
exit

For a file of 64M, I commit after each 500 node/relationship commands and it works just fine. I tried it with 1000 and I got the same error as above.       

After creating the file with CREATE command like example above, stop Neo4j server and import using the following command:

rm -rf data/graph.db && cat path/to/create_statment.cql | ./path/to/neo4j-shell -path data/graph.db  

In the command above, create_statment.cql is the name of the file with the CREATE command, graph.db is the folder which Neo4j uses to store the database informations (usually located at /path/to/neo4j-community-XX/data/). The first command just remove existing graph database files.

Now, start the Neo4j server and simply query neo4j to see the data:

START r=node(*) RETURN count(r);
START r=rel(*) RETURN count(r);


To sum up – if you are running long Cypher CREATE commands, be sure to break down the statements to smaller chunks wrapped inside transaction clause to be executable from neo4j-shell. 

/Alireza  
 

Keywords:  


3 Comments

Is performance better doing inserts this way , with one large insert , or if one had a program that performed individual inserts and a commit at the end.

Alireza RM says:

This comment has been removed by the author.

Alireza RM says:

Hi George, <br />Actually, this is not a large insert. The idea is to break it down to small pieces (say 500 lines of insert) and commit it. I did not measure it with a program based on for instance Java for line by line insert. Usually, the bulk insert should perform better. You may try and report its performance.

Leave a Reply

Your email address will not be published. Required fields are marked *

Related Posts

Popular Graph Topics

Archives

Have a Graph Question?

Reach out and connect with the Neo4j staff.
Stackoverflow
Contact Us