Here’s how you do it:
You can already export CSV from the Neo4j Browser by clicking the download icon on the table view of your Cypher query results. Alternatively, you can also use my
neo4j-shell-tools
to export results of a Cypher query to a CSV file.Something I had wanted to do was to use the awesome
jq
tool for this purpose, a very powerful and fast command-line JSON processor. With just a few selectors you transform JSON, extract and convert values and much more.In this example, we’ll use the Pokec social network dataset (from Stanford SNAP), which contains about 1.6M people and 30M connections.
We will execute a Cypher statement against the transactional, HTTP-API endpoint, which is what most Neo4j drivers use as well.
This is the query showing who is connected to whom:
MATCH (p1:PROFILES)-[:RELATION]-(p2) RETURN p1._key as id1, p1.AGE as age1, p1.gender as gender1, p2._key as id2, p2.AGE as age2, p2.gender as gender2 LIMIT 4
The result looks like this:
id1 |
age1 |
gender1 |
id2 |
age2 |
gender2 |
1 |
26 |
1 |
16 |
23 |
1 |
1 |
26 |
1 |
10 |
22 |
0 |
1 |
26 |
1 |
9 |
0 |
0 |
1 |
26 |
1 |
12 |
26 |
1 |
POST
that query using curl
against the Cypher endpoint:curl -H accept:application/json -H content-type:application/json \ -d '{"statements":[{"statement":"MATCH ... RETURN ..."}]}' \ https://localhost:7474/db/data/transaction/commit
Here we only use one of the multiple statements that can be passed as payload of the HTTP request, and no query parameters. |
{"results": {"columns":["id1","age1","gender1","id2","age2","gender2"], "data":[{"row":[1,26,1,16,23,1]}, {"row":[1,26,1,10,22,0]}, {"row":[1,26,1,9,0,0]}, {"row":[1,26,1,12,26,1]}] ], "errors":[]}
Now we have to take this JSON and transform it to CSV using
jq
. Follow these simple steps:- Select results with
.results[0]
- Take the first entry as we only send one query
.results[0]
- Let’s then take the columns array out of this
.results[0].columns
and convert it to CSV, piping it to the@csv
transformer.results[0].columns | @csv
- Next, do the same for the
data
field.results[0].data
- While we’re there, we also need to grab each entry and extract the content of the
row
field.results[0].data[].row
and convert it to CSV too.results[0].data[].row | @csv
- Now, we combine both by wrapping the first
results
object and piping it into both selectors separated by a comma(.results[0]) | .columns,.data[].row
and finally convert it to CSV:jq '(.results[0]) | .columns,.data[].row | @csv'
- Finally, we use the
-r
parameter for jq to output raw data, so that quotes are not quoted twice
curl
command but pipe the result to jq
it looks like this:curl -H accept:application/json -H content-type:application/json \ -d '{"statements":[{"statement":"MATCH (p1:PROFILES)-[:RELATION]-(p2) RETURN ... LIMIT 4"}]}' \ https://localhost:7474/db/data/transaction/commit \ | jq -r '(.results[0]) | .columns,.data[].row | @csv' "id1","age1","gender1","id2","age2","gender2" 1,26,1,16,23,1 1,26,1,10,22,0 1,26,1,9,0,0 1,26,1,12,26,1
Of course we also want to see how quickly we can extract data from Neo4j. In total, our query returns 60 million rows (Twice 30 as we ignore the direction) if we remove the
LIMIT
clause. I don’t really want to store that data, that’s why I pipe it to /dev/null
curl -H accept:application/json -H content-type:application/json \ -d '{"statements":[{"statement":"MATCH (p1:PROFILES)-[:RELATION]-(p2) RETURN ..."}]}' \ https://localhost:7474/db/data/transaction/commit \ | jq -r '(.results[0]) | .columns,.data[].row | @csv' | /dev/null % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 970M 0 970M 100 184 6118k 1 0:03:04 0:02:42 0:00:22 6035k
This transfers 60 million rows totaling 1 Gigabyte with 6.2MB per second on average, taking 3 minutes to finish.
Not bad.
References
Want in on graph database projects like this? Click below to get your free copy of the O’Reilly Graph Databases ebook and start tapping into the power of graph technology for your next project or app.