Load JSON
Web APIs allow you to access and integrate data from different sources into your graph. Most of them provide the data in JSON format.
The Load JSON procedures retrieve data from URLs or maps and turn it into MAP
values for Cypher to consume.
Cypher has support for deconstructing nested documents with dot syntax, slices, UNWIND
etc. so it is easy to turn nested data into graphs.
Sources with multiple JSON objects (JSONL,JSON Lines) in a stream, like the streaming Twitter format or the Yelp Kaggle dataset, are also supported,
Procedure Overview
The table below describes the available procedures:
Qualified Name | Type |
---|---|
apoc.load.json |
|
apoc.load.jsonParams |
|
apoc.load.jsonArray |
|
apoc.import.json |
|
apoc.load.json
This procedure takes a file or HTTP URL and parses the JSON into a map data structure.
signature |
---|
|
It supports the following config parameters:
name | type | default | description |
---|---|---|---|
failOnError |
BOOLEAN |
true |
Fail if an error is encountered while parsing JSON |
binary |
|
|
If not null, allow to take binary data instead of a file name/url as first parameter. Similar to Binary file example |
charset |
java.nio.charset.Charset |
|
The optional charset, with |
apoc.load.jsonParams
The procedure’s 'Params' name refers to the HTTP request’s headers and payload being parameterized when connecting to the given URL, not the Neo4j $ parameters.
|
This procedure takes a file or HTTP URL and parses the JSON into a map data structure. It is a more configurable version of apoc.load.json that enables processing of endpoints that require HTTP headers or JSON payloads.
signature |
---|
|
It supports the following config parameter:
name | type | default | description |
---|---|---|---|
failOnError |
BOOLEAN |
true |
Fail if an error is encountered while parsing JSON |
apoc.load.jsonArray
This procedure takes a file or HTTP URL containing a JSON array, and parses it into a stream of maps.
signature |
---|
|
apoc.import.json
This procedure can be used to import JSON files created by the Export JSON procedures, exported using the config parameter jsonFormat: 'JSON_LINES'
(default config).
signature |
---|
|
It supports the following config parameters:
name | type | default | description |
---|---|---|---|
unwindBatchSize |
INTEGER |
|
the batch size of the unwind |
txBatchSize |
INTEGER |
|
the batch size of the transacttion |
importIdName |
STRING |
|
the name of the property to be populated with the "id" field present into the json. For example a row |
nodePropertyMappings |
MAP |
|
The mapping label/property name/property type for Custom Neo4j types (point date). i.e. |
relPropertyMappings |
MAP |
|
The mapping rel type/property name/property type for Custom Neo4j types (point date). i.e. |
nodePropertyMappings
and relPropertyMappings
support the following Neo4j types:
Point
, Localdate
, Localtime
, Localdatetime
, Duration
, offsettime
, and Zoneddatetime
.
Importing from a file
By default importing from the file system is disabled.
We can enable it by setting the following property in apoc.conf
:
apoc.import.file.enabled=true
If we try to use any of the import procedures without having first set this property, we’ll get the following error message:
Failed to invoke procedure: Caused by: java.lang.RuntimeException: Import from files not enabled, please set apoc.import.file.enabled=true in your apoc.conf |
Import files are read from the import
directory, which is defined by the server.directories.import
property.
This means that any file path that we provide is relative to this directory.
If we try to read from an absolute path, such as /tmp/filename
, we’ll get an error message similar to the following one:
Failed to invoke procedure: Caused by: java.lang.RuntimeException: Can’t read url or key file:/path/to/neo4j/import/tmp/filename as json: /path/to/neo4j//import/tmp/filename (No such file or directory) |
We can enable reading files from anywhere on the file system by setting the following property in apoc.conf
:
apoc.import.file.use_neo4j_config=false
Neo4j will now be able to read from anywhere on the file system, so be sure that this is your intention before setting this property. |
JSON-Path
Using JSON paths gives you a condensed way to read and process sub-documents and sub-values from nested JSON structures. This is especially helpful if you need to skip over unwinding higher-level parent objects in order to access more nested data, or if you need to manipulate values in those substructures.
Rather than passing in a large JSON file and using Cypher to unwind each object and access what you need, you can pass in the file and provide the JSON path to the substructures you need, resulting in shorter statements for nested JSON. The JSON path format follows the Java implementation by Jayway of Stefan Gössner’s JSONPath, providing a consistent syntax for the paths.
Many of the apoc.convert.*Json*
procedures and functions, as well as the apoc.load.json
procedures, also accept a JSON path as their last argument.
Note that these functions are meant to stream arrays (of values or objects) and maps, not a single value.
If a single item containing a single value is specified as the path, the function must try to wrap it and will not return expected results.
There is also the apoc.json.path(json,path)
function that takes a JSON string (not map or list) and retrieves values from the json path provided as the second argument.
Note: if the JSON is not already in string format, you can use the apoc.convert.toJson
function to convert it.
More examples can be found at the links provided above, but let us look at an example of the syntax for JSON paths.
The syntax shown below pulls the items
array from the StackOverflow API of Neo4j questions and retrieves the array of tags
from the first object in the item list.
$.items[0].tags
All of the operators and options for specifying JSON paths are included in the next table.
Operator | Description | Example |
---|---|---|
|
The root element to query. This starts all path expressions. |
|
|
The current node being processed by a filter predicate. |
|
|
Wildcard. Available anywhere a name or numeric are required. |
|
|
Deep scan. Available anywhere a name is required. |
|
|
Dot-notated child |
|
|
Array index or indexes |
|
|
Array slice operator |
|
|
Filter expression. Expression must evaluate to a boolean value. |
|
Moreover, we can customize the Json path options, adding the config {pathOptions: LIST OF STRINGS
},
where the strings are based on Enum<Option>.
The default value is ["SUPPRESS_EXCEPTIONS", "DEFAULT_PATH_LEAF_TO_NULL"]
. Note that we can also insert []
, that is "without options".
So with the following json:
{ "columns": {
"col2": {
"_id": "772col2"
}
}
}
we can execute (with default pathOptions
):
CALL apoc.load.json($url, '$..columns');
value |
---|
[ {"col2": { "_id": "772col2" }}, null, null ] |
or, with custom path options:
CALL apoc.load.json($url, '$..columns', ['ALWAYS_RETURN_LIST']);
Output |
---|
[ {"col2": { "_id": "772col2" }} ] |
Examples
The following section contains examples showing how to import data from various JSON sources.
Import from local file
person.json
contains a JSON document representing a person and their children.
{
"name":"Michael",
"age": 41,
"children": ["Selina","Rana","Selma"]
}
We’ll place this file into the import
directory of our Neo4j instance.
Let’s now write a query using the apoc.load.json
procedure to explore this file.
person.json
and returns the content as Cypher data structuresCALL apoc.load.json("file:///person.json")
YIELD value
RETURN value;
value |
---|
{name: "Michael", children: ["Selina", "Rana", "Selma"], age: 41} |
We get back a map that looks almost the same as the JSON document.
We can now extend that query to create a graph based on this JSON file.
We’ll create a Person
node for Michael and each of his children, and a CHILD_OF
relationship from each child to the Michael node.
person.json
CALL apoc.load.json("file:///person.json")
YIELD value
MERGE (p:Person {name: value.name})
SET p.age = value.age
WITH p, value
UNWIND value.children AS child
MERGE (c:Person {name: child})
MERGE (c)-[:CHILD_OF]->(p);
The Neo4j Browser visualization below shows the imported graph:
You can use failOnError
configuration to handle the result in case of incorrect url or json.
For example, with the help of the apoc.when
procedure, you can return nothingToDo
as result with incorrect url:
CALL apoc.load.json("MY_JSON_URL", null, {failOnError:false})
YIELD value
WITH collect(value) as values
call apoc.do.when(values = [], "return 'nothingToDo' as result", "return values as result", {values: values})
YIELD value
UNWIND value["result"] as result
RETURN result
Import from StackOverflow API
apoc.load.json
enables loading JSON data from any file or URL.
If the result is a JSON object, it is returned as a singular map.
If the result is an array, it is turned into a stream of maps.
StackOverflow provides several APIs, including one for retrieving recent questions and answers. The URL for retrieving the last questions and answers for the neo4j tag is:
https://api.stackexchange.com/2.2/questions?pagesize=100&order=desc&sort=creation&tagged=neo4j&site=stackoverflow&filter=!5-i6Zw8Y)4W7vpy91PMYsKM-k9yzEsSC1_Uxlf
Since this is a rather long URL string, we can simplify the syntax by configuring aliases in apoc.conf
The third value in the Original call with full json url string
New call with aliased string with full string in apoc.conf
|
Let’s introspect the data that is returned from this end point.
neo4j
tag on StackOverflowWITH "https://api.stackexchange.com/2.2/questions?pagesize=100&order=desc&sort=creation&tagged=neo4j&site=stackoverflow&filter=!5-i6Zw8Y)4W7vpy91PMYsKM-k9yzEsSC1_Uxlf" AS url
CALL apoc.load.json(url) YIELD value
UNWIND value.items AS item
RETURN item.title, item.owner, item.creation_date, keys(item)
LIMIT 5;
item.title | item.owner | item.creation_date | keys(item) |
---|---|---|---|
"Cypher patten for getting self related nodes" |
{profile_image: "https://lh3.googleusercontent.com/-1FWbhuaEBiQ/AAAAAAAAAAI/AAAAAAAAAIA/tLM_mEb-8MY/photo.jpg?sz=128", user_type: "registered", user_id: 5730203, link: "https://stackoverflow.com/users/5730203/asif-ali", reputation: 1148, display_name: "Asif Ali", accept_rate: 90} |
1586944991 |
["owner", "comment_count", "link", "last_activity_date", "creation_date", "answer_count", "title", "question_id", "tags", "share_link", "score", "down_vote_count", "body_markdown", "favorite_count", "is_answered", "delete_vote_count", "close_vote_count", "view_count", "up_vote_count"] |
"Problem connecting .NET Client to Neo4j Desktop version 4" |
{profile_image: "https://www.gravatar.com/avatar/a3fac35d600d1d462d8fc12f3926074c?s=128&d=identicon&r=PG&f=1", user_type: "registered", user_id: 2853912, link: "https://stackoverflow.com/users/2853912/user2853912", reputation: 21, display_name: "user2853912"} |
1586938954 |
["owner", "comment_count", "link", "last_activity_date", "creation_date", "answer_count", "title", "question_id", "tags", "share_link", "score", "down_vote_count", "body_markdown", "favorite_count", "is_answered", "delete_vote_count", "close_vote_count", "view_count", "up_vote_count"] |
"What kind of graph algorithm does Neo4j use?" |
{profile_image: "https://www.gravatar.com/avatar/736024b862a229111d4b3119875753b0?s=128&d=identicon&r=PG&f=1", user_type: "registered", user_id: 4402081, link: "https://stackoverflow.com/users/4402081/mariappan", reputation: 7, display_name: "Mariappan"} |
1586901300 |
["owner", "comment_count", "answers", "link", "last_activity_date", "creation_date", "answer_count", "title", "question_id", "tags", "share_link", "score", "down_vote_count", "body_markdown", "favorite_count", "is_answered", "delete_vote_count", "close_vote_count", "view_count", "up_vote_count"] |
"Import json file to Neo4j" |
{profile_image: "https://lh3.googleusercontent.com/-PWDC85Kp2ig/AAAAAAAAAAI/AAAAAAAAAAA/AB6qoq3nhmVZl-_0VDKESOG5MsyHvXnw_A/mo/photo.jpg?sz=128", user_type: "registered", user_id: 9964138, link: "https://stackoverflow.com/users/9964138/jo%c3%a3o-costa", reputation: 23, display_name: "João Costa"} |
1586897574 |
["owner", "comment_count", "answers", "link", "last_activity_date", "creation_date", "answer_count", "title", "question_id", "tags", "share_link", "score", "down_vote_count", "body_markdown", "favorite_count", "is_answered", "delete_vote_count", "close_vote_count", "view_count", "up_vote_count"] |
"Difference between Neo4j Graph Algorithms and Graph Data Science" |
{profile_image: "https://i.stack.imgur.com/2rLPZ.jpg?s=128&g=1", user_type: "registered", user_id: 3297954, link: "https://stackoverflow.com/users/3297954/rotten", reputation: 1295, display_name: "rotten", accept_rate: 75} |
1586872077 |
["owner", "comment_count", "answers", "link", "last_activity_date", "creation_date", "answer_count", "title", "question_id", "tags", "share_link", "score", "down_vote_count", "body_markdown", "favorite_count", "is_answered", "delete_vote_count", "close_vote_count", "view_count", "up_vote_count"] |
Let’s now create a Neo4j graph based on those entities.
WITH "https://api.stackexchange.com/2.2/questions?pagesize=100&order=desc&sort=creation&tagged=neo4j&site=stackoverflow&filter=!5-i6Zw8Y)4W7vpy91PMYsKM-k9yzEsSC1_Uxlf" AS url
CALL apoc.load.json(url) YIELD value
UNWIND value.items AS q
MERGE (question:Question {id:q.question_id})
ON CREATE SET question.title = q.title,
question.share_link = q.share_link,
question.favorite_count = q.favorite_count
FOREACH (tagName IN q.tags | MERGE (tag:Tag {name:tagName}) MERGE (question)-[:TAGGED]->(tag))
FOREACH (a IN q.answers |
MERGE (question)<-[:ANSWERS]-(answer:Answer {id:a.answer_id})
MERGE (answerer:User {id:a.owner.user_id}) ON CREATE SET answerer.display_name = a.owner.display_name
MERGE (answer)<-[:PROVIDED]-(answerer)
)
WITH * WHERE NOT q.owner.user_id IS NULL
MERGE (owner:User {id:q.owner.user_id}) ON CREATE SET owner.display_name = q.owner.display_name
MERGE (owner)-[:ASKED]->(question)
The Neo4j Browser visualization below shows the imported graph:
Use JSON Path and Import from StackOverflow API
We can narrow down the data that we sift through and import using the JSON path syntax. This will allow us to specify substructures to import and ignore the rest of the data. For this example, we only want to import answers and the members posting those answers.
WITH "https://api.stackexchange.com/2.2/questions?pagesize=100&order=desc&sort=creation&tagged=neo4j&site=stackoverflow&filter=!5-i6Zw8Y)4W7vpy91PMYsKM-k9yzEsSC1_Uxlf" AS url
CALL apoc.load.json(url,'$.items[?(@.answer_count>0)].answers[*]') YIELD value
RETURN value LIMIT 5;
Notice that we are only looking at StackOverflow questions that have an answer count greater than 0. That means we are only passing along the question JSON objects that have answers, as the rest do not pertain to our use case. With this in mind, let us import those with this statement:
WITH "https://api.stackexchange.com/2.2/questions?pagesize=100&order=desc&sort=creation&tagged=neo4j&site=stackoverflow&filter=!5-i6Zw8Y)4W7vpy91PMYsKM-k9yzEsSC1_Uxlf" AS url
CALL apoc.load.json(url,'$.items[?(@.answer_count>0)].answers[*]') YIELD value
MERGE (a:Answer {id: value.answer_id})
ON CREATE SET a.accepted = value.is_accepted,
a.shareLink = value.share_link,
a.lastActivityDate = value.last_activity_date,
a.creationDate = value.creation_date,
a.title = value.title,
a.score = value.score
MERGE (q:Question {id: value.question_id})
MERGE (a)-[rel:POSTED_TO]->(q)
WITH a as answer, value.owner as value
MERGE (u:User {userId: value.user_id})
ON CREATE SET u.displayName = value.display_name,
u.userType = value.user_type,
u.reputation = value.reputation,
u.userLink = value.link
MERGE (u)-[rel2:SUBMITTED]->(answer)
RETURN count(answer)
This imports around 78 answers to our graph. We can then explore this graph to find out which users submitted the most answers, have the highest ratings, and more.
Import JSON file created by Export JSON procedures
The apoc.import.json
procedure can be used to import JSON files created by the apoc.export.json.*
procedures, exported using the config parameter jsonFormat: 'JSON_LINES'
(default config).
This procedure supports the following config parameters:
name | default | description |
---|---|---|
unwindBatchSize |
|
the batch size of the unwind |
txBatchSize |
|
the batch size of the transacttion |
importIdName |
|
the name of the property to be populated with the "id" field present into the json. For example a row |
nodePropertyMappings |
|
The mapping label/property name/property type for Custom Neo4j types (point date). I.e. |
relPropertyMappings |
|
The mapping rel type/property name/property type for Custom Neo4j types (point date). I.e. |
nodePropertyMappings
and relPropertyMappings
support the following Neo4j types:
-
Point
-
Localdate
-
Localtime
-
Localdatetime
-
Duration
-
offsettime
-
Zoneddatetime
all.json
contains a subset of Neo4j’s movies graph, and was generated by the Export JSON procedure.
{"type":"node","id":"0","labels":["User"],"properties":{"born":"2015-07-04T19:32:24","name":"Adam","place":{"crs":"wgs-84","latitude":13.1,"longitude":33.46789,"height":null},"age":42,"male":true,"kids":["Sam","Anna","Grace"]}}
{"type":"node","id":"1","labels":["User"],"properties":{"name":"Jim","age":42}}
{"type":"node","id":"2","labels":["User"],"properties":{"age":12}}
{"id":"0","type":"relationship","label":"KNOWS","properties":{"bffSince":"P5M1DT12H","since":1993},"start":{"id":"0","labels":["User"],"properties":{"born":"2015-07-04T19:32:24","name":"Adam","place":{"crs":"wgs-84","latitude":13.1,"longitude":33.46789,"height":null},"age":42,"male":true,"kids":["Sam","Anna","Grace"]}},"end":{"id":"1","labels":["User"],"properties":{"name":"Jim","age":42}}}
We can import this file using apoc.import.json
.
CALL apoc.import.json("file:///all.json")
file | source | format | nodes | relationships | properties | time | rows | batchSize | batches | done | data |
---|---|---|---|---|---|---|---|---|---|---|---|
"file:///all.json" |
"file" |
"json" |
3 |
1 |
15 |
105 |
4 |
-1 |
0 |
TRUE |
NULL |
Make a GET request to the Wikipedia Action API
The following makes a GET request to the Wikipedia Action API:
apoc.load.jsonParams
allows explicit specification of HTTP request parameters. To perform a GET request to a JSON endpoint, the method
parameter in the configuration can be set to GET
.
CALL apoc.load.jsonParams(
"https://en.wikipedia.org/w/api.php?action=query&titles=Neo4j&format=json&formatversion=2",
{method: "GET"},
"",
".query.pages[0]"
) YIELD value
RETURN value
value |
---|
{ "title": "Neo4j", "ns": 0, "pageid": 25505874 } |
Note that the URL needs to be properly encoded. This can be achieved using apoc.text.urlencode.
WITH apoc.text.urlencode("Auvergne-Rhône-Alpes") AS title
CALL apoc.load.jsonParams(
"https://en.wikipedia.org/w/api.php?action=query&titles="+title+"&format=json&formatversion=2",
{method: "GET"}, "", ".query.pages[0]"
) YIELD value
RETURN value.title AS title, value.pageid AS pageid
title | pageid |
---|---|
"Auvergne-Rhône-Alpes" |
45093325 |