Load JSON

Web APIs are a huge opportunity to access and integrate data from any sources with your graph. Most of them provide the data in JSON format.

The Load JSON procedures retrieve data from URLs or maps and turn it into map value(s) for Cypher to consume. Cypher has support for deconstructing nested documents with dot syntax, slices, UNWIND etc. so it is easy to turn nested data into graphs.

Sources with multiple JSON objects (JSONL,JSON Lines) in a stream, like the streaming Twitter format or the Yelp Kaggle dataset, are also supported,

Procedure Overview

The table below describes the available procedures:

Qualified Name Type Release

apoc.load.json

- import JSON as stream of values if the JSON was an array or a single value if it was a map

Procedure

APOC Core

apoc.load.jsonParams

- load from JSON URL (e.g. web-api) while sending headers / payload to import JSON as stream of values if the JSON was an array or a single value if it was a map

Procedure

APOC Core

apoc.load.jsonArray

- load array from JSON URL (e.g. web-api) to import JSON as stream of values

Procedure

APOC Core

apoc.import.json

- imports the json list to the provided file

Procedure

APOC Core

apoc.load.json

This procedure takes a file or HTTP URL and parses the JSON into a map data structure.

signature

apoc.load.json(urlOrKeyOrBinary :: ANY?, path = :: STRING?, config = {} :: MAP?) :: (value :: MAP?)

It supports the following config parameter:

Table 1. Config
name type default description

failOnError

boolean

true

fail if error encountered while parsing JSON

binary

Enum[NONE, BYTES, GZIP, BZIP2, DEFLATE, BLOCK_LZ4, FRAMED_SNAPPY]

null

If not null, allow to take binary data instead of a file name/url as first parameter. Similar to Binary file example

compression

Enum[NONE, BYTES, GZIP, BZIP2, DEFLATE, BLOCK_LZ4, FRAMED_SNAPPY]

null

Allow taking binary data, either not compressed (value: NONE) or compressed (other values). Similar to Binary file example

apoc.load.jsonParams

This procedure takes a file or HTTP URL and parses the JSON into a map data structure. It is a more configurable version of apoc.load.json that enables processing of endpoints that require HTTP headers or JSON payloads.

signature

apoc.load.jsonParams(urlOrKeyOrBinary :: ANY?, headers :: MAP?, payload :: STRING?, path = :: STRING?, config = {} :: MAP?) :: (value :: MAP?)

It supports the following config parameter:

Table 2. Config
name type default description

failOnError

boolean

true

fail if error encountered while parsing JSON

apoc.load.jsonArray

This procedure takes a file or HTTP URL containing a JSON array, and parses it into a stream of maps.

signature

apoc.load.jsonArray(url :: STRING?, path = :: STRING?, config = {} :: MAP?) :: (value :: ANY?)

apoc.import.json

This procedure can be used to import JSON files created by the Export JSON procedures, exported using the config parameter jsonFormat: 'JSON_LINES' (default config).

signature

apoc.import.json(urlOrBinaryFile :: ANY?, config = {} :: MAP?) :: (file :: STRING?, source :: STRING?, format :: STRING?, nodes :: INTEGER?, relationships :: INTEGER?, properties :: INTEGER?, time :: INTEGER?, rows :: INTEGER?, batchSize :: INTEGER?, batches :: INTEGER?, done :: BOOLEAN?, data :: STRING?)

It supports the following config parameters:

Table 3. Config parameters
name type default description

unwindBatchSize

Long

5000

the batch size of the unwind

txBatchSize

Long

5000

the batch size of the transacttion

importIdName

String

neo4jImportId

the name of the property to be populated with the "id" field present into the json. For example a row {"type":"node", "labels":["Language"], "id":"10"}, with importIdName:`foo`, will create a node (:User {foo: "10"})

nodePropertyMappings

Map

{}

The mapping label/property name/property type for Custom Neo4j types (point date).

i.e. { User: { born: 'Point', dateOfBirth: 'Datetime' } }

relPropertyMappings

Map

{}

The mapping rel type/property name/property type for Custom Neo4j types (point date).

i.e. { KNOWS: { since: 'Datetime' } }

nodePropertyMappings and relPropertyMappings support the following Neo4j types:

Point, Localdate, Localtime, Localdatetime, Duration, offsettime, and Zoneddatetime.

Importing from a file

By default importing from the file system is disabled. We can enable it by setting the following property in apoc.conf:

apoc.conf
apoc.import.file.enabled=true

If we try to use any of the import procedures without having first set this property, we’ll get the following error message:

Failed to invoke procedure: Caused by: java.lang.RuntimeException: Import from files not enabled, please set apoc.import.file.enabled=true in your apoc.conf

Import files are read from the import directory, which is defined by the dbms.directories.import property. This means that any file path that we provide is relative to this directory. If we try to read from an absolute path, such as /tmp/filename, we’ll get an error message similar to the following one:

Failed to invoke procedure: Caused by: java.lang.RuntimeException: Can’t read url or key file:/path/to/neo4j/import/tmp/filename as json: /path/to/neo4j//import/tmp/filename (No such file or directory)

We can enable reading files from anywhere on the file system by setting the following property in apoc.conf:

apoc.conf
apoc.import.file.use_neo4j_config=false

Neo4j will now be able to read from anywhere on the file system, so be sure that this is your intention before setting this property.

JSON-Path

Using JSON paths gives you a condensed way to read and process sub-documents and sub-values from nested JSON structures. This is especially helpful if you need to skip over unwinding higher-level parent objects in order to access more nested data, or if you need to manipulate values in those substructures.

Rather than passing in a large JSON file and using Cypher to unwind each object and access what you need, you can pass in the file and provide the JSON path to the substructures you need, resulting in shorter statements for nested JSON. The JSON path format follows the Java implementation by Jayway of Stefan Gössner’s JSONPath, providing a consistent syntax for the paths.

Many of the apoc.convert.Json procedures and functions, as well as the apoc.load.json procedure, now accept a json path as last argument. Note that these functions are meant to stream arrays (of values or objects) and maps, not a single value. If a single item containing a single value is specified as the path, the function must try to wrap it and will not return expected results.

There is also the apoc.json.path(json,path) function that takes a JSON string (not map or list) and retrieves values from the json path provided as the second argument. Note: if the JSON is not already in string format, you can use the apoc.convert.toJson function to convert it.

More examples can be found at the links provided above, but let us look at an example of the syntax for JSON paths. The syntax shown below pulls the items array from the StackOverflow API of Neo4j questions and retrieves the array of tags from the first object in the item list.

$.items[0].tags

All of the operators and options for specifying JSON paths are included in the next table.

Table 4. Operators
Operator Description Example

$

The root element to query. This starts all path expressions.

$ - retrieve all data in parent object

@

The current node being processed by a filter predicate.

$.items[?(@.answer_count > 0)] - retrieve the item if it has an answer_count greater than 0

*

Wildcard. Available anywhere a name or numeric are required.

$.items[\*] - retrieve all items in array

..

Deep scan. Available anywhere a name is required.

$..tags[\*] - find substructure named tags and pull all the values

.<name>

Dot-notated child

$.items[0:1].owner.user_id - retrieve user_id for the first item (in the owner object)

[<number> (,<number>)]

Array index or indexes

$.items[0,-1] - retrieve first and last item in array

[start:end]

Array slice operator

$.items[0:5] - retrieve the first through fifth items in the array

[?(<expression>)]

Filter expression. Expression must evaluate to a boolean value.

$.items[?(@.is_answered == true)] - retrieve items where the is_answered field is true

Moreover, we can customize the Json path options, adding the config {pathOptions: LIST OF STRINGS}, where the strings are based on Enum<Option>. The default value is ["SUPPRESS_EXCEPTIONS", "DEFAULT_PATH_LEAF_TO_NULL"]. Note that we can also insert [], that is "without options". So with the following json:

{ "columns": {
      "col2": {
        "_id": "772col2"
      }
    }
}

we can execute (with default pathOptions):

CALL apoc.load.json($url, '$..columns');
Table 5. Results
value

[ {"col2": { "_id": "772col2" }}, null, null ]

or, with custom path options:

CALL apoc.load.json($url, '$..columns', ['ALWAYS_RETURN_LIST']);
Table 6. Results
Output

[ {"col2": { "_id": "772col2" }} ]

Examples

The following section contains examples showing how to import data from various JSON sources.

Import from local file

person.json contains a JSON document representing a person and their children.

person.json
{
 "name":"Michael",
 "age": 41,
 "children": ["Selina","Rana","Selma"]
}

We’ll place this file into the import directory of our Neo4j instance. Let’s now write a query using the apoc.load.json procedure to explore this file.

The following query processes person.json and returns the content as Cypher data structures
CALL apoc.load.json("file:///person.json")
YIELD value
RETURN value;
Table 7. Results
value

{name: "Michael", children: ["Selina", "Rana", "Selma"], age: 41}

We get back a map that looks almost the same as the JSON document. We can now extend that query to create a graph based on this JSON file. We’ll create a Person node for Michael and each of his children, and a CHILD_OF relationship from each child to the Michael node.

The following creates a graph based on person.json
CALL apoc.load.json("file:///person.json")
YIELD value
MERGE (p:Person {name: value.name})
SET p.age = value.age
WITH p, value
UNWIND value.children AS child
MERGE (c:Person {name: child})
MERGE (c)-[:CHILD_OF]->(p);

The Neo4j Browser visualization below shows the imported graph:

apoc.load.json.local.file

You can use failOnError configuration to handle the result in case of incorrect url or json. For example, with the help of the apoc.when procedure, you can return nothingToDo as result with incorrect url:

CALL apoc.load.json("MY_JSON_URL", null, {failOnError:false})
YIELD value
WITH collect(value) as values
call apoc.do.when(values = [], "return 'nothingToDo' as result", "return values as result", {values: values})
YIELD value
UNWIND value["result"] as result
RETURN result

Import from StackOverflow API

apoc.load.json enables loading JSON data from any file or URL. If the result is a JSON object, it is returned as a singular map. If the result is an array, it is turned into a stream of maps.

StackOverflow provides several APIs, including one for retrieving recent questions and answers. The URL for retrieving the last questions and answers for the neo4j tag is:

https://api.stackexchange.com/2.2/questions?pagesize=100&order=desc&sort=creation&tagged=neo4j&site=stackoverflow&filter=!5-i6Zw8Y)4W7vpy91PMYsKM-k9yzEsSC1_Uxlf

Since this is a rather long URL string, we can simplify the syntax by configuring aliases in conf/apoc.conf:

apoc.conf
apoc.json.myJson.url=https://api.stackexchange.com/2.2/questions?pagesize=100&order=desc&sort=creation&tagged=neo4j&site=stackoverflow&filter=!5-i6Zw8Y)4W7vpy91PMYsKM-k9yzEsSC1_Uxlf

The third value in the apoc.json.<alias>.url= effectively defines the variable to be used in apoc.load.json('<alias>',…​.. With this, the massive JSON url string below can be aliased to a shorter string.

Original call with full json url string
CALL apoc.load.json('https://api.stackexchange.com/2.2/questions?pagesize=100&order=desc&sort=creation&tagged=neo4j&site=stackoverflow&filter=!5-i6Zw8Y)4W7vpy91PMYsKM-k9yzEsSC1_Uxlf')
New call with aliased string with full string in apoc.conf
CALL apoc.load.json('myJson')

Let’s introspect the data that is returned from this end point.

The following finds the 5 most recent questions with the neo4j tag on StackOverflow
WITH "https://api.stackexchange.com/2.2/questions?pagesize=100&order=desc&sort=creation&tagged=neo4j&site=stackoverflow&filter=!5-i6Zw8Y)4W7vpy91PMYsKM-k9yzEsSC1_Uxlf" AS url
CALL apoc.load.json(url) YIELD value
UNWIND value.items AS item
RETURN item.title, item.owner, item.creation_date, keys(item)
LIMIT 5;
Table 8. Results
item.title item.owner item.creation_date keys(item)

"Cypher patten for getting self related nodes"

{profile_image: "https://lh3.googleusercontent.com/-1FWbhuaEBiQ/AAAAAAAAAAI/AAAAAAAAAIA/tLM_mEb-8MY/photo.jpg?sz=128", user_type: "registered", user_id: 5730203, link: "https://stackoverflow.com/users/5730203/asif-ali", reputation: 1148, display_name: "Asif Ali", accept_rate: 90}

1586944991

["owner", "comment_count", "link", "last_activity_date", "creation_date", "answer_count", "title", "question_id", "tags", "share_link", "score", "down_vote_count", "body_markdown", "favorite_count", "is_answered", "delete_vote_count", "close_vote_count", "view_count", "up_vote_count"]

"Problem connecting .NET Client to Neo4j Desktop version 4"

{profile_image: "https://www.gravatar.com/avatar/a3fac35d600d1d462d8fc12f3926074c?s=128&d=identicon&r=PG&f=1", user_type: "registered", user_id: 2853912, link: "https://stackoverflow.com/users/2853912/user2853912", reputation: 21, display_name: "user2853912"}

1586938954

["owner", "comment_count", "link", "last_activity_date", "creation_date", "answer_count", "title", "question_id", "tags", "share_link", "score", "down_vote_count", "body_markdown", "favorite_count", "is_answered", "delete_vote_count", "close_vote_count", "view_count", "up_vote_count"]

"What kind of graph algorithm does Neo4j use?"

{profile_image: "https://www.gravatar.com/avatar/736024b862a229111d4b3119875753b0?s=128&d=identicon&r=PG&f=1", user_type: "registered", user_id: 4402081, link: "https://stackoverflow.com/users/4402081/mariappan", reputation: 7, display_name: "Mariappan"}

1586901300

["owner", "comment_count", "answers", "link", "last_activity_date", "creation_date", "answer_count", "title", "question_id", "tags", "share_link", "score", "down_vote_count", "body_markdown", "favorite_count", "is_answered", "delete_vote_count", "close_vote_count", "view_count", "up_vote_count"]

"Import json file to Neo4j"

{profile_image: "https://lh3.googleusercontent.com/-PWDC85Kp2ig/AAAAAAAAAAI/AAAAAAAAAAA/AB6qoq3nhmVZl-_0VDKESOG5MsyHvXnw_A/mo/photo.jpg?sz=128", user_type: "registered", user_id: 9964138, link: "https://stackoverflow.com/users/9964138/jo%c3%a3o-costa", reputation: 23, display_name: "João Costa"}

1586897574

["owner", "comment_count", "answers", "link", "last_activity_date", "creation_date", "answer_count", "title", "question_id", "tags", "share_link", "score", "down_vote_count", "body_markdown", "favorite_count", "is_answered", "delete_vote_count", "close_vote_count", "view_count", "up_vote_count"]

"Difference between Neo4j Graph Algorithms and Graph Data Science"

{profile_image: "https://i.stack.imgur.com/2rLPZ.jpg?s=128&g=1", user_type: "registered", user_id: 3297954, link: "https://stackoverflow.com/users/3297954/rotten", reputation: 1295, display_name: "rotten", accept_rate: 75}

1586872077

["owner", "comment_count", "answers", "link", "last_activity_date", "creation_date", "answer_count", "title", "question_id", "tags", "share_link", "score", "down_vote_count", "body_markdown", "favorite_count", "is_answered", "delete_vote_count", "close_vote_count", "view_count", "up_vote_count"]

Let’s now create a Neo4j graph based on those entities.

The following creates a graph based on data from the StackOverflow API
WITH "https://api.stackexchange.com/2.2/questions?pagesize=100&order=desc&sort=creation&tagged=neo4j&site=stackoverflow&filter=!5-i6Zw8Y)4W7vpy91PMYsKM-k9yzEsSC1_Uxlf" AS url
CALL apoc.load.json(url) YIELD value
UNWIND value.items AS q
MERGE (question:Question {id:q.question_id})
ON CREATE SET question.title = q.title,
              question.share_link = q.share_link,
              question.favorite_count = q.favorite_count

FOREACH (tagName IN q.tags | MERGE (tag:Tag {name:tagName}) MERGE (question)-[:TAGGED]->(tag))
FOREACH (a IN q.answers |
   MERGE (question)<-[:ANSWERS]-(answer:Answer {id:a.answer_id})
   MERGE (answerer:User {id:a.owner.user_id}) ON CREATE SET answerer.display_name = a.owner.display_name
   MERGE (answer)<-[:PROVIDED]-(answerer)
)

WITH * WHERE NOT q.owner.user_id IS NULL
MERGE (owner:User {id:q.owner.user_id}) ON CREATE SET owner.display_name = q.owner.display_name
MERGE (owner)-[:ASKED]->(question)

The Neo4j Browser visualization below shows the imported graph:

apoc load json so

Use JSON Path and Import from StackOverflow API

We can narrow down the data that we sift through and import using the JSON path syntax. This will allow us to specify substructures to import and ignore the rest of the data. For this example, we only want to import answers and the members posting those answers.

Find StackOverflow answers using JSON path (only retrieve sample of 5)
WITH "https://api.stackexchange.com/2.2/questions?pagesize=100&order=desc&sort=creation&tagged=neo4j&site=stackoverflow&filter=!5-i6Zw8Y)4W7vpy91PMYsKM-k9yzEsSC1_Uxlf" AS url
CALL apoc.load.json(url,'$.items[?(@.answer_count>0)].answers[*]') YIELD value
RETURN value LIMIT 5;

Notice that we are only looking at StackOverflow questions that have an answer count greater than 0. That means we are only passing along the question JSON objects that have answers, as the rest do not pertain to our use case. With this in mind, let us import those with this statement:

WITH "https://api.stackexchange.com/2.2/questions?pagesize=100&order=desc&sort=creation&tagged=neo4j&site=stackoverflow&filter=!5-i6Zw8Y)4W7vpy91PMYsKM-k9yzEsSC1_Uxlf" AS url
CALL apoc.load.json(url,'$.items[?(@.answer_count>0)].answers[*]') YIELD value
MERGE (a:Answer {id: value.answer_id})
  ON CREATE SET a.accepted = value.is_accepted,
                a.shareLink = value.share_link,
                a.lastActivityDate = value.last_activity_date,
                a.creationDate = value.creation_date,
                a.title = value.title,
                a.score = value.score
MERGE (q:Question {id: value.question_id})
MERGE (a)-[rel:POSTED_TO]->(q)
WITH a as answer, value.owner as value
MERGE (u:User {userId: value.user_id})
  ON CREATE SET u.displayName = value.display_name,
                u.userType = value.user_type,
                u.reputation = value.reputation,
                u.userLink = value.link
MERGE (u)-[rel2:SUBMITTED]->(answer)
RETURN count(answer)

This imports around 78 answers to our graph. We can then explore this graph to find out which users submitted the most answers, have the highest ratings, and more.

Import from Twitter API (with additional parameters)

With apoc.load.jsonParams we can send additional headers or payload with our JSON GET request, e.g. for the Twitter API:

Let’s first configure the Bearer and Twitter Search URL token in apoc.conf:

apoc.conf
apoc.static.twitter.bearer=XXXX
apoc.static.twitter.url=https://api.twitter.com/1.1/search/tweets.json?count=100&result_type=recent&lang=en&q=

These values can then be retrieved using the functions described in

static-values>>.

The following queries the Twitter API and returns the results
WITH apoc.static.getAll("twitter") AS twitter
CALL apoc.load.jsonParams(
  twitter.url + "oscon+OR+neo4j+OR+%23oscon+OR+%40neo4j",
  {Authorization:"Bearer "+twitter.bearer},
  null // payload
)
YIELD value
UNWIND value.statuses as status

WITH status, status.user as u, status.entities as e
RETURN status.id, status.text, u.screen_name,
       [t IN e.hashtags | t.text] as tags,
       e.symbols,
       [m IN e.user_mentions | m.screen_name] as mentions,
       [u IN e.urls | u.expanded_url] as urls;

Import JSON file created by Export JSON procedures

The apoc.import.json procedure can be used to import JSON files created by the apoc.export.json.* procedures, exported using the config parameter jsonFormat: 'JSON_LINES' (default config).

This procedure supports the following config parameters:

Table 9. Config parameters
name default description

unwindBatchSize

5000

the batch size of the unwind

txBatchSize

5000

the batch size of the transacttion

importIdName

String

neo4jImportId

the name of the property to be populated with the "id" field present into the json. For example a row {"type":"node", "labels":["Language"], "id":"10"}, with importIdName:`foo`, will create a node (:User {foo: "10"})

nodePropertyMappings

{}

The mapping label/property name/property type for Custom Neo4j types (point date). I.e. { User: { born: 'Point', dateOfBirth: 'Datetime' } }

relPropertyMappings

{}

nodePropertyMappings and relPropertyMappings support the following Neo4j types:

  • Point

  • Localdate

  • Localtime

  • Localdatetime

  • Duration

  • offsettime

  • Zoneddatetime

all.json contains a subset of Neo4j’s movies graph, and was generated by the Export JSON procedure.

all.json
{"type":"node","id":"0","labels":["User"],"properties":{"born":"2015-07-04T19:32:24","name":"Adam","place":{"crs":"wgs-84","latitude":13.1,"longitude":33.46789,"height":null},"age":42,"male":true,"kids":["Sam","Anna","Grace"]}}
{"type":"node","id":"1","labels":["User"],"properties":{"name":"Jim","age":42}}
{"type":"node","id":"2","labels":["User"],"properties":{"age":12}}
{"id":"0","type":"relationship","label":"KNOWS","properties":{"bffSince":"P5M1DT12H","since":1993},"start":{"id":"0","labels":["User"],"properties":{"born":"2015-07-04T19:32:24","name":"Adam","place":{"crs":"wgs-84","latitude":13.1,"longitude":33.46789,"height":null},"age":42,"male":true,"kids":["Sam","Anna","Grace"]}},"end":{"id":"1","labels":["User"],"properties":{"name":"Jim","age":42}}}

We can import this file using apoc.import.json.

CALL apoc.import.json("file:///all.json")
Table 10. Results
file source format nodes relationships properties time rows batchSize batches done data

"file:///all.json"

"file"

"json"

3

1

15

105

4

-1

0

TRUE

NULL

POST a query to the neo4j.com search API

We can perform a POST request to a JSON endpoint by setting the config parameter method to POST. We’ll also use the apoc.convert.toJson function to construct a JSON payload from a Cypher map.

The following makes a POST request to neo4j’s search API
CALL apoc.load.jsonParams(
  "https://neo4j.com/docs/search/",
  {method: "POST"},
  apoc.convert.toJson({query: "subquery", version: "4.0"})
)
Table 11. Results
value

{description: "The CALL {} clause evaluates a subquery that returns some values.", weight: 0.6460227966308594, title: "3.16. CALL {} (subquery) - Chapter 3. Clauses", uri: "https://neo4j.com/docs/cypher-manual/4.0/clauses/call-subquery/"}

{description: "This section provides examples of queries and Cypher commands that can be used with Neo4j Fabric.", weight: 0.05099273845553398, title: "7.3. Queries - Chapter 7. Fabric", uri: "https://neo4j.com/docs/operations-manual/4.0/fabric/queries/"}

{description: "WHERE adds constraints to the patterns in a MATCH or OPTIONAL MATCH clause or filters the results of a WITH clause.", weight: 0.03291567042469978, title: "3.6. WHERE - Chapter 3. Clauses", uri: "https://neo4j.com/docs/cypher-manual/4.0/clauses/where/"}

{description: "This appendix contains the recommended style when writing Cypher queries.", weight: 0.031550146639347076, title: "Appendix A. Cypher styleguide - The Neo4j Cypher Manual v4.0", uri: "https://neo4j.com/docs/cypher-manual/4.0/styleguide/"}

{description: "This section contains information on all the clauses in the Cypher query language.", weight: 0.02944066934287548, title: "Chapter 3. Clauses - The Neo4j Cypher Manual v4.0", uri: "https://neo4j.com/docs/cypher-manual/4.0/clauses/"}

{description: "", weight: 0.01821548491716385, title: "2.3. Expressions - Chapter 2. Syntax", uri: "https://neo4j.com/docs/cypher-manual/4.0/syntax/expressions/"}