Using JSON paths gives you a condensed way to read and process sub-documents and sub-values from nested JSON structures. This is especially helpful if you need to skip over unwinding higher-level parent objects in order to access more nested data, or if you need to manipulate values in those substructures.

Rather than passing in a large JSON file and using Cypher to unwind each object and access what you need, you can pass in the file and provide the JSON path to the substructures you need, resulting in shorter statements for nested JSON. The JSON path format follows the Java implementation by Jayway of Stefan Gössner’s JSONPath, providing a consistent syntax for the paths.

Many of the apoc.convert.Json procedures and functions, as well as the apoc.load.json procedure, now accept a json path as last argument. Note that these functions are meant to stream arrays (of values or objects) and maps, not a single value. If a single item containing a single value is specified as the path, the function must try to wrap it and will not return expected results.

There is also the apoc.json.path(json,path) function that takes a JSON string (not map or list) and retrieves values from the json path provided as the second argument. Note: if the JSON is not already in string format, you can use the apoc.convert.toJson function to convert it.

More examples can be found at the links provided above, but let us look at an example of the syntax for JSON paths. The syntax shown below pulls the items array from the StackOverflow API of Neo4j questions and retrieves the array of tags from the first object in the item list.

$.items[0].tags

All of the operators and options for specifying JSON paths are included in the next table.

Table 1. Operators
Operator Description Example

$

The root element to query. This starts all path expressions.

$ - retrieve all data in parent object

@

The current node being processed by a filter predicate.

$.items[?(@.answer_count > 0)] - retrieve the item if it has an answer_count greater than 0

*

Wildcard. Available anywhere a name or numeric are required.

$.items[\*] - retrieve all items in array

..

Deep scan. Available anywhere a name is required.

$..tags[\*] - find substructure named tags and pull all the values

.<name>

Dot-notated child

$.items[0:1].owner.user_id - retrieve user_id for the first item (in the owner object)

[<number> (,<number>)]

Array index or indexes

$.items[0,-1] - retrieve first and last item in array

[start:end]

Array slice operator

$.items[0:5] - retrieve the first through fifth items in the array

[?(<expression>)]

Filter expression. Expression must evaluate to a boolean value.

$.items[?(@.is_answered == true)] - retrieve items where the is_answered field is true

Moreover, we can customize the Json path options, adding the config {pathOptions: LIST OF STRINGS}, where the strings are based on Enum<Option>. The default value is ["SUPPRESS_EXCEPTIONS", "DEFAULT_PATH_LEAF_TO_NULL"]. Note that we can also insert [], that is "without options". So with the following json:

{ "columns": {
      "col2": {
        "_id": "772col2"
      }
    }
}

we can execute (with default pathOptions):

CALL apoc.load.json($url, '$..columns');
Table 2. Results
value

[ {"col2": { "_id": "772col2" }}, null, null ]

or, with custom path options:

CALL apoc.load.json($url, '$..columns', ['ALWAYS_RETURN_LIST']);
Table 3. Results
Output

[ {"col2": { "_id": "772col2" }} ]