apoc.import.xml

Procedure APOC Core

apoc.import.xml(file,config) - imports graph from provided file

Signature

apoc.import.xml(urlOrBinary :: ANY?, config = {} :: MAP?) :: (node :: NODE?)

Input parameters

Name	Type	Default
urlOrBinary	ANY?	null
config	MAP?	{}

Name

Type

Default

urlOrBinary

ANY?

null

config

MAP?

{}

Output parameters

Name	Type
node	NODE?

Name

Type

node

NODE?

Reading from a file

By default importing from the file system is disabled. We can enable it by setting the following property in apoc.conf:

apoc.conf

apoc.import.file.enabled=true

If we try to use any of the import procedures without having first set this property, we’ll get the following error message:

Failed to invoke procedure: Caused by: java.lang.RuntimeException: Import from files not enabled, please set apoc.import.file.enabled=true in your apoc.conf

Import files are read from the import directory, which is defined by the dbms.directories.import property. This means that any file path that we provide is relative to this directory. If we try to read from an absolute path, such as /tmp/filename, we’ll get an error message similar to the following one:

Failed to invoke procedure: Caused by: java.lang.RuntimeException: Can’t read url or key file:/path/to/neo4j/import/tmp/filename as json: /path/to/neo4j//import/tmp/filename (No such file or directory)

We can enable reading files from anywhere on the file system by setting the following property in apoc.conf:

apoc.conf

apoc.import.file.use_neo4j_config=false

Neo4j will now be able to read from anywhere on the file system, so be sure that this is your intention before setting this property.

Usage Examples

The examples in this section are based on the Microsoft book.xml file.

book.xml

<?xml version="1.0"?>
<catalog>
   <book id="bk101">
      <author>Gambardella, Matthew</author>
      <title>XML Developer's Guide</title>
      <genre>Computer</genre>
      <price>44.95</price>
      <publish_date>2000-10-01</publish_date>
      <description>An in-depth look at creating applications
      with XML.</description>
   </book>
   <book id="bk102">
      <author>Ralls, Kim</author>
      <title>Midnight Rain</title>
      <genre>Fantasy</genre>
      <price>5.95</price>
      <publish_date>2000-12-16</publish_date>
      <description>A former architect battles corporate zombies,
...

This file can be downloaded from GitHub.

We can write the following query to create a graph structure of the Microsoft books XML file.

CALL apoc.import.xml(
  "https://raw.githubusercontent.com/neo4j-contrib/neo4j-apoc-procedures/4.3/core/src/test/resources/xml/books.xml",
  {relType:'NEXT_WORD', label:'XmlWord'}
)
YIELD node
RETURN node;

node
(:XmlDocument {_xmlVersion: "1.0", _xmlEncoding: "UTF-8", url: "https://raw.githubusercontent.com/neo4j-contrib/neo4j-apoc-procedures/4.3/core/src/test/resources/xml/books.xml"})

node

(:XmlDocument {_xmlVersion: "1.0", _xmlEncoding: "UTF-8", url: "https://raw.githubusercontent.com/neo4j-contrib/neo4j-apoc-procedures/4.3/core/src/test/resources/xml/books.xml"})

The Neo4j Browser visualization below shows the imported graph:

Binary file

You can also import a file from a binary byte[] (not compressed) or a compressed file (allowed compression algos are: GZIP, BZIP2, DEFLATE, BLOCK_LZ4, FRAMED_SNAPPY).

CALL apoc.import.xml(`binaryGzipByteArray`,  {compression: 'GZIP'})

or:

CALL apoc.import.xml(`binaryFileNotCompressed`,  {compression: 'NONE'})

For example, this one works well with apoc.util.compress function:

WITH apoc.util.compress('<?xml version="1.0" encoding="UTF-8"?>
<parent name="databases">
    <child name="Neo4j">
        Neo4j is a graph database
    </child>
    <child name="relational">
        <grandchild name="MySQL"><![CDATA[
            MySQL is a database & relational
            ]]>
        </grandchild>
        <grandchild name="Postgres">
            Postgres is a relational database
        </grandchild>
    </child>
</parent>', {compression: 'DEFLATE'}) as xmlCompressed
CALL apoc.import.xml(xmlCompressed,  {compression: 'DEFLATE'})
YIELD node
RETURN node

Table 1. Results
node
[source,json] ---- { "identity": 11, "labels": [ "XmlDocument" ], "properties": { "_xmlEncoding": "UTF-8", "_xmlVersion": "1.0" } } ----