apoc.load.xml

Procedure APOC Core

apoc.load.xml('http://example.com/test.xml', 'xPath',config, false) YIELD value as doc CREATE (p:Person) SET p.name = doc.name - load from XML URL (e.g. web-api) to import XML as single nested map with attributes and _type, _text and _childrenx fields.

Signature

apoc.load.xml(urlOrBinary :: ANY?, path = / :: STRING?, config = {} :: MAP?, simple = false :: BOOLEAN?) :: (value :: MAP?)

Input parameters

Name	Type	Default
urlOrBinary	ANY?	null
path	STRING?	/
config	MAP?	{}
simple	BOOLEAN?	false

Name

Type

Default

urlOrBinary

ANY?

null

path

STRING?

config

MAP?

{}

simple

BOOLEAN?

false

Output parameters

Name	Type
value	MAP?

Name

Type

value

MAP?

Reading from a file

By default importing from the file system is disabled. We can enable it by setting the following property in apoc.conf:

apoc.conf

apoc.import.file.enabled=true

If we try to use any of the import procedures without having first set this property, we’ll get the following error message:

Failed to invoke procedure: Caused by: java.lang.RuntimeException: Import from files not enabled, please set apoc.import.file.enabled=true in your apoc.conf

Import files are read from the import directory, which is defined by the dbms.directories.import property. This means that any file path that we provide is relative to this directory. If we try to read from an absolute path, such as /tmp/filename, we’ll get an error message similar to the following one:

Failed to invoke procedure: Caused by: java.lang.RuntimeException: Can’t read url or key file:/path/to/neo4j/import/tmp/filename as json: /path/to/neo4j//import/tmp/filename (No such file or directory)

We can enable reading files from anywhere on the file system by setting the following property in apoc.conf:

apoc.conf

apoc.import.file.use_neo4j_config=false

Neo4j will now be able to read from anywhere on the file system, so be sure that this is your intention before setting this property.

Usage Examples

The examples in this section are based on the Microsoft book.xml file.

book.xml

<?xml version="1.0"?>
<catalog>
   <book id="bk101">
      <author>Gambardella, Matthew</author>
      <title>XML Developer's Guide</title>
      <genre>Computer</genre>
      <price>44.95</price>
      <publish_date>2000-10-01</publish_date>
      <description>An in-depth look at creating applications
      with XML.</description>
   </book>
   <book id="bk102">
      <author>Ralls, Kim</author>
      <title>Midnight Rain</title>
      <genre>Fantasy</genre>
      <price>5.95</price>
      <publish_date>2000-12-16</publish_date>
      <description>A former architect battles corporate zombies,
...

This file can be downloaded from GitHub.

Import from local file

The books.xml file described below contains the first two books from the Microsoft Books XML file. We’ll use the smaller file in this section to simplify our examples.

books.xml

<?xml version="1.0"?>
<catalog>
   <book id="bk101">
      <author>Gambardella, Matthew</author>
      <author>Arciniegas, Fabio</author>
      <title>XML Developer's Guide</title>
      <genre>Computer</genre>
      <price>44.95</price>
      <publish_date>2000-10-01</publish_date>
      <description>An in-depth look at creating applications
      with XML.</description>
   </book>
   <book id="bk102">
      <author>Ralls, Kim</author>
      <title>Midnight Rain</title>
      <genre>Fantasy</genre>
      <price>5.95</price>
      <publish_date>2000-12-16</publish_date>
      <description>A former architect battles corporate zombies,
      an evil sorceress, and her own childhood to become queen
      of the world.</description>
   </book>
</catalog>

We’ll place this file into the import directory of our Neo4j instance. Let’s now write a query using the apoc.load.xml procedure to explore this file.

The following query processes books.xml and returns the content as Cypher data structures

CALL apoc.load.xml("file:///books.xml")
YIELD value
RETURN value

Table 1. Results
value
{_type: "catalog", _children: [{_type: "book", _children: [{_type: "author", _text: "Gambardella, Matthew"}, {_type: "author", _text: "Arciniegas, Fabio"}, {_type: "title", _text: "XML Developer’s Guide"}, {_type: "genre", _text: "Computer"}, {_type: "price", _text: "44.95"}, {_type: "publish_date", _text: "2000-10-01"}, {_type: "description", _text: "An in-depth look at creating applications with XML."}], id: "bk101"}, {_type: "book", _children: [{_type: "author", _text: "Ralls, Kim"}, {_type: "title", _text: "Midnight Rain"}, {_type: "genre", _text: "Fantasy"}, {_type: "price", _text: "5.95"}, {_type: "publish_date", _text: "2000-12-16"}, {_type: "description", _text: "A former architect battles corporate zombies, an evil sorceress, and her own childhood to become queen of the world."}], id: "bk102"}]}

We get back a map representing the XML structure. Every time an XML element is nested inside another one, it is accessible via the .children property. We can write the following query to get a better understanding of what our file contains.

The following query processes book.xml and parses the results to pull out the title, description, genre, and authors

CALL apoc.load.xml("file:///books.xml")
YIELD value
UNWIND value._children AS book
RETURN book.id AS bookId,
       [item in book._children WHERE item._type = "title"][0] AS title,
       [item in book._children WHERE item._type = "description"][0] AS description,
       [item in book._children WHERE item._type = "author"] AS authors,
       [item in book._children WHERE item._type = "genre"][0] AS genre;

Table 2. Results
bookId	title	description	authors	genre
"bk101"	{_type: "title", _text: "XML Developer’s Guide"}	{_type: "description", _text: "An in-depth look at creating applications with XML."}	[{_type: "author", _text: "Gambardella, Matthew"}, {_type: "author", _text: "Arciniegas, Fabio"}]	{_type: "genre", _text: "Computer"}
"bk102"	{_type: "title", _text: "Midnight Rain"}	{_type: "description", _text: "A former architect battles corporate zombies, an evil sorceress, and her own childhood to become queen of the world."}	[{_type: "author", _text: "Ralls, Kim"}]	{_type: "genre", _text: "Fantasy"}

Let’s now create a graph of books and their metadata, authors, and genres.

The following query processes book.xml and parses the results to pull out the title, description, genre, and authors

CALL apoc.load.xml("file:///books.xml")
YIELD value
UNWIND value._children AS book

WITH book.id AS bookId,
     [item in book._children WHERE item._type = "title"][0] AS title,
     [item in book._children WHERE item._type = "description"][0] AS description,
     [item in book._children WHERE item._type = "author"] AS authors,
     [item in book._children WHERE item._type = "genre"][0] AS genre

MERGE (b:Book {id: bookId})
SET b.title = title._text, b.description = description._text

MERGE (g:Genre {name: genre._text})
MERGE (b)-[:HAS_GENRE]->(g)

WITH b, authors
UNWIND authors AS author
MERGE (a:Author {name:author._text})
MERGE (a)-[:WROTE]->(b);

The Neo4j Browser visualization below shows the imported graph:

Import from GitHub

We can also process XML files from HTTP or HTTPS URIs. Let’s start by processing the books.xml file hosted on GitHub.

This time we’ll pass in true as the 4th argument of the procedure. This means that the XML will be parsed in simple mode.

The following query loads the books.xml file from GitHub using simple mode

WITH "https://raw.githubusercontent.com/neo4j-contrib/neo4j-apoc-procedures/4.1/src/test/resources/xml/books.xml" AS uri
CALL apoc.load.xml(uri, '', {}, true)
YIELD value
RETURN value;

Table 3. Results
value
{_type: "catalog", _catalog: [{_type: "book", _book: [{_type: "author", _text: "Gambardella, Matthew"}, {_type: "author", _text: "Arciniegas, Fabio"}, {_type: "title", _text: "XML Developer’s Guide"}, {_type: "genre", _text: "Computer"}, {_type: "price", _text: "44.95"}, {_type: "publish_date", _text: "2000-10-01"}, {_type: "description", _text: "An in-depth look at creating applications with XML."}], id: "bk101"}, {_type: "book", _book: [{_type: "author", _text: "Ralls, Kim"}, {_type: "title", _text: "Midnight Rain"}, {_type: "genre", _text: "Fantasy"}, {_type: "price", _text: "5.95"}, {_type: "publish_date", _text: "2000-12-16"}, {_type: "description", _text: "A former architect battles corporate zombies, an evil sorceress, and her own childhood to become queen of the world."}], id: "bk102"}, {_type: "book", _book: [{_type: "author", _text: "Corets, Eva"}, {_type: "title", _text: "Maeve Ascendant"}, {_type: "genre", _text: "Fantasy"}, {_type: "price", _text: "5.95"}, {_type: "publish_date", _text: "2000-11-17"}, {_type: "description", _text: "After the collapse of a nanotechnology society in England, the young survivors lay the foundation for a new society."}], id: "bk103"}, {_type: "book", _book: [{_type: "author", _text: "Corets, Eva"}, {_type: "title", _text: "Oberon’s Legacy"}, {_type: "genre", _text: "Fantasy"}, {_type: "price", _text: "5.95"}, {_type: "publish_date", _text: "2001-03-10"}, {_type: "description", _text: "In post-apocalypse England, the mysterious agent known only as Oberon helps to create a new life for the inhabitants of London. Sequel to Maeve Ascendant."}], id: "bk104"}, {_type: "book", _book: [{_type: "author", _text: "Corets, Eva"}, {_type: "title", _text: "The Sundered Grail"}, {_type: "genre", _text: "Fantasy"}, {_type: "price", _text: "5.95"}, {_type: "publish_date", _text: "2001-09-10"}, {_type: "description", _text: "The two daughters of Maeve, half-sisters, battle one another for control of England. Sequel to Oberon’s Legacy."}], id: "bk105"}, {_type: "book", _book: [{_type: "author", _text: "Randall, Cynthia"}, {_type: "title", _text: "Lover Birds"}, {_type: "genre", _text: "Romance"}, {_type: "price", _text: "4.95"}, {_type: "publish_date", _text: "2000-09-02"}, {_type: "description", _text: "When Carla meets Paul at an ornithology conference, tempers fly as feathers get ruffled."}], id: "bk106"}, {_type: "book", _book: [{_type: "author", _text: "Thurman, Paula"}, {_type: "title", _text: "Splish Splash"}, {_type: "genre", _text: "Romance"}, {_type: "price", _text: "4.95"}, {_type: "publish_date", _text: "2000-11-02"}, {_type: "description", _text: "A deep sea diver finds true love twenty thousand leagues beneath the sea."}], id: "bk107"}, {_type: "book", _book: [{_type: "author", _text: "Knorr, Stefan"}, {_type: "title", _text: "Creepy Crawlies"}, {_type: "genre", _text: "Horror"}, {_type: "price", _text: "4.95"}, {_type: "publish_date", _text: "2000-12-06"}, {_type: "description", _text: "An anthology of horror stories about roaches, centipedes, scorpions and other insects."}], id: "bk108"}, {_type: "book", _book: [{_type: "author", _text: "Kress, Peter"}, {_type: "title", _text: "Paradox Lost"}, {_type: "genre", _text: "Science Fiction"}, {_type: "price", _text: "6.95"}, {_type: "publish_date", _text: "2000-11-02"}, {_type: "description", _text: "After an inadvertant trip through a Heisenberg Uncertainty Device, James Salway discovers the problems of being quantum."}], id: "bk109"}, {_type: "book", _book: [{_type: "author", _text: "O’Brien, Tim"}, {_type: "title", _text: "Microsoft .NET: The Programming Bible"}, {_type: "genre", _text: "Computer"}, {_type: "price", _text: "36.95"}, {_type: "publish_date", _text: "2000-12-09"}, {_type: "description", _text: "Microsoft’s .NET initiative is explored in detail in this deep programmer’s reference."}], id: "bk110"}, {_type: "book", _book: [{_type: "author", _text: "O’Brien, Tim"}, {_type: "title", _text: "MSXML3: A Comprehensive Guide"}, {_type: "genre", _text: "Computer"}, {_type: "price", _text: "36.95"}, {_type: "publish_date", _text: "2000-12-01"}, {_type: "description", _text: "The Microsoft MSXML3 parser is covered in detail, with attention to XML DOM interfaces, XSLT processing, SAX and more."}], id: "bk111"}, {_type: "book", _book: [{_type: "author", _text: "Galos, Mike"}, {_type: "title", _text: "Visual Studio 7: A Comprehensive Guide"}, {_type: "genre", _text: "Computer"}, {_type: "price", _text: "49.95"}, {_type: "publish_date", _text: "2001-04-16"}, {_type: "description", _text: "Microsoft Visual Studio 7 is explored in depth, looking at how Visual Basic, Visual C+, C#, and ASP are integrated into a comprehensive development environment."}], id: "bk112"}]}

We again get back back a map representing the XML structure, but the structure is different than when we don’t use simple mode. This time nested XML elements are accessible via a property of the element name prefixed with an _.

We can write the following query to get a better understanding of what our file contains.

The following query processes book.xml and parses the results to pull out the title, description, genre, and authors

WITH "https://raw.githubusercontent.com/neo4j-contrib/neo4j-apoc-procedures/4.0/src/test/resources/xml/books.xml" AS uri
CALL apoc.load.xml(uri, '', {}, true)
YIELD value
UNWIND value._catalog AS catalog
RETURN catalog.id AS bookId,
       [item in catalog._book WHERE item._type = "title"][0] AS title,
       [item in catalog._book WHERE item._type = "description"][0] AS description,
       [item in catalog._book WHERE item._type = "author"] AS authors,
       [item in catalog._book WHERE item._type = "genre"][0] AS genre;

Table 4. Results
bookId	title	description	authors	genre
"bk101"	{_type: "title", _text: "XML Developer’s Guide"}	{_type: "description", _text: "An in-depth look at creating applications with XML."}	[{_type: "author", _text: "Gambardella, Matthew"}, {_type: "author", _text: "Arciniegas, Fabio"}]	{_type: "genre", _text: "Computer"}
"bk102"	{_type: "title", _text: "Midnight Rain"}	{_type: "description", _text: "A former architect battles corporate zombies, an evil sorceress, and her own childhood to become queen of the world."}	[{_type: "author", _text: "Ralls, Kim"}]	{_type: "genre", _text: "Fantasy"}
"bk103"	{_type: "title", _text: "Maeve Ascendant"}	{_type: "description", _text: "After the collapse of a nanotechnology society in England, the young survivors lay the foundation for a new society."}	[{_type: "author", _text: "Corets, Eva"}]	{_type: "genre", _text: "Fantasy"}
"bk104"	{_type: "title", _text: "Oberon’s Legacy"}	{_type: "description", _text: "In post-apocalypse England, the mysterious agent known only as Oberon helps to create a new life for the inhabitants of London. Sequel to Maeve Ascendant."}	[{_type: "author", _text: "Corets, Eva"}]	{_type: "genre", _text: "Fantasy"}
"bk105"	{_type: "title", _text: "The Sundered Grail"}	{_type: "description", _text: "The two daughters of Maeve, half-sisters, battle one another for control of England. Sequel to Oberon’s Legacy."}	[{_type: "author", _text: "Corets, Eva"}]	{_type: "genre", _text: "Fantasy"}
"bk106"	{_type: "title", _text: "Lover Birds"}	{_type: "description", _text: "When Carla meets Paul at an ornithology conference, tempers fly as feathers get ruffled."}	[{_type: "author", _text: "Randall, Cynthia"}]	{_type: "genre", _text: "Romance"}
"bk107"	{_type: "title", _text: "Splish Splash"}	{_type: "description", _text: "A deep sea diver finds true love twenty thousand leagues beneath the sea."}	[{_type: "author", _text: "Thurman, Paula"}]	{_type: "genre", _text: "Romance"}
"bk108"	{_type: "title", _text: "Creepy Crawlies"}	{_type: "description", _text: "An anthology of horror stories about roaches, centipedes, scorpions and other insects."}	[{_type: "author", _text: "Knorr, Stefan"}]	{_type: "genre", _text: "Horror"}
"bk109"	{_type: "title", _text: "Paradox Lost"}	{_type: "description", _text: "After an inadvertant trip through a Heisenberg Uncertainty Device, James Salway discovers the problems of being quantum."}	[{_type: "author", _text: "Kress, Peter"}]	{_type: "genre", _text: "Science Fiction"}
"bk110"	{_type: "title", _text: "Microsoft .NET: The Programming Bible"}	{_type: "description", _text: "Microsoft’s .NET initiative is explored in detail in this deep programmer’s reference."}	[{_type: "author", _text: "O’Brien, Tim"}]	{_type: "genre", _text: "Computer"}
"bk111"	{_type: "title", _text: "MSXML3: A Comprehensive Guide"}	{_type: "description", _text: "The Microsoft MSXML3 parser is covered in detail, with attention to XML DOM interfaces, XSLT processing, SAX and more."}	[{_type: "author", _text: "O’Brien, Tim"}]	{_type: "genre", _text: "Computer"}
"bk112"	{_type: "title", _text: "Visual Studio 7: A Comprehensive Guide"}	{_type: "description", _text: "Microsoft Visual Studio 7 is explored in depth, looking at how Visual Basic, Visual C+, C#, and ASP are integrated into a comprehensive development environment."}	[{_type: "author", _text: "Galos, Mike"}]	{_type: "genre", _text: "Computer"}

Rather than just returning that data, we can create a graph of books and their metadata, authors, and genres.

The following query processes book.xml and parses the results to pull out the title, description, genre, and authors

WITH "https://raw.githubusercontent.com/neo4j-contrib/neo4j-apoc-procedures/4.0/src/test/resources/xml/books.xml" AS uri
CALL apoc.load.xml(uri, '', {}, true)
YIELD value
UNWIND value._catalog AS catalog
WITH catalog.id AS bookId,
       [item in catalog._book WHERE item._type = "title"][0] AS title,
       [item in catalog._book WHERE item._type = "description"][0] AS description,
       [item in catalog._book WHERE item._type = "author"] AS authors,
       [item in catalog._book WHERE item._type = "genre"][0] AS genre

MERGE (b:Book {id: bookId})
SET b.title = title._text, b.description = description._text

MERGE (g:Genre {name: genre._text})
MERGE (b)-[:HAS_GENRE]->(g)

WITH b, authors
UNWIND authors AS author
MERGE (a:Author {name:author._text})
MERGE (a)-[:WROTE]->(b);

The Neo4j Browser visualization below shows the imported graph:

xPath expressions

We can also provide an xPath expression to select nodes from an XML document. If we only want to return books that have the Computer genre, we could write the following query:

CALL apoc.load.xml(
  "https://raw.githubusercontent.com/neo4j-contrib/neo4j-apoc-procedures/4.1/src/test/resources/xml/books.xml",
  '/catalog/book[genre=\"Computer\"]'
)
YIELD value as book
WITH book.id as id, [attr IN book._children WHERE attr._type IN ['title','price'] | attr._text] as pairs
RETURN id, pairs[0] as title, pairs[1] as price;

Table 5. Results
id	title	price
"bk101"	"XML Developer’s Guide"	"44.95"
"bk110"	"Microsoft .NET: The Programming Bible"	"36.95"
"bk111"	"MSXML3: A Comprehensive Guide"	"36.95"
"bk112"	"Visual Studio 7: A Comprehensive Guide"	"49.95"

In this case we return only id, title and prize but we can return any other elements

We can also return just a single specific element. For example, the following query returns the author of the book with id = bg102

CALL apoc.load.xml(
  'https://raw.githubusercontent.com/neo4j-contrib/neo4j-apoc-procedures/4.1/src/test/resources/xml/books.xml',
  '/catalog/book[@id="bk102"]/author'
)
YIELD value as result
WITH result._text as author
RETURN author;

Table 6. Results
author
"Ralls, Kim"

Extracting data structures

We can turn values into a map using the apoc.map.fromPairs function.

call apoc.load.xml("https://raw.githubusercontent.com/neo4j-contrib/neo4j-apoc-procedures/4.1/src/test/resources/xml/books.xml")
yield value as catalog
UNWIND catalog._children as book
WITH book.id as id, [attr IN book._children WHERE attr._type IN ['author','title'] | [attr._type, attr._text]] as pairs
WITH id, apoc.map.fromPairs(pairs) AS value
RETURN id, value

Table 7. Results
id	value
"bk101"	{title: "XML Developer’s Guide", author: "Arciniegas, Fabio"}
"bk102"	{title: "Midnight Rain", author: "Ralls, Kim"}
"bk103"	{title: "Maeve Ascendant", author: "Corets, Eva"}
"bk104"	{title: "Oberon’s Legacy", author: "Corets, Eva"}
"bk105"	{title: "The Sundered Grail", author: "Corets, Eva"}
"bk106"	{title: "Lover Birds", author: "Randall, Cynthia"}
"bk107"	{title: "Splish Splash", author: "Thurman, Paula"}
"bk108"	{title: "Creepy Crawlies", author: "Knorr, Stefan"}
"bk109"	{title: "Paradox Lost", author: "Kress, Peter"}
"bk110"	{title: "Microsoft .NET: The Programming Bible", author: "O’Brien, Tim"}
"bk111"	{title: "MSXML3: A Comprehensive Guide", author: "O’Brien, Tim"}
"bk112"	{title: "Visual Studio 7: A Comprehensive Guide", author: "Galos, Mike"}

And now we can cleanly access the attributes from the map.

call apoc.load.xml("https://raw.githubusercontent.com/neo4j-contrib/neo4j-apoc-procedures/4.1/src/test/resources/xml/books.xml")
yield value as catalog
UNWIND catalog._children as book
WITH book.id as id, [attr IN book._children WHERE attr._type IN ['author','title'] | [attr._type, attr._text]] as pairs
WITH id, apoc.map.fromPairs(pairs) AS value
RETURN id, value.title, value.author

Table 8. Results
id	value.title	value.author
"bk101"	"XML Developer’s Guide"	"Arciniegas, Fabio"
"bk102"	"Midnight Rain"	"Ralls, Kim"
"bk103"	"Maeve Ascendant"	"Corets, Eva"
"bk104"	"Oberon’s Legacy"	"Corets, Eva"
"bk105"	"The Sundered Grail"	"Corets, Eva"
"bk106"	"Lover Birds"	"Randall, Cynthia"
"bk107"	"Splish Splash"	"Thurman, Paula"
"bk108"	"Creepy Crawlies"	"Knorr, Stefan"
"bk109"	"Paradox Lost"	"Kress, Peter"
"bk110"	"Microsoft .NET: The Programming Bible"	"O’Brien, Tim"
"bk111"	"MSXML3: A Comprehensive Guide"	"O’Brien, Tim"
"bk112"	"Visual Studio 7: A Comprehensive Guide"	"Galos, Mike"

Binary file

You can also import a file from a binary byte[] (not compressed) or a compressed file (allowed compression algos are: GZIP, BZIP2, DEFLATE, BLOCK_LZ4, FRAMED_SNAPPY).

CALL apoc.load.xml(`binaryGzipByteArray`, '/', {compression: 'GZIP'})

or:

CALL apoc.load.xml(`binaryFileNotCompressed`, '/', {compression: 'NONE'})

For example, this one works well with apoc.util.compress function:

WITH apoc.util.compress('<?xml version="1.0" encoding="UTF-8"?>
<parent name="databases">
    <child name="Neo4j">
        Neo4j is a graph database
    </child>
    <child name="relational">
        <grandchild name="MySQL"><![CDATA[
            MySQL is a database & relational
            ]]>
        </grandchild>
        <grandchild name="Postgres">
            Postgres is a relational database
        </grandchild>
    </child>
</parent>', {compression: 'DEFLATE'}) as xmlCompressed

CALL apoc.load.xml(xmlCompressed, '/', {compression: 'DEFLATE'})
YIELD value
RETURN value

Table 9. Results
value
[source,json] ---- { "_type": "parent", "name": "databases", "_children": [{ "_type": "child", "name": "Neo4j", "_text": "Neo4j is a graph database" }, { "_type": "child", "name": "relational", "_children": [{ "_type": "grandchild", "name": "MySQL", "_text": "MySQL is a database & relational" }, { "_type": "grandchild", "name": "Postgres", "_text": "Postgres is a relational database" } ] } ] } ----