apoc.load.xml
Procedure APOC Core
apoc.load.xml('http://example.com/test.xml', 'xPath',config, false) YIELD value as doc CREATE (p:Person) SET p.name = doc.name - load from XML URL (e.g. web-api) to import XML as single nested map with attributes and _type, _text and _childrenx fields.
Signature
apoc.load.xml(urlOrBinary :: ANY?, path = / :: STRING?, config = {} :: MAP?, simple = false :: BOOLEAN?) :: (value :: MAP?)
Input parameters
Name | Type | Default |
---|---|---|
urlOrBinary |
ANY? |
null |
path |
STRING? |
/ |
config |
MAP? |
{} |
simple |
BOOLEAN? |
false |
Reading from a file
By default importing from the file system is disabled.
We can enable it by setting the following property in apoc.conf
:
apoc.import.file.enabled=true
If we try to use any of the import procedures without having first set this property, we’ll get the following error message:
Failed to invoke procedure: Caused by: java.lang.RuntimeException: Import from files not enabled, please set apoc.import.file.enabled=true in your apoc.conf |
Import files are read from the import
directory, which is defined by the dbms.directories.import
property.
This means that any file path that we provide is relative to this directory.
If we try to read from an absolute path, such as /tmp/filename
, we’ll get an error message similar to the following one:
Failed to invoke procedure: Caused by: java.lang.RuntimeException: Can’t read url or key file:/path/to/neo4j/import/tmp/filename as json: /path/to/neo4j//import/tmp/filename (No such file or directory) |
We can enable reading files from anywhere on the file system by setting the following property in apoc.conf
:
apoc.import.file.use_neo4j_config=false
Neo4j will now be able to read from anywhere on the file system, so be sure that this is your intention before setting this property. |
Usage Examples
The examples in this section are based on the Microsoft book.xml file.
<?xml version="1.0"?>
<catalog>
<book id="bk101">
<author>Gambardella, Matthew</author>
<title>XML Developer's Guide</title>
<genre>Computer</genre>
<price>44.95</price>
<publish_date>2000-10-01</publish_date>
<description>An in-depth look at creating applications
with XML.</description>
</book>
<book id="bk102">
<author>Ralls, Kim</author>
<title>Midnight Rain</title>
<genre>Fantasy</genre>
<price>5.95</price>
<publish_date>2000-12-16</publish_date>
<description>A former architect battles corporate zombies,
...
This file can be downloaded from GitHub.
Import from local file
The books.xml
file described below contains the first two books from the Microsoft Books XML file.
We’ll use the smaller file in this section to simplify our examples.
<?xml version="1.0"?>
<catalog>
<book id="bk101">
<author>Gambardella, Matthew</author>
<author>Arciniegas, Fabio</author>
<title>XML Developer's Guide</title>
<genre>Computer</genre>
<price>44.95</price>
<publish_date>2000-10-01</publish_date>
<description>An in-depth look at creating applications
with XML.</description>
</book>
<book id="bk102">
<author>Ralls, Kim</author>
<title>Midnight Rain</title>
<genre>Fantasy</genre>
<price>5.95</price>
<publish_date>2000-12-16</publish_date>
<description>A former architect battles corporate zombies,
an evil sorceress, and her own childhood to become queen
of the world.</description>
</book>
</catalog>
We’ll place this file into the import
directory of our Neo4j instance.
Let’s now write a query using the apoc.load.xml
procedure to explore this file.
books.xml
and returns the content as Cypher data structuresCALL apoc.load.xml("file:///books.xml")
YIELD value
RETURN value
value |
---|
{_type: "catalog", _children: [{_type: "book", _children: [{_type: "author", _text: "Gambardella, Matthew"}, {_type: "author", _text: "Arciniegas, Fabio"}, {_type: "title", _text: "XML Developer’s Guide"}, {_type: "genre", _text: "Computer"}, {_type: "price", _text: "44.95"}, {_type: "publish_date", _text: "2000-10-01"}, {_type: "description", _text: "An in-depth look at creating applications with XML."}], id: "bk101"}, {_type: "book", _children: [{_type: "author", _text: "Ralls, Kim"}, {_type: "title", _text: "Midnight Rain"}, {_type: "genre", _text: "Fantasy"}, {_type: "price", _text: "5.95"}, {_type: "publish_date", _text: "2000-12-16"}, {_type: "description", _text: "A former architect battles corporate zombies, an evil sorceress, and her own childhood to become queen of the world."}], id: "bk102"}]} |
We get back a map representing the XML structure.
Every time an XML element is nested inside another one, it is accessible via the .children
property.
We can write the following query to get a better understanding of what our file contains.
book.xml
and parses the results to pull out the title, description, genre, and authorsCALL apoc.load.xml("file:///books.xml")
YIELD value
UNWIND value._children AS book
RETURN book.id AS bookId,
[item in book._children WHERE item._type = "title"][0] AS title,
[item in book._children WHERE item._type = "description"][0] AS description,
[item in book._children WHERE item._type = "author"] AS authors,
[item in book._children WHERE item._type = "genre"][0] AS genre;
bookId | title | description | authors | genre |
---|---|---|---|---|
"bk101" |
{_type: "title", _text: "XML Developer’s Guide"} |
{_type: "description", _text: "An in-depth look at creating applications with XML."} |
[{_type: "author", _text: "Gambardella, Matthew"}, {_type: "author", _text: "Arciniegas, Fabio"}] |
{_type: "genre", _text: "Computer"} |
"bk102" |
{_type: "title", _text: "Midnight Rain"} |
{_type: "description", _text: "A former architect battles corporate zombies, an evil sorceress, and her own childhood to become queen of the world."} |
[{_type: "author", _text: "Ralls, Kim"}] |
{_type: "genre", _text: "Fantasy"} |
Let’s now create a graph of books and their metadata, authors, and genres.
book.xml
and parses the results to pull out the title, description, genre, and authorsCALL apoc.load.xml("file:///books.xml")
YIELD value
UNWIND value._children AS book
WITH book.id AS bookId,
[item in book._children WHERE item._type = "title"][0] AS title,
[item in book._children WHERE item._type = "description"][0] AS description,
[item in book._children WHERE item._type = "author"] AS authors,
[item in book._children WHERE item._type = "genre"][0] AS genre
MERGE (b:Book {id: bookId})
SET b.title = title._text, b.description = description._text
MERGE (g:Genre {name: genre._text})
MERGE (b)-[:HAS_GENRE]->(g)
WITH b, authors
UNWIND authors AS author
MERGE (a:Author {name:author._text})
MERGE (a)-[:WROTE]->(b);
The Neo4j Browser visualization below shows the imported graph:
Import from GitHub
We can also process XML files from HTTP or HTTPS URIs.
Let’s start by processing the books.xml
file hosted on GitHub.
This time we’ll pass in true
as the 4th argument of the procedure.
This means that the XML will be parsed in simple mode.
WITH "https://raw.githubusercontent.com/neo4j-contrib/neo4j-apoc-procedures/4.1/src/test/resources/xml/books.xml" AS uri
CALL apoc.load.xml(uri, '', {}, true)
YIELD value
RETURN value;
value |
---|
{_type: "catalog", _catalog: [{_type: "book", _book: [{_type: "author", _text: "Gambardella, Matthew"}, {_type: "author", _text: "Arciniegas, Fabio"}, {_type: "title", _text: "XML Developer’s Guide"}, {_type: "genre", _text: "Computer"}, {_type: "price", _text: "44.95"}, {_type: "publish_date", _text: "2000-10-01"}, {_type: "description", _text: "An in-depth look at creating applications with XML."}], id: "bk101"}, {_type: "book", _book: [{_type: "author", _text: "Ralls, Kim"}, {_type: "title", _text: "Midnight Rain"}, {_type: "genre", _text: "Fantasy"}, {_type: "price", _text: "5.95"}, {_type: "publish_date", _text: "2000-12-16"}, {_type: "description", _text: "A former architect battles corporate zombies, an evil sorceress, and her own childhood to become queen of the world."}], id: "bk102"}, {_type: "book", _book: [{_type: "author", _text: "Corets, Eva"}, {_type: "title", _text: "Maeve Ascendant"}, {_type: "genre", _text: "Fantasy"}, {_type: "price", _text: "5.95"}, {_type: "publish_date", _text: "2000-11-17"}, {_type: "description", _text: "After the collapse of a nanotechnology society in England, the young survivors lay the foundation for a new society."}], id: "bk103"}, {_type: "book", _book: [{_type: "author", _text: "Corets, Eva"}, {_type: "title", _text: "Oberon’s Legacy"}, {_type: "genre", _text: "Fantasy"}, {_type: "price", _text: "5.95"}, {_type: "publish_date", _text: "2001-03-10"}, {_type: "description", _text: "In post-apocalypse England, the mysterious agent known only as Oberon helps to create a new life for the inhabitants of London. Sequel to Maeve Ascendant."}], id: "bk104"}, {_type: "book", _book: [{_type: "author", _text: "Corets, Eva"}, {_type: "title", _text: "The Sundered Grail"}, {_type: "genre", _text: "Fantasy"}, {_type: "price", _text: "5.95"}, {_type: "publish_date", _text: "2001-09-10"}, {_type: "description", _text: "The two daughters of Maeve, half-sisters, battle one another for control of England. Sequel to Oberon’s Legacy."}], id: "bk105"}, {_type: "book", _book: [{_type: "author", _text: "Randall, Cynthia"}, {_type: "title", _text: "Lover Birds"}, {_type: "genre", _text: "Romance"}, {_type: "price", _text: "4.95"}, {_type: "publish_date", _text: "2000-09-02"}, {_type: "description", _text: "When Carla meets Paul at an ornithology conference, tempers fly as feathers get ruffled."}], id: "bk106"}, {_type: "book", _book: [{_type: "author", _text: "Thurman, Paula"}, {_type: "title", _text: "Splish Splash"}, {_type: "genre", _text: "Romance"}, {_type: "price", _text: "4.95"}, {_type: "publish_date", _text: "2000-11-02"}, {_type: "description", _text: "A deep sea diver finds true love twenty thousand leagues beneath the sea."}], id: "bk107"}, {_type: "book", _book: [{_type: "author", _text: "Knorr, Stefan"}, {_type: "title", _text: "Creepy Crawlies"}, {_type: "genre", _text: "Horror"}, {_type: "price", _text: "4.95"}, {_type: "publish_date", _text: "2000-12-06"}, {_type: "description", _text: "An anthology of horror stories about roaches, centipedes, scorpions and other insects."}], id: "bk108"}, {_type: "book", _book: [{_type: "author", _text: "Kress, Peter"}, {_type: "title", _text: "Paradox Lost"}, {_type: "genre", _text: "Science Fiction"}, {_type: "price", _text: "6.95"}, {_type: "publish_date", _text: "2000-11-02"}, {_type: "description", _text: "After an inadvertant trip through a Heisenberg Uncertainty Device, James Salway discovers the problems of being quantum."}], id: "bk109"}, {_type: "book", _book: [{_type: "author", _text: "O’Brien, Tim"}, {_type: "title", _text: "Microsoft .NET: The Programming Bible"}, {_type: "genre", _text: "Computer"}, {_type: "price", _text: "36.95"}, {_type: "publish_date", _text: "2000-12-09"}, {_type: "description", _text: "Microsoft’s .NET initiative is explored in detail in this deep programmer’s reference."}], id: "bk110"}, {_type: "book", _book: [{_type: "author", _text: "O’Brien, Tim"}, {_type: "title", _text: "MSXML3: A Comprehensive Guide"}, {_type: "genre", _text: "Computer"}, {_type: "price", _text: "36.95"}, {_type: "publish_date", _text: "2000-12-01"}, {_type: "description", _text: "The Microsoft MSXML3 parser is covered in detail, with attention to XML DOM interfaces, XSLT processing, SAX and more."}], id: "bk111"}, {_type: "book", _book: [{_type: "author", _text: "Galos, Mike"}, {_type: "title", _text: "Visual Studio 7: A Comprehensive Guide"}, {_type: "genre", _text: "Computer"}, {_type: "price", _text: "49.95"}, {_type: "publish_date", _text: "2001-04-16"}, {_type: "description", _text: "Microsoft Visual Studio 7 is explored in depth, looking at how Visual Basic, Visual C+, C#, and ASP are integrated into a comprehensive development environment."}], id: "bk112"}]} |
We again get back back a map representing the XML structure, but the structure is different than when we don’t use simple mode.
This time nested XML elements are accessible via a property of the element name prefixed with an _
.
We can write the following query to get a better understanding of what our file contains.
book.xml
and parses the results to pull out the title, description, genre, and authorsWITH "https://raw.githubusercontent.com/neo4j-contrib/neo4j-apoc-procedures/4.0/src/test/resources/xml/books.xml" AS uri
CALL apoc.load.xml(uri, '', {}, true)
YIELD value
UNWIND value._catalog AS catalog
RETURN catalog.id AS bookId,
[item in catalog._book WHERE item._type = "title"][0] AS title,
[item in catalog._book WHERE item._type = "description"][0] AS description,
[item in catalog._book WHERE item._type = "author"] AS authors,
[item in catalog._book WHERE item._type = "genre"][0] AS genre;
bookId | title | description | authors | genre |
---|---|---|---|---|
"bk101" |
{_type: "title", _text: "XML Developer’s Guide"} |
{_type: "description", _text: "An in-depth look at creating applications with XML."} |
[{_type: "author", _text: "Gambardella, Matthew"}, {_type: "author", _text: "Arciniegas, Fabio"}] |
{_type: "genre", _text: "Computer"} |
"bk102" |
{_type: "title", _text: "Midnight Rain"} |
{_type: "description", _text: "A former architect battles corporate zombies, an evil sorceress, and her own childhood to become queen of the world."} |
[{_type: "author", _text: "Ralls, Kim"}] |
{_type: "genre", _text: "Fantasy"} |
"bk103" |
{_type: "title", _text: "Maeve Ascendant"} |
{_type: "description", _text: "After the collapse of a nanotechnology society in England, the young survivors lay the foundation for a new society."} |
[{_type: "author", _text: "Corets, Eva"}] |
{_type: "genre", _text: "Fantasy"} |
"bk104" |
{_type: "title", _text: "Oberon’s Legacy"} |
{_type: "description", _text: "In post-apocalypse England, the mysterious agent known only as Oberon helps to create a new life for the inhabitants of London. Sequel to Maeve Ascendant."} |
[{_type: "author", _text: "Corets, Eva"}] |
{_type: "genre", _text: "Fantasy"} |
"bk105" |
{_type: "title", _text: "The Sundered Grail"} |
{_type: "description", _text: "The two daughters of Maeve, half-sisters, battle one another for control of England. Sequel to Oberon’s Legacy."} |
[{_type: "author", _text: "Corets, Eva"}] |
{_type: "genre", _text: "Fantasy"} |
"bk106" |
{_type: "title", _text: "Lover Birds"} |
{_type: "description", _text: "When Carla meets Paul at an ornithology conference, tempers fly as feathers get ruffled."} |
[{_type: "author", _text: "Randall, Cynthia"}] |
{_type: "genre", _text: "Romance"} |
"bk107" |
{_type: "title", _text: "Splish Splash"} |
{_type: "description", _text: "A deep sea diver finds true love twenty thousand leagues beneath the sea."} |
[{_type: "author", _text: "Thurman, Paula"}] |
{_type: "genre", _text: "Romance"} |
"bk108" |
{_type: "title", _text: "Creepy Crawlies"} |
{_type: "description", _text: "An anthology of horror stories about roaches, centipedes, scorpions and other insects."} |
[{_type: "author", _text: "Knorr, Stefan"}] |
{_type: "genre", _text: "Horror"} |
"bk109" |
{_type: "title", _text: "Paradox Lost"} |
{_type: "description", _text: "After an inadvertant trip through a Heisenberg Uncertainty Device, James Salway discovers the problems of being quantum."} |
[{_type: "author", _text: "Kress, Peter"}] |
{_type: "genre", _text: "Science Fiction"} |
"bk110" |
{_type: "title", _text: "Microsoft .NET: The Programming Bible"} |
{_type: "description", _text: "Microsoft’s .NET initiative is explored in detail in this deep programmer’s reference."} |
[{_type: "author", _text: "O’Brien, Tim"}] |
{_type: "genre", _text: "Computer"} |
"bk111" |
{_type: "title", _text: "MSXML3: A Comprehensive Guide"} |
{_type: "description", _text: "The Microsoft MSXML3 parser is covered in detail, with attention to XML DOM interfaces, XSLT processing, SAX and more."} |
[{_type: "author", _text: "O’Brien, Tim"}] |
{_type: "genre", _text: "Computer"} |
"bk112" |
{_type: "title", _text: "Visual Studio 7: A Comprehensive Guide"} |
{_type: "description", _text: "Microsoft Visual Studio 7 is explored in depth, looking at how Visual Basic, Visual C+, C#, and ASP are integrated into a comprehensive development environment."} |
[{_type: "author", _text: "Galos, Mike"}] |
{_type: "genre", _text: "Computer"} |
Rather than just returning that data, we can create a graph of books and their metadata, authors, and genres.
book.xml
and parses the results to pull out the title, description, genre, and authorsWITH "https://raw.githubusercontent.com/neo4j-contrib/neo4j-apoc-procedures/4.0/src/test/resources/xml/books.xml" AS uri
CALL apoc.load.xml(uri, '', {}, true)
YIELD value
UNWIND value._catalog AS catalog
WITH catalog.id AS bookId,
[item in catalog._book WHERE item._type = "title"][0] AS title,
[item in catalog._book WHERE item._type = "description"][0] AS description,
[item in catalog._book WHERE item._type = "author"] AS authors,
[item in catalog._book WHERE item._type = "genre"][0] AS genre
MERGE (b:Book {id: bookId})
SET b.title = title._text, b.description = description._text
MERGE (g:Genre {name: genre._text})
MERGE (b)-[:HAS_GENRE]->(g)
WITH b, authors
UNWIND authors AS author
MERGE (a:Author {name:author._text})
MERGE (a)-[:WROTE]->(b);
The Neo4j Browser visualization below shows the imported graph:
xPath expressions
We can also provide an xPath expression to select nodes from an XML document.
If we only want to return books that have the Computer
genre, we could write the following query:
CALL apoc.load.xml(
"https://raw.githubusercontent.com/neo4j-contrib/neo4j-apoc-procedures/4.1/src/test/resources/xml/books.xml",
'/catalog/book[genre=\"Computer\"]'
)
YIELD value as book
WITH book.id as id, [attr IN book._children WHERE attr._type IN ['title','price'] | attr._text] as pairs
RETURN id, pairs[0] as title, pairs[1] as price;
id | title | price |
---|---|---|
"bk101" |
"XML Developer’s Guide" |
"44.95" |
"bk110" |
"Microsoft .NET: The Programming Bible" |
"36.95" |
"bk111" |
"MSXML3: A Comprehensive Guide" |
"36.95" |
"bk112" |
"Visual Studio 7: A Comprehensive Guide" |
"49.95" |
In this case we return only id
, title
and prize
but we can return any other elements
We can also return just a single specific element.
For example, the following query returns the author
of the book with id = bg102
CALL apoc.load.xml(
'https://raw.githubusercontent.com/neo4j-contrib/neo4j-apoc-procedures/4.1/src/test/resources/xml/books.xml',
'/catalog/book[@id="bk102"]/author'
)
YIELD value as result
WITH result._text as author
RETURN author;
author |
---|
"Ralls, Kim" |
Extracting data structures
We can turn values into a map using the apoc.map.fromPairs
function.
call apoc.load.xml("https://raw.githubusercontent.com/neo4j-contrib/neo4j-apoc-procedures/4.1/src/test/resources/xml/books.xml")
yield value as catalog
UNWIND catalog._children as book
WITH book.id as id, [attr IN book._children WHERE attr._type IN ['author','title'] | [attr._type, attr._text]] as pairs
WITH id, apoc.map.fromPairs(pairs) AS value
RETURN id, value
id | value |
---|---|
"bk101" |
{title: "XML Developer’s Guide", author: "Arciniegas, Fabio"} |
"bk102" |
{title: "Midnight Rain", author: "Ralls, Kim"} |
"bk103" |
{title: "Maeve Ascendant", author: "Corets, Eva"} |
"bk104" |
{title: "Oberon’s Legacy", author: "Corets, Eva"} |
"bk105" |
{title: "The Sundered Grail", author: "Corets, Eva"} |
"bk106" |
{title: "Lover Birds", author: "Randall, Cynthia"} |
"bk107" |
{title: "Splish Splash", author: "Thurman, Paula"} |
"bk108" |
{title: "Creepy Crawlies", author: "Knorr, Stefan"} |
"bk109" |
{title: "Paradox Lost", author: "Kress, Peter"} |
"bk110" |
{title: "Microsoft .NET: The Programming Bible", author: "O’Brien, Tim"} |
"bk111" |
{title: "MSXML3: A Comprehensive Guide", author: "O’Brien, Tim"} |
"bk112" |
{title: "Visual Studio 7: A Comprehensive Guide", author: "Galos, Mike"} |
And now we can cleanly access the attributes from the map.
call apoc.load.xml("https://raw.githubusercontent.com/neo4j-contrib/neo4j-apoc-procedures/4.1/src/test/resources/xml/books.xml")
yield value as catalog
UNWIND catalog._children as book
WITH book.id as id, [attr IN book._children WHERE attr._type IN ['author','title'] | [attr._type, attr._text]] as pairs
WITH id, apoc.map.fromPairs(pairs) AS value
RETURN id, value.title, value.author
id | value.title | value.author |
---|---|---|
"bk101" |
"XML Developer’s Guide" |
"Arciniegas, Fabio" |
"bk102" |
"Midnight Rain" |
"Ralls, Kim" |
"bk103" |
"Maeve Ascendant" |
"Corets, Eva" |
"bk104" |
"Oberon’s Legacy" |
"Corets, Eva" |
"bk105" |
"The Sundered Grail" |
"Corets, Eva" |
"bk106" |
"Lover Birds" |
"Randall, Cynthia" |
"bk107" |
"Splish Splash" |
"Thurman, Paula" |
"bk108" |
"Creepy Crawlies" |
"Knorr, Stefan" |
"bk109" |
"Paradox Lost" |
"Kress, Peter" |
"bk110" |
"Microsoft .NET: The Programming Bible" |
"O’Brien, Tim" |
"bk111" |
"MSXML3: A Comprehensive Guide" |
"O’Brien, Tim" |
"bk112" |
"Visual Studio 7: A Comprehensive Guide" |
"Galos, Mike" |
Binary file
You can also import a file from a binary byte[]
(not compressed) or a compressed file (allowed compression algos are: GZIP
, BZIP2
, DEFLATE
, BLOCK_LZ4
, FRAMED_SNAPPY
).
CALL apoc.load.xml(`binaryGzipByteArray`, '/', {compression: 'GZIP'})
or:
CALL apoc.load.xml(`binaryFileNotCompressed`, '/', {compression: 'NONE'})
For example, this one works well with apoc.util.compress function:
WITH apoc.util.compress('<?xml version="1.0" encoding="UTF-8"?>
<parent name="databases">
<child name="Neo4j">
Neo4j is a graph database
</child>
<child name="relational">
<grandchild name="MySQL"><![CDATA[
MySQL is a database & relational
]]>
</grandchild>
<grandchild name="Postgres">
Postgres is a relational database
</grandchild>
</child>
</parent>', {compression: 'DEFLATE'}) as xmlCompressed
CALL apoc.load.xml(xmlCompressed, '/', {compression: 'DEFLATE'})
YIELD value
RETURN value
value |
---|
[source,json] ---- { "_type": "parent", "name": "databases", "_children": [{ "_type": "child", "name": "Neo4j", "_text": "Neo4j is a graph database" }, { "_type": "child", "name": "relational", "_children": [{ "_type": "grandchild", "name": "MySQL", "_text": "MySQL is a database & relational" }, { "_type": "grandchild", "name": "Postgres", "_text": "Postgres is a relational database" } ] } ] } ---- |