Load XML

Many existing enterprise applications, endpoints, and files use XML as data exchange format. The Load XML procedures allow us to process these files.

Procedure and Function Overview

The table below describes the available procedures and functions:

type qualified name signature description

type	qualified name	signature	description
procedure	`apoc.load.xml`	`apoc.load.xml(url :: STRING?, path = / :: STRING?, config = {} :: MAP?, simple = false :: BOOLEAN?) :: (value :: MAP?)`	apoc.load.xml('http://example.com/test.xml', 'xPath',config, false) YIELD value as doc CREATE (p:Person) SET p.name = doc.name load from XML URL (e.g. web-api) to import XML as single nested map with attributes and _type, _text and _childrenx fields.
function	`apoc.xml.parse`	`apoc.xml.parse(data :: STRING?, path = / :: STRING?, config = {} :: MAP?, simple = false :: BOOLEAN?) :: (MAP?)`	RETURN apoc.xml.parse(<xml string>, <xPath string>, config, false) AS value
procedure	`apoc.import.xml`	`apoc.import.xml(url :: STRING?, config = {} :: MAP?) :: (node :: NODE?)`	apoc.import.xml(file,config) - imports graph from provided file

procedure

apoc.load.xml

apoc.load.xml(url :: STRING?, path = / :: STRING?, config = {} :: MAP?, simple = false :: BOOLEAN?) :: (value :: MAP?)

apoc.load.xml('http://example.com/test.xml', 'xPath',config, false) YIELD value as doc CREATE (p:Person) SET p.name = doc.name load from XML URL (e.g. web-api) to import XML as single nested map with attributes and _type, _text and _childrenx fields.

function

apoc.xml.parse

apoc.xml.parse(data :: STRING?, path = / :: STRING?, config = {} :: MAP?, simple = false :: BOOLEAN?) :: (MAP?)

RETURN apoc.xml.parse(<xml string>, <xPath string>, config, false) AS value

procedure

apoc.import.xml

apoc.import.xml(url :: STRING?, config = {} :: MAP?) :: (node :: NODE?)

apoc.import.xml(file,config) - imports graph from provided file

`apoc.load.xml`

This procedure takes a file or HTTP URL and parses the XML into a map data structure.

signature

signature
`apoc.load.xml(url :: STRING?, path = / :: STRING?, config = {} :: MAP?, simple = false :: BOOLEAN?) :: (value :: MAP?)`

apoc.load.xml(url :: STRING?, path = / :: STRING?, config = {} :: MAP?, simple = false :: BOOLEAN?) :: (value :: MAP?)

The map is created using the following rules:

in simple mode, each type of children has its own entry in the parent map.
the element-type as key is prefixed with _ to prevent collisions with attributes.
if there is a single element, the entry will just have that element as value, not a collection.
if there is more than one element, there will be a list of values.
each child will still have its _type field to discern them.

This procedure supports the following config parameters:

Table 1. Config
name	type	default	description
failOnError	boolean	true	fail if error encountered while parsing XML
headers	Map	{}	HTTP headers to be used when querying XML document

`apoc.xml.parse`

If our dataset contains nodes with XML as property values, they can be parsed into maps with the apoc.xml.parse function.

signature

signature
`apoc.xml.parse(data :: STRING?, path = / :: STRING?, config = {} :: MAP?, simple = false :: BOOLEAN?) :: (MAP?)`

apoc.xml.parse(data :: STRING?, path = / :: STRING?, config = {} :: MAP?, simple = false :: BOOLEAN?) :: (MAP?)

This function supports the following config parameter:

Table 2. Config
name	type	default	description
failOnError	boolean	true	fail if error encountered while parsing XML

The following parses an XML string into a Cypher map

WITH '<?xml version="1.0"?><table><tr><td><img src="pix/logo-tl.gif"></img></td></tr></table>' AS xmlString
RETURN apoc.xml.parse(xmlString) AS value

Table 3. Results
value
{_type: "table", _children: [{_type: "tr", _children: [{_type: "td", _children: [{_type: "img", src: "pix/logo-tl.gif"}]}]}]}

`apoc.import.xml`

If we don’t want to do any transformation of the XML before creating a graph structure, we can create a 1:1 mapping of XML into the graph using the apoc.import.xml procedure.

signature

signature
`apoc.import.xml(url :: STRING?, config = {} :: MAP?) :: (node :: NODE?)`

apoc.import.xml(url :: STRING?, config = {} :: MAP?) :: (node :: NODE?)

This procedure will return a node representing the XML document containing nodes and relationships underneath mapping to the XML structure.

The following mapping rules are applied:

xml	label	properties
document	XmlDocument	_xmlVersion, _xmlEncoding
processing instruction	XmlProcessingInstruction	_piData, _piTarget
Element/Tag	XmlTag	_name
Attribute	n/a	property in the XmlTag node
Text	XmlWord	for each word a separate node is created

xml

label

properties

document

XmlDocument

_xmlVersion, _xmlEncoding

processing instruction

XmlProcessingInstruction

_piData, _piTarget

Element/Tag

XmlTag

_name

Attribute

n/a

property in the XmlTag node

Text

XmlWord

for each word a separate node is created

The nodes for the XML document are connected:

relationship type description

relationship type	description
:IS_CHILD_OF	pointing to a nested xml element
:FIRST_CHILD_OF	pointing to the first child
:NEXT_SIBLING	pointing to the next xml element on the same nesting level
:NEXT	produces a linear chain through the full document
:NEXT_WORD	only produced if config map has `createNextWordRelationships:true`. Connects words in XML to a text flow.

:IS_CHILD_OF

pointing to a nested xml element

:FIRST_CHILD_OF

pointing to the first child

:NEXT_SIBLING

pointing to the next xml element on the same nesting level

:NEXT

produces a linear chain through the full document

:NEXT_WORD

only produced if config map has createNextWordRelationships:true. Connects words in XML to a text flow.

This procedure supports the following config parameters:

Table 4. Config
config option	default value	description
connectCharacters	false	if `true` the xml text elements are child nodes of their tags, interconnected by relationships of type `relType` (see below)
filterLeadingWhitespace	false	if `true` leading whitespace is skipped for each line
delimiter	`\s` (regex whitespace)	if given, split text elements with the delimiter into separate nodes
label	XmlCharacter	label to use for text element representation
relType	`NE`	relationship type to be used for connecting the text elements into one linked list
charactersForTag	{}	map of tagname → string. For the given tag names an additional text element is added containing the value as `text` property. Useful e.g. for `<lb/>` tags in TEI-XML to be represented as `<lb> </lb>`.

Importing from a file

By default importing from the file system is disabled. We can enable it by setting the following property in apoc.conf:

apoc.conf

apoc.import.file.enabled=true

If we try to use any of the import procedures without having first set this property, we’ll get the following error message:

Failed to invoke procedure: Caused by: java.lang.RuntimeException: Import from files not enabled, please set apoc.import.file.enabled=true in your apoc.conf

Import files are read from the import directory, which is defined by the dbms.directories.import property. This means that any file path that we provide is relative to this directory. If we try to read from an absolute path, such as /tmp/filename, we’ll get an error message similar to the following one:

Failed to invoke procedure: Caused by: java.lang.RuntimeException: Can’t read url or key file:/path/to/neo4j/import/tmp/filename as json: /path/to/neo4j//import/tmp/filename (No such file or directory)

We can enable reading files from anywhere on the file system by setting the following property in apoc.conf:

apoc.conf

apoc.import.file.use_neo4j_config=false

Neo4j will now be able to read from anywhere on the file system, so be sure that this is your intention before setting this property.

Examples

The examples in this section are based on the Microsoft book.xml file.

book.xml

<?xml version="1.0"?>
<catalog>
   <book id="bk101">
      <author>Gambardella, Matthew</author>
      <title>XML Developer's Guide</title>
      <genre>Computer</genre>
      <price>44.95</price>
      <publish_date>2000-10-01</publish_date>
      <description>An in-depth look at creating applications
      with XML.</description>
   </book>
   <book id="bk102">
      <author>Ralls, Kim</author>
      <title>Midnight Rain</title>
      <genre>Fantasy</genre>
      <price>5.95</price>
      <publish_date>2000-12-16</publish_date>
      <description>A former architect battles corporate zombies,
...

This file can be downloaded from GitHub.

Import from local file

The books.xml file described below contains the first two books from the Microsoft Books XML file. We’ll use the smaller file in this section to simplify our examples.

books.xml

<?xml version="1.0"?>
<catalog>
   <book id="bk101">
      <author>Gambardella, Matthew</author>
      <author>Arciniegas, Fabio</author>
      <title>XML Developer's Guide</title>
      <genre>Computer</genre>
      <price>44.95</price>
      <publish_date>2000-10-01</publish_date>
      <description>An in-depth look at creating applications
      with XML.</description>
   </book>
   <book id="bk102">
      <author>Ralls, Kim</author>
      <title>Midnight Rain</title>
      <genre>Fantasy</genre>
      <price>5.95</price>
      <publish_date>2000-12-16</publish_date>
      <description>A former architect battles corporate zombies,
      an evil sorceress, and her own childhood to become queen
      of the world.</description>
   </book>
</catalog>

We’ll place this file into the import directory of our Neo4j instance. Let’s now write a query using the apoc.load.xml procedure to explore this file.

The following query processes books.xml and returns the content as Cypher data structures

CALL apoc.load.xml("file:///books.xml")
YIELD value
RETURN value

Table 5. Results
value
{_type: "catalog", _children: [{_type: "book", _children: [{_type: "author", _text: "Gambardella, Matthew"}, {_type: "author", _text: "Arciniegas, Fabio"}, {_type: "title", _text: "XML Developer’s Guide"}, {_type: "genre", _text: "Computer"}, {_type: "price", _text: "44.95"}, {_type: "publish_date", _text: "2000-10-01"}, {_type: "description", _text: "An in-depth look at creating applications with XML."}], id: "bk101"}, {_type: "book", _children: [{_type: "author", _text: "Ralls, Kim"}, {_type: "title", _text: "Midnight Rain"}, {_type: "genre", _text: "Fantasy"}, {_type: "price", _text: "5.95"}, {_type: "publish_date", _text: "2000-12-16"}, {_type: "description", _text: "A former architect battles corporate zombies, an evil sorceress, and her own childhood to become queen of the world."}], id: "bk102"}]}

We get back a map representing the XML structure. Every time an XML element is nested inside another one, it is accessible via the .children property. We can write the following query to get a better understanding of what our file contains.

The following query processes book.xml and parses the results to pull out the title, description, genre, and authors

CALL apoc.load.xml("file:///books.xml")
YIELD value
UNWIND value._children AS book
RETURN book.id AS bookId,
       [item in book._children WHERE item._type = "title"][0] AS title,
       [item in book._children WHERE item._type = "description"][0] AS description,
       [item in book._children WHERE item._type = "author"] AS authors,
       [item in book._children WHERE item._type = "genre"][0] AS genre;

Table 6. Results
bookId	title	description	authors	genre
"bk101"	{_type: "title", _text: "XML Developer’s Guide"}	{_type: "description", _text: "An in-depth look at creating applications with XML."}	[{_type: "author", _text: "Gambardella, Matthew"}, {_type: "author", _text: "Arciniegas, Fabio"}]	{_type: "genre", _text: "Computer"}
"bk102"	{_type: "title", _text: "Midnight Rain"}	{_type: "description", _text: "A former architect battles corporate zombies, an evil sorceress, and her own childhood to become queen of the world."}	[{_type: "author", _text: "Ralls, Kim"}]	{_type: "genre", _text: "Fantasy"}

Let’s now create a graph of books and their metadata, authors, and genres.

The following query processes book.xml and parses the results to pull out the title, description, genre, and authors

CALL apoc.load.xml("file:///books.xml")
YIELD value
UNWIND value._children AS book

WITH book.id AS bookId,
     [item in book._children WHERE item._type = "title"][0] AS title,
     [item in book._children WHERE item._type = "description"][0] AS description,
     [item in book._children WHERE item._type = "author"] AS authors,
     [item in book._children WHERE item._type = "genre"][0] AS genre

MERGE (b:Book {id: bookId})
SET b.title = title._text, b.description = description._text

MERGE (g:Genre {name: genre._text})
MERGE (b)-[:HAS_GENRE]->(g)

WITH b, authors
UNWIND authors AS author
MERGE (a:Author {name:author._text})
MERGE (a)-[:WROTE]->(b);

The Neo4j Browser visualization below shows the imported graph:

Import from GitHub

We can also process XML files from HTTP or HTTPS URIs. Let’s start by processing the books.xml file hosted on GitHub.

This time we’ll pass in true as the 4th argument of the procedure. This means that the XML will be parsed in simple mode.

The following query loads the books.xml file from GitHub using simple mode

WITH "https://raw.githubusercontent.com/neo4j-contrib/neo4j-apoc-procedures/4.0/src/test/resources/xml/books.xml" AS uri
CALL apoc.load.xml(uri, '', {}, true)
YIELD value
RETURN value;

Table 7. Results
value
{_type: "catalog", _catalog: [{_type: "book", _book: [{_type: "author", _text: "Gambardella, Matthew"}, {_type: "author", _text: "Arciniegas, Fabio"}, {_type: "title", _text: "XML Developer’s Guide"}, {_type: "genre", _text: "Computer"}, {_type: "price", _text: "44.95"}, {_type: "publish_date", _text: "2000-10-01"}, {_type: "description", _text: "An in-depth look at creating applications with XML."}], id: "bk101"}, {_type: "book", _book: [{_type: "author", _text: "Ralls, Kim"}, {_type: "title", _text: "Midnight Rain"}, {_type: "genre", _text: "Fantasy"}, {_type: "price", _text: "5.95"}, {_type: "publish_date", _text: "2000-12-16"}, {_type: "description", _text: "A former architect battles corporate zombies, an evil sorceress, and her own childhood to become queen of the world."}], id: "bk102"}, {_type: "book", _book: [{_type: "author", _text: "Corets, Eva"}, {_type: "title", _text: "Maeve Ascendant"}, {_type: "genre", _text: "Fantasy"}, {_type: "price", _text: "5.95"}, {_type: "publish_date", _text: "2000-11-17"}, {_type: "description", _text: "After the collapse of a nanotechnology society in England, the young survivors lay the foundation for a new society."}], id: "bk103"}, {_type: "book", _book: [{_type: "author", _text: "Corets, Eva"}, {_type: "title", _text: "Oberon’s Legacy"}, {_type: "genre", _text: "Fantasy"}, {_type: "price", _text: "5.95"}, {_type: "publish_date", _text: "2001-03-10"}, {_type: "description", _text: "In post-apocalypse England, the mysterious agent known only as Oberon helps to create a new life for the inhabitants of London. Sequel to Maeve Ascendant."}], id: "bk104"}, {_type: "book", _book: [{_type: "author", _text: "Corets, Eva"}, {_type: "title", _text: "The Sundered Grail"}, {_type: "genre", _text: "Fantasy"}, {_type: "price", _text: "5.95"}, {_type: "publish_date", _text: "2001-09-10"}, {_type: "description", _text: "The two daughters of Maeve, half-sisters, battle one another for control of England. Sequel to Oberon’s Legacy."}], id: "bk105"}, {_type: "book", _book: [{_type: "author", _text: "Randall, Cynthia"}, {_type: "title", _text: "Lover Birds"}, {_type: "genre", _text: "Romance"}, {_type: "price", _text: "4.95"}, {_type: "publish_date", _text: "2000-09-02"}, {_type: "description", _text: "When Carla meets Paul at an ornithology conference, tempers fly as feathers get ruffled."}], id: "bk106"}, {_type: "book", _book: [{_type: "author", _text: "Thurman, Paula"}, {_type: "title", _text: "Splish Splash"}, {_type: "genre", _text: "Romance"}, {_type: "price", _text: "4.95"}, {_type: "publish_date", _text: "2000-11-02"}, {_type: "description", _text: "A deep sea diver finds true love twenty thousand leagues beneath the sea."}], id: "bk107"}, {_type: "book", _book: [{_type: "author", _text: "Knorr, Stefan"}, {_type: "title", _text: "Creepy Crawlies"}, {_type: "genre", _text: "Horror"}, {_type: "price", _text: "4.95"}, {_type: "publish_date", _text: "2000-12-06"}, {_type: "description", _text: "An anthology of horror stories about roaches, centipedes, scorpions and other insects."}], id: "bk108"}, {_type: "book", _book: [{_type: "author", _text: "Kress, Peter"}, {_type: "title", _text: "Paradox Lost"}, {_type: "genre", _text: "Science Fiction"}, {_type: "price", _text: "6.95"}, {_type: "publish_date", _text: "2000-11-02"}, {_type: "description", _text: "After an inadvertant trip through a Heisenberg Uncertainty Device, James Salway discovers the problems of being quantum."}], id: "bk109"}, {_type: "book", _book: [{_type: "author", _text: "O’Brien, Tim"}, {_type: "title", _text: "Microsoft .NET: The Programming Bible"}, {_type: "genre", _text: "Computer"}, {_type: "price", _text: "36.95"}, {_type: "publish_date", _text: "2000-12-09"}, {_type: "description", _text: "Microsoft’s .NET initiative is explored in detail in this deep programmer’s reference."}], id: "bk110"}, {_type: "book", _book: [{_type: "author", _text: "O’Brien, Tim"}, {_type: "title", _text: "MSXML3: A Comprehensive Guide"}, {_type: "genre", _text: "Computer"}, {_type: "price", _text: "36.95"}, {_type: "publish_date", _text: "2000-12-01"}, {_type: "description", _text: "The Microsoft MSXML3 parser is covered in detail, with attention to XML DOM interfaces, XSLT processing, SAX and more."}], id: "bk111"}, {_type: "book", _book: [{_type: "author", _text: "Galos, Mike"}, {_type: "title", _text: "Visual Studio 7: A Comprehensive Guide"}, {_type: "genre", _text: "Computer"}, {_type: "price", _text: "49.95"}, {_type: "publish_date", _text: "2001-04-16"}, {_type: "description", _text: "Microsoft Visual Studio 7 is explored in depth, looking at how Visual Basic, Visual C+, C#, and ASP are integrated into a comprehensive development environment."}], id: "bk112"}]}

We again get back back a map representing the XML structure, but the structure is different than when we don’t use simple mode. This time nested XML elements are accessible via a property of the element name prefixed with an _.

We can write the following query to get a better understanding of what our file contains.

The following query processes book.xml and parses the results to pull out the title, description, genre, and authors

WITH "https://raw.githubusercontent.com/neo4j-contrib/neo4j-apoc-procedures/4.0/src/test/resources/xml/books.xml" AS uri
CALL apoc.load.xml(uri, '', {}, true)
YIELD value
UNWIND value._catalog AS catalog
RETURN catalog.id AS bookId,
       [item in catalog._book WHERE item._type = "title"][0] AS title,
       [item in catalog._book WHERE item._type = "description"][0] AS description,
       [item in catalog._book WHERE item._type = "author"] AS authors,
       [item in catalog._book WHERE item._type = "genre"][0] AS genre;

Table 8. Results
bookId	title	description	authors	genre
"bk101"	{_type: "title", _text: "XML Developer’s Guide"}	{_type: "description", _text: "An in-depth look at creating applications with XML."}	[{_type: "author", _text: "Gambardella, Matthew"}, {_type: "author", _text: "Arciniegas, Fabio"}]	{_type: "genre", _text: "Computer"}
"bk102"	{_type: "title", _text: "Midnight Rain"}	{_type: "description", _text: "A former architect battles corporate zombies, an evil sorceress, and her own childhood to become queen of the world."}	[{_type: "author", _text: "Ralls, Kim"}]	{_type: "genre", _text: "Fantasy"}
"bk103"	{_type: "title", _text: "Maeve Ascendant"}	{_type: "description", _text: "After the collapse of a nanotechnology society in England, the young survivors lay the foundation for a new society."}	[{_type: "author", _text: "Corets, Eva"}]	{_type: "genre", _text: "Fantasy"}
"bk104"	{_type: "title", _text: "Oberon’s Legacy"}	{_type: "description", _text: "In post-apocalypse England, the mysterious agent known only as Oberon helps to create a new life for the inhabitants of London. Sequel to Maeve Ascendant."}	[{_type: "author", _text: "Corets, Eva"}]	{_type: "genre", _text: "Fantasy"}
"bk105"	{_type: "title", _text: "The Sundered Grail"}	{_type: "description", _text: "The two daughters of Maeve, half-sisters, battle one another for control of England. Sequel to Oberon’s Legacy."}	[{_type: "author", _text: "Corets, Eva"}]	{_type: "genre", _text: "Fantasy"}
"bk106"	{_type: "title", _text: "Lover Birds"}	{_type: "description", _text: "When Carla meets Paul at an ornithology conference, tempers fly as feathers get ruffled."}	[{_type: "author", _text: "Randall, Cynthia"}]	{_type: "genre", _text: "Romance"}
"bk107"	{_type: "title", _text: "Splish Splash"}	{_type: "description", _text: "A deep sea diver finds true love twenty thousand leagues beneath the sea."}	[{_type: "author", _text: "Thurman, Paula"}]	{_type: "genre", _text: "Romance"}
"bk108"	{_type: "title", _text: "Creepy Crawlies"}	{_type: "description", _text: "An anthology of horror stories about roaches, centipedes, scorpions and other insects."}	[{_type: "author", _text: "Knorr, Stefan"}]	{_type: "genre", _text: "Horror"}
"bk109"	{_type: "title", _text: "Paradox Lost"}	{_type: "description", _text: "After an inadvertant trip through a Heisenberg Uncertainty Device, James Salway discovers the problems of being quantum."}	[{_type: "author", _text: "Kress, Peter"}]	{_type: "genre", _text: "Science Fiction"}
"bk110"	{_type: "title", _text: "Microsoft .NET: The Programming Bible"}	{_type: "description", _text: "Microsoft’s .NET initiative is explored in detail in this deep programmer’s reference."}	[{_type: "author", _text: "O’Brien, Tim"}]	{_type: "genre", _text: "Computer"}
"bk111"	{_type: "title", _text: "MSXML3: A Comprehensive Guide"}	{_type: "description", _text: "The Microsoft MSXML3 parser is covered in detail, with attention to XML DOM interfaces, XSLT processing, SAX and more."}	[{_type: "author", _text: "O’Brien, Tim"}]	{_type: "genre", _text: "Computer"}
"bk112"	{_type: "title", _text: "Visual Studio 7: A Comprehensive Guide"}	{_type: "description", _text: "Microsoft Visual Studio 7 is explored in depth, looking at how Visual Basic, Visual C+, C#, and ASP are integrated into a comprehensive development environment."}	[{_type: "author", _text: "Galos, Mike"}]	{_type: "genre", _text: "Computer"}

Rather than just returning that data, we can create a graph of books and their metadata, authors, and genres.

The following query processes book.xml and parses the results to pull out the title, description, genre, and authors

WITH "https://raw.githubusercontent.com/neo4j-contrib/neo4j-apoc-procedures/4.0/src/test/resources/xml/books.xml" AS uri
CALL apoc.load.xml(uri, '', {}, true)
YIELD value
UNWIND value._catalog AS catalog
WITH catalog.id AS bookId,
       [item in catalog._book WHERE item._type = "title"][0] AS title,
       [item in catalog._book WHERE item._type = "description"][0] AS description,
       [item in catalog._book WHERE item._type = "author"] AS authors,
       [item in catalog._book WHERE item._type = "genre"][0] AS genre

MERGE (b:Book {id: bookId})
SET b.title = title._text, b.description = description._text

MERGE (g:Genre {name: genre._text})
MERGE (b)-[:HAS_GENRE]->(g)

WITH b, authors
UNWIND authors AS author
MERGE (a:Author {name:author._text})
MERGE (a)-[:WROTE]->(b);

The Neo4j Browser visualization below shows the imported graph:

xPath expressions

We can also provide an xPath expression to select nodes from an XML document. If we only want to return books that have the Computer genre, we could write the following query:

CALL apoc.load.xml(
  "https://raw.githubusercontent.com/neo4j-contrib/neo4j-apoc-procedures/4.0/src/test/resources/xml/books.xml",
  '/catalog/book[genre=\"Computer\"]'
)
YIELD value as book
WITH book.id as id, [attr IN book._children WHERE attr._type IN ['title','price'] | attr._text] as pairs
RETURN id, pairs[0] as title, pairs[1] as price;

Table 9. Results
id	title	price
"bk101"	"XML Developer’s Guide"	"44.95"
"bk110"	"Microsoft .NET: The Programming Bible"	"36.95"
"bk111"	"MSXML3: A Comprehensive Guide"	"36.95"
"bk112"	"Visual Studio 7: A Comprehensive Guide"	"49.95"

In this case we return only id, title and prize but we can return any other elements

We can also return just a single specific element. For example, the following query returns the author of the book with id = bg102

CALL apoc.load.xml(
  'https://raw.githubusercontent.com/neo4j-contrib/neo4j-apoc-procedures/4.0/src/test/resources/xml/books.xml',
  '/catalog/book[@id="bk102"]/author'
)
YIELD value as result
WITH result._text as author
RETURN author;

Table 10. Results
author
"Ralls, Kim"

Extracting data structures

We can turn values into a map using the apoc.map.fromPairs function.

call apoc.load.xml("https://raw.githubusercontent.com/neo4j-contrib/neo4j-apoc-procedures/4.0/src/test/resources/xml/books.xml")
yield value as catalog
UNWIND catalog._children as book
WITH book.id as id, [attr IN book._children WHERE attr._type IN ['author','title'] | [attr._type, attr._text]] as pairs
WITH id, apoc.map.fromPairs(pairs) AS value
RETURN id, value

Table 11. Results
id	value
"bk101"	{title: "XML Developer’s Guide", author: "Arciniegas, Fabio"}
"bk102"	{title: "Midnight Rain", author: "Ralls, Kim"}
"bk103"	{title: "Maeve Ascendant", author: "Corets, Eva"}
"bk104"	{title: "Oberon’s Legacy", author: "Corets, Eva"}
"bk105"	{title: "The Sundered Grail", author: "Corets, Eva"}
"bk106"	{title: "Lover Birds", author: "Randall, Cynthia"}
"bk107"	{title: "Splish Splash", author: "Thurman, Paula"}
"bk108"	{title: "Creepy Crawlies", author: "Knorr, Stefan"}
"bk109"	{title: "Paradox Lost", author: "Kress, Peter"}
"bk110"	{title: "Microsoft .NET: The Programming Bible", author: "O’Brien, Tim"}
"bk111"	{title: "MSXML3: A Comprehensive Guide", author: "O’Brien, Tim"}
"bk112"	{title: "Visual Studio 7: A Comprehensive Guide", author: "Galos, Mike"}

And now we can cleanly access the attributes from the map.

call apoc.load.xml("https://raw.githubusercontent.com/neo4j-contrib/neo4j-apoc-procedures/4.0/src/test/resources/xml/books.xml")
yield value as catalog
UNWIND catalog._children as book
WITH book.id as id, [attr IN book._children WHERE attr._type IN ['author','title'] | [attr._type, attr._text]] as pairs
WITH id, apoc.map.fromPairs(pairs) AS value
RETURN id, value.title, value.author

Table 12. Results
id	value.title	value.author
"bk101"	"XML Developer’s Guide"	"Arciniegas, Fabio"
"bk102"	"Midnight Rain"	"Ralls, Kim"
"bk103"	"Maeve Ascendant"	"Corets, Eva"
"bk104"	"Oberon’s Legacy"	"Corets, Eva"
"bk105"	"The Sundered Grail"	"Corets, Eva"
"bk106"	"Lover Birds"	"Randall, Cynthia"
"bk107"	"Splish Splash"	"Thurman, Paula"
"bk108"	"Creepy Crawlies"	"Knorr, Stefan"
"bk109"	"Paradox Lost"	"Kress, Peter"
"bk110"	"Microsoft .NET: The Programming Bible"	"O’Brien, Tim"
"bk111"	"MSXML3: A Comprehensive Guide"	"O’Brien, Tim"
"bk112"	"Visual Studio 7: A Comprehensive Guide"	"Galos, Mike"

Import XML directly

We can write the following query to create a graph structure of the Microsoft books XML file.

The following creates a graph structure based on the contents of books.xml

CALL apoc.import.xml(
  "https://raw.githubusercontent.com/neo4j-contrib/neo4j-apoc-procedures/4.0/src/test/resources/xml/books.xml",
  {relType:'NEXT_WORD', label:'XmlWord'}
)
YIELD node
RETURN node;

node
(:XmlDocument {_xmlVersion: "1.0", _xmlEncoding: "UTF-8", url: "https://raw.githubusercontent.com/neo4j-contrib/neo4j-apoc-procedures/4.0/src/test/resources/xml/books.xml"})

node

(:XmlDocument {_xmlVersion: "1.0", _xmlEncoding: "UTF-8", url: "https://raw.githubusercontent.com/neo4j-contrib/neo4j-apoc-procedures/4.0/src/test/resources/xml/books.xml"})

The Neo4j Browser visualization below shows the imported graph: