Previewing RDF data
Sometimes before we go ahead and import RDF data into Neo4j we want to visualise it and confirm that
the configuration parameters are producing the desired graph. That’s what the .preview
procedures
help us with. Also in some situations, we may even want to take control of the data ingestion
process and completely customise it at a triple level. That is the purpose of the
.stream
methods. Here is a description for each of them.
Previewing RDF data
The preview procedures rdf.preview
and onto.preview
generate a visual preview of the graph in
the Neo4j browser. They are identical in structure to their import counterparts and also
come in the usual .fetch
and .inline
modes. They take the same parameters as the import
procedures:
Parameter | Type | Description |
---|---|---|
url or payload |
String |
URL to retrieve the dataset (fetch mode) or RDF snippet (inline mode) |
format |
String |
serialization format. Valid formats are: Turtle, N-Triples, JSON-LD, RDF/XML, TriG and N-Quads (For named graphs) |
params |
Map |
Optional set of parameters (see description in table below) |
Here is an example of use in .inline
mode with a fragment of RDF/XML:
call n10s.rdf.preview.inline('
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:music="http://neo4j.com/voc/music#">
<rdf:Description rdf:about="http://neo4j.com/indiv#The_Beatles">
<rdf:type rdf:resource="http://neo4j.com/voc/music#Band"/>
<music:name>The Beatles</music:name>
<music:member rdf:resource="http://neo4j.com/indiv#John_Lennon"/>
</rdf:Description>
<music:Artist rdf:about="http://neo4j.com/indiv#John_Lennon">
<music:name>John Lennon</music:name>
</music:Artist>
<music:Song rdf:about="http://neo4j.com/indiv#Helter_Skelter">
<music:name>Helter Skelter</music:name>
<music:length rdf:datatype="http://www.w3.org/2001/XMLSchema#integer">269</music:length>
<music:writer rdf:resource="http://neo4j.com/indiv#John_Lennon"/>
</music:Song>
</rdf:RDF>
','RDF/XML')
which produces the following output in the Neo4j browser:
Preview works as well in .fetch
mode:
call n10s.rdf.preview.fetch("https://raw.githubusercontent.com/jbarrasa/datasets/master/rdf/music.ttl","Turtle")
which produces the following output in the Neo4j browser:
There are also n10s.onto.preview
methods available that implement the same
import logic described in the ontology import section but producing a preview instead of
persisting the results in the DB.
Note that by default, preview (and stream) methods parse only the first 1000
triples. If you want to go beyond that limit you can override the limit param.
For example { limit: 9999 }
|
Streaming triples
The streaming procedures only parse the input RDF and streams it as result. No write to the DB takes place. These methods take the following parameters:
Parameter | Type | Description |
---|---|---|
url |
String |
URL of the dataset |
format |
String |
serialization format. Valid formats are: Turtle, N-Triples, JSON-LD, RDF/XML, TriG and N-Quads (For named graphs) |
params |
Map |
Optional set of parameters (see description in table below) |
We will invoke it with a URL and a serialisation format just as we would invoke the n10s.rdf.import.fetch
procedure:
CALL n10s.rdf.stream.fetch("https://github.com/neo4j-labs/neosemantics/raw/3.5/docs/rdf/nsmntx.ttl","Turtle");
It will produce a stream of records, each one representing a triple parsed. So you will get fields for the subject, predicate and object plus three additional ones:
-
isLiteral: a boolean indicating whether the object of the statement is a literal
-
literalType: The datatype of the literal value when available
-
literalLang: The language when available
In the previous example the output would look something like this:
The procedure is read-only and nothing is written to the graph, however, it is possible to use cypher on the output of the procedure to analyse the triples returned like in this first example:
CALL n10s.rdf.stream.fetch("https://github.com/neo4j-labs/neosemantics/raw/3.5/docs/rdf/nsmntx.ttl","Turtle") yield subject, predicate, object
WHERE predicate = "http://www.w3.org/1999/02/22-rdf-syntax-ns#type"
RETURN object as category, count(*) as itemsInCategory;
category | itemsInCategory |
---|---|
"http://neo4j.org/vocab/sw#Neo4jPlugin" |
3 |
"http://neo4j.org/vocab/sw#GraphPlatform" |
1 |
"http://neo4j.org/vocab/sw#AwesomePlatform" |
1 |
Or even to write to the Graph to create your own custom structure like in this second one:
CALL n10s.rdf.stream.fetch("https://github.com/neo4j-labs/neosemantics/raw/3.5/docs/rdf/nsmntx.ttl","Turtle")
YIELD subject, predicate, object, isLiteral
WHERE NOT ( isLiteral OR predicate = "http://www.w3.org/1999/02/22-rdf-syntax-ns#type" )
MERGE (from:Thing { id: subject})
MERGE (to:Thing { id: object })
MERGE (from)-[:CONNECTED_TO { id: predicate }]->(to);
There is also a streamRDFSnippet
procedure identical to streamRDF
but taking the RDF input (first parameter) in the form of a string, here’s an example of how can it be used:
WITH '<http://ind#123> <http://voc#property1> "Value"@en, "Valeur"@fr, "Valor"@es ; <http://voc#property2> 123 .' as rdfPayload
CALL n10s.rdf.stream.inline(rdfPayload,"Turtle")
YIELD subject, predicate, object, isLiteral, literalType, literalLang
WHERE predicate = 'http://voc#property1' AND literalLang = "es"
CREATE (:Thing { uri: subject, prop: object });
The n10s.rdf.stream.fetch
and n10s.rdf.stream.inline
methods provide a convenient way to visualise in the Neo4j browser some RDF data before we go ahead with the actual import.
Like all methods in the Preview section, both n10s.rdf.stream.fetch
and n10s.rdf.stream.inline
are read only so will not persist anything in the graph.
The difference between them is that previewRDF
takes a url (and optionally additional configuration settings as described in Advanced Fetching) whereas n10s.rdf.stream.inline
takes an RDF fragment as text instead.