[As community content, this post reflects the views and opinions of the particular author and does not necessarily reflect the official stance of Neo4j.]

Why Would You Want to Do This?

The Web Ontology Language (OWL) has been around for a while now and is used for a variety of semantic applications.

Ontologies are freely available and help developers to create models for real-world scenarios. They can be instantiated, combined and enriched using SWRL rules and a reasoner such as Hermit or Pellet.

The reasons for creating such a representation of data differ: Natural language processing, reusing data across domains or contextualisation are just some of many. The data obtained is then stored in a knowledge base, from which it can be retrieved using SPARQL queries.

But if it’s all already there, what’s the point of combining it with a graph database?

While SPARQL certainly has its strong points – like using different ontologies at the same time and the similarity to well-known SQL – it also has weaknesses.

Triple stores, which are the starting point for most SPARQL applications, consume a lot of disk space compared to relational databases. They are also slow for very large datasets.

On the other hand, Neo4j stores whole graphs as opposed to “just” triples. It has an easy-to-learn and easy-to-use query language and a web-based, graphical interface which allows users to easily browse and explore the graph.

Also, it is fast for querying and scales very well to handle larger datasets.

As is the usual case, there is no ideal solution, and everything really depends on your particular use case, considering, for example:
    • The amount, frequency and connectedness of incoming data
    • The importance of speed and size
    • The type of query executed on the database

The Playground

There is the PROV-O ontology, which models causation and influence between activities, agents and entities. This concept is fairly abstract but useful to answer a number of questions related to the origin of entities (and what may have influenced them throughout their lifetime).

The PROV-O ontology is applicable to a number of fields. For example, social networking (“Who was the author of the blog post that influenced Peter to write his mashup?”) or experimenting (“Who was the last person to access the experiment before it failed and when did he or she access it?”).

PROV-O is used in the BonFIRE project, which is a multi-site cloud experimentation and testing facility.

There are people (agents) conducting experiments using resources (entities). At an infrastructure level, to perform their experiments, they create, use and destroy (activities) compute nodes, storages and virtual networks (entities). After their experiment has finished, they download the results (entities) from the virtual machine for further analysis.

These results are influenced by a large number of activities and agents, and often it is difficult to determine how such a result came to be, who was involved in its formation or why it is different from other results. But using provenance, these questions can be answered.


In BonFIRE, the data arrives on a RabbitMQ as a set of JSON messages that look like this:


In this case, bert shut down compute node 123 located on server1. This message is filled into Java classes which are used to transform them (“manually”) into triples.

Using the single message from the above example, we derive several triples that would look something like this:

:Action_state.shutdown_1375801302 rdf:type :Action
:Compute_/locations/server1/computes/123 rdf:type :Compute
:Compute_/locations/server1/computes/123 prov:invalidatedBy :Action_state.shutdown_1375801302
:Experimenter_Bert rdf:type :Experimenter
:Experimenter_Bert prov:wasAssociatedWith :Action_state.shutdown_1375801302

The prefixes used are defined in the ontology into which these triples are going to be imported.

The above step is not necessary if the messages are supposed to go into the ontology directly – the OWL API could be used instead to create individuals, properties and so on.

Transforming them to triples, however, serves as an interface to be able to read data from all kinds of sources as long as it’s formatted as triples. If the OWL API was used instead, the code would have to be changed every time the data changes.

These triples can then be added to an ontology using the OWLRDFConsumer class from the OWL API. This adds the triples to the ontology where the reasoner can be invoked to enrich the data.

So far, that’s not really special. The interesting bit follows after the reasoning has taken place.

Getting Graphy

Now there is this ontology object sitting in the memory, which contains the ontology itself as well as the individuals that came from the triples. It could simply be stored in a knowledge base, but if it was, you wouldn’t be reading about it here 🙂

An ontology is a graph. It has a top node (owl:Thing) and classes extending it. There are individuals that belong to classes and object properties connecting the individuals. Individuals can have data properties and annotations that can be represented as node properties and relationship properties or as relationship types.

The import of an ontology is pretty straightforward:

Step 1

The only object you need is the ontology object created earlier.

It could also be loaded from a file, but that doesn’t make a difference.

private void importOntology(OWLOntology ontology) throws Exception {
     OWLReasoner reasoner = new Reasoner(ontology);
          if (!reasoner.isConsistent()) {
               logger.error("Ontology is inconsistent");
               //throw your exception of choice here
               throw new Exception("Ontology is inconsistent");
          Transaction tx = db.beginTx();
          try {

Step 2

Create a starting node in Neo4j representing the owl:Thing node. This is the root node of the graph we’re going to create.

               Node thingNode = getOrCreateNodeWithUniqueFactory("owl:Thing");

Step 3

Get all the classes defined in the ontology and add them to the graph.

          for (OWLClass c :ontology.getClassesInSignature(true)) {
               String classString = c.toString();
               if (classString.contains("#")) {
                    classString = classString.substring(
                    Node classNode = getOrCreateNodeWithUniqueFactory(classString);

Step 4

Find out if they have any super classes. If they do, link them. If they don’t, link back to owl:Thing.

Make sure only to link to the direct super classes! The relationship type used to express the rdf:type property is a custom one named “isA”:

                    NodeSet<OWLClass> superclasses = reasoner.getSuperClasses(c, true);
                    if (superclasses.isEmpty()) {
                         } else {
                         for (org.semanticweb.owlapi.reasoner.Node<OWLClass>
                         parentOWLNode: superclasses) {
                              OWLClassExpression parent =
                              String parentString = parent.toString();
                              if (parentString.contains("#")) {
                                   parentString = parentString.substring(
                              Node parentNode =

Step 5

Now for each class, get all the individuals. Create nodes and link them back to their parent class.

                    for (org.semanticweb.owlapi.reasoner.Node<OWLNamedIndividual> in
                         : reasoner.getInstances(c, true)) {
                         OWLNamedIndividual i = in.getRepresentativeElement();
                         String indString = i.toString();
                         if (indString.contains("#")) {
                              indString = indString.substring(
                         Node individualNode =

Step 6

For each individual, get all object properties and all data properties. Add them to the graph as node properties or relationships. Make sure to get all axioms, not just the asserted ones.

                         for (OWLObjectPropertyExpression objectProperty:
                         ontology.getObjectPropertiesInSignature()) {
                              object: reasoner.getObjectPropertyValues(i,
                              objectProperty)) {
                                   String reltype = objectProperty.toString();
                                   reltype = reltype.substring(reltype.indexOf("#")+1,
                                   String s =
                                   s = s.substring(s.indexOf("#")+1,
                                   Node objectNode =
                        for (OWLDataPropertyExpression dataProperty:
                        ontology.getDataPropertiesInSignature()) {
                              for (OWLLiteral object: reasoner.getDataPropertyValues(
                              i, dataProperty.asOWLDataProperty())) {
                                   String reltype =
                                   reltype = reltype.substring(reltype.indexOf("#")+1, 
                                   String s = object.toString();
                                   individualNode.setProperty(reltype, s);
          } finally {

That’s it, you’re done! Now for the fun bit: querying the ontology!


This is the graph now sitting in the database:

It has the ontology as well as all the individuals and properties, represented in their “natural” form. Now the querying can begin.

Whether it is a simple query to find out what happened to a specific VM (entity) during its lifecycle…

START e=node:name(name="experiment123"), ag=node:name(name="Agent")
MATCH e-[r:hadActivity]->ac-->a-[:isA*]->ag
RETURN distinct e.name as experiment, type(r) as relationship, a.name as agent
ac.name as activity, ac.startedAtTime as starttime, ac.endedAtTime as endtime
ORDER BY starttime

…or you want to do some more complicated pattern matching to find out how two experiments are different when they look the same at first glance – the only boundary is your imagination.


Protege comes with a simple visualisation and the possibility to execute SPARQL queries.

Neo4j has Cypher, which makes querying the imported ontology much more intuitive – ontologies are graphs after all. Also the webadmin interface allows better “exploring” of the graph.

Time is not an issue in this case, because the ontology import is not time-critical. It’s done only once after the experiment has finished and imports the whole ontology.

For an ontology containing several hours of experiment data, the import takes only a few seconds. Once the graph has been imported, querying is fast, which makes it a great tool to analyse and visualise ontologies.

Want to learn more about graph databases? Click below to get your free copy of O’Reilly’s Graph Databases ebook and discover how to use graph technologies for your mission-critical application today.

Download My Free Copy



About the Author

Stefanie Wiegand , Research Engineer, IT Innovation Centre

Stefanie Wiegand is a Research Engineer at IT Innovation Centre in the UK, and she is a guest contributor to the Neo4j blog.


Jim Salmons says:

Stefanie,<br /><br />Great article! Thank you. I am an exploratory learner/developer digging into Neo4j for a project where &quot;self-descriptive&quot; graph databases will be important. So I found your article to be particularly helpful. Lots of good relevant links, a cogent context description, and a &quot;just right&quot; (enough but not too much) example, all fitting into an article that is

David Sadler says:

Great article! Thanks for clarifying and offering an integration viewpoint between OWL and Neo4j.

I’m looking at using an Expert System (Inference Engine) Pyke, and wondered if you have experimented in using “A” Inference Engine with Neo4j? Specifically making inferences against a graph to discover patterns?

I should note that for the use case it would be a dynamic data set.

Thanks for the article, and any insights appreciated.

Julian Simpson says:

Hi David, thanks for getting in touch. The Neo4j Mailing list would be the best place to ask.

herli joaquim de menezes says:

Great article, it is inspiring! I am digging too into NEO4J working with OWL ontology to representf some experiments and I think your article has very interesting clues.

Bo Ferri says:

Thanks a lot for getting to this topic.
After digging in this space (Semantic Web + Graph Databases) for quite I while, I would tend to say that it might be better to not directly connect schema (ABox) with instance (TBox) data. It would simply cause super nodes (dense nodes) very fast at larger datasets (and super nodes should be avoided in general). I prefer to put the class as label to the nodes. It is good to store the identifiers (URIs) in their prefixed form in the graph database (to safe space). However, this would require a namespace/prefix resolving mechanism (which makes the processing a bit more difficult in general). Besides this, a mechanism to fulfil the unique statement (triple) constraint is required, i.e., each triple (node-edge-node) should only exist once (per Named Graph) in the graph database (this needs to be checked upfront, i.e. for insert).
In our graph extension in the d:swarm project (see https://github.com/dswarm/dswarm-graph-neo4j) we experimented with all this (and even more (e.g. versioning of statements (triples))). Unfortunately, right we didn’t get it to scale as we would like (so feel free to help – it’s open source 😉 ).

just my 5 p

Charbel kaed says:

Thank you for this article,

Inserting and querying is really needed, the next level would be is to check how can you plug a reasoner on top of neo4j?

One of the main powerful features of ontologies are reasoners.

Did you investigate plugging an inference engine?


Xiao Wen says:

I’m a student and I’m try to put the owl store in the Neo4j.Do you have any suggest to me ?
Thank you
And this article is very useful for me to understand the relationship between the owl and Neo4j.

Leave a Reply

Your email address will not be published. Required fields are marked *