Why would you want to do this?

OWL has been around for a while now and is used for a variety of semantic applications. Ontologies are freely available and help developers to create models for real world scenarios. They can be instantiated, combined and enriched using SWRL rules and a reasoner such as Hermit or Pellet. The reasons for creating such a representation of data differ: natural language processing, reusing data across domains or contextualisation are just some of many. The data obtained is then stored in a knowledge base, from which it can be retrieved using SPARQL queries. But if it’s all already there, what’s the point of combining it with a graph database?

While SPARQL certainly has its strong points like using different ontologies at the same time and the similarity to well known SQL, it also has weaknesses. Triple stores which are the starting point for most SPARQL applications consume a lot of disk space compared to relational databases. They are also slow for very large datasets.
Neo4j stores whole graphs as opposed to “just” triples. It has an easy to learn and easy to use query language and a web based, graphical, interface which allows users to easily browse and explore the graph. Also it is fast for querying and scales very well to handle larger datasets.

 As it is the case most of the time, there is no ideal solution and it really depends on the use case, considering, for example:
  • the amount and frequency and connectedness of incoming data
  • importance of speed and size
  • the type of query executed on the database

The playground

There is the PROV-O ontology, which models causation and influence between activities, agents and entities. This concept is fairly abstract but useful to answer a number of questions related to the origin of entities (and what may have influenced them throughout their lifetime). It is applicable to a number of fields, for example social networking (“Who was the author of the blog post that influenced Peter to write his mashup?”) or experimenting (“Who was the last person to access the experiment before it failed and when did he access it?”).

PROV-O is used in the BonFIRE project, which is a multi-site cloud experimentation and testing facility. There are people (agents) conducting experiments using resources (entities). At an infrastructure level, to perform their experiments, they create, use and destroy (activities) compute nodes, storages and virtual networks (entities). After their experiment has finished, they download the results (entities) from the virtual machine for further analysis. These results are influenced by a large number of activities and agents, and often it is difficult to determine how such a result came to be, who was involved in its formation or why it is different from other results. Using provenance, these questions can be answered.

Preparations

In BonFIRE, the data arrives on a RabbitMQ as a set of JSON messages that look like this:

{"timestamp":1375801302,"eventType":"state.shutdown","objectType":"compute","objectId":"/locations/server1/computes/123","groupId":"group1","userId":"bert"}

In this case, bert shut down compute node 123 located on server1. This message is filled into Java classes which are used to transform them (“manually”) into triples. Using the single message from the above example, we derive several triples that would look something like this:

:Action_state.shutdown_1375801302 rdf:type :Action
:Compute_/locations/server1/computes/123 rdf:type :Compute
:Compute_/locations/server1/computes/123 prov:invalidatedBy :Action_state.shutdown_1375801302
:Experimenter_Bert rdf:type :Experimenter
:Experimenter_Bert prov:wasAssociatedWith :Action_state.shutdown_1375801302
...

The prefixes used are defined in the ontology into which these triples are going to be imported.

The above step is not necessary if the messages are supposed to go into the ontology directly – OWLAPI could be used instead to create individuals, properties and so on. Transforming them to triples however serves as an interface to be able to read data from all kinds of sources as long as it’s formatted as triples. If OWLAPI was used instead, the code would have to be changed every time the data changes.

These triples can then be added to an ontology using the OWLRDFConsumer class from the OWLAPI. This adds the triples to the ontology where the reasoner can be invoked to enrich the data. So far, that’s not really special. The interesting bit follows after the reasoning has taken place.

Getting graphy

Now there is this ontology object sitting in the memory, which contains the ontology itself as well as the individuals that came from the triples. Now it could simply be stored in a knowledge base but if it was, you wouldn’t be reading about it here :)

An ontology is a graph. It has a top node (owl:Thing) and classes extending it. There are individuals that belong to classes and object properties connecting the individuals. Individuals can have data properties and annotations that can be represented as node properties and relationship properties or as relationship types.

The import of an ontology is pretty straightforward:

Step 1

The only object you need is the ontology object created earlier. It could also be loaded from a file, that doesn’t make a difference.

private void importOntology(OWLOntology ontology) throws Exception {
    OWLReasoner reasoner = new Reasoner(ontology);
       
        if (!reasoner.isConsistent()) {
            logger.error("Ontology is inconsistent");
            //throw your exception of choice here
            throw new Exception("Ontology is inconsistent");
        }
        Transaction tx = db.beginTx();
        try {

Step 2

Create a starting node in Neo4j representing the owl:Thing node. This is the root node of the graph we’re going to create.

            Node thingNode = getOrCreateNodeWithUniqueFactory("owl:Thing");


Step 3

Get all the classes defined in the ontology and add them to the graph.

            for (OWLClass c :ontology.getClassesInSignature(true)) {
                String classString = c.toString();
                if (classString.contains("#")) {
                    classString = classString.substring(

                     classString.indexOf("#")+1,classString.lastIndexOf(">"));
                }
                Node classNode = getOrCreateNodeWithUniqueFactory(classString);

Step 4

Find out if they have any super classes. If they do, link them. If they don’t, link back to owl:Thing. Make sure only to link to the direct super classes! The relationship type used to express the rdf:type property is a custom one named “isA”.

                NodeSet<OWLClass> superclasses = reasoner.getSuperClasses(c, true);

                if (superclasses.isEmpty()) {
                    classNode.createRelationshipTo(thingNode,

                     DynamicRelationshipType.withName("isA"));   
                } else {
                    for (org.semanticweb.owlapi.reasoner.Node<OWLClass>

                     parentOWLNode: superclasses) {
                       
                        OWLClassExpression parent =

                         parentOWLNode.getRepresentativeElement();
                        String parentString = parent.toString();
                       
                        if (parentString.contains("#")) {
                            parentString = parentString.substring(

                             parentString.indexOf("#")+1,
                             parentString.lastIndexOf(">"));
                        }
                        Node parentNode =

                         getOrCreateNodeWithUniqueFactory(parentString);
                        classNode.createRelationshipTo(parentNode,

                         DynamicRelationshipType.withName("isA"));
                    }
                }

Step 5

Now for each class, get all the individuals. Create nodes and link them back to their parent class.

                for (org.semanticweb.owlapi.reasoner.Node<OWLNamedIndividual> in
                 : reasoner.getInstances(c, true)) {
                    OWLNamedIndividual i = in.getRepresentativeElement();
                    String indString = i.toString();
                    if (indString.contains("#")) {
                        indString = indString.substring(

                         indString.indexOf("#")+1,indString.lastIndexOf(">"));
                    }
                    Node individualNode = 

                     getOrCreateNodeWithUniqueFactory(indString);
                                             

                    individualNode.createRelationshipTo(classNode,
                    DynamicRelationshipType.withName("isA"));

Step 6

For each individual, get all object properties and all data properties. Add them to the graph as node properties or relationships. Make sure to get all axioms, not just the asserted ones.

                    for (OWLObjectPropertyExpression objectProperty:
                     ontology.getObjectPropertiesInSignature()) {

                       for  

                       (org.semanticweb.owlapi.reasoner.Node<OWLNamedIndividual> 
                        object: reasoner.getObjectPropertyValues(i,
                        objectProperty)) {
                            String reltype = objectProperty.toString();
                            reltype = reltype.substring(reltype.indexOf("#")+1,

                             reltype.lastIndexOf(">"));
                           
                            String s =

                             object.getRepresentativeElement().toString();
                            s = s.substring(s.indexOf("#")+1,

                             s.lastIndexOf(">"));
                            Node objectNode =

                             getOrCreateNodeWithUniqueFactory(s);
                            individualNode.createRelationshipTo(objectNode,

                             DynamicRelationshipType.withName(reltype));
                        }
                    }

                    for (OWLDataPropertyExpression dataProperty:

                     ontology.getDataPropertiesInSignature()) {

                        for (OWLLiteral object: reasoner.getDataPropertyValues(

                         i, dataProperty.asOWLDataProperty())) {
                            String reltype =

                             dataProperty.asOWLDataProperty().toString();
                            reltype = reltype.substring(reltype.indexOf("#")+1, 

                             reltype.lastIndexOf(">"));
                           
                            String s = object.toString();
                            individualNode.setProperty(reltype, s);
                        }
                    }
                }
            }
            tx.success();
        } finally {
            tx.finish();
        }
    }


That’s it, you’re done! Now for the fun bit: querying the ontology!

Graphwalking

This is the graph now sitting in the database:



It has the ontology as well as all the individuals and properties, represented in their “natural” form. Now the querying can begin. Whether it is a simple query to find out what happened to a specific VM (entity) during its lifecyle

START e=node:name(name="experiment123"), ag=node:name(name="Agent")
MATCH e-[r:hadActivity]->ac-->a-[:isA*]->ag
RETURN distinct e.name as experiment, type(r) as relationship, a.name as agent
ac.name as activity, ac.startedAtTime as starttime, ac.endedAtTime as endtime
ORDER BY starttime

or do some more complicated pattern matching to find out how two experiments are different when they look the same at first glance – the only boundary is imagination.

Conclusion

Protege comes with a simple visualisation and the possibility to execute SPARQL queries. Neo4j has cypher, which makes querying the imported ontology much more intuitive – ontologies are graphs after all. Also the webadmin interface allows better “exploring” of the graph. Time is not an issue in this case, because the ontology import is not time-critical. It’s done only once after the experiment has finished and imports the whole ontology. For an ontology containing several hours of experiment data, the import takes only a few seconds. Once the graph has been imported, querying is fast which makes it a great tool to analyse and visualise ontologies.

by Stefanie Wiegand


 

Keywords:  


4 Comments

Jim Salmons says:

Stefanie,<br /><br />Great article! Thank you. I am an exploratory learner/developer digging into Neo4j for a project where &quot;self-descriptive&quot; graph databases will be important. So I found your article to be particularly helpful. Lots of good relevant links, a cogent context description, and a &quot;just right&quot; (enough but not too much) example, all fitting into an article that is

David Sadler says:

Great article! Thanks for clarifying and offering an integration viewpoint between OWL and Neo4j.

I’m looking at using an Expert System (Inference Engine) Pyke, and wondered if you have experimented in using “A” Inference Engine with Neo4j? Specifically making inferences against a graph to discover patterns?

I should note that for the use case it would be a dynamic data set.

Thanks for the article, and any insights appreciated.

Julian Simpson says:

Hi David, thanks for getting in touch. The Neo4j Mailing list would be the best place to ask.

herli joaquim de menezes says:

Great article, it is inspiring! I am digging too into NEO4J working with OWL ontology to representf some experiments and I think your article has very interesting clues.

Leave a Reply

Your email address will not be published. Required fields are marked *

Related Posts

Popular Graph Topics

Archives

Have a Graph Question?

Reach out and connect with the Neo4j staff.
Stackoverflow
Contact Us