[As community content, this post reflects the views and opinions of the particular author and does not necessarily reflect the official stance of Neo4j.]
Why Would You Want to Do This?
The Web Ontology Language (OWL) has been around for a while now and is used for a variety of semantic applications.
Ontologies are freely available and help developers to create models for real-world scenarios. They can be instantiated, combined and enriched using SWRL rules and a reasoner such as Hermit or Pellet.
The reasons for creating such a representation of data differ: Natural language processing, reusing data across domains or contextualisation are just some of many. The data obtained is then stored in a knowledge base, from which it can be retrieved using SPARQL queries.
But if it’s all already there, what’s the point of combining it with a graph database?
While SPARQL certainly has its strong points – like using different ontologies at the same time and the similarity to well-known SQL – it also has weaknesses.
Triple stores, which are the starting point for most SPARQL applications, consume a lot of disk space compared to relational databases. They are also slow for very large datasets.
On the other hand, Neo4j stores whole graphs as opposed to “just” triples. It has an easy-to-learn and easy-to-use query language and a web-based, graphical interface which allows users to easily browse and explore the graph.
Also, it is fast for querying and scales very well to handle larger datasets.
As is the usual case, there is no ideal solution, and everything really depends on your particular use case, considering, for example:
- The amount, frequency and connectedness of incoming data
- The importance of speed and size
- The type of query executed on the database
The Playground
There is the PROV-O ontology, which models causation and influence between activities, agents and entities. This concept is fairly abstract but useful to answer a number of questions related to the origin of entities (and what may have influenced them throughout their lifetime).
The PROV-O ontology is applicable to a number of fields. For example, social networking (“Who was the author of the blog post that influenced Peter to write his mashup?”) or experimenting (“Who was the last person to access the experiment before it failed and when did he or she access it?”).
PROV-O is used in the BonFIRE project, which is a multi-site cloud experimentation and testing facility.
There are people (agents) conducting experiments using resources (entities). At an infrastructure level, to perform their experiments, they create, use and destroy (activities) compute nodes, storages and virtual networks (entities). After their experiment has finished, they download the results (entities) from the virtual machine for further analysis.
These results are influenced by a large number of activities and agents, and often it is difficult to determine how such a result came to be, who was involved in its formation or why it is different from other results. But using provenance, these questions can be answered.
Preparations
In BonFIRE, the data arrives on a RabbitMQ as a set of JSON messages that look like this:
{"timestamp":1375801302,"eventType":"state.shutdown", "objectType":"compute","objectId":"/locations/server1/computes/123","groupId":"group1","userId":"bert"}
In this case, bert shut down compute node 123 located on server1. This message is filled into Java classes which are used to transform them (“manually”) into triples.
Using the single message from the above example, we derive several triples that would look something like this:
:Action_state.shutdown_1375801302 rdf:type :Action :Compute_/locations/server1/computes/123 rdf:type :Compute :Compute_/locations/server1/computes/123 prov:invalidatedBy :Action_state.shutdown_1375801302 :Experimenter_Bert rdf:type :Experimenter :Experimenter_Bert prov:wasAssociatedWith :Action_state.shutdown_1375801302 ...
The prefixes used are defined in the ontology into which these triples are going to be imported.
The above step is not necessary if the messages are supposed to go into the ontology directly – the OWL API could be used instead to create individuals, properties and so on.
Transforming them to triples, however, serves as an interface to be able to read data from all kinds of sources as long as it’s formatted as triples. If the OWL API was used instead, the code would have to be changed every time the data changes.
These triples can then be added to an ontology using the OWLRDFConsumer class from the OWL API. This adds the triples to the ontology where the reasoner can be invoked to enrich the data.
So far, that’s not really special. The interesting bit follows after the reasoning has taken place.
Getting Graphy
Now there is this ontology object sitting in the memory, which contains the ontology itself as well as the individuals that came from the triples. It could simply be stored in a knowledge base, but if it was, you wouldn’t be reading about it here 🙂
An ontology is a graph. It has a top node (owl:Thing) and classes extending it. There are individuals that belong to classes and object properties connecting the individuals. Individuals can have data properties and annotations that can be represented as node properties and relationship properties or as relationship types.
The import of an ontology is pretty straightforward:
Step 1
The only object you need is the ontology object created earlier.
It could also be loaded from a file, but that doesn’t make a difference.
private void importOntology(OWLOntology ontology) throws Exception { OWLReasoner reasoner = new Reasoner(ontology); if (!reasoner.isConsistent()) { logger.error("Ontology is inconsistent"); //throw your exception of choice here throw new Exception("Ontology is inconsistent"); } Transaction tx = db.beginTx(); try {
Step 2
Create a starting node in Neo4j representing the owl:Thing node. This is the root node of the graph we’re going to create.
Node thingNode = getOrCreateNodeWithUniqueFactory("owl:Thing");
Step 3
Get all the classes defined in the ontology and add them to the graph.
for (OWLClass c :ontology.getClassesInSignature(true)) { String classString = c.toString(); if (classString.contains("#")) { classString = classString.substring( classString.indexOf("#")+1,classString.lastIndexOf(">")); } Node classNode = getOrCreateNodeWithUniqueFactory(classString);
Step 4
Find out if they have any super classes. If they do, link them. If they don’t, link back to owl:Thing.
Make sure only to link to the direct super classes! The relationship type used to express the rdf:type property is a custom one named “isA”:
NodeSet<OWLClass> superclasses = reasoner.getSuperClasses(c, true); if (superclasses.isEmpty()) { classNode.createRelationshipTo(thingNode, DynamicRelationshipType.withName("isA")); } else { for (org.semanticweb.owlapi.reasoner.Node<OWLClass> parentOWLNode: superclasses) { OWLClassExpression parent = parentOWLNode.getRepresentativeElement(); String parentString = parent.toString(); if (parentString.contains("#")) { parentString = parentString.substring( parentString.indexOf("#")+1, parentString.lastIndexOf(">")); } Node parentNode = getOrCreateNodeWithUniqueFactory(parentString); classNode.createRelationshipTo(parentNode, DynamicRelationshipType.withName("isA")); } }
Step 5
Now for each class, get all the individuals. Create nodes and link them back to their parent class.
for (org.semanticweb.owlapi.reasoner.Node<OWLNamedIndividual> in : reasoner.getInstances(c, true)) { OWLNamedIndividual i = in.getRepresentativeElement(); String indString = i.toString(); if (indString.contains("#")) { indString = indString.substring( indString.indexOf("#")+1,indString.lastIndexOf(">")); } Node individualNode = getOrCreateNodeWithUniqueFactory(indString); individualNode.createRelationshipTo(classNode, DynamicRelationshipType.withName("isA"));
Step 6
For each individual, get all object properties and all data properties. Add them to the graph as node properties or relationships. Make sure to get all axioms, not just the asserted ones.
for (OWLObjectPropertyExpression objectProperty: ontology.getObjectPropertiesInSignature()) { for (org.semanticweb.owlapi.reasoner.Node<OWLNamedIndividual> object: reasoner.getObjectPropertyValues(i, objectProperty)) { String reltype = objectProperty.toString(); reltype = reltype.substring(reltype.indexOf("#")+1, reltype.lastIndexOf(">")); String s = object.getRepresentativeElement().toString(); s = s.substring(s.indexOf("#")+1, s.lastIndexOf(">")); Node objectNode = getOrCreateNodeWithUniqueFactory(s); individualNode.createRelationshipTo(objectNode, DynamicRelationshipType.withName(reltype)); } } for (OWLDataPropertyExpression dataProperty: ontology.getDataPropertiesInSignature()) { for (OWLLiteral object: reasoner.getDataPropertyValues( i, dataProperty.asOWLDataProperty())) { String reltype = dataProperty.asOWLDataProperty().toString(); reltype = reltype.substring(reltype.indexOf("#")+1, reltype.lastIndexOf(">")); String s = object.toString(); individualNode.setProperty(reltype, s); } } } } tx.success(); } finally { tx.finish(); } }
That’s it, you’re done! Now for the fun bit: querying the ontology!
Graphwalking
This is the graph now sitting in the database:
It has the ontology as well as all the individuals and properties, represented in their “natural” form. Now the querying can begin.
Whether it is a simple query to find out what happened to a specific VM (entity) during its lifecycle…
START e=node:name(name="experiment123"), ag=node:name(name="Agent") MATCH e-[r:hadActivity]->ac-->a-[:isA*]->ag RETURN distinct e.name as experiment, type(r) as relationship, a.name as agent ac.name as activity, ac.startedAtTime as starttime, ac.endedAtTime as endtime ORDER BY starttime
…or you want to do some more complicated pattern matching to find out how two experiments are different when they look the same at first glance – the only boundary is your imagination.
Conclusion
Protege comes with a simple visualisation and the possibility to execute SPARQL queries.
Neo4j has Cypher, which makes querying the imported ontology much more intuitive – ontologies are graphs after all. Also the webadmin interface allows better “exploring” of the graph.
Time is not an issue in this case, because the ontology import is not time-critical. It’s done only once after the experiment has finished and imports the whole ontology.
For an ontology containing several hours of experiment data, the import takes only a few seconds. Once the graph has been imported, querying is fast, which makes it a great tool to analyse and visualise ontologies. Want to learn more about graph databases? Click below to get your free copy of O’Reilly’s Graph Databases ebook and discover how to use graph technologies for your mission-critical application today.