Navigate All the Knowledge with Spring + Neo4j

Editor’s Note: This presentation was given by James Weaver at GraphConnect Europe in April 2016. Here’s a quick review of what he covered:

What we’re going to be talking about today is an app I developed to map the relationships between Wikipedia entries:

My name is James Weaver, and I work for Pivotal. The following post is the demonstration of an app I created to showcase some of Pivotal’s curated technologies, such as the open source technologies Java Spring and Pivotal Cloud Foundry.

The open source app, WikiBrowser Service, is live on GitHub. There’s also a Getting Started presentation available. Please refer to the open source app for the examples provided later in the post.

If I had to provide a thumbnail of what is to follow, I would say that it’s the fusion of Wikipedia and Wikidata, Wikipedia’s sister project. Wikimedia is the umbrella organization, which has several projects in addition to Wikipedia. These include Wikimedia Commons, which has all the media assets, and Wikidata, which has all the structural information and is particularly interesting to graph geeks.

For this project, we wanted to be able to fuse the thousands of relationships between the different Wikipedia entries, all of which are available right behind the API. Below is an example graph of one such entry, which shows the universe with a HAS_PART relationship, the observable universe as well as other relationships between Wikipedia articles that we can exploit.

The big idea is to have the ability to navigate Wikipedia articles not only through article links, but through the Wikidata structural relationships as well. The next step is to pin those links as we navigate them into a concept map:

Navigate wikipedia articles through Wikidata structural relationships

WikiBrowser: A Powerful Learning Tool

This tool is designed to facilitate learning, for yourself and others. As a teacher, the WikiBrowser Service tool may be useful to help your students understand and explore a particular domain. As a researcher, it will provide valuable data and insights into your topic of interest.

The following is a video demonstration of how to create a concept map using the WikiBrowser Service:

The Concept Map Architecture

Let’s review the architecture of our concept map:

Discover the architecture of the Wikipedia concept map
We have a single page HTML 5 app along with some microservices running in the cloud, which are grouped into four major categories. This includes Wikidata services, Wikipedia services, some graph DB services – which is proxied over to Neo4j – and some short URL services with Bitly.

The education services are in the same domain as the HTML 5 app, so there are no cross origin problems. We also have approximately 20 microservices on the front end that provide simplicity and location independence, and the different end points are actually the microservices themselves.

In the following example, each time I search for “earth,” the system will perform an article search by going to the Wikipedia API end point and looking up its Wikidata Q item:

See how the Wikipedia concept map architecture works

When I pin the items to the graph, we use the graph end point – giving it a list of Q items that are in Wikidata – send that to the Neo4j end point and then render the results:

Find out how to pin wikipedia entries to the Neo4j concept graph

The Neo4j Cypher query then is a very simple:

The Neo4j Cypher query that finds relationships between pinned items

It’s a match on item labels A and B where the itemID are IN this list of items. And then itemID B is in the same list. Next we do an optional match and then return A, B and collect REL, which returns all of the nodes and relationships for our concept map.

The following slide shows how to annotate Java with Spring annotations and set up a REST service by specifying the name of the end point, its parameters and then returning it:

Find out how to annotate Java with Spring to develop a REST server

The way to represent the items that go over the wire is in Plain Old Java Objects, or POJOs:

How to represent items that go over the wire in a REST service

WikiBrowswer Use Cases

The following are a series of WikiBrowser use cases. In this example, we’re going to change the language of our searches and results from English to Chinese. For each Q item – article name – Wikidata has several language translations. Not only does it provide semantic structure; it also provides the language for the different labels:

How to change languages using POJOs

Now let’s go through breadth-first search using bladder cancer research, which includes scientific information such as genes, chromosomes and other related data:

How to expand items related by a given property to a given depth

We’ll do a breadth-first search to one level on genetic association, which will pull up those results along with the associated genes. Next we’ll select the PCSA gene, find its chromosome and pin this in our browswer tool. Now that we have that chromosome, we’ll do another breadth-first search to infinite levels on “follows,” which shows us which chromosomes this particular chromosome follows and is followed by.

Another use case is for “items in common:”

How to find all shortest paths

Let’s use soccer as an example. We’ll pin Lionel Messi and Cristiano Ronaldo. Then we’re going to ask, “What do Cristiano Ronaldo and Lionel Messi have in common?” We select Cristiano Ronaldo, say “in common” and then click Lionel Messi, which will give us the shortest path – with no more than two hops. It will also show us additional information about both footballers, such as that they were both in the 2010 FIFA World Cup.

Below is the Neo4j Cypher query for finding shortest paths:

Neo4j Cypher query to find all shortest paths

Our next use case is Navigate to Root. The Wikidata information is hierarchical in nature. We’re going to take Cristiano Ronaldo and do a root path, which will show us that Cristiano is human, a subclass of person, subclass of subject and subclass of entity.

How to find the shortest path to entity using subclass of, instance of, and part of

Below is the Cypher query for Nagivate to Root. We’re doing “all shortest paths” from the item that we selected, up to entity. But we’re only going to get results that are subclass of, whole part, part of, and instance of:

The Neo4j Cypher query for navigating to a root

Next we have Degrees of Separation, which we can demonstrate by looking at the footballers Eric Cantona and Harry Kane:

How to find the degrees of separation using shortest path

Using our open-source app, we are going to find an undirected shortest path between Harry Kane and Eric Cantona. What we will immediately find is that the two are related through teams, kind of like Kevin Bacon is related to other actors through movies. We see that Eric Cantona played for FC Barcelona with Gary Lineker, who played on another team with Harry Kane.

Since this presentation is in London, let’s also find the shortest path between members of British royalty: Prince Harry and Queen Victoria. We’ll start with Queen Victoria and do a shortest path forward through child Prince Harry. I took a detour in my search to see who followed George V, which includes sons Edward VII and George VI. George V actually abdicated his throne to marry an American, Wallace Simpson, who had been divorced twice.

I’m out of time, but I hope you enjoyed the demo. Let me know what you think of it!

Inspired by Jim’s talk? Click below to register for GraphConnect Europe on 11 May 2017 at the QEII Centre in London and attend even more presentations, talks and workshops from the world’s leading graph technology experts.

Register for GraphConnect