What we’re going to be talking about today is an app I developed to map the relationships between Wikipedia entries:
My name is James Weaver, and I work for Pivotal. The following post is the demonstration of an app I created to showcase some of Pivotal’s curated technologies, such as the open source technologies Java Spring and Pivotal Cloud Foundry.
The open source app, WikiBrowser Service, is live on GitHub. There’s also a Getting Started presentation available. Please refer to the open source app for the examples provided later in the post.
If I had to provide a thumbnail of what is to follow, I would say that it’s the fusion of Wikipedia and Wikidata, Wikipedia’s sister project. Wikimedia is the umbrella organization, which has several projects in addition to Wikipedia. These include Wikimedia Commons, which has all the media assets, and Wikidata, which has all the structural information and is particularly interesting to graph geeks.
For this project, we wanted to be able to fuse the thousands of relationships between the different Wikipedia entries, all of which are available right behind the API. Below is an example graph of one such entry, which shows the universe with a
HAS_PARTrelationship, the observable universe as well as other relationships between Wikipedia articles that we can exploit.
The big idea is to have the ability to navigate Wikipedia articles not only through article links, but through the Wikidata structural relationships as well. The next step is to pin those links as we navigate them into a concept map:
WikiBrowser: A Powerful Learning Tool
This tool is designed to facilitate learning, for yourself and others. As a teacher, the WikiBrowser Service tool may be useful to help your students understand and explore a particular domain. As a researcher, it will provide valuable data and insights into your topic of interest.
The following is a video demonstration of how to create a concept map using the WikiBrowser Service:
The Concept Map Architecture
Let’s review the architecture of our concept map:
Neo4j – and some short URL services with Bitly.
The education services are in the same domain as the HTML 5 app, so there are no cross origin problems. We also have approximately 20 microservices on the front end that provide simplicity and location independence, and the different end points are actually the microservices themselves.
In the following example, each time I search for “earth,” the system will perform an article search by going to the Wikipedia API end point and looking up its Wikidata Q item:
When I pin the items to the graph, we use the graph end point – giving it a list of Q items that are in Wikidata – send that to the Neo4j end point and then render the results:
The Neo4j Cypher query then is a very simple:
It’s a match on item labels A and B where the
INthis list of items. And then
itemIDB is in the same list. Next we do an optional match and then return A, B and collect
REL, which returns all of the nodes and relationships for our concept map.
The following slide shows how to annotate Java with Spring annotations and set up a REST service by specifying the name of the end point, its parameters and then returning it:
The way to represent the items that go over the wire is in Plain Old Java Objects, or POJOs:
WikiBrowswer Use Cases
The following are a series of WikiBrowser use cases. In this example, we’re going to change the language of our searches and results from English to Chinese. For each Q item – article name – Wikidata has several language translations. Not only does it provide semantic structure; it also provides the language for the different labels:
Now let’s go through breadth-first search using bladder cancer research, which includes scientific information such as genes, chromosomes and other related data:
We’ll do a breadth-first search to one level on genetic association, which will pull up those results along with the associated genes. Next we’ll select the PCSA gene, find its chromosome and pin this in our browswer tool. Now that we have that chromosome, we’ll do another breadth-first search to infinite levels on “follows,” which shows us which chromosomes this particular chromosome follows and is followed by.
Another use case is for “items in common:”
Let’s use soccer as an example. We’ll pin Lionel Messi and Cristiano Ronaldo. Then we’re going to ask, “What do Cristiano Ronaldo and Lionel Messi have in common?” We select Cristiano Ronaldo, say “in common” and then click Lionel Messi, which will give us the shortest path – with no more than two hops. It will also show us additional information about both footballers, such as that they were both in the 2010 FIFA World Cup.
Below is the Neo4j Cypher query for finding shortest paths:
Our next use case is Navigate to Root. The Wikidata information is hierarchical in nature. We’re going to take Cristiano Ronaldo and do a root path, which will show us that Cristiano is human, a subclass of person, subclass of subject and subclass of entity.
Below is the Cypher query for Nagivate to Root. We’re doing “all shortest paths” from the item that we selected, up to entity. But we’re only going to get results that are subclass of, whole part, part of, and instance of:
Next we have Degrees of Separation, which we can demonstrate by looking at the footballers Eric Cantona and Harry Kane:
Using our open-source app, we are going to find an undirected shortest path between Harry Kane and Eric Cantona. What we will immediately find is that the two are related through teams, kind of like Kevin Bacon is related to other actors through movies. We see that Eric Cantona played for FC Barcelona with Gary Lineker, who played on another team with Harry Kane.
Since this presentation is in London, let’s also find the shortest path between members of British royalty: Prince Harry and Queen Victoria. We’ll start with Queen Victoria and do a shortest path forward through child Prince Harry. I took a detour in my search to see who followed George V, which includes sons Edward VII and George VI. George V actually abdicated his throne to marry an American, Wallace Simpson, who had been divorced twice.
I’m out of time, but I hope you enjoyed the demo. Let me know what you think of it!
Inspired by Jim’s talk? Click below to register for GraphConnect Europe on 11 May 2017 at the QEII Centre in London and attend even more presentations, talks and workshops from the world’s leading graph technology experts.
Register for GraphConnect
Register for GraphConnect
About the Author
James Weaver , Consultant Technologist, Pivotal
James Weaver is a Consultant Technologist at Pivotal with a passion for Java. He is a Java developer and author of several books on the subject, and can frequently be found speaking internationally at software technology conferences.