GraphGists

Wikigame logo

The Wikipedia Game, or any of its variants like "Wikirace", "Wikispeedia", "Wiki-Link Game", or the allegedly original "Clicks to Hitler", is a game played on Wikipedia. It is mostly enjoyed by students in boring lectures all over the world and everyone who likes to have a little bit of fun with the magnificent chaotic beauty that is Wikipedia.

In this gist, I propose a concept to use a Neo4j server as an engine for this game. To actually build this, you’ll also need to provide a frontend that can provide some game screens, the wikipedia pages and execute cypher queries, described in the following parts.

Game rules

Players start on a randomly selected Wikipedia page and must navigate to another given page (home) only by clicking links in the current article. The goal is to arrive at the home with the fewest number of clicks (steps) and in the least amount of time.

Based on the number of steps and the time, points will be awarded to the player. A possible formula for that could look like this:

wikiPoints

Thus a possible result of a played round would look like this:

wikiplayer

OPTIMALBOT is given for comparison and shows the parameters for maximum points in this round. I also suggest to use some maximum time limit, just to keep the game fair.

Data model: A game powered by Neo4j

Links

As shown in this image, the data model needed for the game is most simplistic. To find the minimal path between two article pages, we just need to consider pages and links between them. We don’t need any more information about the pages, other than the title to identify the pages.

Luckily, Mirko Nasato provided the graphipedia project to generate a Neo4j database out of a Wikipedia dump file, exactly fitting the requested data model.

Import the graph

Generated from a Wikipedia dump with graphipedia, this is an example (sub)graph of the English Wikipedia. In order to keep this example at a manageable size (for a gist), I chose to limit the amount of nodes and relationships in the example.

Note that the Wikipedia dump needs to correspond to the version of Wikipedia you want to play on. Otherwise there could be missing or additional links/pages on the live version that your system doesn’t know about. If you choose to play on an older version of the dump, you might need to host that version yourself.

So, now that we have the graph, lets look at how we can use it:

Selecting Start and Home

First, we need two pages, selected at random: a start and a home. A simple query like the following will provide us with two pages.

MATCH (node:Page)
WITH node, rand() AS number
RETURN node.title as Pages
ORDER BY number
LIMIT 2

Calculating optimal paths

In order to check if a path between the both pages exists (if not, take two new random pages) and, more importantly, evaluate the performance of a player, we need to know the length of a shortest path from the random start page to home. With cypher, this is pretty easy. For example, lets find all shortest paths from "Angela Merkel" to "Apple Inc." on the given graph:

MATCH paths = allShortestPaths((a:Page { title:'Angela Merkel' })-[:Link*]->(b:Page { title:'Apple Inc.' }))
RETURN paths

As we see, there are two equally minimal paths with a lengh of 4 steps. Finding any of them would be awarded with the maximum number of points for steps (50). Thus, we don’t really need all paths. Any shortest path is enough to compare against player results.

So let us see a minimal path from "Doctor Who" to "Microsoft" and for convenience, print out the number of steps this path requires:

MATCH path = shortestPath((a:Page { title:'Doctor Who' })-[:Link*]->(b:Page { title:'Microsoft' }))
RETURN path, length(path) as Steps

Adding Rules

Adding additional rules is quite simple too. For example, if we want to add a rule like "Do NOT use countries" there are two ways. The first is to enrich the dataset, so that nodes that represent countries will get a label "Country". Those nodes can then either be removed from the databse or the queries need to be changed to ignore those country pages. The second is to remove them manually, which can be done fairly quickly for countries but might be a task you want to automate, if it’s things like cities you want to remove from the game.

Cheating detection

If we track the players steps, we can detect rule breaking attempts. Just match any path of the player against the graph and see if there truly is a path between the nodes. So, if a user tried to take an "illegal shortcut" like going directly from "ASP.NET" to ".NET", a query that checks his path like

OPTIONAL MATCH path = (a:Page { title:'ASP.NET' })-[:Link]->(b:Page { title:'.NET' })
RETURN
CASE
WHEN path is null THEN false
ELSE true
END AS ValidPath

will discover a cheating player, while valid steps like going from ".NET Framework" to "C++" can be verified:

OPTIONAL MATCH path = (a:Page { title:'.NET Framework' })-[:Link]->(b:Page { title:'C++' })
RETURN
CASE
WHEN path is null THEN false
ELSE true
END AS ValidPath

About

Created by Sascha Peukert

"Wikipedia-logo-v2", Author: version 1 by Nohat (concept by Paullusmagnus); Wikimedia. Licensed under CC BY-SA 3.0