Finding the shortest path through the park (with graphs!)

Regional VP & Neo4j Advocate

August 19, 2013

7 min read

Finding the shortest path through the park (i.e. orienteering) via a graph algorithm

As you may know by now, I like beer. A lot. Why else would I keep writing and talking about it?

But there’s more to life than sweet beverages, and one of the things that I have been doing for as long as I can remember is orienteering. I have been practicing the sport in Belgium since 1984 – I was 11 years old. My dad used to take me to races all across the continent – we truly had a blast. And we still do: I still orienteer almost every week, and so does my dad. Now I take my 8 and 10-year old kids with me to the races, and their granddad cheers them on every step of the way. It’s a fantastic family sport.

One of the reasons why it is so fantastic is that orienteering is a “thinking sport.” You have to concentrate to navigate. You have to run to have the best time (it’s a race), but if you run too fast, you are sure to make navigation mistakes. You have to find the balance between physical and mental fitness – which is difficult, but completely awesome when you succeed. And it’s outdoors in the woods and fields, so what’s not to like?

XrXYFfsENq_5EpcOtyGN1UU7Z3RGhYdqfnWAjEafGjmmh0nCYdu8-LfImLcm7c3bjRIX8wJUnYaIpKFoAoMO3igUiVre3HKoYkqG5fHVkdmBfudfT7qlF7QbeA

So what does that have to do with Neo4j? Well, orienteering is all about finding the shortest path: the *fastest* route from start to finish. Fast can be short. Fast can also mean that it is better to take a detour: if it is easier to run the longer route, than to walk the shorter route, you are better off choosing the longer route.

In essence, every orienteering race is … a graph problem waiting to be solved in the middle of nature.

Orienteering: A green graph problem

In case you don’t know, orienteering races are a bit like an obstacle race. Every participant gets assigned a course, out there in the green forests and fields, and along that course are sequences of beacons that one needs to get to in a particular order. Such a sequence is … a path on a graph. You have to choose how to navigate from obstacle to obstacle, from node to node.

Essentially, the orienteer has to navigate and choose the fastest route. Finding the fastest route effectively this boils down to a weighted shortest path calculation. You calculate the shortest path using two criteria as your parameters:

distance: shorter = better
runnability: higher = better. Runnability can be affected by the type of terrain (running on a road? Through a field? Through a forest? Through a forest with soil covered with plants? Over a hill? Through a valley? …)

For every leg of the race, you estimate the presumable “best route” based on the assumption that distance / runnability will be the best indicator for your likely speed.

A two control race in Antwerp, Belgium

This is the map of a training race that I did with my kids in a beautiful Antwerp park.

O5DDYqOsZzO5LviLCefHFQsTRTP6MdGxtJY1eZ8QMiJVwSVNCuv7a6cMdE7NJa4QqiKkQBF-d_JVrpHAv3q_DTTJI-GbDhiQ1hcrMA48GLlMxRloGXUQMA_9nA

As you can see, the race assignment itself is a graph.

BR6XWd4ZbbfPu9Bm2c-IFYWdhzACwwxLJOS3ZpAR7gmUSZE6ldzwwlcp4GnR9YR2cwdNT6AuXiUESf_B5YQOy4BEDYgpEKtBRCMCbkBOwc9Q9GpriAklzO9pqg

3HUrk5oyc0wp8y8ZLnE5CzBDnf5icOeoDdBAACEK36BL09oOydbh6h9cgzAHHre9A3MUBHo1dOJXRsDyjaXxBDdgEK1mLdr6C7uqmfIx19D4gPFQboIbHwuY-A

If we then look at every leg separately, you can see that for every leg, there are a number of route options.


The red route is the safe choice – running along the roads – but takes quite a detour. The blue route cuts straight across the field – but then requires me to go straight through the forest for a short distance.	The red route is the shortest – but requires me to run straight through the forest. The blue route just races along the forest road.	The red route just goes straight to the finish line. The blue route cuts through the forest and then follows the road. The green route safely hurries along the roads.

So there are three controls and different routes with different characteristics. As you can see in the schematic representations, every route has different “waypoints” – specific points of interest that I can identify on the map, and recognize in the “field.” These waypoints are extremely important for the navigation exercise that we are doing because they allow us to break the problem up in smaller pieces and evaluate our options.

Intuitively, all of us will have a “feeling” about what would be the best route choice, but now let’s use graph algorithms to solve this problem for us!

Graph database model to navigate

In order to apply a graph algorithm, we first need to create a graph. These are my nodes:

Control nodes: the race beacons that I have to pass by
The alternative route choices, decomposed in waypoints.

4c3nh8WwVw9Xrd1VWJJyB6i8OuiJYjTzSNyY3LJ2VIRV2S4r4a-wPSJYvfuw6JgAt3fjnTUZgtigsD8DC96N_UHRqjmbfwSkuJ9tEdV0n80Hs3KPFXrUJZOpIw

Then, let’s create the relationships between these nodes. We will have COURSE_TO relationships between controls, and NAVIGATE_TO relationships between waypoints. Effectively, these will become “paths” on my graph, hopping from node to node along the relationships.

From the start to control 1, I have 0->0.11->0.12->0.13->0.14->0.15 as one route and 0->0.21->0.22->1 as another route
From control 1 to control 2, I have 1->1.11->1.12->1.13->2 as one route option, and 1->0.15->1.21->2 as another option.
From control 2 to the finish, I have three options: 2->2.11->3 and 2->2.21->2.22->3 and 2->1.21->2.31->2.32->3.

WjMFfEPfivN5OvnGV1C_iqt3HwRi7rDY_e4QvSwNdeIu4T-TGBWMOY5rrXFcqmCpFNGJO_y81r30wgNJFvFwarUg4P1h2VLcnJ97Me4p0lBNG2PHxS-mFlv6lA

As you can see, I have immediately added distance (in meters) and runnability (in %) properties to my relationships.

When I then generate a Neo4j graph database using the spreadsheet method, I get a nice little database – ready to be queried and ready for my graph algorithms.

a8lTQ9m4XqyM0w4YyEi0mnzMynQbwDwC_iF33BwISvwbsLJYTm1PE5hCwZz1YT6tRlRzDi0Te1T0bdT3QwDQw5OuTIIwzaItWymsJtXMhb7QomeRgCTSPx0wRg

Graphs algorithms to win the race!

In order to calculate the best route to win the race, I need to calculate the shortest path across the graph – which is standard functionality of Neo4j. But because there’s more to it than running the course in straight lines between controls, I need to incorporate weights (distance, runnability) to get a realistic estimate of what would be the best route choice. To do so I am using a technique so well demonstrated by Ian Robinson on his blog last June.

Let’s go through two versions of this calculation:

Find the shortest path by distance only:

To do this, we will be using REDUCE to sum the distance properties.

START  startNode=node:node_auto_index(name="Start"),
             endNode=node:node_auto_index(name="Finish")
MATCH  p=(startNode)-[:NAVIGATE_TO*]->(endNode)
RETURN p AS shortestPath,
                reduce(distance=0, r in relationships(p) :  distance+r.distance) AS totalDistance
                ORDER BY totalDistance ASC
                LIMIT 1;

l8JRBbJ1bSoSBS0SncF6wOKZ3Vb1T6t5oRqRfe2QYVOnPas9XqgR00NTmaVDgcGNPRKuNFBG6qkIV_I_P8XPwoJLl32GMCqmaVxsr9zeoMqVpbbIg5GTa-kVPg

Find the best estimate of the fastest path, as a function of distance/runnability. In a real race this would probably be the route that I would choose – as it would give me the best chance of winning the race.

To do this, we will be using REDUCE to sum the distance divided by the runnability: longer distance with superior runnability, is possibly faster than shorter distance with lower runnability:

START   startNode=node:node_auto_index(name="Start"),
                 endNode=node:node_auto_index(name="Finish")
MATCH    p=(startNode)-[:NAVIGATE_TO*]->(endNode)
RETURN p AS shortestPath, 
                reduce(EstimatedTime=0, r in relationships(p) : EstimatedTime+( r.distance/r.runnability)) AS TotalEstimatedTime
               ORDER BY TotalEstimatedTime ASC
               LIMIT 1;

HB9VpUZcBq5dynaxQ22tZo1lXU3Pr01C4Ej0q-awYnaoFBg1i7O7guKEJhFaFoKXRpSAnhbnWERhfUEPPkg3DO4JOGVr5E2K_8-st5w6MW6l6_E4Xofwm9R6xQ

As you can see, the first and second queries take the same (shortest) path for start to control 1 and from control 2 to finish, but the algorithm recommends a clearly different path from control 1 to control 2 (following the forest road instead of cutting through the forest).