GraphGists

or 'How a sat-nav could use a graph database'

My talk on this gist is available to watch online at SkillsMatter https://bit.ly/neo4jrna

Inspiration Top

Navigation has nothing to do with my day job, but I regularly use satellite navigation. This gist has been inspired by Waze, a satellite navigation system, which allows users to edit the maps. It is easy to see a graph in the structure of the map, but how could it work?

I decided to create a gist that could model a map and possible ways in which the data stored could be queried to allow someone to navigate that map. The area to be mapped will be restricted for the purposes of simplicity and the capabilities of the gist processing.

To this end, this gist will create a representation of part of the M3 and M25 motorways (in the UK) and some of the connected towns.

Data Top

There are many sources of data on the internet for a range of gist applications. It is worth looking at various sources to see which fits your requirements the best, or combine different sources. It is possible to import data from sources such as csv files with various available tools. Rik Van Bruggen’s blog (https://blog.bruggen.com/) is a good source for tutorials on how to do this.

Sources Top

I originally started out using Wikipedia and Google Maps as my data sources, but then discovered that the editable Waze map had the distance information I needed, which I had previously been guestimating from Wikipedia and Google Maps combined.

With regard to speed data it seems that Waze do not store this, or do not make it available to users. I have used a combination of informed guesses (M roads are typically 3 lanes and 70mph) and Google Maps Streetview, to look at speed signs, and enter the data into my road relationships..

Waze may also calculate the speed by using users' covered distance and time, or they may not use it and just estimate time from past data collected from users.

Model Top

To begin with the model was fairly simple, with Locations and Junctions represented as nodes and the roads as relationships, but this evolved over time into Junctions being represented as a series of Exits and Entrances connected by roads.

It then evolved again to include locations also, with various properties including post code and house number. This is modelled in the diagram below, using only two entrances and exits (effectively only two roads).

But this was not good enough. It did not allow for slip roads or carriageways on motorways (where the driver effectively does not leave the motorway). Every entrance needed to be connected to every exit to allow for this. So the junction model again evolved into the following (properties and locations removed for clarity):

Graph Population Top

Click the + to see the full set-up Cypher script.

Graph Visualisation Top

This shows the results of the above set-up script. For more info on graph visualisation see (link to skillz matter)

Applications

This graph model allows queries to extract data, and also to calculate navigation.

Simple Navigation Top

I want to travel to TW16 5AD, what are the address options? Top

A user may enter 'tw16 5ad' or 'TW16 5AD' or 'tw165ad' or 'TW165AD' or anything like 'Tw16 5aD'. The app would need to take this into account. The easiest way would be to strip the input of spaces before submitting it in the query, and having the postcode stored without spaces in the database. Case could also be taken into account in the app (e.g. converting to lower if that is how it is stored in the database), but I have taken care of it in a regex here.

In an application the user could first be given a list of road names and then be prompted to choose the number.

MATCH (l:Location)
WHERE l.postcode =~ '(?i)tW16 5aD'
RETURN l.number AS number, l.street AS street, l.local AS local, l.postcode AS postcode
ORDER BY l.number

From which junctions can I get to Chertsey? Top

Note: If junction names below 10 were stored with a leading 0 (e.g. J09) I could order by j.name instead of number.

MATCH (j:Junction)-[r*2]-(l:Location)
WHERE l.local = 'Chertsey'
WITH j
order by j.number
RETURN collect(distinct j.name) as junctions

Complex Navigation Top

What is the shortest route from KT15 2QH to TW16 5AD? Top

This query calculates the shortest route between the two specified nodes. This is the shortest route based upon the number of traversals in the graph, and not upon the distance property on the relationships being traversed. This is the shortest route in the graph, not in reality. The next question deals with actual distance.

Because there is more than one possible start and end node I have used LENGTH(), ORDER BY and LIMIT to return just the shortest route (calculated by the number of traversals). I did this before the RETURN statement, so that I did not have to return the count, which I would have to have done if ordering by count after the RETURN statement.

MATCH p=shortestPath((a:Location)-[r*]->(b:Location))
WHERE a.postcode = 'KT15 2QH'
AND b.postcode = 'TW16 5AD'
WITH p,relationships(p) AS r, length(relationships(p)) AS count
ORDER BY count
LIMIT 1
RETURN p AS route

How far is it from No.49 KT15 2QH to No.125 TW16 5AD? Top

This query calculates the distance from the two specified nodes by adding the values in the distance properties on all the relationships in the shortest path. The shortest path is calculated by the number of traversals to get from one node to the other. This may not actually be the shortest path if it were calculated by adding the values of the distance properties on all the possible paths between these two nodes. This is possible to do, but not in the Neo4j Gist system. The following code would work in a normal Neo4j database.

MATCH (a:Location)-[roads*]->(b:Location)
WHERE a.postcode = 'KT15 2QH' AND a.number = 49
AND b.postcode = 'TW16 5AD' AND b.number = 125
WITH reduce(d=0, r in roads | d + r.km) as km
RETURN km, round(km) as rounded
ORDER BY km DESC
LIMIT 1
MATCH p=shortestPath((a:Location)-[r*]->(b:Location))
WHERE a.postcode = 'KT15 2QH' AND a.number = 49
AND b.postcode = 'TW16 5AD' AND b.number = 125
WITH p, length(relationships(p)) AS count
ORDER BY count
LIMIT 1
WITH relationships(p) AS roads
WITH reduce(d=0, r in roads | d + r.km) as km
RETURN km, round(km) as rounded

Note on decimals: Adding the floating point distance values gives us quite a few decimal places. This can be rounded in the application that implements this query. Cypher does provide round(), floor() and ceiling(), but these all round to an integer (no decimal places) and the number of decimal places cannot be specified.

What is the average speed limit between Chertsey and Sunbury-on-Thames? Top

This will calculate the average speed, in kph, based on the number of roads and their speeds. It does not take into account the length of those roads. Please see the following query for a weighted average speed.

MATCH p=shortestPath((a:Location)-[r*]-(b:Location))
WHERE a.local = 'Chertsey' AND b.local = 'Sunbury-on-Thames'
WITH relationships(p) AS roads
ORDER BY length(roads)
LIMIT 1
WITH roads, length(roads) AS l, reduce(d=0, r in roads | d + r.speed) as total
RETURN total / l AS average_kph

What is the weighted average speed? Top

All roads (relationships) are not the same length, we must therefore take that into account when calculating the average speed for this route. A weighted average speed takes into account the speed, distance and time.

MATCH p=shortestPath((a:Location)-[r*]-(b:Location))
WHERE a.local = 'Chertsey' AND b.local = 'Sunbury-on-Thames'
WITH relationships(p) AS roads
ORDER BY length(roads)
LIMIT 1
WITH reduce(d=0, r in roads | d + (r.speed*r.km)) as weighted, roads
WITH reduce(d=0, r in roads | d + (r.km)) as totalkm, weighted
WITH weighted / totalkm AS average_kph
RETURN average_kph, round(average_kph) AS rounded_average_kph, average_kph * 0.621371 as mph, round(average_kph) * 0.621371 as rounded_mph

How long would it take to drive from 48 Addlestone Moor London Road to 123 Staines Road? Top

This assumes that you drive at the maximum speed the whole time, but in an application this would need to be adjusted. This could possibly be done by assuming 60mph in a 70mph limit, or 25mph in a 30mph limit. Some sat-nav systems let the user define these numbers.

MATCH p=shortestPath((a:Location)-[r*]-(b:Location))
WHERE a.number = 48 AND a.street = 'Addlestone Moor'
AND b.number = 123 AND b.street = 'Staines Road'
WITH relationships(p) AS roads
WITH reduce(d=0.0, r in roads | d + ((r.km*0.62)/r.speed)) as t
WITH floor(t) AS hours, t
WITH (t-hours)*60 AS minutes, hours
RETURN round(hours) + ':' + (CASE WHEN minutes < 10 THEN '0' ELSE '' END) + round(minutes) as time

What about at peak time? Top

This takes into account the fact that most roads will be slower at peak times due to the number of cars on the road. The application would have peak times defined and so would know which query to run.

MATCH p=shortestPath((a:Location)-[r*]-(b:Location))
WHERE a.number = 48 AND a.street = 'Addlestone Moor'
AND b.number = 123 AND b.street = 'Staines Road'
WITH relationships(p) AS roads
WITH reduce(d=0.0, r in roads | d + ((r.km*0.62)/r.peak)) as t
WITH floor(t) AS hours, t
WITH (t-hours)*60 AS minutes, hours
RETURN round(hours) + ':' + (CASE WHEN minutes < 10 THEN '0' ELSE '' END) + round(minutes) as time

What if there is an accident or other delay? Top

This is off-peak, to change to peak time simply change r.speed to r.peak. The application would have peak times defined and so would know which query to run. Takes into account one-digit minutes. Explain what delay has been added and how this is taken into account. Link to junction delays in Future section if put there.

MATCH p=shortestPath((a:Location)-[r*]-(b:Location))
WHERE a.number = 48 AND a.street = 'Addlestone Moor'
AND b.number = 123 AND b.street = 'Staines Road'
WITH relationships(p) AS roads, nodes(p) AS junc
WITH junc, reduce(d=0.0, r in roads | d + (CASE WHEN has(r.delay)
											  THEN ((r.km*0.62)/r.delay)
											  ELSE ((r.km*0.62)/r.speed) END)) as t
WITH floor(t) AS hours, t
WITH (t-hours)*60 AS minutes, hours
RETURN round(hours) + ':' + (CASE WHEN minutes < 10
								  THEN '0'
								  ELSE '' END) + round(minutes) as revised_time

Console

Click the green play button on query one to set up the database for the console, and have a go with your own queries. Back to top