Spatial

The spatial procedures enable geographic capabilities on your data, and complement the spatial functions that come with Neo4j. More extensive Spatial functionality can be found in the Neo4j Spatial Library.

CALL apoc.spatial.geocode('address') YIELD location, latitude, longitude, description, osmData

look up geographic location of location from a geocoding service (the default one is OpenStreetMap)

CALL apoc.spatial.reverseGeocode(latitude,longitude) YIELD location, latitude, longitude, description

look up address from latitude and longitude from a geocoding service (the default one is OpenStreetMap)

CALL apoc.spatial.sortPathsByDistance(Collection<Path>) YIELD path, distance

sort a given collection of paths by geographic distance based on lat/long properties on the path nodes

Geocode

The geocode procedure converts a textual address into a location containing latitude, longitude and description. Despite being only a single function, together with the built-in functions point and distance we can achieve quite powerful results.

First, how can we use the procedure:

CALL apoc.spatial.geocodeOnce('21 rue Paul Bellamy 44000 NANTES FRANCE')
YIELD location
RETURN location.latitude, location.longitude
Table 1. Results
location.latitude location.longitude

47.2221667

-1.5566625

There are three forms of the procedure:

  • geocodeOnce(address) returns zero or one result.

  • geocode(address,maxResults) returns zero, one or more up to maxResults.

  • reverseGeocode(latitude,longitude) returns zero or one result.

This is because the backing geocoding service (OSM, Google, OpenCage or other) might return multiple results for the same query. GeocodeOnce() is designed to return the first, or highest ranking result.

The third procedure reverseGeocode will convert a location containing latitude and longitude into a textual address.

CALL apoc.spatial.reverseGeocode(47.2221667,-1.5566625) YIELD location
RETURN location.description;
Table 2. Results
location.description

"21, Rue Paul Bellamy, Talensac - Pont Morand, Hauts-Pavés - Saint-Félix, Nantes, Loire-Atlantique, Pays de la Loire, France métropolitaine, 44000, France"

Configuring Geocode

There are a few options that can be set in the apoc.conf file to control the service:

  • apoc.spatial.geocode.provider=osm (osm, google, opencage, etc.)

  • apoc.spatial.geocode.osm.throttle=5000 (ms to delay between queries to not overload OSM servers)

  • apoc.spatial.geocode.google.throttle=1 (ms to delay between queries to not overload Google servers)

  • apoc.spatial.geocode.google.key=xxxx (API key for google geocode access)

  • apoc.spatial.geocode.google.client=xxxx (client code for google geocode access)

  • apoc.spatial.geocode.google.signature=xxxx (client signature for google geocode access)

For google, you should use either a key or a combination of client and signature. Read more about this on the google page for geocode access at https://developers.google.com/maps/documentation/geocoding/get-api-key#key

Configuring Custom Geocode Provider

Geocode

For any provider that is not 'osm' or 'google' you get a configurable supplier that requires two additional settings, 'url' and 'key'. The 'url' must contain the two words 'PLACE' and 'KEY'. The 'KEY' will be replaced with the key you get from the provider when you register for the service. The 'PLACE' will be replaced with the address to geocode when the procedure is called.

Reverse Geocode

The 'url' must contain the three words 'LAT', 'LNG' and 'KEY'. The 'LAT' will be replaced with the latitude and 'LNG' will be replaced with the the longitude to reverse geocode when the procedure is called.

For example, to get the service working with OpenCage, perform the following steps:

apoc.spatial.geocode.provider=opencage
apoc.spatial.geocode.opencage.key=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
apoc.spatial.geocode.opencage.url=http://api.opencagedata.com/geocode/v1/json?q=PLACE&key=KEY
apoc.spatial.geocode.opencage.reverse.url=http://api.opencagedata.com/geocode/v1/json?q=LAT+LNG&key=KEY
  • make sure that the 'XXXXXXX' part above is replaced with your actual key

  • Restart the Neo4j server and then test the geocode procedures to see that they work

  • If you are unsure if the provider is correctly configured try verify with:

CALL apoc.spatial.showConfig()

Using Geocode within a bigger Cypher query

A more complex, or useful, example which geocodes addresses found in properties of nodes:

MATCH (a:Place)
WHERE exists(a.address)
CALL apoc.spatial.geocodeOnce(a.address) YIELD location
RETURN location.latitude AS latitude, location.longitude AS longitude, location.description AS description

Calculating distance between locations

If we wish to calculate the distance between addresses, we need to use the point() function to convert latitude and longitude to Cyper Point types, and then use the distance() function to calculate the distance:

WITH point({latitude: 48.8582532, longitude: 2.294287}) AS eiffel
MATCH (a:Place)
WHERE exists(a.address)
CALL apoc.spatial.geocodeOnce(a.address) YIELD location
WITH location, distance(point(location), eiffel) AS distance
WHERE distance < 5000
RETURN location.description AS description, distance
ORDER BY distance
LIMIT 100

sortByDistance

The second procedure enables you to sort a given collection of paths by the sum of their distance based on lat/long properties on the nodes.

Sample data :

CREATE (bruges:City {name:"bruges", latitude: 51.2605829, longitude: 3.0817189})
CREATE (brussels:City {name:"brussels", latitude: 50.854954, longitude: 4.3051786})
CREATE (paris:City {name:"paris", latitude: 48.8588376, longitude: 2.2773455})
CREATE (dresden:City {name:"dresden", latitude: 51.0767496, longitude: 13.6321595})
MERGE (bruges)-[:NEXT]->(brussels)
MERGE (brussels)-[:NEXT]->(dresden)
MERGE (brussels)-[:NEXT]->(paris)
MERGE (bruges)-[:NEXT]->(paris)
MERGE (paris)-[:NEXT]->(dresden)

Finding paths and sort them by distance

MATCH (a:City {name:'bruges'}), (b:City {name:'dresden'})
MATCH p=(a)-[*]->(b)
WITH collect(p) as paths
CALL apoc.spatial.sortByDistance(paths) YIELD path, distance
RETURN path, distance

Graph Refactoring

In order not to have to repeatedly geocode the same thing in multiple queries, especially if the database will be used by many people, it might be a good idea to persist the results in the database so that subsequent calls can use the saved results.

Geocode and persist the result

MATCH (a:Place)
WHERE exists(a.address) AND NOT exists(a.latitude)
WITH a LIMIT 1000
CALL apoc.spatial.geocodeOnce(a.address) YIELD location
SET a.latitude = location.latitude
SET a.longitude = location.longitude

Note that the above command only geocodes the first 1000 ‘Place’ nodes that have not already been geocoded. This query can be run multiple times until all places are geocoded. Why would we want to do this? Two good reasons:

  • The geocoding service is a public service that can throttle or blacklist sites that hit the service too heavily, so controlling how much we do is useful.

  • The transaction is updating the database, and it is wise not to update the database with too many things in the same transaction, to avoid using up too much memory. This trick will keep the memory usage very low.

Now make use of the results in distance queries

WITH point({latitude: 48.8582532, longitude: 2.294287}) AS eiffel
MATCH (a:Place)
WHERE exists(a.latitude) AND exists(a.longitude)
WITH a, distance(point(a), eiffel) AS distance
WHERE distance < 5000
RETURN a.name, distance
ORDER BY distance
LIMIT 100

Combining spatial and date-time functions can allow for more complex queries:

WITH point({latitude: 48.8582532, longitude: 2.294287}) AS eiffel
MATCH (e:Event)
WHERE exists(e.address) AND exists(e.datetime)
CALL apoc.spatial.geocodeOnce(e.address) YIELD location
WITH e, location,
distance(point(location), eiffel) AS distance,
            (apoc.date.parse('2016-06-01 00:00:00','h') - apoc.date.parse(e.datetime,'h'))/24.0 AS days_before_due
WHERE distance < 5000 AND days_before_due < 14 AND apoc.date.parse(e.datetime,'h') < apoc.date.parse('2016-06-01 00:00:00','h')
RETURN e.name AS event, e.datetime AS date,
location.description AS description, distance
ORDER BY distance