Spatial Procedures

The spatial procedures enable geographic capabilities on your data, and complement the spatial functions that come with Neo4j. More extensive Spatial functionality can be found in the Neo4j Spatial Library.

Qualified Name Type

apoc.spatial.geocode
apoc.spatial.geocode(location STRING, maxResults INTEGER, quotaException BOOLEAN, config MAP<STRING, ANY>)) - returns the geographic location (latitude, longitude, and description) of the given address using a geocoding service (default: OpenStreetMap).

Procedure

apoc.spatial.reverseGeocode
apoc.spatial.reverseGeocode(latitude FLOAT, longitude FLOAT, quotaException BOOLEAN, config MAP<STRING, ANY>) - returns a textual address from the given geographic location (latitude, longitude) using a geocoding service (default: OpenStreetMap). This procedure returns at most one result.

Procedure

apoc.spatial.sortByDistance
apoc.spatial.sortByDistance(paths LIST<PATH>) - sorts the given collection of PATH values by the sum of their distance based on the latitude/longitude values in the NODE values.

Procedure

Geocode

The geocode procedure converts a textual address into a location containing latitude, longitude and description. Despite being only a single function, together with the built-in functions point and distance we can achieve quite powerful results.

First, how can we use the procedure:

CALL apoc.spatial.geocodeOnce('21 rue Paul Bellamy 44000 NANTES FRANCE')
YIELD location
RETURN location.latitude, location.longitude
Table 1. Results
location.latitude location.longitude

47.2221667

-1.5566625

There are three forms of the procedure:

  • geocodeOnce(address) returns zero or one result.

  • geocode(address,maxResults) returns zero, one or more up to maxResults.

  • reverseGeocode(latitude,longitude) returns zero or one result.

This is because the backing geocoding service (OSM, Google, OpenCage or other) might return multiple results for the same query. GeocodeOnce() is designed to return the first, or highest ranking result.

The third procedure reverseGeocode will convert a location containing latitude and longitude into a textual address.

CALL apoc.spatial.reverseGeocode(47.2221667,-1.5566625) YIELD location
RETURN location.description;
Table 2. Results
location.description

"21, Rue Paul Bellamy, Talensac - Pont Morand, Hauts-Pavés - Saint-Félix, Nantes, Loire-Atlantique, Pays de la Loire, France métropolitaine, 44000, France"

Configuring Geocode

There are a few options that can be set in the apoc.conf file or via $config parameter (see below the Configure via config parameter map section) to control the service.

In the apoc.conf we can pass:

  • apoc.spatial.geocode.provider=osm (osm, google, opencage, etc.)

  • apoc.spatial.geocode.osm.throttle=5000 (ms to delay between queries to not overload OSM servers)

  • apoc.spatial.geocode.google.throttle=1 (ms to delay between queries to not overload Google servers)

  • apoc.spatial.geocode.google.key=xxxx (API key for google geocode access)

  • apoc.spatial.geocode.google.client=xxxx (client code for google geocode access)

  • apoc.spatial.geocode.google.signature=xxxx (client signature for google geocode access)

For Google, you should use either a key or a combination of client and signature. Read more about this on the google page for geocode access at https://developers.google.com/maps/documentation/geocoding/get-api-key#key

Configuring Custom Geocode Provider

Geocode

For any provider that is not 'osm' or 'google' you get a configurable supplier that requires two additional settings, 'url' and 'key'. The 'url' must contain the two words 'PLACE' and 'KEY'. The 'KEY' will be replaced with the key you get from the provider when you register for the service. The 'PLACE' will be replaced with the address to geocode when the procedure is called.

Reverse Geocode

The 'url' must contain the three words 'LAT', 'LNG' and 'KEY'. The 'LAT' will be replaced with the latitude and 'LNG' will be replaced with the the longitude to reverse geocode when the procedure is called.

For example, to get the service working with OpenCage, perform the following steps:

apoc.spatial.geocode.provider=opencage
apoc.spatial.geocode.opencage.key=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
apoc.spatial.geocode.opencage.url=http://api.opencagedata.com/geocode/v1/json?q=PLACE&key=KEY
apoc.spatial.geocode.opencage.reverse.url=http://api.opencagedata.com/geocode/v1/json?q=LAT+LNG&key=KEY
  • make sure that the 'XXXXXXX' part above is replaced with your actual key

  • Restart the Neo4j server and then test the geocode procedures to see that they work

Configure via config parameter map

Alternatively, we can pass a config map.
Note that these configs take precedence over the apoc.conf settings.
We can pass a provider key, which will be equivalent to apoc.spatial.geocode.provider setting key, and the other keys will be equivalent to apoc.spatial.geocode.<PROVIDER>.<KEY> settings.

For example:

CALL apoc.spatial.geocodeOnce('<MY_PLACE>', {
  provider: 'opencage',
  url: 'http://api.opencagedata.com/geocode/v1/json?q=PLACE&key=KEY',
  reverseUrl: 'http://api.opencagedata.com/geocode/v1/json?q=LAT+LNG&key=KEY',
  key: 'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX'
})

is equivalent to these (note that we transform UpperCamelCase keys in dot.case, e.g from reverseUrl to reverse.url):

apoc.spatial.geocode.provider=opencage
apoc.spatial.geocode.opencage.key=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
apoc.spatial.geocode.opencage.url=http://api.opencagedata.com/geocode/v1/json?q=PLACE&key=KEY
apoc.spatial.geocode.opencage.reverse.url=http://api.opencagedata.com/geocode/v1/json?q=LAT+LNG&key=KEY

If we don’t pass the provider via config map, the setting apoc.spatial.geocode.provider will be choose, otherwise the default 'osm'. For example:

/* apoc.conf
  ...
  apoc.spatial.geocode.provider=google
  ...
*/
CALL apoc.spatial.geocodeOnce('<MY_PLACE>', {key: 'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX'})

will pass a config like apoc.spatial.geocode.google.key=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX.

Using Geocode within a bigger Cypher query

A more complex, or useful, example which geocodes addresses found in properties of nodes:

MATCH (a:Place)
WHERE a.address IS NOT NULL
CALL apoc.spatial.geocodeOnce(a.address) YIELD location
RETURN location.latitude AS latitude, location.longitude AS longitude, location.description AS description

Calculating distance between locations

If we wish to calculate the distance between addresses, we need to use the point() function to convert latitude and longitude to Cyper Point types, and then use the point.distance() function to calculate the distance:

WITH point({latitude: 48.8582532, longitude: 2.294287}) AS eiffel
MATCH (a:Place)
WHERE a.address IS NOT NULL
CALL apoc.spatial.geocodeOnce(a.address) YIELD location
WITH location, point.distance(point(location), eiffel) AS distance
WHERE distance < 5000
RETURN location.description AS description, distance
ORDER BY distance
LIMIT 100

sortByDistance

The second procedure enables you to sort a given collection of PATH values by the sum of their distance based on lat/long properties on the nodes.

Sample data :

CREATE (bruges:City {name:"bruges", latitude: 51.2605829, longitude: 3.0817189})
CREATE (brussels:City {name:"brussels", latitude: 50.854954, longitude: 4.3051786})
CREATE (paris:City {name:"paris", latitude: 48.8588376, longitude: 2.2773455})
CREATE (dresden:City {name:"dresden", latitude: 51.0767496, longitude: 13.6321595})
MERGE (bruges)-[:NEXT]->(brussels)
MERGE (brussels)-[:NEXT]->(dresden)
MERGE (brussels)-[:NEXT]->(paris)
MERGE (bruges)-[:NEXT]->(paris)
MERGE (paris)-[:NEXT]->(dresden)

Finding paths and sort them by distance

MATCH (a:City {name:'bruges'}), (b:City {name:'dresden'})
MATCH p=(a)-[*]->(b)
WITH collect(p) as paths
CALL apoc.spatial.sortByDistance(paths) YIELD path, distance
RETURN path, distance

Graph Refactoring

In order not to have to repeatedly geocode the same thing in multiple queries, especially if the database will be used by many people, it might be a good idea to persist the results in the database so that subsequent calls can use the saved results.

Geocode and persist the result

MATCH (a:Place)
WHERE a.address IS NOT NULL AND a.latitude IS NULL
WITH a LIMIT 1000
CALL apoc.spatial.geocodeOnce(a.address) YIELD location
SET a.latitude = location.latitude
SET a.longitude = location.longitude

Note that the above command only geocodes the first 1000 ‘Place’ nodes that have not already been geocoded. This query can be run multiple times until all places are geocoded. Why would we want to do this? Two good reasons:

  • The geocoding service is a public service that can throttle or denylist sites that hit the service too heavily, so controlling how much we do is useful.

  • The transaction is updating the database, and it is wise not to update the database with too many things in the same transaction, to avoid using up too much memory. This trick will keep the memory usage very low.

Now make use of the results in distance queries

WITH point({latitude: 48.8582532, longitude: 2.294287}) AS eiffel
MATCH (a:Place)
WHERE a.latitude IS NOT NULL AND a.longitude IS NOT NULL
WITH a, point.distance(point(a), eiffel) AS distance
WHERE distance < 5000
RETURN a.name, distance
ORDER BY distance
LIMIT 100

Combining spatial and date-time functions can allow for more complex queries:

WITH point({latitude: 48.8582532, longitude: 2.294287}) AS eiffel
MATCH (e:Event)
WHERE e.address IS NOT NULL AND e.datetime IS NOT NULL
CALL apoc.spatial.geocodeOnce(e.address) YIELD location
WITH e, location,
distance(point(location), eiffel) AS distance,
            (apoc.date.parse('2016-06-01 00:00:00','h') - apoc.date.parse(e.datetime,'h'))/24.0 AS days_before_due
WHERE distance < 5000 AND days_before_due < 14 AND apoc.date.parse(e.datetime,'h') < apoc.date.parse('2016-06-01 00:00:00','h')
RETURN e.name AS event, e.datetime AS date,
location.description AS description, distance
ORDER BY distance