Spatial Procedures
The spatial procedures enable geographic capabilities on your data, and complement the spatial functions that come with Neo4j. More extensive Spatial functionality can be found in the Neo4j Spatial Library.
Qualified Name | Type |
---|---|
apoc.spatial.geocode |
|
apoc.spatial.reverseGeocode |
|
apoc.spatial.sortByDistance |
|
Geocode
The geocode procedure converts a textual address into a location containing latitude, longitude and description. Despite being only a single function, together with the built-in functions point and distance we can achieve quite powerful results.
First, how can we use the procedure:
CALL apoc.spatial.geocodeOnce('21 rue Paul Bellamy 44000 NANTES FRANCE')
YIELD location
RETURN location.latitude, location.longitude
location.latitude | location.longitude |
---|---|
47.2221667 |
-1.5566625 |
There are three forms of the procedure:
-
geocodeOnce(address) returns zero or one result.
-
geocode(address,maxResults) returns zero, one or more up to maxResults.
-
reverseGeocode(latitude,longitude) returns zero or one result.
This is because the backing geocoding service (OSM, Google, OpenCage or other) might return multiple results for the same query. GeocodeOnce() is designed to return the first, or highest ranking result.
The third procedure reverseGeocode will convert a location containing latitude and longitude into a textual address.
CALL apoc.spatial.reverseGeocode(47.2221667,-1.5566625) YIELD location
RETURN location.description;
location.description |
---|
"21, Rue Paul Bellamy, Talensac - Pont Morand, Hauts-Pavés - Saint-Félix, Nantes, Loire-Atlantique, Pays de la Loire, France métropolitaine, 44000, France" |
Configuring Geocode
There are a few options that can be set in the apoc.conf file or via $config
parameter (see below the Configure via config parameter map
section) to control the service.
In the apoc.conf
we can pass:
-
apoc.spatial.geocode.provider=osm (osm, google, opencage, etc.)
-
apoc.spatial.geocode.osm.throttle=5000 (ms to delay between queries to not overload OSM servers)
-
apoc.spatial.geocode.google.throttle=1 (ms to delay between queries to not overload Google servers)
-
apoc.spatial.geocode.google.key=xxxx (API key for google geocode access)
-
apoc.spatial.geocode.google.client=xxxx (client code for google geocode access)
-
apoc.spatial.geocode.google.signature=xxxx (client signature for google geocode access)
For Google, you should use either a key or a combination of client and signature. Read more about this on the google page for geocode access at https://developers.google.com/maps/documentation/geocoding/get-api-key#key
Configuring Custom Geocode Provider
Geocode
For any provider that is not 'osm' or 'google' you get a configurable supplier that requires two additional settings, 'url' and 'key'. The 'url' must contain the two words 'PLACE' and 'KEY'. The 'KEY' will be replaced with the key you get from the provider when you register for the service. The 'PLACE' will be replaced with the address to geocode when the procedure is called.
Reverse Geocode
The 'url' must contain the three words 'LAT', 'LNG' and 'KEY'. The 'LAT' will be replaced with the latitude and 'LNG' will be replaced with the the longitude to reverse geocode when the procedure is called.
For example, to get the service working with OpenCage, perform the following steps:
-
Register your own application key at https://geocoder.opencagedata.com/
-
Once you have a key, add the following three lines to apoc.conf
apoc.spatial.geocode.provider=opencage apoc.spatial.geocode.opencage.key=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX apoc.spatial.geocode.opencage.url=http://api.opencagedata.com/geocode/v1/json?q=PLACE&key=KEY apoc.spatial.geocode.opencage.reverse.url=http://api.opencagedata.com/geocode/v1/json?q=LAT+LNG&key=KEY
-
make sure that the 'XXXXXXX' part above is replaced with your actual key
-
Restart the Neo4j server and then test the geocode procedures to see that they work
Configure via config parameter map
Alternatively, we can pass a config map.
Note that these configs take precedence over the apoc.conf
settings.
We can pass a provider key, which will be equivalent to apoc.spatial.geocode.provider
setting key,
and the other keys will be equivalent to apoc.spatial.geocode.<PROVIDER>.<KEY>
settings.
For example:
CALL apoc.spatial.geocodeOnce('<MY_PLACE>', {
provider: 'opencage',
url: 'http://api.opencagedata.com/geocode/v1/json?q=PLACE&key=KEY',
reverseUrl: 'http://api.opencagedata.com/geocode/v1/json?q=LAT+LNG&key=KEY',
key: 'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX'
})
is equivalent to these (note that we transform UpperCamelCase
keys in dot.case
, e.g from reverseUrl
to reverse.url
):
apoc.spatial.geocode.provider=opencage apoc.spatial.geocode.opencage.key=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX apoc.spatial.geocode.opencage.url=http://api.opencagedata.com/geocode/v1/json?q=PLACE&key=KEY apoc.spatial.geocode.opencage.reverse.url=http://api.opencagedata.com/geocode/v1/json?q=LAT+LNG&key=KEY
If we don’t pass the provider via config map, the setting apoc.spatial.geocode.provider
will be choose, otherwise the default 'osm'
. For example:
/* apoc.conf
...
apoc.spatial.geocode.provider=google
...
*/
CALL apoc.spatial.geocodeOnce('<MY_PLACE>', {key: 'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX'})
will pass a config like apoc.spatial.geocode.google.key=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
.
Using Geocode within a bigger Cypher query
A more complex, or useful, example which geocodes addresses found in properties of nodes:
MATCH (a:Place)
WHERE a.address IS NOT NULL
CALL apoc.spatial.geocodeOnce(a.address) YIELD location
RETURN location.latitude AS latitude, location.longitude AS longitude, location.description AS description
Calculating distance between locations
If we wish to calculate the distance between addresses, we need to use the point() function to convert latitude and longitude to Cyper Point types, and then use the point.distance() function to calculate the distance:
WITH point({latitude: 48.8582532, longitude: 2.294287}) AS eiffel
MATCH (a:Place)
WHERE a.address IS NOT NULL
CALL apoc.spatial.geocodeOnce(a.address) YIELD location
WITH location, point.distance(point(location), eiffel) AS distance
WHERE distance < 5000
RETURN location.description AS description, distance
ORDER BY distance
LIMIT 100
sortByDistance
The second procedure enables you to sort a given collection of PATH
values by the sum of their distance based on lat/long properties
on the nodes.
Sample data :
CREATE (bruges:City {name:"bruges", latitude: 51.2605829, longitude: 3.0817189})
CREATE (brussels:City {name:"brussels", latitude: 50.854954, longitude: 4.3051786})
CREATE (paris:City {name:"paris", latitude: 48.8588376, longitude: 2.2773455})
CREATE (dresden:City {name:"dresden", latitude: 51.0767496, longitude: 13.6321595})
MERGE (bruges)-[:NEXT]->(brussels)
MERGE (brussels)-[:NEXT]->(dresden)
MERGE (brussels)-[:NEXT]->(paris)
MERGE (bruges)-[:NEXT]->(paris)
MERGE (paris)-[:NEXT]->(dresden)
Finding paths and sort them by distance
MATCH (a:City {name:'bruges'}), (b:City {name:'dresden'})
MATCH p=(a)-[*]->(b)
WITH collect(p) as paths
CALL apoc.spatial.sortByDistance(paths) YIELD path, distance
RETURN path, distance
Graph Refactoring
In order not to have to repeatedly geocode the same thing in multiple queries, especially if the database will be used by many people, it might be a good idea to persist the results in the database so that subsequent calls can use the saved results.
Geocode and persist the result
MATCH (a:Place)
WHERE a.address IS NOT NULL AND a.latitude IS NULL
WITH a LIMIT 1000
CALL apoc.spatial.geocodeOnce(a.address) YIELD location
SET a.latitude = location.latitude
SET a.longitude = location.longitude
Note that the above command only geocodes the first 1000 ‘Place’ nodes that have not already been geocoded. This query can be run multiple times until all places are geocoded. Why would we want to do this? Two good reasons:
-
The geocoding service is a public service that can throttle or denylist sites that hit the service too heavily, so controlling how much we do is useful.
-
The transaction is updating the database, and it is wise not to update the database with too many things in the same transaction, to avoid using up too much memory. This trick will keep the memory usage very low.
Now make use of the results in distance queries
WITH point({latitude: 48.8582532, longitude: 2.294287}) AS eiffel
MATCH (a:Place)
WHERE a.latitude IS NOT NULL AND a.longitude IS NOT NULL
WITH a, point.distance(point(a), eiffel) AS distance
WHERE distance < 5000
RETURN a.name, distance
ORDER BY distance
LIMIT 100
Combined Space and Time search
Combining spatial and date-time functions can allow for more complex queries:
WITH point({latitude: 48.8582532, longitude: 2.294287}) AS eiffel
MATCH (e:Event)
WHERE e.address IS NOT NULL AND e.datetime IS NOT NULL
CALL apoc.spatial.geocodeOnce(e.address) YIELD location
WITH e, location,
distance(point(location), eiffel) AS distance,
(apoc.date.parse('2016-06-01 00:00:00','h') - apoc.date.parse(e.datetime,'h'))/24.0 AS days_before_due
WHERE distance < 5000 AND days_before_due < 14 AND apoc.date.parse(e.datetime,'h') < apoc.date.parse('2016-06-01 00:00:00','h')
RETURN e.name AS event, e.datetime AS date,
location.description AS description, distance
ORDER BY distance