Exploring Neo4j Spatial: Layer Management, Spatial Filtering, and Intersection

Product Manager, Neo4j
6 min read

Introduction
The common division of responsibility between graph databases and GIS platforms limits the ability to exploit the full power of graph databases and graph query languages like Cypher to effectively explore relationships that involve both geospatial and non-geospatial relationships.
This series of blogs covers:
- Installation, data loading, and simple querying
- Layer management, spatial filtering, and intersection
- Path intersections using Automatic identification system (AIS) data
- Custom procedures for spatial analysis
In the first part of this series, we looked at Neo4j Spatial’s index layer data model, and looked at how to load polygon data into a spatial layer and run simple spatial queries. Here, we’ll look at layer management and begin performing basic geospatial operations in the database.
Working With Multiple Layers
If you’re familiar with GIS tools like ArcGIS or QGIS, you can think of the Neo4j Spatial plugin layer in much the same way as a GIS layer. You would create a new layer for each feature type or theme, and each layer may contain only one geometry type.
When using a GIS, a typical workflow would be to query one layer — for example, generating a filtered set of features — and save the results into a new layer for further processing. I was keen to see how easily I could replicate that workflow here. (As before, the sample data for the examples is available for download, and because this is a multi-part ZIP archive, you’ll need a tool to extract the contents.)
In this example, my aim is to create a new layer called cities
, updating it with any admin areas that contain the substring City
.
CALL spatial.addLayer("cities", "wkt", "", "") YIELD node
CALL spatial.layer("admin_areas") YIELD node AS layerNode
MATCH (layerNode)-[:RTREE_ROOT]->()-[*]-()-[:RTREE_REFERENCE]->(spatialNode)
WHERE spatialNode.Name CONTAINS "City"
CALL spatial.addNode("cities", spatialNode) YIELD node AS addedNode
RETURN addedNode
This illustrates a useful technique when working with spatial layers. We can easily get the root layerNode
for a layer by using CALL spatial.layer
. To get to the actual feature data (the geometry and properties) stored in that layer, we can traverse the graph from that layerNode
to the spatialNodes
.
In the first part, we briefly looked at the Neo4j Spatial’s R-tree index model. For a small spatial layer, the path to the leaf
nodes will be short and will look like the following image.
However, for a larger spatial layer with more records, the path will grow as the index fills up, requiring more child nodes to manage the records. The R-tree will adaptively expand as more spatial data is added to the layer. We don’t know how many child nodes we have to traverse to finally reach the spatialNodes
. So for larger layers, the path could look more like the following image.
When we don’t know how deep our index is, we can match the path from the layerNode
to the spatialNodes
using a variable-length pattern like this:
MATCH (layerNode)-[:RTREE_ROOT]->()-[*]-()-[:RTREE_REFERENCE]->(spatialNode)
Next, I’ll update my query so that, again, I’m creating a new layer to hold my results, populating it with any administrative areas that contain the substring City
, then filtering those City
features to include only those that fall within a specified bounding rectangle. I’ll use the plugin’s spatial.bbox
procedure to do this:
// Create a new layer to hold our results
CALL spatial.addLayer("cities_2", "wkt", "", "") YIELD node
// Find admin_areas that contain the substring 'City' in their name property
CALL spatial.layer("admin_areas") YIELD node AS layerNode
MATCH (layerNode)-[*]->(spatialNode)
WHERE spatialNode.Name CONTAINS "City"
WITH spatialNode
// Add the results to our new layer
CALL spatial.addNode(cities_2, spatialNode) YIELD node AS addedNode
// Find the city nodes in a simple bounding box
CALL spatial.bbox(cities_2, {lon: -2, lat: 51}, {lon: 1, lat: 53}) YIELD node AS resultNodes
RETURN resultNodes.Name
In this example, the WHERE
clause filters on the spatialNode.Name
property. Since I know that only my admin_areas
data has a Name
property, I’ve simplified the preceding MATCH
statement further, using a simpler path pattern: (layerNode)-[*]->(spatialNode)
. Here are the results:
"City of Peterborough (B)"
"City of Westminster London Boro"
"City and County of the City of London"
"City of Derby (B)"
"City of Leicester (B)"
"City of Peterborough (B)"
"City of Westminster London Boro"
"City and County of the City of London"
Here’s a similar example, this time testing whether our results fall inside a simple polygon. Note the formatting of the POLYGON
string: a space between the longitude and latitude coordinates, separated by a comma.
CALL spatial.addLayer("cities_3", "wkt", "", "") YIELD node
CALL spatial.layer("admin_areas") YIELD node AS layerNode
MATCH (layerNode)-[*]->(spatialNode)
WHERE spatialNode.Name CONTAINS "City"
WITH spatialNode
CALL spatial.addNode('cities_3', spatialNode) YIELD node AS addedNode
WITH 'POLYGON((-2 50, 1 50, 1 52, -2 52, -2 50))' AS polygon
CALL spatial.intersects('cities_3', polygon) YIELD node
RETURN DISTINCT node.Name AS name
Say you’re using the new Cypher 25, you could write your query like this. It’s up to you to decide which syntax you prefer.
// get the cities
CALL spatial.addLayer("cities_3", "wkt", "", "") YIELD node
CALL spatial.layer("admin_areas") YIELD node AS layerNode
MATCH (layerNode)-[*]->(spatialNode)
WHERE spatialNode.Name CONTAINS "City"
RETURN spatialNode
NEXT
// filter cities inside a polygon
CALL spatial.addNode('cities_3', spatialNode) YIELD node AS addedNode
LET polygon = 'POLYGON((-2 50, 1 50, 1 52, -2 52, -2 50))'
CALL spatial.intersects('new_layer', polygon) YIELD node
RETURN DISTINCT node.Name AS name
POLYGON
intersects fewer cities than the previous bounding box query, and here’s our smaller result set:
"City of Portsmouth (B)"
"City of Southampton (B)"
"The City of Brighton and Hove (B)"
"City of Westminster London Boro"
"City and County of the City of London"
While it’s useful to know how to locate spatial nodes in the model by using path patterns like this, using wildcard variable-length path [*] with no bounds or type restriction can result in huge traversals, especially if the spatial index structure is large (or if it becomes malformed for any reason).
For this reason, I recommend you always add labels to more easily and efficiently identify the spatial nodes in your layers:
CALL spatial.layer("counties") YIELD node AS layerNode
MATCH (layerNode)-[:RTREE_ROOT]->()-[*1..4]-()-[:RTREE_REFERENCE]->(spatialNode)
SET spatialNode:admin_area // add the label admin to spatial nodes
RETURN count(*) AS count
Having created labels, we can now retrieve our spatial nodes without needing to match on the spatial index path pattern:
MATCH (n:admin_area)
WHERE n CONTAINS "City"
RETURN n
As we see here, in a few lines of code, we can perform a common GIS-type operation in Neo4j Spatial: passing the results of spatial procedure into a new named layer.
Calculating Intersections
Next, I wanted to try another common GIS-type operation: calculating the spatial intersection of feature geometries stored on different layers — specifically a point-on-polygon intersection.
To try this, I added a new WKT (point geometry) layer, firestations
, using the same method as before, but this time remembering to add a firestation
label to our nodes to avoid having to traverse from the root node every time we want to identify them in a query:
:auto // this is required if you run CALL IN TRANSACTIONS queries using Browser
LOAD CSV WITH HEADERS FROM 'file:///firestations.csv' AS row
WITH row
WHERE row.WKT IS NOT NULL AND row.WKT <> '' // ✅ Skip empty WKT values
CALL {
WITH row
CALL spatial.addWKTs('firestations', [row.WKT]) YIELD node
SET node:firestation, node.name = row.name, node.id = toInteger(row.id)
RETURN node
} IN CONCURRENT TRANSACTIONS
RETURN node;
I want to find which admin area contains the Emsworth fire station.
I’ll filter the fire stations layer on that name, then carry out an intersection procedure to find out which admin area it falls inside:
MATCH (spatialNode:firestation) WHERE spatialNode.name CONTAINS "Emsworth"
WITH spatialNode
CALL spatial.intersects('admin_areas', spatialNode.geometry) YIELD node
MERGE (spatialNode)-[:IS_IN]->(node)
RETURN node
You’ll see in the query above that I used the query result (“Havant District B” as it turns out) to update the graph with new information, creating an IS_IN
relationship, so we can reuse this relationship without having to perform that spatial test again.
Summary
As these examples show, the Neo4j Spatial plugin allows us to perform a number of common GIS functions in the database, using the Cypher language: loading data, managing feature layers, filtering and spatially querying layers, and persisting derived knowledge back into the database.
In the next part, we’ll look at AIS data — a text format used to record vessel location data and metadata transmitted by shipping, then use that to try some line-on-polygon spatial processing.
Exploring Neo4j Spatial: Layer Management, Spatial Filtering, and Intersection was originally published in Neo4j Developer Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.