Identify Bottlenecks and Model Disruptions
3.6 million Americans ride the New York City subway each day. That’s 1.3 million more people than who takes a flight each day in the United States. The New York City Subway is unique in the number of riders, stations, and the level of service it provides.
Using Graph Analytics for Snowflake, we can easily model what would happen if a station was fully closed for repairs. Insights like this can apply not just to transport systems, but to supply chains, manufacturing processes and much more.
Prerequisites
-
The Native App Neo4j Graph Analytics for Snowflake
What You Will Need
-
A Snowflake account with appropriate access to databases and schemas.
-
Neo4j Graph Analytics application installed from the Snowflake marketplace. Access the marketplace via the menu bar on the left hand side of your screen, as seen below:

What You Will Learn
-
How to prepare and project your data for graph analytics
-
How to pathfinding to model subway disruptions
-
How to read and write directly from and to your snowflake tables
Loading The Data
Dataset overview : This dataset is a digital twin of the NYC Subway!
Let’s name our database MTA. Using the CSVs found
here,
We are going to add two new tables:
-
One called
STATIONSbased on stations.csv -
One called
LINESon lines.csv
Follow the steps found here to load in your data.
Setting Up
Import The Notebook
-
We’ve provided a Colab notebook to walk you through each SQL and Python step—no local setup required!
-
Download the .ipynb found here, and import the notebook into snowflake.
-
Don’t forget to install streamlit and python package before you run.
Permissions
Before we run our algorithms, we need to set the proper permissions. But
before we get started granting different roles, we need to ensure that
you are using accountadmin to grant and create roles. Lets do that
now:
-- Use a role with the required privileges USE ROLE ACCOUNTADMIN; -- Create a consumer role for users of the Graph Analytics application CREATE ROLE IF NOT EXISTS MY_CONSUMER_ROLE; GRANT APPLICATION ROLE Neo4j_Graph_Analytics.app_user TO ROLE MY_CONSUMER_ROLE; SET MY_USER = (SELECT CURRENT_USER()); GRANT ROLE MY_CONSUMER_ROLE TO USER IDENTIFIER($MY_USER); USE SCHEMA MTA.PUBLIC; CREATE TABLE NODES (nodeId Number); INSERT INTO NODES VALUES (1), (2), (3), (4), (5), (6); CREATE TABLE RELATIONSHIPS (sourceNodeId Number, targetNodeId Number); INSERT INTO RELATIONSHIPS VALUES (1, 2), (2, 3), (4, 5), (5, 6); -- Grants needed for the app to read consumer data stored in tables and views, using a database role USE DATABASE MTA; CREATE DATABASE ROLE IF NOT EXISTS MY_DB_ROLE; GRANT USAGE ON DATABASE MTA TO DATABASE ROLE MY_DB_ROLE; GRANT USAGE ON SCHEMA MTA.PUBLIC TO DATABASE ROLE MY_DB_ROLE; GRANT SELECT ON ALL TABLES IN SCHEMA MTA.PUBLIC TO DATABASE ROLE MY_DB_ROLE; GRANT SELECT ON ALL VIEWS IN SCHEMA MTA.PUBLIC TO DATABASE ROLE MY_DB_ROLE; -- Future tables also include tables that are created by the application itself. -- This is useful as many use-cases require running algorithms in a sequence and using the output of a prior algorithm as input. GRANT SELECT ON FUTURE TABLES IN SCHEMA MTA.PUBLIC TO DATABASE ROLE MY_DB_ROLE; GRANT SELECT ON FUTURE VIEWS IN SCHEMA MTA.PUBLIC TO DATABASE ROLE MY_DB_ROLE; GRANT CREATE TABLE ON SCHEMA MTA.PUBLIC TO DATABASE ROLE MY_DB_ROLE; GRANT DATABASE ROLE MY_DB_ROLE TO APPLICATION Neo4j_Graph_Analytics; -- Ensure the consumer role has access to tables created by the application GRANT USAGE ON DATABASE MTA TO ROLE MY_CONSUMER_ROLE; GRANT USAGE ON SCHEMA MTA.PUBLIC TO ROLE MY_CONSUMER_ROLE; GRANT SELECT ON FUTURE TABLES IN SCHEMA MTA.PUBLIC TO ROLE MY_CONSUMER_ROLE; -- Use the consumer role to run the algorithm and inspect the output USE ROLE MY_CONSUMER_ROLE;
Cleaning Our Data
We need our data to be in a particular format in order to work with Graph Analytics. In general it should be like so:
For The Table Representing Nodes:
The first column should be called nodeId, which represents the ids for
the each node in our graph
For The table Representing Relationships:
We need to have columns called sourceNodeId and targetNodeId. These
will tell Graph Analytics the direction of the relationships, which in
this case means:
-
The starting station (sourceNodeId) and
-
The ending station (targetNodeId)
CREATE OR REPLACE TABLE mta.public.nodes AS SELECT id AS nodeid FROM mta.public.stations;
CREATE OR REPLACE TABLE mta.public.relationships AS
SELECT
sourcenodeid,
targetnodeid
FROM mta.public.lines;
Running Your Algorithms
Now we are finally at the step where we create a projection, run our algorithms, and write back to snowflake. We will run dijksra to determine the shortest path between subway stations.
You can find more information about writing this function in our documentation.
CALL neo4j_graph_analytics.graph.dijkstra('CPU_X64_XS', {
'defaultTablePrefix': 'mta.public',
'project': {
'nodeTables': ['nodes'],
'relationshipTables': {
'relationships': {
'sourceTable': 'nodes',
'targetTable': 'nodes'
}
}
},
'compute': { 'sourceNode': 68,
'targetNode': 24,
'sourceNodeTable': 'nodes'
},
'write': [
{
'sourceLabel': 'nodes',
'targetLabel': 'nodes',
'outputTable': 'mta.public.paths'
}
]
});
select * from mta.public.paths
We can then use a simple SQL query to take our numeric nodeids and return the stations in order:
WITH flattened AS (
SELECT
VALUE::NUMBER AS nodeid,
INDEX AS ordering
FROM mta.public.paths,
LATERAL FLATTEN(input => nodeids)
),
filtered AS (
SELECT s.*, f.ordering
FROM mta.public.stations s
JOIN flattened f ON s.id = f.nodeid
)
SELECT *
FROM filtered
ORDER BY ordering;
| STATION_NAME | ID | ORDERING |
|---|---|---|
Bergen St - Bk |
68 |
0 |
Atlantic Av-Barclays Ctr - Bk |
67 |
1 |
Canal St - M |
32 |
2 |
14 St-Union Sq - M |
104 |
3 |
34 St-Herald Sq - M |
230 |
4 |
Times Sq-42 St - M |
24 |
5 |
Modeling a Disruption
But what would happen if one of those stations were closed? How can we model what would be the new path? Luckily it will be pretty simple.
First we need to filter out the station that is closed from our nodes and relationships tables. Let’s say that Herald Square is closed:
CREATE OR REPLACE TABLE mta.public.nodes_f AS SELECT * FROM mta.public.nodes WHERE nodeid != 230;
CREATE OR REPLACE TABLE mta.public.relationships_f AS SELECT * FROM mta.public.relationships WHERE sourcenodeid != 230 and targetnodeid != 230;
Then we use those new tables when we run our algorithm:
CALL neo4j_graph_analytics.graph.dijkstra('CPU_X64_XS', {
'defaultTablePrefix': 'mta.public',
'project': {
'nodeTables': ['nodes_f'],
'relationshipTables': {
'relationships_f': {
'sourceTable': 'nodes_f',
'targetTable': 'nodes_f'
}
}
},
'compute': { 'sourceNode': 68,
'targetNode': 24,
'sourceNodeTable': 'nodes_f'
},
'write': [
{
'sourceLabel': 'nodes_f',
'targetLabel': 'nodes_f',
'outputTable': 'mta.public.paths_f'
}
]
});
And then we can pull back our results like so:
CREATE OR REPLACE VIEW mta.public.filtered_station_captions AS
WITH flattened AS (
SELECT
VALUE::NUMBER AS nodeid,
INDEX AS ordering
FROM mta.public.paths_f,
LATERAL FLATTEN(input => nodeids)
),
filtered AS (
SELECT
s.id AS nodeid,
s.station_name AS caption,
f.ordering,
FROM mta.public.stations s
JOIN flattened f ON s.id = f.nodeid
)
SELECT
nodeid,
ordering,
caption
FROM filtered
ORDER BY ordering;
select * from mta.public.filtered_station_captions
| NODEID | ORDERING | CAPTION |
|---|---|---|
68 |
0 |
Bergen St - Bk |
67 |
1 |
Atlantic Av-Barclays Ctr - Bk |
32 |
2 |
Canal St - M |
104 |
3 |
14 St-Union Sq - M |
103 |
4 |
Grand Central-42 St - M |
186 |
5 |
5 Av - M |
24 |
6 |
Times Sq-42 St - M |
Visualize Your Graph
At this point, you may want to visualize your graph to get a better understanding of how everything fits together. It would be nice to have our new station path represented visually. We already have everything we need for the nodes from our last step, but we also need to create a relationship table, which we do below:
CREATE OR REPLACE VIEW mta.public.relationships_view AS
WITH ordered_nodes AS (
SELECT
nodeid AS sourcenodeid,
LEAD(nodeid) OVER (ORDER BY ordering) AS targetnodeid
FROM mta.public.filtered_station_captions
)
SELECT *
FROM ordered_nodes
WHERE targetnodeid IS NOT NULL;
select * from mta.public.relationships_view
We use our filtered_station_viz table as our nodes. From here,
our syntax will look very familiar to what we have had before when
running algorithms. We also need to specify what captions belong to each
node in a for loop.
from neo4j_viz.snowflake import from_snowflake
from snowflake.snowpark.context import get_active_session
session = get_active_session()
viz_graph = from_snowflake(
session,
{
'nodeTables': ['mta.public.filtered_station_viz'],
'relationshipTables': {
'mta.public.relationships_view': {
'sourceTable': 'mta.public.filtered_station_viz',
'targetTable': 'mta.public.filtered_station_viz'
}
}
}
)
##### specifying which column should be associated with captions
for node in viz_graph.nodes:
node.caption = str(node.properties["STATION_NAME"])
###### now we render
html_object = viz_graph.render()
import streamlit.components.v1 as components
components.html(html_object.data, height=600)
Conclusions And Resources
In this quickstart, you learned how to bring the power of graph insights into Snowflake using Neo4j Graph Analytics.
What You Learned
Using our NYC Subway dataset, you were able to:
-
Set up the Neo4j Graph Analytics application within Snowflake.
-
Prepare and project your data into a graph model (stations as nodes, lines as relationships).
-
Ran dijkstra to find the shortest path between two stations and to model alternative routes if a station was closed