Identify Bottlenecks and Model Disruptions

3.6 million Americans ride the New York City subway each day. That’s 1.3 million more people than who takes a flight each day in the United States. The New York City Subway is unique in the number of riders, stations, and the level of service it provides.

Using Graph Analytics for Snowflake, we can easily model what would happen if a station was fully closed for repairs. Insights like this can apply not just to transport systems, but to supply chains, manufacturing processes and much more.

Prerequisites

The Native App Neo4j Graph Analytics for Snowflake

What You Will Need

A Snowflake account with appropriate access to databases and schemas.
Neo4j Graph Analytics application installed from the Snowflake marketplace. Access the marketplace via the menu bar on the left hand side of your screen, as seen below:

What You Will Build

A notebook that models subway disruptions in the New York City subway

What You Will Learn

How to prepare and project your data for graph analytics
How to pathfinding to model subway disruptions
How to read and write directly from and to your snowflake tables

Loading The Data

Dataset overview : This dataset is a digital twin of the NYC Subway!

Let’s name our database MTA. Using the CSVs found here, We are going to add two new tables:

One called STATIONS based on stations.csv
One called LINES on lines.csv

Follow the steps found here to load in your data.

Setting Up

Import The Notebook

We’ve provided a Colab notebook to walk you through each SQL and Python step—no local setup required!
Download the .ipynb found here, and import the notebook into snowflake.
Don’t forget to install streamlit and python package before you run.

Permissions

Before we run our algorithms, we need to set the proper permissions. But before we get started granting different roles, we need to ensure that you are using accountadmin to grant and create roles. Lets do that now:

-- Use a role with the required privileges
USE ROLE ACCOUNTADMIN;

-- Create a consumer role for users of the Graph Analytics application
CREATE ROLE IF NOT EXISTS MY_CONSUMER_ROLE;
GRANT APPLICATION ROLE Neo4j_Graph_Analytics.app_user TO ROLE MY_CONSUMER_ROLE;
SET MY_USER = (SELECT CURRENT_USER());
GRANT ROLE MY_CONSUMER_ROLE TO USER IDENTIFIER($MY_USER);

USE SCHEMA MTA.PUBLIC;
CREATE TABLE NODES (nodeId Number);
INSERT INTO NODES VALUES (1), (2), (3), (4), (5), (6);
CREATE TABLE RELATIONSHIPS (sourceNodeId Number, targetNodeId Number);
INSERT INTO RELATIONSHIPS VALUES (1, 2), (2, 3), (4, 5), (5, 6);

-- Grants needed for the app to read consumer data stored in tables and views, using a database role
USE DATABASE MTA;
CREATE DATABASE ROLE IF NOT EXISTS MY_DB_ROLE;
GRANT USAGE ON DATABASE MTA TO DATABASE ROLE MY_DB_ROLE;
GRANT USAGE ON SCHEMA MTA.PUBLIC TO DATABASE ROLE MY_DB_ROLE;
GRANT SELECT ON ALL TABLES IN SCHEMA MTA.PUBLIC TO DATABASE ROLE MY_DB_ROLE;
GRANT SELECT ON ALL VIEWS IN SCHEMA MTA.PUBLIC TO DATABASE ROLE MY_DB_ROLE;
-- Future tables also include tables that are created by the application itself.
-- This is useful as many use-cases require running algorithms in a sequence and using the output of a prior algorithm as input.
GRANT SELECT ON FUTURE TABLES IN SCHEMA MTA.PUBLIC TO DATABASE ROLE MY_DB_ROLE;
GRANT SELECT ON FUTURE VIEWS IN SCHEMA MTA.PUBLIC TO DATABASE ROLE MY_DB_ROLE;
GRANT CREATE TABLE ON SCHEMA MTA.PUBLIC TO DATABASE ROLE MY_DB_ROLE;
GRANT DATABASE ROLE MY_DB_ROLE TO APPLICATION Neo4j_Graph_Analytics;

-- Ensure the consumer role has access to tables created by the application
GRANT USAGE ON DATABASE MTA TO ROLE MY_CONSUMER_ROLE;
GRANT USAGE ON SCHEMA MTA.PUBLIC TO ROLE MY_CONSUMER_ROLE;
GRANT SELECT ON FUTURE TABLES IN SCHEMA MTA.PUBLIC TO ROLE MY_CONSUMER_ROLE;

-- Use the consumer role to run the algorithm and inspect the output
USE ROLE MY_CONSUMER_ROLE;

Cleaning Our Data

We need our data to be in a particular format in order to work with Graph Analytics. In general it should be like so:

For The Table Representing Nodes:

The first column should be called nodeId, which represents the ids for the each node in our graph

For The table Representing Relationships:

We need to have columns called sourceNodeId and targetNodeId. These will tell Graph Analytics the direction of the relationships, which in this case means:

The starting station (sourceNodeId) and
The ending station (targetNodeId)

CREATE OR REPLACE TABLE mta.public.nodes AS
SELECT
  id AS nodeid
FROM mta.public.stations;

CREATE OR REPLACE TABLE mta.public.relationships AS
SELECT
    sourcenodeid,
    targetnodeid
FROM mta.public.lines;

Running Your Algorithms

Now we are finally at the step where we create a projection, run our algorithms, and write back to snowflake. We will run dijksra to determine the shortest path between subway stations.

You can find more information about writing this function in our documentation.

CALL neo4j_graph_analytics.graph.dijkstra('CPU_X64_XS', {
'defaultTablePrefix': 'mta.public',
  'project': {
    'nodeTables': ['nodes'],
    'relationshipTables': {
      'relationships': {
        'sourceTable': 'nodes',
        'targetTable': 'nodes'
      }
    }
  },
  'compute': { 'sourceNode': 68,
                'targetNode': 24,
                'sourceNodeTable': 'nodes'
            },
  'write': [
    {
    'sourceLabel': 'nodes',
    'targetLabel': 'nodes',
    'outputTable':  'mta.public.paths'
    }
  ]
});

select * from mta.public.paths

We can then use a simple SQL query to take our numeric nodeids and return the stations in order:

WITH flattened AS (
  SELECT
    VALUE::NUMBER AS nodeid,
    INDEX AS ordering
  FROM mta.public.paths,
       LATERAL FLATTEN(input => nodeids)
),
filtered AS (
  SELECT s.*, f.ordering
  FROM mta.public.stations s
  JOIN flattened f ON s.id = f.nodeid
)
SELECT *
FROM filtered
ORDER BY ordering;

STATION_NAME	ID	ORDERING
Bergen St - Bk	68	0
Atlantic Av-Barclays Ctr - Bk	67	1
Canal St - M	32	2
14 St-Union Sq - M	104	3
34 St-Herald Sq - M	230	4
Times Sq-42 St - M	24	5

Modeling a Disruption

But what would happen if one of those stations were closed? How can we model what would be the new path? Luckily it will be pretty simple.

First we need to filter out the station that is closed from our nodes and relationships tables. Let’s say that Herald Square is closed:

CREATE OR REPLACE TABLE mta.public.nodes_f AS
SELECT *
FROM mta.public.nodes
WHERE nodeid != 230;

CREATE OR REPLACE TABLE mta.public.relationships_f AS
SELECT *
FROM mta.public.relationships
WHERE sourcenodeid != 230 and targetnodeid != 230;

Then we use those new tables when we run our algorithm:

CALL neo4j_graph_analytics.graph.dijkstra('CPU_X64_XS', {
'defaultTablePrefix': 'mta.public',
  'project': {
    'nodeTables': ['nodes_f'],
    'relationshipTables': {
      'relationships_f': {
        'sourceTable': 'nodes_f',
        'targetTable': 'nodes_f'
      }
    }
  },
  'compute': { 'sourceNode': 68,
                'targetNode': 24,
                'sourceNodeTable': 'nodes_f'
            },
  'write': [
    {
    'sourceLabel': 'nodes_f',
    'targetLabel': 'nodes_f',
    'outputTable':  'mta.public.paths_f'
    }
  ]
});

And then we can pull back our results like so:

CREATE OR REPLACE VIEW mta.public.filtered_station_captions AS
WITH flattened AS (
  SELECT
    VALUE::NUMBER AS nodeid,
    INDEX AS ordering
  FROM mta.public.paths_f,
       LATERAL FLATTEN(input => nodeids)
),
filtered AS (
  SELECT
    s.id AS nodeid,
    s.station_name AS caption,
    f.ordering,
  FROM mta.public.stations s
  JOIN flattened f ON s.id = f.nodeid
)
SELECT
    nodeid,
    ordering,
    caption
FROM filtered
ORDER BY ordering;

select * from mta.public.filtered_station_captions

NODEID	ORDERING	CAPTION
68	0	Bergen St - Bk
67	1	Atlantic Av-Barclays Ctr - Bk
32	2	Canal St - M
104	3	14 St-Union Sq - M
103	4	Grand Central-42 St - M
186	5	5 Av - M
24	6	Times Sq-42 St - M

Visualize Your Graph

At this point, you may want to visualize your graph to get a better understanding of how everything fits together. It would be nice to have our new station path represented visually. We already have everything we need for the nodes from our last step, but we also need to create a relationship table, which we do below:

CREATE OR REPLACE VIEW mta.public.relationships_view AS
WITH ordered_nodes AS (
  SELECT
    nodeid AS sourcenodeid,
    LEAD(nodeid) OVER (ORDER BY ordering) AS targetnodeid
  FROM mta.public.filtered_station_captions
)
SELECT *
FROM ordered_nodes
WHERE targetnodeid IS NOT NULL;

select * from mta.public.relationships_view

We use our filtered_station_viz table as our nodes. From here, our syntax will look very familiar to what we have had before when running algorithms. We also need to specify what captions belong to each node in a for loop.

from neo4j_viz.snowflake import from_snowflake
from snowflake.snowpark.context import get_active_session

session = get_active_session()

viz_graph = from_snowflake(
    session,
{
    'nodeTables': ['mta.public.filtered_station_viz'],
    'relationshipTables': {
      'mta.public.relationships_view': {
        'sourceTable': 'mta.public.filtered_station_viz',
        'targetTable': 'mta.public.filtered_station_viz'
      }
    }
  }
)


##### specifying which column should be associated with captions
for node in viz_graph.nodes:
    node.caption = str(node.properties["STATION_NAME"])

###### now we render
html_object = viz_graph.render()

import streamlit.components.v1 as components

components.html(html_object.data, height=600)

Conclusions And Resources

In this quickstart, you learned how to bring the power of graph insights into Snowflake using Neo4j Graph Analytics.

What You Learned

Using our NYC Subway dataset, you were able to:

Set up the Neo4j Graph Analytics application within Snowflake.
Prepare and project your data into a graph model (stations as nodes, lines as relationships).
Ran dijkstra to find the shortest path between two stations and to model alternative routes if a station was closed