Getting started

Neo4j Graph Analytics for Snowflake is in Public Preview and is not intended for production use.

Neo4j Graph Analytics for Snowflake is delivered as a Native Application. The application can be installed from the Snowflake Marketplace.

Installation

The application requires the CREATE COMPUTE POOL and CREATE WAREHOUSE privileges.

The application creates multiple compute pools on which graph algorithms can be run. The compute pools are created in a suspended state and will auto-suspend when no algorithm is running. This aligns with typical run-on-demand usage patterns. If users run jobs in sequence and the specifications are compatible, a running or recently active compute pool may be reused for the subsequent run. Otherwise, they are suspended to reduce costs for the consumer.

Compute pools for the following instance families are created, if they are supported in the consumer region
  • 'CPU_X64_XS'

  • 'CPU_X64_M'

  • 'CPU_X64_L'

  • 'HIGHMEM_X64_S'

  • 'HIGHMEM_X64_M'

  • 'HIGHMEM_X64_L'

  • 'GPU_NV_S'

  • 'GPU_NV_XS'

These instance family names are also the selectors, used to run a graph algorithm in a particular compute pool.

The application creates a warehouse for reading and writing data from and to consumer databases. Specifically, that warehouse is used to read Snowflake tables when projecting graphs, and it is used for writing algorithm results. It is also used for administrative queries and logging.

Similar to the compute pools, the warehouse is configured with a short auto-suspend timeout to reduce costs. All privileges on the warehouse are granted to consumers via application roles, to let the consumer have full control of the warehouse. It is expected therefore that application users modify the warehouse at various times to suit the workload.

Consumer roles and privileges

The application comes with two application roles: app_user and app_admin.

The app_user role provides access to all algorithm procedures and utility functions. The app_admin role provides access to manage the query warehouse, and to monitor and operate the compute pools.

The role used to create the application is automatically granted both of the application roles. Other consumer roles used to interact with the application will need to have one or both of the application roles granted. In the below installation example we create two new consumer roles, one for users and one for administrators. To these consumer roles we grant usage on the corresponding application roles. The consumer roles can then be granted to users according to how they will interact with the application.

Copy the queries below and insert names of roles as necessary. If you are not using the default application name Neo4j_Graph_Analytics, replace it with the name you used during installation. Run the queries top to bottom to satisfy application requirements.

-- Use a role with the required privileges, like 'ACCOUNTADMIN'
USE ROLE <privileged_role>;

-- Create a consumer role for users of the Graph Analytics application
CREATE ROLE IF NOT EXISTS <consumer_user_role>;
GRANT APPLICATION ROLE Neo4j_Graph_Analytics.app_user TO ROLE <consumer_user_role>;
-- Create a consumer role for administrators of the Graph Analytics application
CREATE ROLE IF NOT EXISTS <consumer_admin_role>;
GRANT APPLICATION ROLE Neo4j_Graph_Analytics.app_admin TO ROLE <consumer_admin_role>;

Table access privileges

The application needs to be given access to read from tables from which graphs are to be projected. If this is a single database and schema, the required privileges can be granted once during setup. More likely, users will at different times want to project graphs from data in different schemas and databases. For this reason we show an example of granting read access in the setup query immediately below, and repeat the instructions in the Usage example.

And when writing data back, the application creates a new table. To immediately have access to this table afterward, users can grant privileges on future tables in the schema.

Copy the queries below and insert names of roles and database objects as necessary. If you are not using the default application name Neo4j_Graph_Analytics, replace it with the name you used during installation. Run the queries top to bottom to satisfy application requirements.

-- Use a role with the required privileges, like 'ACCOUNTADMIN'
USE ROLE <privileged_role>;

-- Grant access to consumer data
-- The application reads consumer data to build a graph object, and it also writes results into new tables.
-- We therefore need to grant the right permissions to give the application access.
GRANT USAGE ON DATABASE <database_name> TO APPLICATION Neo4j_Graph_Analytics;
GRANT USAGE ON SCHEMA <database_name>.<schema_name> TO APPLICATION Neo4j_Graph_Analytics;

-- Required to read tabular data into a graph
GRANT SELECT ON ALL TABLES IN SCHEMA <database_name>.<schema_name> TO APPLICATION Neo4j_Graph_Analytics;
-- Required to write computation results into a table
GRANT CREATE TABLE ON SCHEMA <database_name>.<schema_name> TO APPLICATION Neo4j_Graph_Analytics;
-- Optional, ensuring the consumer role has access to tables created by the application
GRANT ALL PRIVILEGES ON FUTURE TABLES IN SCHEMA <database_name>.<schema_name> TO ROLE <consumer_user_role>;

Privileges for future tables and views

The privileges granted to the application in the previous section allows the it to read from existing tables in the given schema. It might also be necessary to allow the application to read from future tables in the schema. Unfortunately, SELECT ON FUTURE TABLES can not be granted directly to the application. However, Snowflake provides database roles which we can use to solve this problem.

First of all, we need to create a database role that has the required privileges.

CREATE DATABASE ROLE <database_role>;

-- Grants needed for reading existing consumer data stored in tables and views.
GRANT SELECT ON ALL TABLES IN SCHEMA <database_name>.<schema_name> TO DATABASE ROLE <database_role>;
GRANT SELECT ON ALL VIEWS IN SCHEMA <database_name>.<schema_name> TO DATABASE ROLE <database_role>;
-- Grants needed for reading future consumer data stored in tables and views.
GRANT SELECT ON FUTURE TABLES IN SCHEMA <database_name>.<schema_name> TO DATABASE ROLE <database_role>;
GRANT SELECT ON FUTURE VIEWS IN SCHEMA <database_name>.<schema_name> TO DATABASE ROLE <database_role>;
-- Grants needed for writing computation results into tables.
GRANT CREATE TABLE ON SCHEMA <database_name>.<schema_name> TO DATABASE ROLE <database_role>;

After we assigned the permissions to the database role, we need to assign the database role to the application. Note, that this is a preview feature and might not be available in all Snowflake regions or accounts.

-- Assuming the default name of 'Neo4j_Graph_Analytics' for the application
GRANT DATABASE ROLE <database_role> TO APPLICATION Neo4j_Graph_Analytics;

Any table and view that is created in the given schema will now be accessible to the application.

Introduction to the Algorithm API

Neo4j Graph Analytics offers a catalogue of algorithms, from PageRank over Dijkstra to WCC.

The execution of algorithms is done in three steps: projection, computation, and writing results. The project-compute-write pattern is a common pattern in graph processing, where you first project a graph from your data, then compute some properties or metrics on that graph, and finally write the results back to your data store.

Projection

We start by having you specify which tables to read nodes from, and which tables to read relationships from. Nodes and relationships are terms from graph databases, but for now, think of them in these simple terms:

  • Nodes are entities, rows in a table

  • Relationships are connections between entities, two foreign key references in a row

Tables that contain nodes, i.e. node tables, must have a column with unique identifiers for the nodes. These identifiers are called node IDs and the corresponding column must be named nodeId. The application will use this column to identify the nodes in the graph. Any other column in a node table is treated as a node property.

Tables that contain relationships, i.e. relationship tables, must have two columns storing references to the node IDs. These columns must be named sourceNodeId and targetNodeId. The application will use these columns to identify the source and target nodes of the relationship. A relationship table can also have an additional column, which is treated as property of the relationship. This can be useful when running a weighted graph algorithm.

Using that convention, we can read in your data and turn it into a graph data structure, upon which we can compute any of the algorithms in our catalogue.

Algorithm computation

We have a large and growing catalogue of algorithms, from classics such as Dijkstra’s algorithm for finding shortest paths, over a modern one like PageRank for link analysis, to a popular algorithm for finding groupings in your data with WCC (Weakly Connected Components).

They differ in the inputs they accept. Dijkstra will take directed or undirected graphs and will consider weights if you have them. PageRank is also agnostic to direction-ness, but offers many tuning handles. Topological sort in contrast only understands directed graphs and ignores weights. We document these specifics in the catalogue of algorithms.

Writing results

Immediately after an algorithm finishes, we write data back to your tables for inspection and further processing. Here we require you to specify things like an output table and labels (a Neo4j concept, think table name).

Preparation

Before you run an algorithm, your environment needs to be set up correctly.

We need to make sure to use a role with the required privileges to use the application.

USE ROLE <consumer_user_role>;

Optionally, to run an algorithm without giving the fully qualified endpoint names we can execute the following command.

USE DATABASE Neo4j_Graph_Analytics;

Here we are assuming that the application is installed under the default name, Neo4j_Graph_Analytics.

Usage example

We will give a more comprehensive example on how to run Neo4j Graph Analytics using the default warehouse configuration and selecting the CPU_X64_XS compute pool. For a larger usage example including preparation of data into the node and relationship table/view format, see Basket analysis example on TPC-H data. It uses the TPC-H sample data available in Snowflake.

In the following example we assume that: - the application is installed as Neo4j_Graph_Analytics (default). - the role executing the queries has access to the consumer database objects referred to. - the role executing the queries has granted usage of the app_user application role.

For our example, we create two tables, one for nodes and one for relationships. The graph we want to project consists of six nodes and four relationships. Note the column names nodeId and sourceNodeId, targetNodeId respectively. The application requires these columns to exist in order to extract the data. Optional additional columns are treated as node or relationship properties, respectively.

If node and relationship tables are already available, but do not have the required column names, you can create views on top of them.

-- Use a role with the required privileges
USE ROLE ACCOUNTADMIN;

-- Create a consumer role for users of the Graph Analytics application
CREATE ROLE IF NOT EXISTS MY_CONSUMER_ROLE;
GRANT APPLICATION ROLE Neo4j_Graph_Analytics.app_user TO ROLE MY_CONSUMER_ROLE;

USE SCHEMA EXAMPLE_DB.DATA_SCHEMA;
CREATE TABLE NODES (nodeId Number);
INSERT INTO NODES VALUES (1), (2), (3), (4), (5), (6);
CREATE TABLE RELATIONSHIPS (sourceNodeId Number, targetNodeId Number);
INSERT INTO RELATIONSHIPS VALUES (1, 2), (2, 3), (4, 5), (5, 6);

-- Grant read access on the newly created tables to the application
GRANT USAGE ON DATABASE EXAMPLE_DB TO APPLICATION Neo4j_Graph_Analytics;
GRANT USAGE ON SCHEMA EXAMPLE_DB.DATA_SCHEMA TO APPLICATION Neo4j_Graph_Analytics;
GRANT SELECT ON ALL TABLES IN SCHEMA EXAMPLE_DB.DATA_SCHEMA TO APPLICATION Neo4j_Graph_Analytics;
GRANT CREATE TABLE ON SCHEMA EXAMPLE_DB.DATA_SCHEMA TO APPLICATION Neo4j_Graph_Analytics;

-- Ensure the consumer role has access to tables created by the application
GRANT ALL PRIVILEGES ON FUTURE TABLES IN SCHEMA EXAMPLE_DB.DATA_SCHEMA TO ROLE MY_CONSUMER_ROLE;
-- Use the consumer role to run the algorithm and inspect the output
USE ROLE MY_CONSUMER_ROLE;

We capture this in projection configuration like so:

'project': {
    'nodeTables': ['EXAMPLE_DB.DATA_SCHEMA.NODES'],
    'relationshipTables': {
      'EXAMPLE_DB.DATA_SCHEMA.RELATIONSHIPS': {
        'sourceTable': 'EXAMPLE_DB.DATA_SCHEMA.NODES',
        'targetTable': 'EXAMPLE_DB.DATA_SCHEMA.NODES',
        'orientation': 'NATURAL'
      }
    }
  }

Both, nodes and relationships, can be read from multiple tables. Using the nodeTables and relationshipTables configuration parameters, we can specify which tables to read from. The nodeTables configuration parameter specifies an array of tables that contain the nodes. The name of each node table is mapped to a node label in the graph. The relationship tables are specified in a map, where the key is the name of the table and the value is a map of configuration parameters. The name of each relationship table is mapped to a relationship type in the graph. The sourceTable and targetTable configuration parameters specify the node tables that the relationship table refers to.

If we want to project relationships using a different orientation, we can specify that in the configuration. Possible values are NATURAL (default), UNDIRECTED and REVERSE.

In our example, we will use the Weakly Connected Components algorithm (WCC) to find disconnected parts of the graph. We can put together the algorithm computation configuration for WCC by specifying that we want "nice numbers" on the output side:

'compute': { 'consecutiveIds': true }

Once we have computed WCC, we write the results back to a table for further analytics. That table will be created and overridden if it already exists.

'write': [{
    'nodeLabel': 'NODES',
    'outputTable': 'EXAMPLE_DB.DATA_SCHEMA.NODES_COMPONENTS'
  }]

Since this is the first time writing back to our example schema, we also need to consider privileges. We need to grant the CREATE TABLE privilege on the schema to the application.

Finally with all that preamble, we are ready to go!

CALL Neo4j_Graph_Analytics.graph.wcc('CPU_X64_XS', {
    'project': {
        'nodeTables': ['EXAMPLE_DB.DATA_SCHEMA.NODES'],
        'relationshipTables': {
            'EXAMPLE_DB.DATA_SCHEMA.RELATIONSHIPS': {
                'sourceTable': 'EXAMPLE_DB.DATA_SCHEMA.NODES',
                'targetTable': 'EXAMPLE_DB.DATA_SCHEMA.NODES',
                'orientation': 'NATURAL'
            }
        }
    },
    'compute': { 'consecutiveIds': true },
    'write': [{
        'nodeLabel': 'NODES',
        'outputTable': 'EXAMPLE_DB.DATA_SCHEMA.NODES_COMPONENTS'
    }]
});

Please note that we could have called USE DATABASE Neo4j_Graph_Analytics followed by CALL graph.wcc(…​) to avoid the fully qualified name.

Once this query ran, we can select the components from the table.

SELECT * FROM EXAMPLE_DB.DATA_SCHEMA.NODES_COMPONENTS;

This will list the component for each node. We can see that the graph consists of two separate components, each containing three nodes.

NODE	VALUE
1	0
2	0
3	0
4	1
5	1
6	1

Most algorithms produce node property results. Some algorithms, like KNN and Node Similarity, produce relationship results. These specific variants will be covered in the algorithms catalogue.

Now results have been written back to your end, and you can inspect and further process them - for example by feeding them into another Neo4j Graph Analytics algorithm.