Getting started

Installation

Neo4j Graph Analytics for Snowflake is delivered as a Native Application. The application can be installed from the Snowflake Marketplace. During the installation, you are required to enable Event sharing. See more about managing event sharing.

Application privileges

The application requires the CREATE COMPUTE POOL and CREATE WAREHOUSE privileges. The easiest way to grant those privileges is via Snowsight by selecting Data Products ⇒ Apps ⇒ Neo4j Graph Analytics ⇒ Privileges ⇒ Grant. Finally, click the Activate button on the same page. This will trigger the application to create internal resources such as compute pools.

When running algorithms, the application needs certain privileges to be able to run. For details, please see administration of application privileges. We also give a Usage example below to show both how to grant these privileges and run an algorithm.

Consumer roles and privileges

Users of the application also require certain privileges. For details please see administration of consumer privileges. The usage example below includes the aspect of consumer privileges.

Introduction to the Algorithm API

Neo4j Graph Analytics offers a catalogue of algorithms, from PageRank over Dijkstra to WCC.

The execution of algorithms is done in three steps: projection, computation, and writing results. The project-compute-write pattern is a common pattern in graph processing, where you first project a graph from your data, then compute some properties or metrics on that graph, and finally write the results back to your data store.

The three steps are covered more extensively in the respective chapters Project, Compute and Write.

Writing results

Immediately after an algorithm finishes, we write data back to your tables for inspection and further processing. Here we require you to specify things like an output table and labels (a Neo4j concept, think table name).

Preparation

Before you run an algorithm, your environment needs to be set up correctly.

We need to make sure to use a role with the required privileges to use the application.

USE ROLE <consumer_user_role>;

Optionally, to run an algorithm without giving the fully qualified endpoint names we can execute the following command.

USE DATABASE Neo4j_Graph_Analytics;

Here we are assuming that the application is installed under the default name, Neo4j_Graph_Analytics.

Usage example

We will give a more comprehensive example on how to run Neo4j Graph Analytics using the default warehouse configuration and selecting the CPU_X64_XS compute pool.

In the following example we assume that: - the application is installed as Neo4j_Graph_Analytics (default). - the role executing the queries has access to the consumer database objects referred to. - the role executing the queries has granted usage of the app_user application role.

For our example, we create two tables, one for nodes and one for relationships. The graph we want to project consists of six nodes and four relationships. Note the column names nodeId and sourceNodeId, targetNodeId respectively. The application requires these columns to exist in order to extract the data. Optional additional columns are treated as node or relationship properties, respectively.

If node and relationship tables are already available, but do not have the required column names, you can create views on top of them.

-- Use a role with the required privileges
USE ROLE ACCOUNTADMIN;

-- Create a consumer role for users of the Graph Analytics application
CREATE ROLE IF NOT EXISTS MY_CONSUMER_ROLE;
GRANT APPLICATION ROLE Neo4j_Graph_Analytics.app_user TO ROLE MY_CONSUMER_ROLE;

USE SCHEMA EXAMPLE_DB.DATA_SCHEMA;
CREATE TABLE NODES (nodeId Number);
INSERT INTO NODES VALUES (1), (2), (3), (4), (5), (6);
CREATE TABLE RELATIONSHIPS (sourceNodeId Number, targetNodeId Number);
INSERT INTO RELATIONSHIPS VALUES (1, 2), (2, 3), (4, 5), (5, 6);

-- Grant read access on the newly created tables to the application
GRANT USAGE ON DATABASE EXAMPLE_DB TO APPLICATION Neo4j_Graph_Analytics;
GRANT USAGE ON SCHEMA EXAMPLE_DB.DATA_SCHEMA TO APPLICATION Neo4j_Graph_Analytics;
GRANT SELECT ON ALL TABLES IN SCHEMA EXAMPLE_DB.DATA_SCHEMA TO APPLICATION Neo4j_Graph_Analytics;
GRANT CREATE TABLE ON SCHEMA EXAMPLE_DB.DATA_SCHEMA TO APPLICATION Neo4j_Graph_Analytics;

-- Ensure the consumer role has access to tables created by the application
GRANT USAGE ON DATABASE EXAMPLE_DB TO ROLE MY_CONSUMER_ROLE;
GRANT USAGE ON SCHEMA EXAMPLE_DB.DATA_SCHEMA TO ROLE MY_CONSUMER_ROLE;
GRANT SELECT ON FUTURE TABLES IN SCHEMA EXAMPLE_DB.DATA_SCHEMA TO ROLE MY_CONSUMER_ROLE;

-- Use the consumer role to run the algorithm and inspect the output
USE ROLE MY_CONSUMER_ROLE;

We capture this in projection configuration like so:

'project': {
    'nodeTables': ['EXAMPLE_DB.DATA_SCHEMA.NODES'],
    'relationshipTables': {
      'EXAMPLE_DB.DATA_SCHEMA.RELATIONSHIPS': {
        'sourceTable': 'EXAMPLE_DB.DATA_SCHEMA.NODES',
        'targetTable': 'EXAMPLE_DB.DATA_SCHEMA.NODES',
        'orientation': 'NATURAL'
      }
    }
  }

Both, nodes and relationships, can be read from multiple tables. Using the nodeTables and relationshipTables configuration parameters, we can specify which tables to read from. The nodeTables configuration parameter specifies an array of tables that contain the nodes. The name of each node table is mapped to a node label in the graph. The relationship tables are specified in a map, where the key is the name of the table and the value is a map of configuration parameters. The name of each relationship table is mapped to a relationship type in the graph. The sourceTable and targetTable configuration parameters specify the node tables that the relationship table refers to.

If we want to project relationships using a different orientation, we can specify that in the configuration. Possible values are NATURAL (default), UNDIRECTED and REVERSE.

In our example, we will use the Weakly Connected Components algorithm (WCC) to find disconnected parts of the graph. We can put together the algorithm computation configuration for WCC by specifying that we want "nice numbers" on the output side:

'compute': { 'consecutiveIds': true }

Once we have computed WCC, we write the results back to a table for further analytics. That table will be created and overridden if it already exists.

'write': [{
    'nodeLabel': 'NODES',
    'outputTable': 'EXAMPLE_DB.DATA_SCHEMA.NODES_COMPONENTS'
  }]

Since this is the first time writing back to our example schema, we also need to consider privileges. We need to grant the CREATE TABLE privilege on the schema to the application.

Finally with all that preamble, we are ready to go!

CALL Neo4j_Graph_Analytics.graph.wcc('CPU_X64_XS', {
    'project': {
        'nodeTables': ['EXAMPLE_DB.DATA_SCHEMA.NODES'],
        'relationshipTables': {
            'EXAMPLE_DB.DATA_SCHEMA.RELATIONSHIPS': {
                'sourceTable': 'EXAMPLE_DB.DATA_SCHEMA.NODES',
                'targetTable': 'EXAMPLE_DB.DATA_SCHEMA.NODES',
                'orientation': 'NATURAL'
            }
        }
    },
    'compute': { 'consecutiveIds': true },
    'write': [{
        'nodeLabel': 'NODES',
        'outputTable': 'EXAMPLE_DB.DATA_SCHEMA.NODES_COMPONENTS'
    }]
});

Please note that we could have called USE DATABASE Neo4j_Graph_Analytics followed by CALL graph.wcc(…) to avoid the fully qualified name.

Once this query ran, we can select the components from the table.

SELECT * FROM EXAMPLE_DB.DATA_SCHEMA.NODES_COMPONENTS;

This will list the component for each node. We can see that the graph consists of two separate components, each containing three nodes.

NODE	VALUE
1	0
2	0
3	0
4	1
5	1
6	1

Most algorithms produce node property results. Some algorithms, like KNN and Node Similarity, produce relationship results. These specific variants will be covered in the algorithms catalogue.

Now results have been written back to your end, and you can inspect and further process them - for example by feeding them into another Neo4j Graph Analytics algorithm.

Larger example

For a larger usage example including preparation of data into the node and relationship table/view format, see Basket analysis example on TPC-H data. It uses the TPC-H sample data available in Snowflake.