Running jobs

Neo4j Graph Analytics for Snowflake is in Public Preview and is not intended for production use.

Project

Graph projection is the process of creating a graph from tables/views in a Snowflake database. The tables represent nodes and relationships, rows represent entities, and columns represent identifiers and properties. Graph projection copies the content of the tables into a named in-memory graph. The in-memory graph is a data representation optimized for running large-scale graph algorithms.

Node tables

A node table is a table that represents nodes in a graph. Each row in the table represents a node entity. A node is uniquely identified within the table by a node identifier (nodeId).

Consider the following example of a node table:

CREATE TABLE EXAMPLE_DB.DATA_SCHEMA.NODE_DATA (nodeId Number, property1 Number, property2 Double);
INSERT INTO EXAMPLE_DB.DATA_SCHEMA.NODE_DATA VALUES (1, 42, 13.37), (2, 43, 13.38), (3, 44, 13.39);

The nodeId column uniquely identifies each node in the table. Supported data types for node identifiers are Number (BIGINT) and Varchar. Any additional column is considered a node property. In the example, we have two properties, property1 and property2. Supported data types for node properties are Number, Double, Float, and Array.

Relationship tables

A relationship table is a table that represents relationships between nodes in a graph. Each row in the table represents a relationship entity. A relationship is not required to have a unique identifier. Instead, it must have a source node identifier (sourceNodeId) and a target node identifier (targetNodeId). The values in those columns refer to node identifiers in node tables.

Consider the following example of a relationship table:

CREATE TABLE EXAMPLE_DB.DATA_SCHEMA.RELATIONSHIP_DATA (sourceNodeId Number, targetNodeId Number, property1 Double);
INSERT INTO EXAMPLE_DB.DATA_SCHEMA.RELATIONSHIP_DATA VALUES (1, 2, 42), (2, 3, 43);

The sourceNodeId and targetNodeId columns refer to the nodeId column in the node table. This also means that they must have the same data type as the nodeId column. Note, that the sourceNodeId and targetNodeId columns can have different types if they refer to different node tables. Any additional column is considered a relationship property. In the example, we have one property, property1. Relationship properties must be of type Double.

Granting access

Before we can create a graph, we need to grant read access to the tables to the application. This is done by granting the USAGE privilege on the database and schema and the SELECT privilege on the tables. The following example grants access to the Neo4j_Graph_Analytics application:

GRANT USAGE ON DATABASE EXAMPLE_DB TO APPLICATION Neo4j_Graph_Analytics;
GRANT USAGE ON SCHEMA EXAMPLE_DB.DATA_SCHEMA TO APPLICATION Neo4j_Graph_Analytics;
GRANT SELECT ON ALL TABLES IN SCHEMA EXAMPLE_DB.DATA_SCHEMA TO APPLICATION Neo4j_Graph_Analytics;

For more on this, please see the Getting started page.

Graph projection

When running an algorithm in Neo4j Graph Analytics for Snowflake, the first step is always loading data from your tables into an in-memory data structure. We rely on the project section of the call configuration to facilitate that. It is there that you specify the node and relationship tables to load from when creating the in-memory graph.

The following example creates a graph from the NODE_DATA and RELATIONSHIP_DATA tables:

'project': {
    'nodeTables': ['EXAMPLE_DB.DATA_SCHEMA.NODE_DATA'],
    'relationshipTables': {
        'EXAMPLE_DB.DATA_SCHEMA.RELATIONSHIP_DATA': {
            'sourceTable': 'EXAMPLE_DB.DATA_SCHEMA.NODE_DATA',
            'targetTable': 'EXAMPLE_DB.DATA_SCHEMA.NODE_DATA',
            'orientation': 'NATURAL'
        }
    }
}

In the project configuration section, the nodeTables parameter specifies the node table mappings. A node table mapping consists of a table name and a node label. There is a one-to-one mapping between node tables and node labels.

The relationshipTables parameter specifies the relationship table mappings. Each relationship table mapping consists of a table name and its own configuration object. The sourceTable and targetTable parameters specify the node tables that the relationship table refers to. The orientation parameter is optional and specifies the relationship orientation, here NATURAL.

Node labels and relationship types are used to group nodes and relationships in the in-memory graph. They can be used to filter the graph when calling an algorithm. That way we can run an algorithm on a subset of the graph.

Orientation

If we want to project relationships using a different orientation, we can specify that in the configuration. Possible values are NATURAL (default), UNDIRECTED and REVERSE.

If all necessary tables or views nodeTables, relationshipTables, sourceTable, and targetTable exist within the same schema, the defaultTablePrefix parameter can help reduce the amount of code required.

'project': {
    'defaultTablePrefix': 'EXAMPLE_DB.DATA_SCHEMA',
    'nodeTables': ['NODE_DATA'],
    'relationshipTables': {
        'RELATIONSHIP_DATA': {
            'sourceTable': 'NODE_DATA',
            'targetTable': 'NODE_DATA',
            'orientation': 'NATURAL'
        }
    }
}

This simplified configuration object can be used alongside defining all required projection tables as views within the same schema, allowing them to be referenced in the configuration without specifying the schema name.

It is highly recommended that the values in the nodeId column are unique (e.g. by using DISTINCT in a SELECT statement). Neo4j Graph Analytics for Snowflake only works with unique identifiers and will deduplicate them internally only projecting the first occurrence of a node, ignoring all others. By deduplicating before projecting, you can control how the properties are deduplicated as well as expect a faster projection, as less data needs to be read from the node query.

Multi-table example

Let’s consider a more complex example with multiple node and relationship tables.

The following tables represent a subset of a transactional shop system. Usually, the tables in a real-world scenario would contain more columns and rows. Here, creating a VIEW to transform the source tables into a format that can be used for graph projection.

CREATE TABLE EXAMPLE_DB.DATA_SCHEMA.PRODUCTS  (nodeId Number, price Double);
CREATE TABLE EXAMPLE_DB.DATA_SCHEMA.ORDERS    (nodeId Number, total Number);
CREATE TABLE EXAMPLE_DB.DATA_SCHEMA.CUSTOMERS (nodeId Number);
CREATE TABLE EXAMPLE_DB.DATA_SCHEMA.LINEITEM  (sourceNodeId Number, targetNodeId Number, quantity Double);
CREATE TABLE EXAMPLE_DB.DATA_SCHEMA.PURCHASE  (sourceNodeId Number, targetNodeId Number);

To create a graph from these tables, we need to define the node and relationship tables and their mappings. An example is the Node Similarity algorithm, which can be used to find similar products based on the CONTAINS relationship. We can use the labels to just run the algorithm on the Product and Order nodes. The algorithm will result in new relationships called SIMILAR_TO, that connects similar products. The relationships will contain a property similarity with a score for how string the connection is. Lastly we write the relationships back to Snowflake tables for further analysis.

For the required setup of grants your consumer role and environment, please see the Getting started page.

Please also note that we use the default application name Neo4j_Graph_Analytics in the following example. If you chose a different app name during installation, please replace it with that.

CALL Neo4j_Graph_Analytics.graph.node_similarity('CPU_X64_XS', {
    'project': {
        'defaultTablePrefix': 'EXAMPLE_DB.DATA_SCHEMA',
        'nodeTables': [ 'PRODUCTS', 'ORDERS', 'CUSTOMERS' ],
        'relationshipTables': {
            'LINEITEM': {
                'sourceTable': 'ORDERS',
                'targetTable': 'PRODUCTS'
            },
            'PURCHASE': {
                'sourceTable': 'CUSTOMERS',
                'targetTable': 'PRODUCTS'
            }
        }
    },
    'compute': { 'topK': 2 },
    'write': [
        { 'sourceLabel': 'CUSTOMERS', 'targetLabel': 'CUSTOMERS', 'outputTable': 'EXAMPLE_DB.DATA_SCHEMA.CUSTOMER_SIMILARITIES' },
        { 'sourceLabel': 'ORDERS', 'targetLabel': 'ORDERS', 'outputTable': 'EXAMPLE_DB.DATA_SCHEMA.ORDER_SIMILARITIES' }
    ]
});

Write

The different algorithms have different results types: sometimes node properties, sometimes relationships, etc. Below we describe how to deal with each one in turn.

Granting permissions

In order to write data into new tables, we need to grant the CREATE privilege on the schema to the application.

The following example grants the CREATE privilege on the DATA_SCHEMA schema to the Neo4j_Graph_Analytics application:

GRANT CREATE ON SCHEMA EXAMPLE_DB.DATA_SCHEMA TO APPLICATION Neo4j_Graph_Analytics;

For more on this, please see the Getting started page.

Node properties

Some algorithms output node properties as their results. To write these node properties back to Snowflake, you need to specify both the label and the target table. The following example illustrates how results from e.g. WCC are written back to a new table:

    'write': [
        { 'nodeLabel': 'CUSTOMERS', 'outputTable': 'EXAMPLE_DB.DATA_SCHEMA.CUSTOMERS_COMPONENTS'}
    ]

The nodeLabel defines for which node table mapping we want to write back the properties. The corresponding properties will be stored in the outputTable.

Relationships properties

For algorithms that output relationship properties, specify the desired node label and output table. The following example demonstrates how to write back the relationships computed by the Node Similarity algorithm, including the computed similarity property (score):

    'write': [
        { 'sourceLabel': 'ORDERS', 'targetLabel': 'ORDERS', 'outputTable': 'EXAMPLE_DB.DATA_SCHEMA.ORDER_SIMILARITIES' }
    ]

The table will be newly created and a potential existing table with the same name will be overridden.

Note that the write section of an algorithm call can contain multiple entries. It is a list, and all the write configurations are separated by commas, while the list itself is enclosed in square brackets.