Filtered Node Similarity

This section describes the Filtered Node Similarity algorithm in Neo4j Graph Analytics for Snowflake. The algorithm is an extension of Node Similarity with support for filtering on source nodes, target nodes, or both.

Introduction

The Filtered Node Similarity algorithm is an extension to the Node Similarity algorithm. It adds support for filtering on source nodes, target nodes, or both.

Node filtering

A node filter reduces the node space for which the algorithm will produce results. Consider two similarity results: A = (alice)-[:SIMILAR_TO]→(bob) and B = (bob)-[:SIMILAR_TO]→(alice). Result A will be produced if the (alice) node matches the source node filter and the (bob) node matches the target node filter If the (alice) node does not match the target node filter, or the (bob) node does not match the source node filter, result B will not be produce.

Configuring node filters

For the standard configuration of node similarity, see Node Similarity syntax.

The source node filter is specified with the sourceNodeFilter configuration parameter. The target node filter is specified with the targetNodeFilter configuration parameter. Neither parameter is mandatory.

The node filter parameters accept one of the following:

Table 1. Syntax for `sourceNodeFilter` and `targetNodeFilter`
a list of node ids	`sourceNodeFilter: ['Alice', 'Bob', 'Carol']` and specify `sourceNodesTable: 'Person'`
a single node label	`sourceNodeFilter: 'Person'`

Syntax

This section covers the syntax used to execute the Filtered Node Similarity algorithm.

Run Filtered Node Similarity.

CALL Neo4j_Graph_Analytics.graph.filtered_node_similarity(
  'CPU_X64_XS',                    (1)
  {
    ['defaultTablePrefix': '...',] (2)
    'project': {...},              (3)
    'compute': {...},              (4)
    'write':   {...}               (5)
  }
);

1	Compute pool selector.
2	Optional prefix for table references.
3	Project config.
4	Compute config.
5	Write config.

Table 2. Parameters
Name	Type	Default	Optional	Description
computePoolSelector	String	`n/a`	no	The selector for the compute pool on which to run the Filtered Node Similarity job.
configuration	Map	`{}`	no	Configuration for graph project, algorithm compute and result write back.

The configuration map consists of the following three entries.

For more details on below Project configuration, refer to the Project documentation.

Table 3. Project configuration
Name	Type
nodeTables	List of node tables.
relationshipTables	Map of relationship types to relationship tables.

Table 4. Compute configuration
Name	Type	Default	Optional	Description
mutateProperty	String	`'similarity'`	yes	The relationship property that will be written back to the Snowflake database.
mutateRelationshipType	String	`'SIMILAR_TO'`	yes	The relationship type used for the relationships written back to the Snowflake database.
similarityCutoff	Float	`1e-42`	yes	Lower limit for the similarity score to be present in the result. Values must be between 0 and 1.
degreeCutoff	Integer	`1`	yes	Inclusive lower bound on the node degree for a node to be considered in the comparisons. This value can not be lower than 1.
upperDegreeCutoff	Integer	`2147483647`	yes	Inclusive upper bound on the node degree for a node to be considered in the comparisons. This value can not be lower than 1.
topK	Integer	`10`	yes	Limit on the number of scores per node. The K largest results are returned. This value cannot be lower than 1.
bottomK	Integer	`10`	yes	Limit on the number of scores per node. The K smallest results are returned. This value cannot be lower than 1.
topN	Integer	`0`	yes	Global limit on the number of scores computed. The N largest total results are returned. This value cannot be negative, a value of 0 means no global limit.
bottomN	Integer	`0`	yes	Global limit on the number of scores computed. The N smallest total results are returned. This value cannot be negative, a value of 0 means no global limit.
relationshipWeightProperty	String	`null`	yes	Name of the relationship property to use as weights. If unspecified, the algorithm runs unweighted.
similarityMetric	String	`JACCARD`	yes	The metric used to compute similarity. Can be either `JACCARD`, `OVERLAP` or `COSINE`.
useComponents	Boolean or String	`false`	yes	If enabled, Filtered Node Similarity will use components to improve the performance of the computation, skipping comparisons of nodes in different components. Set to `false` (Default): the algorithm does not use components, but computes similarity across the entire graph. Set to `true`: the algorithm uses components, and will compute these components before computing similarity. Set to String: use pre-computed components stored in graph, String is the key for a node property representing components.
sourceNodeFilter	String or List	`null`	yes	Filter for source nodes. Can be a single node label, list of node labels, single node ID, or list of node IDs.
sourceNodesTable	String	`null`	yes	A table for mapping the source node identifier.
targetNodeFilter	String or List	`null`	yes	Filter for target nodes. Can be a single node label, list of node labels, single node ID, or list of node IDs.
targetNodesTable	String	`null`	yes	A table for mapping the target node identifier.

For more details on below Write configuration, refer to the Write documentation.

Table 5. Write configuration
Name	Type	Default	Optional	Description
sourceLabel	String	`n/a`	no	Node label in the in-memory graph for start nodes of relationships to be written back.
targetLabel	String	`n/a`	no	Node label in the in-memory graph for end nodes of relationships to be written back.
outputTable	String	`n/a`	no	Table in Snowflake database to which relationships are written.
relationshipType	String	`'SIMILAR_TO'`	yes	The relationship type that will be written back to the Snowflake database.
relationshipProperty	String	`'similarity'`	yes	The relationship property that will be written back to the Snowflake database.

Examples

In this section we will show examples of running the Filtered Node Similarity algorithm on a concrete graph. The intention is to illustrate what the results look like and to provide a guide in how to make use of the algorithm in a real setting. We will do this on a small knowledge graph of a handful of nodes, connected in a particular pattern. The example graph looks like this:

The following SQL statement will create the example graph tables in the Snowflake database:

CREATE OR REPLACE TABLE EXAMPLE_DB.DATA_SCHEMA.PERSONS (NODEID VARCHAR);
INSERT INTO EXAMPLE_DB.DATA_SCHEMA.PERSONS VALUES
  ('Alice'),
  ('Bob'),
  ('Carol'),
  ('Dave'),
  ('Eve');

CREATE OR REPLACE TABLE EXAMPLE_DB.DATA_SCHEMA.INSTRUMENTS (NODEID VARCHAR);
INSERT INTO EXAMPLE_DB.DATA_SCHEMA.INSTRUMENTS VALUES
  ('Guitar'),
  ('Synthesizer'),
  ('Bongos'),
  ('Trumpet');

CREATE OR REPLACE TABLE EXAMPLE_DB.DATA_SCHEMA.LIKES (SOURCENODEID VARCHAR, TARGETNODEID VARCHAR, WEIGHT FLOAT);
INSERT INTO EXAMPLE_DB.DATA_SCHEMA.LIKES VALUES
  ('Alice', 'Guitar',      1.0),
  ('Alice', 'Synthesizer', 1.0),
  ('Alice', 'Bongos',      0.5),
  ('Bob',   'Guitar',      1.0),
  ('Bob',   'Synthesizer', 1.0),
  ('Carol', 'Bongos',      1.0),
  ('Dave',  'Guitar',      1.0),
  ('Dave',  'Trumpet',     1.5),
  ('Dave',  'Bongos',      1.0);

This bipartite graph has two node sets, Person nodes and Instrument nodes. The two node sets are connected via LIKES relationships. Each relationship starts at a Person node and ends at an Instrument node.

In the example, we want to use the Filtered Node Similarity algorithm to compare people based on the instruments they like.

The Filtered Node Similarity algorithm will only compute similarity for nodes that have a degree of at least 1. In the example graph, the Eve node will not be compared to other Person nodes.

In the following examples, we will demonstrate using the Filtered Node Similarity algorithm on this graph with various filtering configurations.

Run job

Running a Filtered Node Similarity job involves the three steps: Project, Compute and Write.

To run the query, there is a required setup of grants for the application, your consumer role and your environment. Please see the Getting started page for more on this.

We also assume that the application name is the default Neo4j_Graph_Analytics. If you chose a different app name during installation, please replace it with that.

The following will run a Filtered Node Similarity job with source and target filters:

CALL Neo4j_Graph_Analytics.graph.filtered_node_similarity('CPU_X64_XS', {
    'defaultTablePrefix': 'EXAMPLE_DB.DATA_SCHEMA',
    'project': {
        'nodeTables': ['PERSONS', 'INSTRUMENTS'],
        'relationshipTables': {
          'LIKES': {
            'sourceTable': 'PERSONS',
            'targetTable': 'INSTRUMENTS'
          }
        }
    },
    'compute': {
        'sourceNodeFilter': 'PERSONS',
        'targetNodeFilter': 'PERSONS'
    },
    'write': [{
        'outputTable': 'PERSONS_SIMILARITY',
        'sourceLabel': 'PERSONS',
        'targetLabel': 'PERSONS',
        'relationshipType': 'SIMILAR_TO',
        'relationshipProperty': 'similarity'
    }]
});

Table 6. Results
JOB_ID	JOB_START	JOB_END	JOB_RESULT
job_36fa8f572b8b412fabb7d9343ed038f8	2025-06-25 13:30:03.460	2025-06-25 13:30:10.288	{ "node_similarity_filtered_1": { "computeMillis": 43, "configuration": { "bottomK": 10, "bottomN": 0, "concurrency": 2, "degreeCutoff": 1, "jobId": "d0ae1a6a-30ba-479e-bafc-31b8eadedfb6", "logProgress": true, "mutateProperty": "similarity", "mutateRelationshipType": "SIMILAR_TO", "nodeLabels": [""], "relationshipTypes": [""], "similarityCutoff": 1.000000000000000e-42, "similarityMetric": "JACCARD", "sourceNodeFilter": "NodeFilter[label=PERSONS]", "sudo": false, "targetNodeFilter": "NodeFilter[label=PERSONS]", "topK": 10, "topN": 0, "upperDegreeCutoff": 2147483647, "useComponents": true }, "mutateMillis": 152, "nodesCompared": 4, "postProcessingMillis": 0, "preProcessingMillis": 7, "relationshipsWritten": 10, "similarityDistribution": { "max": 0.6666679382324218, "mean": 0.41666641235351565, "min": 0.25, "p1": 0.25, "p10": 0.25, "p100": 0.6666660308837891, "p25": 0.3333320617675781, "p5": 0.25, "p50": 0.3333320617675781, "p75": 0.5000019073486328, "p90": 0.6666660308837891, "p95": 0.6666660308837891, "p99": 0.6666660308837891, "stdDev": 0.14907148283512542 } }, "project_1": { "graphName": "snowgraph", "nodeCount": 9, "nodeMillis": 402, "relationshipCount": 9, "relationshipMillis": 531, "totalMillis": 933 }, "write_relationship_type_1": { "exportMillis": 2738, "outputTable": "EXAMPLE_DB.DATA_SCHEMA.PERSONS_SIMILARITY", "relationshipProperty": "similarity", "relationshipType": "SIMILAR_TO", "relationshipsExported": 10 } }

Table 6. Results

JOB_ID

JOB_START

JOB_END

JOB_RESULT

job_36fa8f572b8b412fabb7d9343ed038f8

2025-06-25 13:30:03.460

2025-06-25 13:30:10.288

{
  "node_similarity_filtered_1": {
    "computeMillis": 43,
    "configuration": {
      "bottomK": 10,
      "bottomN": 0,
      "concurrency": 2,
      "degreeCutoff": 1,
      "jobId": "d0ae1a6a-30ba-479e-bafc-31b8eadedfb6",
      "logProgress": true,
      "mutateProperty": "similarity",
      "mutateRelationshipType": "SIMILAR_TO",
      "nodeLabels": ["*"],
      "relationshipTypes": ["*"],
      "similarityCutoff": 1.000000000000000e-42,
      "similarityMetric": "JACCARD",
      "sourceNodeFilter": "NodeFilter[label=PERSONS]",
      "sudo": false,
      "targetNodeFilter": "NodeFilter[label=PERSONS]",
      "topK": 10,
      "topN": 0,
      "upperDegreeCutoff": 2147483647,
      "useComponents": true
    },
    "mutateMillis": 152,
    "nodesCompared": 4,
    "postProcessingMillis": 0,
    "preProcessingMillis": 7,
    "relationshipsWritten": 10,
    "similarityDistribution": {
      "max": 0.6666679382324218,
      "mean": 0.41666641235351565,
      "min": 0.25,
      "p1": 0.25,
      "p10": 0.25,
      "p100": 0.6666660308837891,
      "p25": 0.3333320617675781,
      "p5": 0.25,
      "p50": 0.3333320617675781,
      "p75": 0.5000019073486328,
      "p90": 0.6666660308837891,
      "p95": 0.6666660308837891,
      "p99": 0.6666660308837891,
      "stdDev": 0.14907148283512542
    }
  },
  "project_1": {
    "graphName": "snowgraph",
    "nodeCount": 9,
    "nodeMillis": 402,
    "relationshipCount": 9,
    "relationshipMillis": 531,
    "totalMillis": 933
  },
  "write_relationship_type_1": {
    "exportMillis": 2738,
    "outputTable": "EXAMPLE_DB.DATA_SCHEMA.PERSONS_SIMILARITY",
    "relationshipProperty": "similarity",
    "relationshipType": "SIMILAR_TO",
    "relationshipsExported": 10
  }
}

The returned result contains information about the job execution and result distribution. Additionally, each similarity score computed for the compared node pairs has been written back to the Snowflake database. We can query it like so:

SELECT * FROM EXAMPLE_DB.DATA_SCHEMA.PERSONS_SIMILARITY ORDER BY SIMILARITY DESC;

Which shows the computation results as stored in the database:

Table 7. Results
SOURCENODEID	TARGETNODEID	SIMILARITY
Alice	Bob	0.6666666666666666
Bob	Alice	0.6666666666666666
Alice	Dave	0.5
Dave	Alice	0.5
Alice	Carol	0.3333333333333333
Carol	Alice	0.3333333333333333
Carol	Dave	0.3333333333333333
Dave	Carol	0.3333333333333333
Bob	Dave	0.25
Dave	Bob	0.25

We use default values for the procedure configuration parameter. TopK is set to 10, topN is set to 0. Because of that, the result set contains the top 10 similarity scores for each node.

If we would like to instead compare the Instruments to each other, we would then project the LIKES relationship type using REVERSE orientation. This would return similarities for pairs of Instruments and not compute any similarities between Persons.

Source filter only

You can apply filtering only to source nodes, allowing all target nodes to be considered:

The following will run a Filtered Node Similarity job with only source filtering:

CALL Neo4j_Graph_Analytics.graph.filtered_node_similarity('CPU_X64_XS', {
    'defaultTablePrefix': 'EXAMPLE_DB.DATA_SCHEMA',
    'project': {
        'nodeTables': ['PERSONS', 'INSTRUMENTS'],
        'relationshipTables': {
          'LIKES': {
            'sourceTable': 'PERSONS',
            'targetTable': 'INSTRUMENTS'
          }
        }
    },
    'compute': {
        'sourceNodeFilter': ['Alice', 'Bob'],
        'sourceNodesTable': 'PERSONS'
    },
    'write': [{
        'outputTable': 'PERSONS_SIMILARITY_NAMES',
        'sourceLabel': 'PERSONS',
        'targetLabel': 'PERSONS',
        'relationshipType': 'SIMILAR_TO',
        'relationshipProperty': 'similarity'
    }]
});

Table 8. Results
JOB_ID	JOB_START	JOB_END	JOB_RESULT
job_c87ccfc6c46742548940fff74eeeeea6	2025-06-25 13:47:42.029	2025-06-25 13:47:47.708	{ "node_similarity_filtered_1": { "computeMillis": 77, "configuration": { "bottomK": 10, "bottomN": 0, "concurrency": 2, "degreeCutoff": 1, "jobId": "a1df6380-8d32-4904-be97-789542955b10", "logProgress": true, "mutateProperty": "similarity", "mutateRelationshipType": "SIMILAR_TO", "nodeLabels": [""], "relationshipTypes": [""], "similarityCutoff": 1.000000000000000e-42, "similarityMetric": "JACCARD", "sourceNodeFilter": "NodeFilter[4, 5]", "sourceNodesTable": {}, "sudo": false, "targetNodeFilter": "NodeFilter[NoOp]", "topK": 10, "topN": 0, "upperDegreeCutoff": 2147483647, "useComponents": true }, "mutateMillis": 217, "nodesCompared": 2, "postProcessingMillis": 0, "preProcessingMillis": 9, "relationshipsWritten": 5, "similarityDistribution": { "max": 0.6666679382324218, "mean": 0.4833332061767578, "min": 0.25, "p1": 0.25, "p10": 0.25, "p100": 0.6666660308837891, "p25": 0.3333320617675781, "p5": 0.25, "p50": 0.5000019073486328, "p75": 0.6666660308837891, "p90": 0.6666660308837891, "p95": 0.6666660308837891, "p99": 0.6666660308837891, "stdDev": 0.16996730465455073 } }, "project_1": { "graphName": "snowgraph", "nodeCount": 9, "nodeMillis": 236, "relationshipCount": 9, "relationshipMillis": 342, "totalMillis": 578 }, "write_relationship_type_1": { "exportMillis": 3438, "outputTable": "EXAMPLE_DB.DATA_SCHEMA.PERSONS_SIMILARITY_NAMES", "relationshipProperty": "similarity", "relationshipType": "SIMILAR_TO", "relationshipsExported": 5 } }

Table 8. Results

JOB_ID

JOB_START

JOB_END

JOB_RESULT

job_c87ccfc6c46742548940fff74eeeeea6

2025-06-25 13:47:42.029

2025-06-25 13:47:47.708

 {
  "node_similarity_filtered_1": {
    "computeMillis": 77,
    "configuration": {
      "bottomK": 10,
      "bottomN": 0,
      "concurrency": 2,
      "degreeCutoff": 1,
      "jobId": "a1df6380-8d32-4904-be97-789542955b10",
      "logProgress": true,
      "mutateProperty": "similarity",
      "mutateRelationshipType": "SIMILAR_TO",
      "nodeLabels": ["*"],
      "relationshipTypes": ["*"],
      "similarityCutoff": 1.000000000000000e-42,
      "similarityMetric": "JACCARD",
      "sourceNodeFilter": "NodeFilter[4, 5]",
      "sourceNodesTable": {},
      "sudo": false,
      "targetNodeFilter": "NodeFilter[NoOp]",
      "topK": 10,
      "topN": 0,
      "upperDegreeCutoff": 2147483647,
      "useComponents": true
    },
    "mutateMillis": 217,
    "nodesCompared": 2,
    "postProcessingMillis": 0,
    "preProcessingMillis": 9,
    "relationshipsWritten": 5,
    "similarityDistribution": {
      "max": 0.6666679382324218,
      "mean": 0.4833332061767578,
      "min": 0.25,
      "p1": 0.25,
      "p10": 0.25,
      "p100": 0.6666660308837891,
      "p25": 0.3333320617675781,
      "p5": 0.25,
      "p50": 0.5000019073486328,
      "p75": 0.6666660308837891,
      "p90": 0.6666660308837891,
      "p95": 0.6666660308837891,
      "p99": 0.6666660308837891,
      "stdDev": 0.16996730465455073
    }
  },
  "project_1": {
    "graphName": "snowgraph",
    "nodeCount": 9,
    "nodeMillis": 236,
    "relationshipCount": 9,
    "relationshipMillis": 342,
    "totalMillis": 578
  },
  "write_relationship_type_1": {
    "exportMillis": 3438,
    "outputTable": "EXAMPLE_DB.DATA_SCHEMA.PERSONS_SIMILARITY_NAMES",
    "relationshipProperty": "similarity",
    "relationshipType": "SIMILAR_TO",
    "relationshipsExported": 5
  }
}

SELECT * FROM EXAMPLE_DB.DATA_SCHEMA.PERSONS_SIMILARITY_NAMES ORDER BY SIMILARITY DESC;

Table 9. Results
SOURCENODEID	TARGETNODEID	SIMILARITY
Alice	Bob	0.6666666666666666
Bob	Alice	0.6666666666666666
Alice	Dave	0.5
Alice	Carol	0.3333333333333333
Bob	Dave	0.25

In this case, only Alice and Bob are used as source nodes, but they can be compared to all PERSONS target nodes including Dave and Carol.

The weighted similarity computation takes into account the relationship weights while applying the node filters, providing more nuanced similarity scores based on the strength of relationships between nodes.