Breadth First Search

Introduction

The Breadth First Search (BFS) algorithm starts from a given node and visits nodes in order of increasing distance (number of hops) from the start, see https://en.wikipedia.org/wiki/Breadth-first_search. BFS is useful when the likelihood of finding relevant nodes decreases with distance.

There are multiple termination conditions supported for the traversal, based on either reaching one of several target nodes, reaching a maximum depth, or just traversing the whole graph.

The output of a BFS job is written to a Snowflake table. The BFS path consists of individual links source → target where source and target are consecutive nodes on the BFS path. The output table contains precisely these individual links which together form the BFS path. Since the BFS path may traverse nodes of various node labels, the labels of both nodes in the links are included in the output table.

Table 1. Output table schema
Name Type Description

SOURCENODEID

String

The id of the 'first' node of a link in the BFS path. Numeric ids are converted to string.

TARGETNODEID

String

The id of the 'second' node of a link in the BFS path. Numeric ids are converted to string.

SOURCELABEL

String

The name of the node label of the 'first' node of the link.

TARGETLABEL

String

The name of the node label of the 'second' node of the link.

We denote such relationship tables as heterogeneous (over node labels).

The order of rows in the output table is not guaranteed to match the BFS visitation order. Also, the links of the BFS exploration are not necessarily relationships in the original graph, due to back-tracking 'jumps'.

Syntax

This section covers the syntax used to execute the Breadth First Search algorithm.

Run Breadth First Search.
CALL Neo4j_Graph_Analytics.graph.bfs(
  'CPU_X64_XS',                    (1)
  {
    ['defaultTablePrefix': '...',] (2)
    'project': {...},              (3)
    'compute': {...},              (4)
    'write':   {...}               (5)
  }
);
1 Compute pool selector.
2 Optional prefix for table references.
3 Project config.
4 Compute config.
5 Write config.
Table 2. Parameters
Name Type Default Optional Description

computePoolSelector

String

n/a

no

The selector for the compute pool on which to run the BFS job.

configuration

Map

{}

no

Configuration for graph project, algorithm compute and result write back.

For more details on below Project configuration, refer to the Project documentation.
Table 3. Project configuration
Name Type

nodeTables

List of node tables.

relationshipTables

Map of relationship types to relationship tables.

Table 4. Compute configuration
Name Type Default Optional Description

sourceNode

String or Integer

n/a

no

The node id of the node where to start the traversal.

sourceNodeTable

String

n/a

no

Fully-qualified node table containing the sourceNode in its NODEID column.

targetNodes

List<String or Integer>

[]

yes

Set of target node identifiers; traversal stops when any is reached.

targetNodesTable

String

n/a

depends on targetNodes

Fully-qualified node table containing the targetNodes in its NODEID column. Optional if targetNodes is empty.

maxDepth

Integer

-1

yes

The maximum distance from the source node at which nodes are visited. Value of -1 means search continues to arbitrary distance.

For more details on below Write configuration, refer to the Write documentation.
Table 5. Write configuration
Name Type Default Optional Description

outputTable

String

n/a

no

Table in Snowflake database to which relationships are written.

Examples

Now we will look at how to apply BFS to a small directed example graph. The graph is depicted like so:

Visualization of the example graph

The queries below create the example graph.

-- CREATE DATABASE EXAMPLE_DB; -- to do once
USE DATABASE EXAMPLE_DB;
-- CREATE SCHEMA DATA_SCHEMA; -- to do once
USE SCHEMA DATA_SCHEMA;

CREATE OR REPLACE TABLE nodes_A (nodeId varchar);
INSERT INTO nodes_A (nodeId)
SELECT 'a1' UNION ALL
SELECT 'a2';

CREATE OR REPLACE TABLE nodes_B (nodeId Number);
INSERT INTO nodes_B (nodeId)
SELECT 1 UNION ALL
SELECT 7;

CREATE OR REPLACE TABLE nodes_C (nodeId Number);
INSERT INTO nodes_C (nodeId)
SELECT 1 UNION ALL
SELECT 2;

CREATE OR REPLACE TABLE relationships_R (sourceNodeId varchar, targetNodeId Number);
INSERT INTO relationships_R VALUES ('a1', 7), ('a2', 1);

CREATE OR REPLACE TABLE relationships_S (sourceNodeId varchar, targetNodeId Number);
INSERT INTO relationships_S VALUES ('a1', 1);

CREATE OR REPLACE TABLE relationships_T (sourceNodeId Number, targetNodeId varchar);
INSERT INTO relationships_T VALUES (7, 'a2');

CREATE OR REPLACE TABLE relationships_U (sourceNodeId Number, targetNodeId Number);
INSERT INTO relationships_U VALUES (1, 2);

Run job (full traversal from a source)

Running a Breadth First Search job involves the three steps: Project, Compute and Write.

To run the query, there is a required setup of grants for the application, your consumer role and your environment. Please see the Getting started page for more on this.

We also assume that the application name is the default Neo4j_Graph_Analytics. If you chose a different app name during installation, please replace it with that.

The following runs BFS from node A and writes the BFS path to an output relationship table:
CALL Neo4j_Graph_Analytics.graph.bfs('CPU_X64_XS', {
    'defaultTablePrefix': 'EXAMPLE_DB.DATA_SCHEMA',
    'project': {
        'nodeTables': ['nodes_A', 'nodes_B', 'nodes_C'],
        'relationshipTables': {
            'relationships_R': {
                'sourceTable': 'nodes_A',
                'targetTable': 'nodes_B'
            },
            'relationships_S': {
                'sourceTable': 'nodes_A',
                'targetTable': 'nodes_C'
            },
            'relationships_T': {
                'sourceTable': 'nodes_B',
                'targetTable': 'nodes_A'
            },
            'relationships_U': {
                'sourceTable': 'nodes_B',
                'targetTable': 'nodes_C'
            }
        }
    },
    'compute': {
      'sourceNodeTable': 'nodes_A',
      'sourceNode': 'a1',
      'targetNodesTable': 'nodes_C',
      'targetNodes': [2], -- will not be reached due to maxDepth
      'maxDepth': 2
    },
    'write': [{'outputTable': 'BFS_NEXT'}]
})
Table 6. Results
JOB_ID JOB_STATUS JOB_START JOB_END JOB_RESULT

job_8a3f2e0b1c4248d3a0f3a9f0e3f1abcd

SUCCESS

2025-06-30 11:05:12.210

2025-06-30 11:05:16.845

 {
    "bfs_1": {
      "computeMillis": 12,
      "configuration": {
        "concurrency": 6,
        "jobId": "a41ac8f8-acf0-4d8d-a7d4-f6b80786a6fe",
        "logProgress": true,
        "maxDepth": 2,
        "mutateRelationshipType": "NEXT",
        "nodeLabels": [
          "*"
        ],
        "relationshipTypes": [
          "*"
        ],
        "sourceNode": "a1",
        "sourceNodeTable": "EXAMPLE_DB.DATA_SCHEMA.nodes_A",
        "sudo": false,
        "targetNodes": [
          2
        ],
        "targetNodesTable": "EXAMPLE_DB.DATA_SCHEMA.nodes_C"
      },
      "mutateMillis": 2,
      "postProcessingMillis": 0,
      "preProcessingMillis": 8,
      "relationshipsWritten": 3
    },
    "project_1": {
      "graphName": "snowgraph",
      "nodeCount": 6,
      "nodeMillis": 308,
      "relationshipCount": 5,
      "relationshipMillis": 572,
      "totalMillis": 880
    },
    "write_relationship_type_1": {
      "exportMillis": 2162,
      "outputTable": "EXAMPLE_DB.DATA_SCHEMA.BFS_NEXT",
      "relationshipProperty": "<none>",
      "relationshipType": "NEXT",
      "relationshipsExported": 3
    }
  }

We can now query the written results as follows.

SELECT * FROM EXAMPLE_DB.DATA_SCHEMA.BFS_NEXT;
Table 7. Possible results
SOURCENODEID TARGETNODEID SOURCELABEL TARGETLABEL

a1

1

nodes_A

nodes_C

7

a2

nodes_B

nodes_A

1

7

nodes_C

nodes_B

These rows together represent the BFS visitation order ('a1': nodes_A) → (1: nodes_C) → (7: nodes_B) → ('a2': nodes_A). The path contains all nodes at depth 2 or less, because the only target node (2: nodes_C) has greater depth and would only be reached later. The node (1: nodes_B) has depth 3, which is less than the depth of target node, but more than the configured maximum depth of 2; thus it is not explored either.