Delta-Stepping Single-Source Shortest Path

Introduction

The Delta-Stepping Shortest Path algorithm computes all shortest paths between a source node and all reachable nodes in the graph. The algorithm supports weighted graphs with positive relationship weights. To compute the shortest path between a source and a single target node, Dijkstra Source-Target can be used.

In contrast to Dijkstra Single-Source, the Delta-Stepping algorithm is a distance correcting algorithm. This property allows it to traverse the graph in parallel. The algorithm is guaranteed to always find the shortest path between a source node and a target node. However, if multiple shortest paths exist between two nodes, the algorithm is not guaranteed to return the same path in each computation.

The Neo4j Graph Analytics implementation is based on [1] and incorporates the bucket fusion optimization discussed in [2]. The algorithm implementation is executed using multiple threads.

For more information on this algorithm, see:

Syntax

This section covers the syntax used to execute the Delta-Stepping algorithm.

Run Delta-Stepping.
CALL Neo4j_Graph_Analytics.graph.delta_stepping(
  'CPU_X64_XS',                    (1)
  {
    ['defaultTablePrefix': '...',] (2)
    'project': {...},              (3)
    'compute': {...},              (4)
    'write':   {...}               (5)
  }
);
1 Compute pool selector.
2 Optional prefix for table references.
3 Project config.
4 Compute config.
5 Write config.
Table 1. Parameters
Name Type Default Optional Description

computePoolSelector

String

n/a

no

The selector for the compute pool on which to run the Delta-Stepping Single-Source job.

configuration

Map

{}

no

Configuration for graph project, algorithm compute and result write back.

The configuration map consists of the following three entries.

For more details on below Project configuration, refer to the Project documentation.
Table 2. Project configuration
Name Type

nodeTables

List of node tables.

relationshipTables

Map of relationship types to relationship tables.

Table 3. Compute configuration
Name Type Default Optional Description

mutateProperty

String

'total_cost'

yes

The relationship property that will be written back to the Snowflake database.

mutateRelationshipType

String

'PATH'

yes

The relationship type used for the relationships written back to the Snowflake database.

sourceNode

Integer or String

n/a

no

The source node identifier.

sourceNodeTable

String

n/a

no

A table for mapping the source node identifier.

relationshipWeightProperty

String

null

yes

Name of the relationship property to use as weights. If unspecified, the algorithm runs unweighted.

delta

Float

2.0

yes

The bucket width for grouping nodes with the same tentative distance to the source node.

For more details on below Write configuration, refer to the Write documentation.
Table 4. Write configuration
Name Type Default Optional Description

sourceLabel

String

n/a

no

Node label in the in-memory graph for start nodes of relationships to be written back.

targetLabel

String

n/a

no

Node label in the in-memory graph for end nodes of relationships to be written back.

outputTable

String

n/a

no

Table in Snowflake database to which relationships are written.

relationshipType

String

'PATH'

yes

The relationship type that will be written back to the Snowflake database.

relationshipProperty

String

'total_cost'

yes

The relationship property that will be written back to the Snowflake database.

Delta

The delta parameter defines a range which is used to group nodes with the same tentative distance to the start node. The ranges are also called buckets. In each iteration of the algorithm, the non-empty bucket with the smallest tentative distance is processed in parallel. The delta parameter is the main tuning knob for the algorithm and controls the workload that can be processed in parallel. Generally, for power-law graphs, where many nodes can be reached within a few hops, a small delta (e.g. 2) is recommended. For high-diameter graphs, e.g. transport networks, a high delta value (e.g. 10000) is recommended. Note, that the value might vary depending on the graph topology and the value range of relationship properties.

Examples

Now we will look at how to apply Delta-Stepping to a road network.

CREATE OR REPLACE TABLE EXAMPLE_DB.DATA_SCHEMA.locations (NODEID VARCHAR);
INSERT INTO EXAMPLE_DB.DATA_SCHEMA.locations VALUES
  ('A'),
  ('B'),
  ('C'),
  ('D'),
  ('E'),
  ('F');

CREATE OR REPLACE TABLE EXAMPLE_DB.DATA_SCHEMA.roads (SOURCENODEID VARCHAR, TARGETNODEID VARCHAR, COST FLOAT);
INSERT INTO EXAMPLE_DB.DATA_SCHEMA.roads VALUES
  ('A', 'B',  50),
  ('A', 'C',  50),
  ('A', 'D', 100),
  ('B', 'D',  40),
  ('C', 'D',  40),
  ('C', 'E',  80),
  ('D', 'E',  30),
  ('D', 'F',  80),
  ('E', 'F',  40);

This graph builds a transportation network with roads between locations. Like in the real world, the roads in the graph have different lengths. These lengths are represented by the cost relationship property.

In the following example we will demonstrate the use of the Delta-Stepping Shortest Path algorithm using this graph.

Run job

Running a Delta-Stepping job involves the three steps: Project, Compute and Write.

The following will run the algorithm, and write results back to your tables:
CALL Neo4j_Graph_Analytics.graph.delta_stepping('CPU_X64_XS', {
    'defaultTablePrefix': 'EXAMPLE_DB.DATA_SCHEMA',
    'project': {
        'nodeTables': [ 'LOCATIONS' ],
        'relationshipTables': {
            'roads': {
                'sourceTable': 'LOCATIONS',
                'targetTable': 'LOCATIONS'
            }
        }
    },
    'compute': {
        'sourceNode': 'A',
        'sourceNodeTable': 'LOCATIONS',
        'relationshipWeightProperty': 'COST'
    },
    'write': [{
        'sourceLabel': 'LOCATIONS',
        'targetLabel': 'LOCATIONS',
        'outputTable': 'PATHS'
    }]
});
Table 5. Results
JOB_ID JOB_START JOB_END JOB_RESULT

job_a58bc1934def4d5ca9022528ea2e55a1

2025-07-17 13:22:37.239

2025-07-17 13:22:41.800

{
  "delta_stepping_1": {
    "computeMillis": 19,
    "configuration": {
      "concurrency": 6,
      "delta": 2,
      "jobId": "a7e0ce40-9c37-41c7-adb2-7a3cd59fe4b5",
      "logProgress": true,
      "mutateRelationshipType": "PATH",
      "nodeLabels": [
        "*"
      ],
      "relationshipTypes": [
        "*"
      ],
      "relationshipWeightProperty": "COST",
      "sourceNode": "A",
      "sourceNodeTable": "EXAMPLE_DB.DATA_SCHEMA.LOCATIONS",
      "sudo": false
    },
    "mutateMillis": 0,
    "postProcessingMillis": 0,
    "preProcessingMillis": 7
  },
  "project_1": {
    "graphName": "snowgraph",
    "nodeCount": 6,
    "nodeMillis": 111,
    "relationshipCount": 9,
    "relationshipMillis": 263,
    "totalMillis": 374
  },
  "write_relationship_type_1": {
    "exportMillis": 1798,
    "outputTable": "EXAMPLE_DB.DATA_SCHEMA.PATHS",
    "relationshipProperty": "[SOURCENODEID, TARGETNODEID, NODEIDS, NODELABELS, COSTS, TOTALCOST]",
    "relationshipType": "PATH",
    "relationshipsExported": 6
  }
}

The returned result contains information about the job execution. Additionally, the shortest path(s) have been written back to the Snowflake database. We can query it like so:

SELECT * FROM EXAMPLE_DB.DATA_SCHEMA.PATHS;

Which shows the computation results as stored in the database:

Table 6. Results
SOURCENODEID TARGETNODEID NODEIDS NODELABELS COSTS TOTALCOST

A

A

["A"]

["LOCATIONS"]

[0]

0

A

B

["A", "B"]

["LOCATIONS", "LOCATIONS"]

[0, 50]

50

A

C

["A", "C"]

["LOCATIONS", "LOCATIONS"]

[0, 50]

50

A

D

["A", "B", "D"]

["LOCATIONS", "LOCATIONS", "LOCATIONS"]

[0, 50, 90]

90

A

E

["A", "B", "D", "E"]

["LOCATIONS", "LOCATIONS", "LOCATIONS", "LOCATIONS"]

[0, 50, 90, 120]

120

A

F

["A", "B", "D", "E", "F"]

["LOCATIONS", "LOCATIONS", "LOCATIONS", "LOCATIONS", "LOCATIONS"]

[0, 50, 90, 120, 160]

160

The result shows the total cost of the shortest path between node A and all other reachable nodes in the graph. It also shows ordered lists of node ids (and their labels) that were traversed to find the shortest paths as well as the accumulated costs of the visited nodes. This can be verified in the example graph.

The relationships written are always directed, even if the input graph is undirected.