Graph Algorithms in Neo4j: Minimum Weight Spanning Tree
4 min read

About Minimum Weight Spanning Tree
The Minimum Weight Spanning Tree starts from a given node, and finds all its reachable nodes and the set of relationships that connect the nodes together with the minimum possible weight. Prim’s algorithm is one of the simplest and best-known Minimum Weight Spanning Tree algorithms. The K-Means variant of this algorithm can be used to detect clusters in the graph. The first known algorithm for finding a Minimum Weight Spanning Tree was developed by the Czech scientist Otakar Borůvka in 1926 while trying to design an efficient electricity network for Moravia. Prim’s algorithm was invented by Jarnik in 1930 and rediscovered by Prim in 1957. It is similar to Dijkstra’s Shortest Path algorithm, but rather than minimizing the total length of a path ending at each relationship, it minimizes the length of each relationship individually. Unlike Dijkstra’s, Prim’s tolerates negative-weight relationships.NOTE: The algorithm operates as follows:
- Start with a tree containing only one node (and no relationships).
- Select the minimal-weight relationship coming from that node and add it to our tree.
- Repeatedly choose a minimal-weight relationship that joins any node in the tree to one that is not in the tree, adding the new relationship and node to our tree.
- When there are no more nodes to add, the tree we have built is a minimum spanning tree.
When Should I Use Minimum Weight Spanning Tree?
-
- Minimum Weight Spanning Tree was applied to analyze airline and sea connections of Papua New Guinea and minimize the travel cost of exploring the country. It could be used to help design low-cost tours that visit many destinations across a country. The research mentioned is found here: “An Application of Minimum Spanning Trees to Travel Planning.”
- Minimum Weight Spanning Tree has been used to analyze and visualize correlations in a network of currencies based on the correlation between currency returns. This is described in https://www.nbs.sk/_img/Documents/_PUBLIK_NBS_FSR/Biatec/Rok2013/07-2013/05_biatec13-7_resovsky_EN.pdf” target=”_blank”>”Minimum Spanning Tree Application in the Currency Market.”
- Exhaustive clinical research has shown Minimum Weight Spanning Tree to be useful in tracing the history of infection transmission in an outbreak. For more information, see “Use of the Minimum Spanning Tree Model for Molecular Epidemiological Investigation of a Nosocomial Outbreak of Hepatitis C Virus Infection.”
TIP: The Minimum Weight Spanning Tree algorithm only gives meaningful results when run on a graph where the relationships have different weights. If the graph has no weights, or all relationships have the same weight, then any spanning tree is a minimum spanning tree.
Minimum Weight Spanning Tree Example
Let’s see the Minimum Weight Spanning Tree algorithm in action. The following Cypher statement creates a graph containing places and links between them.MERGE (a:Place {id:"A"}) MERGE (b:Place {id:"B"}) MERGE (c:Place {id:"C"}) MERGE (d:Place {id:"D"}) MERGE (e:Place {id:"E"}) MERGE (f:Place {id:"F"}) MERGE (g:Place {id:"G"}) MERGE (d)-[:LINK {cost:4}]->(b) MERGE (d)-[:LINK {cost:6}]->(e) MERGE (b)-[:LINK {cost:1}]->(a) MERGE (b)-[:LINK {cost:3}]->(c) MERGE (a)-[:LINK {cost:2}]->(c) MERGE (c)-[:LINK {cost:5}]->(e) MERGE (f)-[:LINK {cost:1}]->(g);We run the algorithm to find the Minimum Weight Spanning Tree starting from D by executing the following query.
MATCH (n:Place {id:"D"}) CALL algo.spanningTree.minimum("Place", "LINK", "cost", id(n), {write:true, writeProperty:"MINST"}) YIELD loadMillis, computeMillis, writeMillis, effectiveNodeCount RETURN loadMillis, computeMillis, writeMillis, effectiveNodeCount; This procedure creates MINST relationships representing the minimum spanning tree. We then run the following query to find all pairs of nodes and the associated cost of the relationships between them. MATCH path = (n:Place {id:"D"})-[:MINST*]-() WITH relationships(path) AS rels UNWIND rels AS rel WITH DISTINCT rel AS rel RETURN startNode(rel).id AS source, endNode(rel).id AS destination, rel.cost AS costThe Minimum Weight Spanning Tree excludes the relationship with cost
6
from D
to E
, and the one with cost 3
from B
to C
. Nodes F
and G
aren’t included because they’re unreachable from D
. There are also variations of the algorithm that find the Maximum Weight Spanning Tree or K-Spanning Tree.