Graph algorithms provide the means to understand, model and predict complicated dynamics such as the flow of resources or information, the pathways through which contagions or network failures spread, and the influences on and resiliency of groups.

This blog series is designed to help you better utilize graph analytics and graph algorithms so you can effectively innovate and develop intelligent solutions faster using a graph database like Neo4j.

Last week we started our look at Community Detection algorithms, with our look at the Strongly Connected Components algorithm.

Learn more about the Weakly Connected Component graph algorithm.


This week we continue our exploration of these algorithms, with a look at the Weakly Connected Components algorithm, which finds groups of nodes where each node is reachable from any other node in the same group, regardless of the direction of relationships.

About Weakly Connected Components (Union Find)


The Weakly Connected Components, or Union Find, algorithm finds sets of connected nodes in an undirected graph where each node is reachable from any other node in the same set.

It differs from the Strongly Connected Components algorithm (SCC) because it only needs a path to exist between pairs of nodes in one direction, whereas SCC needs a path to exist in both directions. As with SCC, Union Find is often used early in an analysis to understand a graph’s structure.

Bernard A. Galler and Michael J. Fischer first described this algorithm in 1964. The components in a graph are computed using either the breadth-first search or depth-first search algorithms.

When Should I Use Union Find?


    • Testing whether a graph is connected is an essential pre-processing step for every graph algorithm. Such tests are performed so quickly and easily that you should always verify that your input graph is connected, even when you know it has to be. Subtle, difficult-to-detect bugs often result when your algorithm is run only on one component of a disconnected graph.
    • Union Find is also used to keep track of clusters of database records, as part of the de-duplication process – an important task in master data management applications. Read more in An Efficient Domain-Independent Algorithm for Detecting Approximately Duplicate Database Records.”
    • Weakly Connected Components (WCC) is used to analyze citation networks as well. One study uses WCC to work out how well-connected the network is, and then to see whether the connectivity remains if “hub” or “authority” nodes are moved from the graph. Read more in “Characterizing and Mining Citation Graph of Computer Science Literature.”

Union Find Example


Let’s see the Union Find algorithm in action. The following Cypher statement creates a graph of people and their friends.

MERGE (nAlice:User {id:"Alice"})
MERGE (nBridget:User {id:"Bridget"})
MERGE (nCharles:User {id:"Charles"})
MERGE (nDoug:User {id:"Doug"})
MERGE (nMark:User {id:"Mark"})
MERGE (nMichael:User {id:"Michael"})
MERGE (nAlice)-[:FRIEND]->(nBridget)
MERGE (nAlice)-[:FRIEND]->(nCharles)
MERGE (nMark)-[:FRIEND]->(nDoug)
MERGE (nMark)-[:FRIEND]->(nMichael);



Now we run Union Find to find connected components. Execute the following query.

CALL algo.unionFind.stream("User", "FRIEND", {})
YIELD nodeId,setId<
MATCH (u:User) WHERE id(u) = nodeId
RETURN u.id AS user, setId





We have two distinct groups of users that have no link between them.

The first group contains Alice, Charles and Bridget, while the second group contains Michael, Doug and Mark.

Conclusion


As we've seen, the Weakly Connected Components algorithm is often used to enable running other algorithms independently on an identified cluster. As a preprocessing step for directed graphs, it helps quickly identify disconnected groups.

Next week we'll continue our focus on Community Detection algorithms, with a look at Label Propagation.


Find the patterns in your connected data
Learn about the power of graph algorithms in the O'Reilly book,
Graph Algorithms: Practical Examples in Apache Spark and Neo4j by the authors of this article. Click below to get your free ebook copy.


Get the O'Reilly Ebook


 

Explore:  


About the Author

Mark Needham & Amy E. Hodler , Neo4j

Mark Needham & Amy E. Hodler Image

Mark Needham is a Support Engineer for Neo4j. He also blogs about software development at markhneedham.com.

Amy is the Analytics and AI Program Manager at Neo4j. She believes a thriving graph ecosystem is essential to catalyze new types of insights. Accordingly, she helps ensure Neo4j partners are successful. In her career, Amy has consistently helped teams break into new markets at startups and large companies including EDS, Microsoft, and Hewlett-Packard (HP). She most recently comes from Cray Inc., where she was the analytics and artificial intelligence market manager.Amy has a love for science and art with an extreme fascination for complexity science and graph theory. When the weather is good, you’re likely to find her cycling the passes in beautiful Eastern Washington.


Leave a Reply

Your email address will not be published. Required fields are marked *