Online Course Introduction to Graph Algorithms with Neo4j 4.0 Overview of Graph Algorithms Introduction to Graph Data Science library Environment Setup Graph Algorithms Workflow Memory Requirements Estimation Community Detection Algorithms Centrality Algorithms Similarity Algorithms Practical Application of Algorithms Additional Information… Read more →
Community Detection Algorithms
Community Detection algorithms
Community Detection algorithms are designed to help you discover and understand the structure of complex networks.
Understanding the community structure has many realworld applications in sociology, biology, and computer science. You can think of a community as a denselyconnected group of nodes, similar to how a group of friends is highly interconnected.
These algorithms are supported:
 Weakly Connected Components (unionFind)
 Label Propagation
 Louvain Modularity
 Triangle Counting
 Local Clustering Coefficient
Labs implementations:
 Strongly Connected Components
 K1 Coloring
 Modularity optimization
Weakly Connected Components
The Weakly Connected Components (WCC) algorithm is used to find disconnected subgraphs or islands within our network.
In a single connected component, each node can reach all other nodes when disregarding relationships’ direction, effectively treating the connections as undirected. Understanding how many connected components exist in our network is crucial to understanding the graph structure. For that reason, the WCC algorithm is often used early in graph analysis. A best practice is to run WCC to test whether a graph is connected as a preparatory step for all other graph algorithms. Performing this quick test can avoid accidentally running algorithms on only one disconnected component of a graph and getting incorrect results. It can also help you to better comprehend results from other algorithms.
Why use Weakly Connected Components?
Here is why you use WCC:
 Early step in graph analysis to see how a network is connected
 Fraud detection
Here are some examples:
 Finding common identifiers in disambiguation process.
 Identifying disconnected subgraphs to run resource intensive algorithms (like Node Similarity) in parallel.
Label Propagation
The Label Propagation algorithm (LPA) is a fast algorithm for describing communities in a graph. It detects these groups using network topology alone, and does not require any prior information about the communities, or their structure.
In the LPA, the nodes select their community based on their direct neighbors using the node labels. In addition, weights on nodes and relationships can also be considered. The idea is that a single label can quickly become dominant in a denselyconnected group of nodes, but it will have trouble crossing a sparsely connected region.
Here is how the Label Propagation algorithm works. First, every node is initialized with a property. By default, the initial property is unique for every node. However, the LPA also lends itself well to semisupervised learning because you can seed the initial properties with preassigned node labels that you know are predictive.
In this example, we have started with 2 A nodes, but left all other nodes as unique. We are using the node default weights of 1. Nodes are then processed randomly, with each node acquiring its neighbor’s label with the maximum weight. So, in the first iteration, the left A acquires the label F, B acquires the label D, and C now becomes A. The maximum weight is calculated based on the weights of neighbor nodes and their relationships. In addition, ties are broken uniformly and randomly. There will be times when a label is not updated because the neighbor with the maximum weight has the same label. Iterations continue until each node has the majority label of its neighbors or reached the maximum iteration limit. A maximum iteration limit will prevent endless cycles where the algorithm cannot converge on a solution, essentially getting caught in a flipflop cycle for some labels. In contrast to other algorithms, LPA can return different community structures when run multiple times on the same graph. The order in which LPA evaluates nodes can influence the final communities it returns. Another factor is the random tiebreaking process.
Why use Label Propagation?
Here is why you use LPA:
 Community detection
 Semisupervised community detection
 Preprocessing data (classification)
Here are some examples:
 Assigning polarity of tweets as a part of semantic analysis. In this scenario, positive and negative seed labels from a classifier are used in combination with the Twitter follower graph. For more information, see Twitter polarity classification with label propagation over lexical links and the follower graph.
 Finding potentially dangerous combinations of possible coprescribed drugs, based on the chemical similarity and side effect profiles. This study is found here.
 Label Propagation prediction of drugdrug interactions based on clinical side effects.
Louvain Modularity
The Louvain Modularity algorithm is used to detect communities in large networks. You can think of the algorithm doing a “what if” analysis to try out various groupings with the goal of eventually reaching a global modularity optimum.
The Louvain Modularity algorithm consists of repeated application of two steps. The first step is a “greedy” assignment of nodes to communities, favoring local optimizations of modularity. The modularity score quantifies the quality of an assignment of nodes to communities. This process evaluates how much more densely connected the nodes within a community are, compared to how connected they would be in a random network. It starts by calculating the change in modularity if that node joins a community with each of its immediate neighbors. The node then joins the node with the highest modularity change. The process is repeated for each node until the optimal communities are formed. The second step is defining a new coarsegrained network, based on the communities found in the first step. These two steps are repeated until no further modularityincreasing reassignments of communities are possible.
In this example, we can see how the Louvain Modularity algorithm works. First, the algorithm assigns nodes to communities by favoring local optimization of modularity. In our case, the algorithm found four groups of nodes, which are indicated by node color. In the second step, the algorithm merges each group of nodes into a single node. The count of links between nodes within the same community and between various communities is now represented as a weighted relationship between the newlymerged nodes. Once the new network is created, the whole process is repeated until a modularity maximum is reached. The Louvain Modularity algorithm is interesting, because you can observe both the final as well as the intermediate communities that are calculated at the end of each level. It is regarded as a hierarchical clustering algorithm because a hierarchy of communities is produced as a result. As a rule of thumb, the communities on lower levels are smaller and more finegrained than the communities found on higher and final levels.
Why use Louvain Modularity?
Here is why you use Louvain:
 Community detection in large networks
 Uncover hierarchical structures in data
 Evaluate different grouping thresholds
Here are some examples:
 Extracting topics from online social platforms, like Twitter and YouTube, based on the cooccurence of terms in documents as part of the topic modeling process.
 Finding hierarchical community structures within the brain’s functional network.
 Evaluating criminal networks and holes in the structure.
Triangle Count
A triangle contains three nodes where each node has a connection to the other two. In graph theory terminology, a triangle is equivalent to a 3clique.
The Triangle Count algorithm counts the number of triangles for each node in the graph.
It has gained popularity in social network analysis, where it is used to measure the cohesiveness and stability of networks. It is also one of the indices used in the computation of the local clustering coefficients.
Note

The Triangle Count algorithm in the GDSL only finds triangles in undirected graphs. 
In this example, we count the number of triangles passing through node U. In the left example, two triangles pass through node U. The first triangle consists of node U and leftside nodes and the second triangle consists of node U and bottomside nodes. In the second example, we connect the top right nodes, which produces another triangle.
Why use Triangle Count?
Here is why you use Triangle Count:
 Estimating stability
 Part of the Local Clustering Coefficient calculation
 Scoring for machine learning
Here is an example:
Identifying features for classifying a given website as spam content. This is described in Efficient Semistreaming Algorithms for Local Triangle Counting in Massive.
Local Clustering Coefficient
The Local Clustering Coefficient is the probability that neighbors of a particular node are connected to each other.
The goal of the Local Clustering Coefficient algorithm is to measure how tightly a group is clustered compared to how tightly it could be clustered. The algorithm uses Triangle Count in its calculations, which provides a ratio of existing triangles to possible relationships. A maximum value of 1 indicates a clique, where every node is connected to every other node.
The Local Clustering Coefficient describes how many of the node’s neighbors are also connected. In the left example, the probability of node U neighbors being connected is 20%. Node U has five neighbors. If all the neighbors were connected to each other, that would be ten relationships between neighbors. Because there are only two relationships between neighbors, the Local Clustering Coefficient is 0.2.
Why use Local Clustering Coefficient?
Here is why you use Local Clustering Coefficient:
 Estimating network stability
 Finding structural holes
 Scoring for machine learning
Here are some examples:
 Investigating the community structure of Facebook’s social graph, where researchers found dense neighborhoods of users in an otherwise sparse global graph. Find this study in The Anatomy of the Facebook Social Graph.
 Exploring the thematic structure of the Web and detecting communities of pages with a common topics based on the reciprocal links between them. For more information, see Curvature of colinks uncovers hidden thematic layers in the World Wide Web.
Check your understanding
Question 1
What algorithm do you use to calculate the number of triangles a nodes belongs to?
Select the correct answer.
 Triangle Count
 Louvain Modularity
 Weakly Connected Components
 Label Propagation
Summary
In this module you gained experience running the Neo4j supported algorithms for Community Detection:
 Weakly Connected Components (unionFind)
 Label Propagation
 Louvain Modularity
 Triangle Count
 Local Clustering Coefficient
You can read more about these algorithms and also the alpha (labs) algorithms in the Graph Data Science documentation