Graph algorithms provide the means to understand, model and predict complicated dynamics such as the flow of resources or information, the pathways through which contagions or network failures spread, and the influences on and resiliency of groups.

This blog series is designed to help you better utilize graph analytics and graph algorithms so you can effectively innovate and develop intelligent solutions faster using a graph database like Neo4j.

Last week we continued our focus on Community Detection algorithms, with a look at the Label Propagation algorithm.

Learn more about the Louvain Modularity algorithm.


This week we continue our exploration of Community Detection algorithms, with a look at the Louvain Modularity algorithm, which measures the quality (i.e., presumed accuracy) of a community grouping by comparing its relationship density to a suitably defined random network.

About Louvain Modularity


The Louvain method of community detection is an algorithm for detecting communities in networks. It maximizes a modularity score for each community, where the modularity quantifies the quality of an assignment of nodes to communities by evaluating how much more densely connected the nodes within a community are, compared to how connected they would be in a random network.

The Louvain algorithm is one of the fastest modularity-based algorithms and works well with large graphs. It also reveals a hierarchy of communities at different scales, which is useful for understanding the global functioning of a network.

In order to understand the Louvain modularity algorithm, we must first look at modularity in general.

Learn more about modularity in a graph database.


Modularity is a measure of how well groups have been partitioned into clusters. It compares the relationships in a cluster compared to what would be expected for a random (or other baseline) number of connections.

Compare data relationships in a cluster.


The Louvain algorithm was proposed in 2008. The method consists of repeated application of two steps. The first step is a “greedy” assignment of nodes to communities, favoring local optimizations of modularity. The second step is the definition of a new coarse-grained network based on the communities found in the first step. These two steps are repeated until no further modularity-increasing reassignments of communities are possible.

When Should I Use Louvain?


TIP: Although the Louvain method, and modularity optimization algorithms more generally, have found wide applications across many domains, some problems with these algorithms have been identified:

1. The resolution limit
For larger networks, the Louvain method doesn’t stop with the “intuitive” communities. Instead, there’s a second pass through the community modification and coarse-graining stages, in which several of the intuitive communities are merged together. This is a general problem with modularity optimization algorithms; they have trouble detecting small communities in large networks. It’s a virtue of the Louvain method that something close to the intuitive community structure is available as an intermediate step in the process.

2. The degeneracy problem
There is typically an exponentially large (in network size) number of community assignments with modularities close to the maximum. This can be a severe problem because, in the presence of a large number of high modularity solutions, it’s hard to find the global maximum and difficult to determine if the global maximum is truly more scientifically important than local maxima that achieve similar modularity. Research shows that the different locally optimal community assignments have different structural properties. For more information, see The performance of modularity maximization in practical contexts.”

Louvain Example


Let’s see the Louvain algorithm in action. The following Cypher statement creates a graph of users and friends:

MERGE (nAlice:User {id:"Alice"})
MERGE (nBridget:User {id:"Bridget"})
MERGE (nCharles:User {id:"Charles"})
MERGE (nDoug:User {id:"Doug"})
MERGE (nMark:User {id:"Mark"})
MERGE (nMichael:User {id:"Michael"})
MERGE (nAlice)-[:FRIEND]->(nBridget)
MERGE (nAlice)-[:FRIEND]->(nCharles)
MERGE (nMark)-[:FRIEND]->(nDoug)
MERGE (nBridget)-[:FRIEND]->(nMichael)
MERGE (nCharles)-[:FRIEND]->(nMark)
MERGE (nAlice)-[:FRIEND]->(nMichael)
MERGE (nCharles)-[:FRIEND]->(nDoug);

Check out the Louvain Modularity graph model.

Now we run Louvain to find communities in the social network. Execute the following query:

CALL algo.louvain.stream("User", "FRIEND", {})
YIELD nodeId, community
MATCH (user:User) WHERE id(user) = nodeId
RETURN user.id AS user, community
ORDER BY community;

See the results of running the Louvain Modularity graph algorithm.


Our algorithm found two communities with three members each.

Mark, Doug and Charles are all friends with each other, as are Bridget, Alice and Michael. Charles is the only one who has friends in both communities, but he has more in community four so he fits better in that one.

Conclusion


As we’ve seen, the Louvain Modularity algorithm is used to evaluate social structures in Twitter, LinkedIn and YouTube. It’s also used in fraud analytics to evaluate whether a group has just a few bad behaviors or is acting as a fraud ring, which would be indicated by a higher relationship density than average.

Next week we’ll continue our focus on Community Detection algorithms, with a look at the Triangle Count and Average Clustering Coefficient algorithm.


Find the patterns in your connected data
Learn about the power of graph algorithms in the O’Reilly book,
Graph Algorithms: Practical Examples in Apache Spark and Neo4j by the authors of this article. Click below to get your free ebook copy.


Get the O’Reilly Ebook


 

Explore:  


About the Author

Mark Needham & Amy E. Hodler , Neo4j

Mark Needham & Amy E. Hodler Image

Mark Needham is a Support Engineer for Neo4j. He also blogs about software development at markhneedham.com.

Amy is the Analytics and AI Program Manager at Neo4j. She believes a thriving graph ecosystem is essential to catalyze new types of insights. Accordingly, she helps ensure Neo4j partners are successful. In her career, Amy has consistently helped teams break into new markets at startups and large companies including EDS, Microsoft, and Hewlett-Packard (HP). She most recently comes from Cray Inc., where she was the analytics and artificial intelligence market manager.Amy has a love for science and art with an extreme fascination for complexity science and graph theory. When the weather is good, you’re likely to find her cycling the passes in beautiful Eastern Washington.


4 Comments

Jan Zak says:

For better results, I recommend to check the new Leiden algorithm published last year: https://arxiv.org/abs/1810.08473

Amy Hodler says:

Hi Jan!
I’m really interested in the Leiden algorithm. It seems quite promising but I’m wondering how difficult it might be to implement. Have you implemented? Please reach out to me directly if you can share your experience with it! amy.hodler@neo4j.com

Maciej Muras says:

Hello, could you explain how to apply query returning modularity each iteration of algorithm?

Amy Hodler says:

Hi Maciej,
If you’re not seeing the intermediary results, check that the parameter is set:
includeIntermediateCommunities: true

Our documentation has more detail on this setting: https://neo4j.com/docs/graph-algorithms/current/algorithms/louvain/

Best!
Amy

Leave a Reply

Your email address will not be published. Required fields are marked *