5.1. The Louvain algorithm

This section describes the Louvain algorithm in the Neo4j Graph Algorithms library.

The Louvain method of community detection is an algorithm for detecting communities in networks. It maximizes a modularity score for each community, where the modularity quantifies the quality of an assignment of nodes to communities by evaluating how much more densely connected the nodes within a community are, compared to how connected they would be in a random network.

The Louvain algorithm is one of the fastest modularity-based algorithms, and works well with large graphs. It also reveals a hierarchy of communities at different scales, which can be useful for understanding the global functioning of a network.

5.1.1. History and explanation

The "Louvain algorithm" was proposed in 2008 by authors from the University of Louvain.

The method consists of repeated application of two steps. The first step is a "greedy" assignment of nodes to communities, favoring local optimizations of modularity. The second step is the definition of a new coarse-grained network, based on the communities found in the first step. These two steps are repeated until no further modularity-increasing reassignments of communities are possible.

The algorithm is initialized with each node in its own community.

In the first stage we iterate through each of the nodes in the network. We take each node, remove it from its current community and replace it in the community of ones of its neighbors. We compute the modularity change for each of the node’s neighbors. If none of these modularity changes are positive, the node stays in its current community. If some of the modularity changes are positive, the node moves into the community where the modularity change is most positive. Ties are resolved arbitrarily. We repeat this process for each node until one pass through all nodes yields no community assignment changes.

The second stage in the Louvain method uses the communities that were discovered in the community reassignment stage, to define a new coarse-grained network. In this network, the newly discovered communities are the nodes. The relationship weight between the nodes representing two communities is the sum of the relationship weights between the lower-level nodes of each community.

The rest of the Louvain method consists of the repeated application of stages 1 and 2. By applying stage 1 (the community reassignment phase) to the coarse-grained graph, we find a second tier of communities of communities of nodes. Then, in the next application of stage 2, we define a new coarse-grained graph at this higher-level of the hierarchy. We keep going like this until an application of stage 1 yields no reassignments. At that point, repeated application of stages 1 and 2 will not yield any more modularity-optimizing changes, so the process is complete.

5.1.2. Use-cases - when to use the Louvain algorithm

5.1.3. Constraints - when not to use the Louvain algorithm

Although the Louvain method, and modularity optimization algorithms more generally, have found wide application across many domains, some problems with these algorithms have been identified:

The resolution limit
For larger networks, the Louvain method doesn’t stop with the "intuitive" communities. Instead, there’s a second pass through the community modification and coarse-graining stages, in which several of the intuitive communities are merged together. This is a general problem with modularity optimization algorithms; they have trouble detecting small communities in large networks. It’s a virtue of the Louvain method that something close to the intuitive community structure is available as an intermediate step in the process.
The degeneracy problem
There is typically an exponentially large (in network size) number of community assignments with modularities close to the maximum. This can be a severe problem because, in the presence of a large number of high modularity solutions, it’s hard to find the global maximum, and difficult to determine if the global maximum is truly more scientifically important than local maxima that achieve similar modularity. Research undertaken at Universite Catholique de Louvain showed that the different locally optimal community assignments can have quite different structural properties. For more information, see "The performance of modularity maximization in practical contexts"

5.1.4. Louvain algorithm sample

This sample will explain the Louvain algorithm, using a simple graph:

louvain

The following will create a sample graph: 

MERGE (nAlice:User {id:'Alice'})
MERGE (nBridget:User {id:'Bridget'})
MERGE (nCharles:User {id:'Charles'})
MERGE (nDoug:User {id:'Doug'})
MERGE (nMark:User {id:'Mark'})
MERGE (nMichael:User {id:'Michael'})

MERGE (nAlice)-[:FRIEND]->(nBridget)
MERGE (nAlice)-[:FRIEND]->(nCharles)
MERGE (nMark)-[:FRIEND]->(nDoug)
MERGE (nBridget)-[:FRIEND]->(nMichael)
MERGE (nCharles)-[:FRIEND]->(nMark)
MERGE (nAlice)-[:FRIEND]->(nMichael)
MERGE (nCharles)-[:FRIEND]->(nDoug);

The following will run the algorithm and stream results: 

CALL algo.louvain.stream('User', 'FRIEND', {})
YIELD nodeId, community

RETURN algo.getNodeById(nodeId).id AS user, community
ORDER BY community;

The following will run the algorithm and write back results: 

CALL algo.louvain('User', 'FRIEND',
  {write:true, writeProperty:'community'})
YIELD nodes, communityCount, iterations, loadMillis, computeMillis, writeMillis;

Table 5.1. Results
Name Community

Alice

0

Bridget

0

Michael

0

Charles

1

Doug

1

Mark

1

Our algorithm found two communities with 3 members each.

Mark, Doug, and Charles are all friends with each other, as are Bridget, Alice, and Michael. Charles is the only one who has friends in both communities, but he has more in community 4 so he fits better in that one.

5.1.5. Hierarchical Louvain algorithm sample

This sample will explain the hierarchical communities option of the Louvain algorithm:

The following will create a sample graph: 

MERGE (nAlice:User {id:'Alice'})
MERGE (nBridget:User {id:'Bridget'})
MERGE (nCharles:User {id:'Charles'})
MERGE (nDoug:User {id:'Doug'})
MERGE (nMark:User {id:'Mark'})
MERGE (nMichael:User {id:'Michael'})
MERGE (nKarin:User {id:'Karin'})
MERGE (nAmy:User {id:'Amy'})

MERGE (nAlice)-[:FRIEND]->(nBridget)
MERGE (nAlice)-[:FRIEND]->(nCharles)
MERGE (nMark)-[:FRIEND]->(nDoug)
MERGE (nBridget)-[:FRIEND]->(nMichael)
MERGE (nCharles)-[:FRIEND]->(nMark)
MERGE (nAlice)-[:FRIEND]->(nMichael)
MERGE (nCharles)-[:FRIEND]->(nDoug)
MERGE (nMark)-[:FRIEND]->(nKarin)
MERGE (nKarin)-[:FRIEND]->(nAmy)
MERGE (nAmy)-[:FRIEND]->(nDoug);

The following will run the algorithm and stream results: 

CALL algo.louvain.stream('User', 'FRIEND', {includeIntermediateCommunities: true})
YIELD nodeId, communities

RETURN algo.getNodeById(nodeId).id AS user, communities
ORDER BY communities;

The following will run the algorithm and write back results: 

CALL algo.louvain('User', 'FRIEND', {
  write:true,
  includeIntermediateCommunities: true,
  intermediateCommunitiesWriteProperty: 'communities'
})
YIELD nodes, communityCount, iterations, loadMillis, computeMillis, writeMillis;

Table 5.2. Results
Name Communities

Alice

[0,0]

Bridget

[0,0]

Michael

[0,0]

Charles

[1,1]

Doug

[1,1]

Mark

[1,1]

Karin

[2,1]

Amy

[2,1]

Our algorithm found two hierarchical levels of communities.

On the first level it found three communities with Alice, Bridget and Michael forming the first community, Charles, Doug and Mark forming the second one and Karin and Amy forming the third one. On the second level it found two communities. Alice, Bridget and Michael stay in the same community, but the other two communities merge into a single one.

5.1.6. Pre-defined communities

This sample will explain the pre-defined communities option of the Louvain algorithm. Initial or pre-defined communities for nodes are specified using the communityProperty parameter.

The following will create a sample graph: 

MERGE (nAlice:User {id:'Alice'}) SET nAlice.community = 0
MERGE (nBridget:User {id:'Bridget'}) SET nBridget.community = 0
MERGE (nCharles:User {id:'Charles'}) SET nCharles.community = 1
MERGE (nDoug:User {id:'Doug'}) SET nDoug.community = 1
MERGE (nMark:User {id:'Mark'}) SET nMark.community = 1
MERGE (nMichael:User {id:'Michael'}) SET nMichael.community = 0
MERGE (nKarin:User {id:'Karin'}) SET nKarin.community = 1
MERGE (nAmy:User {id:'Amy'})

MERGE (nAlice)-[:FRIEND]->(nBridget)
MERGE (nAlice)-[:FRIEND]->(nCharles)
MERGE (nMark)-[:FRIEND]->(nDoug)
MERGE (nBridget)-[:FRIEND]->(nMichael)
MERGE (nCharles)-[:FRIEND]->(nMark)
MERGE (nAlice)-[:FRIEND]->(nMichael)
MERGE (nCharles)-[:FRIEND]->(nDoug)
MERGE (nMark)-[:FRIEND]->(nKarin)
MERGE (nKarin)-[:FRIEND]->(nAmy)
MERGE (nAmy)-[:FRIEND]->(nDoug);

The following will run the algorithm and stream results: 

CALL algo.louvain.stream('User', 'FRIEND', {communityProperty: 'community'})
YIELD nodeId, communities

RETURN algo.getNodeById(nodeId).id AS user, communities
ORDER BY communities;

The following will run the algorithm and write back results: 

CALL algo.louvain('User', 'FRIEND', {
  write:true,
  communityProperty: "community",
  writeProperty: "newCommunity"
})
YIELD nodes, communityCount, iterations, loadMillis, computeMillis, writeMillis;

In this example the pre-defined communities are read from the community property, and the communities computed by the algorithm are stored to the newCommunity property.

Table 5.3. Results
Name Community

Alice

0

Bridget

0

Michael

0

Charles

1

Doug

1

Mark

1

Karin

1

Amy

1

5.1.7. Syntax

The following will run the algorithm and write back results: 

CALL algo.louvain(label:String, relationship:String,
    {weightProperty:'weight', defaultValue:1.0, write: true, writeProperty:'community', concurrency:4})
YIELD nodes, communityCount, iterations, loadMillis, computeMillis, writeMillis

Table 5.4. Parameters
Name Type Default Optional Description

label

string

null

yes

The label to load from the graph. If null, load all nodes

relationship

string

null

yes

The relationship-type to load from the graph. If null, load all relationships

weightProperty

string

null

yes

The property name that contains weight. If null, treats the graph as unweighted. Must be numeric.

write

boolean

true

yes

Specifies if the result should be written back as a node property

writeProperty

string

'community'

yes

The property name written back to the ID of the community that particular node belongs to

communityProperty

string

null

yes

The property name that contains an initial or pre-defined community (must be a number)

defaultValue

float

null

yes

The default value of the weight in case it is missing or invalid

includeIntermediateCommunities

boolean

false

yes

Specifies whether an array of intermediate communities should be returned

intermediateCommunitiesWriteProperty

string

'communities'

yes

The property name written back to the ID of the intermediate communities that particular node belongs to

concurrency

int

available CPUs

yes

The number of concurrent threads

graph

string

'heavy'

yes

Use 'heavy' when describing the subset of the graph with label and relationship-type parameter. Use 'cypher' for describing the subset with cypher node-statement and relationship-statement

Table 5.5. Results
Name Type Description

loadMillis

int

Milliseconds for loading data

computeMillis

int

Milliseconds for running the algorithm

writeMillis

int

Milliseconds for writing result data back

postProcessingMillis

int

Milliseconds for computing percentiles and community count

nodes

int

The number of nodes considered

communityCount

int

The number of communities found

iterations

int

The number of iterations run

modularities

list of double

List of modularities at each iteration

modularity

double

Modularity after final iteration

p1

double

The 1 percentile of community size.

p5

double

The 5 percentile of community size.

p10

double

The 10 percentile of community size.

p25

double

The 25 percentile of community size.

p50

double

The 50 percentile of community size.

p75

double

The 75 percentile of community size.

p90

double

The 90 percentile of community size.

p95

double

The 95 percentile of community size.

p99

double

The 99 percentile of community size.

p100

double

The 100 percentile of community size.

write

boolean

Specifies if the result was written back as a node property

writeProperty

string

The property name written back to

includeIntermediateCommunities

boolean

Specifies if the intermediate communities were written back as a node property

intermediateCommunitiesWriteProperty

string

The property name written back to for intermediate communities

The following will run the algorithm and stream results: 

CALL algo.louvain.stream(label:String, relationship:String,
    {weightProperty:'propertyName', defaultValue:1.0, concurrency:4})
YIELD nodeId, community - yields a community to each node id

Table 5.6. Parameters
Name Type Default Optional Description

label

string

null

yes

The label to load from the graph. If null, load all nodes

relationship

string

null

yes

The relationship-type to load from the graph. If null, load all relationships

weightProperty

string

null

yes

The property name that contains weight. If null, treats the graph as unweighted. Must be numeric.

defaultValue

float

1.0

yes

The default value of the weight if it is missing or invalid

graph

string

'heavy'

yes

Use 'heavy' when describing the subset of the graph with label and relationship-type parameter. Use 'cypher' for describing the subset with cypher node-statement and relationship-statement

Table 5.7. Results
Name Type Description

nodeId

int

Node ID

community

int

Community ID after final iteration

communities

list of int

Community IDs for each iteration

5.1.8. Huge graph projection

If our projected graph contains more than 2 billion nodes or relationships, we need to use huge graph projection, as the default label and relationship-type projection has a limitation of 2 billion nodes and 2 billion relationships.

Set graph:'huge' in the config: 

CALL algo.louvain('User', 'FRIEND',{graph:'huge'})
YIELD nodes, communityCount, iterations, loadMillis, computeMillis, writeMillis;

5.1.9. Cypher projection

If label and relationship-type are not selective enough to describe your subgraph to run the algorithm on, you can use Cypher statements to load or project subsets of your graph. This can also be used to run algorithms on a virtual graph.

Set graph:'cypher' in the config: 

CALL algo.louvain(
  'MATCH (p:User) RETURN id(p) as id',
  'MATCH (p1:User)-[f:FRIEND]-(p2:User)
   RETURN id(p1) as source, id(p2) as target, f.weight as weight',
  {graph:'cypher',write:true});

5.1.10. Graph type support

The Louvain algorithm supports the following graph types:

  • ✓ undirected, unweighted

    • weightProperty: null
  • ✓ undirected, weighted

    • weightProperty : 'weight'