6.1. The Louvain algorithm

This section describes the Louvain algorithm in the Neo4j Graph Algorithms library.

This is documentation for the Graph Algorithms Library, which has been deprecated by the Graph Data Science Library (GDS).

This topic includes:

6.1.1. Introduction

The Louvain method for community detection is an algorithm for detecting communities in networks. It maximizes a modularity score for each community, where the modularity quantifies the quality of an assignment of nodes to communities. This means evaluating how much more densely connected the nodes within a community are, compared to how connected they would be in a random network.

The Louvain algorithm is a hierarchical clustering algorithm, that recursively merges communities into a single node and executes the modularity clustering on the condensed graphs.

For more information on this algorithm, see:

Running this algorithm requires sufficient memory availability. Before running this algorithm, we recommend that you read Section 2.4, “Memory Requirements”.

6.1.2. Syntax

The following describes the API for running the algorithm and writing results back to Neo4j: 

CALL algo.beta.louvain(label: STRING, relationship: STRING, {
 write: BOOLEAN,
 writeProperty: STRING
 // additional configuration
})
YIELD nodes, communities, modularity, loadMillis, computeMillis, writeMillis

Table 6.1. Parameters
Name Type Default Optional Description

node label

string

null

yes

The node label to load from the graph. If null, load all nodes.

relationship

string

null

yes

The relationship type to load from the graph. If null, load all relationships.

config

map

{}

yes

Additional configuration, see below.

Table 6.2. Configuration
Name Type Default Optional Description

concurrency

int

available CPUs

yes

The number of concurrent threads used for running the algorithm. Also provides the default value for 'readConcurrency' and 'writeConcurrency'. This is dependent on the Neo4j edition; for more information, see Section 1.4.2, “CPU”.

readConcurrency

int

value of 'concurrency'

yes

The number of concurrent threads used for reading the graph.

writeConcurrency

int

value of 'concurrency'

yes

The number of concurrent threads used for writing the result.

weightProperty

string

null

yes

The property name that contains weight. If null, treats the graph as unweighted. Must be numeric.

seedProperty

string

n/a

yes

Used to set the initial community for a node. The property value needs to be a number.

write

boolean

true

yes

Specifies if the result should be written back as a node property.

writeProperty

string

'partition'

yes

The property name written back the ID of the partition particular node belongs to.

levels

int

10

yes

The maximum number of levels in which the graph is clustered and then condensed.

innerIterations

int

10

yes

The maximum number of iterations that the modularity optimization will run for each level.

tolerance

float

0.0001

yes

Minimum change in modularity between iterations. If the modularity changes less than the tolerance value, the result is considered stable and the algorithm returns.

includeIntermediateCommunities

boolean

false

yes

Indicates whether to write intermediate communities. If set to false, only the final community is persisted.

graph

string

'huge'

yes

Use 'huge' when describing the subset of the graph with node label and relationship type parameters. Use 'cypher' for describing the subset using Cypher queries for nodes and relationships.

Table 6.3. Results
Name Type Description

loadMillis

int

Milliseconds for loading data.

computeMillis

int

Milliseconds for running the algorithm.

writeMillis

int

Milliseconds for writing result data back.

postProcessingMillis

int

Milliseconds for computing percentiles and community count.

nodes

int

The number of nodes considered.

communityCount

int

The number of communities found.

levels

int

The number of supersteps the algorithm actually ran.

modularity

float

The final modularity score.

modularities

list of int

The final modularity scores for each level.

includeIntermediateCommunitie

boolean

Indicates whether all intermediate communities where written or only the final one.

p1

int

The 1 percentile of community size.

p5

int

The 5 percentile of community size.

p10

int

The 10 percentile of community size.

p25

int

The 25 percentile of community size.

p50

int

The 50 percentile of community size.

p75

int

The 75 percentile of community size.

p90

int

The 90 percentile of community size.

p95

int

The 95 percentile of community size.

p99

int

The 99 percentile of community size.

p100

int

The 100 percentile of community size.

write

boolean

Specifies if the result was written back as a node property.

writeProperty

string

The property name written back to.

The following describes the API for running the algorithm and stream results: 

CALL algo.beta.louvain.stream(label: STRING, relationship: STRING, {
 // configuration
})
YIELD nodeId, community, communities

Table 6.4. Parameters
Name Type Default Optional Description

node label

string

null

yes

The node label to load from the graph. If null, load all nodes.

relationship

string

null

yes

The relationship type to load from the graph. If null, load all relationships.

config

map

{}

yes

Additional configuration, see below.

Table 6.5. Configuration
Name Type Default Optional Description

concurrency

int

available CPUs

yes

The number of concurrent threads used for running the algorithm. Also provides the default value for 'readConcurrency' and 'writeConcurrency'. This is dependent on the Neo4j edition; for more information, see Section 1.4.2, “CPU”.

readConcurrency

int

value of 'concurrency'

yes

The number of concurrent threads used for reading the graph.

weightProperty

string

null

yes

The property name that contains weight. If null, treats the graph as unweighted. Must be numeric.

seedProperty

string

n/a

yes

Used to set the initial community for a node. The property value needs to be a number.

levels

int

10

yes

The maximum number of levels in which the graph is clustered and then condensed.

innerIterations

int

10

yes

The maximum number of iterations that the modularity optimization will run for each level.

tolerance

float

0.0001

yes

Minimum change in modularity between iterations. If the modularity changes less than the tolerance value, the result is considered stable and the algorithm returns.

includeIntermediateCommunities

boolean

false

yes

Indicates whether to write intermediate communities. If set to false, only the final community is persisted.

graph

string

'huge'

yes

Use 'huge' when describing the subset of the graph with node label and relationship type parameters. Use 'cypher' for describing the subset using Cypher queries for nodes and relationships.

Table 6.6. Results
Name Type Description

nodeId

int

Node ID.

community

int

The community ID of the final level.

communities

list of int

Community IDs for each level. Null if includeIntermediateCommunities is set to false.

6.1.3. Examples

Consider the graph created by the following Cypher statement:

CREATE (nAlice:User {name: 'Alice', seed: 42})
CREATE (nBridget:User {name: 'Bridget', seed: 42})
CREATE (nCharles:User {name: 'Charles', seed: 42})
CREATE (nDoug:User {name: 'Doug'})
CREATE (nMark:User {name: 'Mark'})
CREATE (nMichael:User {name: 'Michael'})

CREATE (nAlice)-[:LINK {weight: 1}]->(nBridget)
CREATE (nAlice)-[:LINK {weight: 1}]->(nCharles)
CREATE (nCharles)-[:LINK {weight: 1}]->(nBridget)

CREATE (nAlice)-[:LINK {weight: 5}]->(nDoug)

CREATE (nMark)-[:LINK {weight: 1}]->(nDoug)
CREATE (nMark)-[:LINK {weight: 1}]->(nMichael);
CREATE (nMichael)-[:LINK {weight: 1}]->(nMark);

This graph has two clusters of Users, that are closely connected. Between those clusters there is one single edge. The relationships that connect the nodes in each component have a property weight which determines the strength of the relationship. In the following examples we will demonstrate using the Louvain algorithm on this graph.

6.1.3.1. Streaming results

The following will load the graph, run the algorithm, and stream results: 

CALL algo.beta.louvain.stream('User', 'LINK', {
 graph: 'huge',
 direction: 'BOTH'
}) YIELD nodeId, community, communities
RETURN algo.asNode(nodeId).name as name, community, communities
ORDER BY name ASC

Table 6.7. Results
name community communities

"Alice"

2

<null>

"Bridget"

2

<null>

"Charles"

2

<null>

"Doug"

5

<null>

"Mark"

5

<null>

"Michael"

5

<null>

We use default values for the procedure configuration parameter. Levels and innerIterations are set to 10 and the tolerance value is 0.0001. Because we did not set the value of includeIntermediateCommunities to true, the column communities is always null.

6.1.3.2. Writing results

To instead write the community results back to the graph in Neo4j, use the following query. For each node a property is written that holds the assigned community.

The following will load the graph, run the algorithm, and write back results: 

CALL algo.beta.louvain('User', 'LINK', {
 graph: 'huge',
 direction: 'BOTH',
 writeProperty: 'community'
}) YIELD communities, modularity, modularities

Table 6.8. Results
communityCount modularity modularities

2

0.3571428571428571

[0.3571428571428571]

When writing back the results, only a single row is returned by the procedure. The result contains meta information, like the number of identified communities and the modularity values.

6.1.3.3. Running on weighted graphs

The Louvain algorithm can also run on weighted graphs, taking the given relationship weights into concern when calculating the modularity.

The following will load the graph, run the algorithm on a weighted graph and stream results: 

CALL algo.beta.louvain.stream('User', 'LINK', {
 graph: 'huge',
 direction: 'BOTH',
 weightProperty: 'weight'
}) YIELD nodeId, community, communities
RETURN algo.asNode(nodeId).name as name, community, communities
ORDER BY name ASC

Table 6.9. Results
name community communities

"Alice"

3

<null>

"Bridget"

2

<null>

"Charles"

2

<null>

"Doug"

3

<null>

"Mark"

5

<null>

"Michael"

5

<null>

Using the weighted relationships, we see that Alice and Doug have formed their own community, as their link is much stronger than all the others.

6.1.3.4. Running with seed communities

The Louvain algorithm can be run incrementally, by providing a seed property. With the seed property an initial community mapping can be supplied for a subset of the loaded nodes. The algorithm will try to keep the seeded community IDs.

The following will load the seeded graph, run the algorithm and stream results: 

CALL algo.beta.louvain.stream('User', 'LINK', {
 graph: 'huge',
 direction: 'BOTH',
 seedProperty: 'seed'
}) YIELD nodeId, community, communities
RETURN algo.asNode(nodeId).name as name, community, communities
ORDER BY name ASC

Table 6.10. Results
name community communities

"Alice"

42

<null>

"Bridget"

42

<null>

"Charles"

42

<null>

"Doug"

47

<null>

"Mark"

47

<null>

"Michael"

47

<null>

Using the seeded graph, we see that the community around Alice keeps its initial community ID of 42. The other community is assigned a new community ID, which is guaranteed to be larger than the largest community ID.

6.1.3.5. Streaming intermediate communities

As described before, Louvain is a hierarchical clustering algorithm. That means that after every clustering step all nodes that belong to the same cluster are reduced to a single node. Relationships between nodes of the same cluster become self-relationships, relationships to nodes of other clusters connect to the clusters representative. This condensed graph is then used to run the next level of clustering. The process is repeated until the clusters are stable.

In order to demonstrate this iterative behavior, we need to construct a more complex graph.

louvain multilevel graph

The following will load the example graph, run the algorithm and stream results including the intermediate communities: 

CALL algo.beta.louvain.stream('', '', {
 graph: 'huge',
 direction: 'BOTH',
 includeIntermediateCommunities: true
}) YIELD nodeId, community, communities
RETURN algo.asNode(nodeId).name as name, community, communities
ORDER BY name ASC

Table 6.11. Results
name community communities

a

14

[3,14]

b

14

[3,14]

c

14

[14,14]

d

14

[3,14]

e

14

[14,14]

f

14

[14,14]

g

7

[7,7]

h

7

[7,7]

i

7

[7,7]

j

12

[12,12]

k

12

[12,12]

l

12

[12,12]

m

12

[12,12]

n

12

[12,12]

x

14

[14,14]

In this example graph, after the first iteration we see 4 clusters, which in the second iteration are reduced to three.