5.3.1. Louvain

This section describes the Louvain algorithm in the Neo4j Graph Data Science library.

This topic includes:

5.3.1.1. Introduction

The Louvain method is an algorithm to detect communities in large networks. It maximizes a modularity score for each community, where the modularity quantifies the quality of an assignment of nodes to communities. This means evaluating how much more densely connected the nodes within a community are, compared to how connected they would be in a random network.

The Louvain algorithm is a hierarchical clustering algorithm, that recursively merges communities into a single node and executes the modularity clustering on the condensed graphs.

For more information on this algorithm, see:

Running this algorithm requires sufficient memory availability. Before running this algorithm, we recommend that you read Section 3.1, “Memory Estimation”.

5.3.1.2. Syntax

This section covers the syntax used to execute the Louvain algorithm in each of its execution modes. We are describing the named graph variant of the syntax. To learn more about general syntax variants, see Section 5.1, “Syntax overview”.

Example 5.3. Louvain syntax per mode

Run Louvain in stream mode on a named graph. 

CALL gds.louvain.stream(
  graphName: String,
  configuration: Map
)
YIELD
  nodeId: Integer,
  communityId: Integer,
  intermediateCommunityIds: Integer[]

Table 5.86. Parameters
Name Type Default Optional Description

graphName

String

n/a

no

The name of a graph stored in the catalog.

configuration

Map

{}

yes

Configuration for algorithm-specifics and/or graph filtering.

Table 5.87. General configuration for algorithm execution on a named graph.
Name Type Default Optional Description

nodeLabels

String[]

['*']

yes

Filter the named graph using the given node labels.

relationshipTypes

String[]

['*']

yes

Filter the named graph using the given relationship types.

concurrency

Integer

4

yes

The number of concurrent threads used for running the algorithm.

Table 5.88. Algorithm specific configuration
Name Type Default Optional Description

relationshipWeightProperty

String

null

yes

The property name that contains weight. If null, treats the graph as unweighted. Must be numeric.

seedProperty

String

n/a

yes

Used to set the initial community for a node. The property value needs to be a number.

maxLevels

Integer

10

yes

The maximum number of levels in which the graph is clustered and then condensed.

maxIterations

Integer

10

yes

The maximum number of iterations that the modularity optimization will run for each level.

tolerance

Float

0.0001

yes

Minimum change in modularity between iterations. If the modularity changes less than the tolerance value, the result is considered stable and the algorithm returns.

includeIntermediateCommunities

Boolean

false

yes

Indicates whether to write intermediate communities. If set to false, only the final community is persisted.

consecutiveIds

Boolean

false

yes

Flag to decide whether component identifiers are mapped into a consecutive id space (requires additional memory). Cannot be used in combination with the includeIntermediateCommunities flag.

Table 5.89. Results
Name Type Description

nodeId

Integer

Node ID.

communityId

Integer

The community ID of the final level.

intermediateCommunityIds

Integer[]

Community IDs for each level. Null if includeIntermediateCommunities is set to false.

Run Louvain in stats mode on a named graph. 

CALL gds.louvain.stats(
  graphName: String,
  configuration: Map
)
YIELD
  createMillis: Integer,
  computeMillis: Integer,
  postProcessingMillis: Integer,
  communityCount: Integer,
  ranLevels: Integer,
  modularity: Float,
  modularities: Integer[],
  communityDistribution: Map,
  configuration: Map

Table 5.90. Parameters
Name Type Default Optional Description

graphName

String

n/a

no

The name of a graph stored in the catalog.

configuration

Map

{}

yes

Configuration for algorithm-specifics and/or graph filtering.

Table 5.91. General configuration for algorithm execution on a named graph.
Name Type Default Optional Description

nodeLabels

String[]

['*']

yes

Filter the named graph using the given node labels.

relationshipTypes

String[]

['*']

yes

Filter the named graph using the given relationship types.

concurrency

Integer

4

yes

The number of concurrent threads used for running the algorithm.

Table 5.92. Algorithm specific configuration
Name Type Default Optional Description

relationshipWeightProperty

String

null

yes

The property name that contains weight. If null, treats the graph as unweighted. Must be numeric.

seedProperty

String

n/a

yes

Used to set the initial community for a node. The property value needs to be a number.

maxLevels

Integer

10

yes

The maximum number of levels in which the graph is clustered and then condensed.

maxIterations

Integer

10

yes

The maximum number of iterations that the modularity optimization will run for each level.

tolerance

Float

0.0001

yes

Minimum change in modularity between iterations. If the modularity changes less than the tolerance value, the result is considered stable and the algorithm returns.

includeIntermediateCommunities

Boolean

false

yes

Indicates whether to write intermediate communities. If set to false, only the final community is persisted.

consecutiveIds

Boolean

false

yes

Flag to decide whether component identifiers are mapped into a consecutive id space (requires additional memory). Cannot be used in combination with the includeIntermediateCommunities flag.

Table 5.93. Results
Name Type Description

createMillis

Integer

Milliseconds for loading data.

computeMillis

Integer

Milliseconds for running the algorithm.

postProcessingMillis

Integer

Milliseconds for computing percentiles and community count.

communityCount

Integer

The number of communities found.

ranLevels

Integer

The number of supersteps the algorithm actually ran.

modularity

Float

The final modularity score.

modularities

Integer[]

The modularity scores for each level.

communityDistribution

Map

Map containing min, max, mean as well as p50, p75, p90, p95, p99 and p999 percentile values of community size for the last level.

configuration

Map

The configuration used for running the algorithm.

Run Louvain in mutate mode on a named graph. 

CALL gds.louvain.mutate(
  graphName: String,
  configuration: Map
)
YIELD
  createMillis: Integer,
  computeMillis: Integer,
  mutateMillis: Integer,
  postProcessingMillis: Integer,
  communityCount: Integer,
  ranLevels: Integer,
  modularity: Float,
  modularities: Integer[],
  communityDistribution: Map,
  configuration: Map

Table 5.94. Parameters
Name Type Default Optional Description

graphName

String

n/a

no

The name of a graph stored in the catalog.

configuration

Map

{}

yes

Configuration for algorithm-specifics and/or graph filtering.

Table 5.95. General configuration for algorithm execution on a named graph.
Name Type Default Optional Description

nodeLabels

String[]

['*']

yes

Filter the named graph using the given node labels.

relationshipTypes

String[]

['*']

yes

Filter the named graph using the given relationship types.

concurrency

Integer

4

yes

The number of concurrent threads used for running the algorithm.

mutateProperty

String

n/a

no

The node property in the GDS graph to which the community ID is written.

Table 5.96. Algorithm specific configuration
Name Type Default Optional Description

relationshipWeightProperty

String

null

yes

The property name that contains weight. If null, treats the graph as unweighted. Must be numeric.

seedProperty

String

n/a

yes

Used to set the initial community for a node. The property value needs to be a number.

maxLevels

Integer

10

yes

The maximum number of levels in which the graph is clustered and then condensed.

maxIterations

Integer

10

yes

The maximum number of iterations that the modularity optimization will run for each level.

tolerance

Float

0.0001

yes

Minimum change in modularity between iterations. If the modularity changes less than the tolerance value, the result is considered stable and the algorithm returns.

includeIntermediateCommunities

Boolean

false

yes

Indicates whether to write intermediate communities. If set to false, only the final community is persisted.

consecutiveIds

Boolean

false

yes

Flag to decide whether component identifiers are mapped into a consecutive id space (requires additional memory). Cannot be used in combination with the includeIntermediateCommunities flag.

Table 5.97. Results
Name Type Description

createMillis

Integer

Milliseconds for loading data.

computeMillis

Integer

Milliseconds for running the algorithm.

mutateMillis

Integer

Milliseconds for adding properties to the in-memory graph.

postProcessingMillis

Integer

Milliseconds for computing percentiles and community count.

communityCount

Integer

The number of communities found.

ranLevels

Integer

The number of supersteps the algorithm actually ran.

modularity

Float

The final modularity score.

modularities

Integer[]

The modularity scores for each level.

communityDistribution

Map

Map containing min, max, mean as well as p50, p75, p90, p95, p99 and p999 percentile values of community size for the last level.

configuration

Map

The configuration used for running the algorithm.

Run Louvain in write mode on a named graph. 

CALL gds.louvain.write(
  graphName: String,
  configuration: Map
)
YIELD
  createMillis: Integer,
  computeMillis: Integer,
  writeMillis: Integer,
  postProcessingMillis: Integer,
  nodePropertiesWritten: Integer,
  communityCount: Integer,
  ranLevels: Integer,
  modularity: Float,
  modularities: Integer[],
  communityDistribution: Map,
  configuration: Map

Table 5.98. Parameters
Name Type Default Optional Description

graphName

String

n/a

no

The name of a graph stored in the catalog.

configuration

Map

{}

yes

Configuration for algorithm-specifics and/or graph filtering.

Table 5.99. General configuration for algorithm execution on a named graph.
Name Type Default Optional Description

nodeLabels

String[]

['*']

yes

Filter the named graph using the given node labels.

relationshipTypes

String[]

['*']

yes

Filter the named graph using the given relationship types.

concurrency

Integer

4

yes

The number of concurrent threads used for running the algorithm. Also provides the default value for 'writeConcurrency'.

writeConcurrency

Integer

value of 'concurrency'

yes

The number of concurrent threads used for writing the result to Neo4j.

writeProperty

String

n/a

no

The node property in the Neo4j database to which the community ID is written.

Table 5.100. Algorithm specific configuration
Name Type Default Optional Description

relationshipWeightProperty

String

null

yes

The property name that contains weight. If null, treats the graph as unweighted. Must be numeric.

seedProperty

String

n/a

yes

Used to set the initial community for a node. The property value needs to be a number.

maxLevels

Integer

10

yes

The maximum number of levels in which the graph is clustered and then condensed.

maxIterations

Integer

10

yes

The maximum number of iterations that the modularity optimization will run for each level.

tolerance

Float

0.0001

yes

Minimum change in modularity between iterations. If the modularity changes less than the tolerance value, the result is considered stable and the algorithm returns.

includeIntermediateCommunities

Boolean

false

yes

Indicates whether to write intermediate communities. If set to false, only the final community is persisted.

consecutiveIds

Boolean

false

yes

Flag to decide whether component identifiers are mapped into a consecutive id space (requires additional memory). Cannot be used in combination with the includeIntermediateCommunities flag.

Table 5.101. Results
Name Type Description

createMillis

Integer

Milliseconds for loading data.

computeMillis

Integer

Milliseconds for running the algorithm.

writeMillis

Integer

Milliseconds for writing result data back.

postProcessingMillis

Integer

Milliseconds for computing percentiles and community count.

nodePropertiesWritten

Integer

The number of node properties written.

communityCount

Integer

The number of communities found.

ranLevels

Integer

The number of supersteps the algorithm actually ran.

modularity

Float

The final modularity score.

modularities

Integer[]

The modularity scores for each level.

communityDistribution

Map

Map containing min, max, mean as well as p50, p75, p90, p95, p99 and p999 percentile values of community size for the last level.

configuration

Map

The configuration used for running the algorithm.

Anonymous graphs

Run Louvain in write mode on an anonymous graph. 

CALL gds.louvain.write(configuration: Map)
YIELD
  createMillis: Integer,
  computeMillis: Integer,
  writeMillis: Integer,
  postProcessingMillis: Integer,
  nodePropertiesWritten: Integer,
  communityCount: Integer,
  ranLevels: Integer,
  modularity: Float,
  modularities: Integer[],
  communityDistribution: Map,
  configuration: Map

Table 5.102. General configuration for algorithm execution on an anonymous graph.
Name Type Default Optional Description

nodeProjection

String, String[] or Map

null

yes

The node projection used for anonymous graph creation via a Native projection.

relationshipProjection

String, String[] or Map

null

yes

The relationship projection used for anonymous graph creation a Native projection.

nodeQuery

String

null

yes

The Cypher query used to select the nodes for anonymous graph creation via a Cypher projection.

relationshipQuery

String

null

yes

The Cypher query used to select the relationships for anonymous graph creation via a Cypher projection.

nodeProperties

String, String[] or Map

null

yes

The node properties to project during anonymous graph creation.

relationshipProperties

String, String[] or Map

null

yes

The relationship properties to project during anonymous graph creation.

concurrency

Integer

4

yes

The number of concurrent threads used for running the algorithm. Also provides the default value for 'readConcurrency' and 'writeConcurrency'.

readConcurrency

Integer

value of 'concurrency'

yes

The number of concurrent threads used for creating the graph.

writeConcurrency

Integer

value of 'concurrency'

yes

The number of concurrent threads used for writing the result to Neo4j.

writeProperty

String

n/a

no

The node property in the Neo4j database to which the community ID is written.

Table 5.103. Algorithm specific configuration
Name Type Default Optional Description

relationshipWeightProperty

String

null

yes

The property name that contains weight. If null, treats the graph as unweighted. Must be numeric.

seedProperty

String

n/a

yes

Used to set the initial community for a node. The property value needs to be a number.

maxLevels

Integer

10

yes

The maximum number of levels in which the graph is clustered and then condensed.

maxIterations

Integer

10

yes

The maximum number of iterations that the modularity optimization will run for each level.

tolerance

Float

0.0001

yes

Minimum change in modularity between iterations. If the modularity changes less than the tolerance value, the result is considered stable and the algorithm returns.

includeIntermediateCommunities

Boolean

false

yes

Indicates whether to write intermediate communities. If set to false, only the final community is persisted.

consecutiveIds

Boolean

false

yes

Flag to decide whether component identifiers are mapped into a consecutive id space (requires additional memory). Cannot be used in combination with the includeIntermediateCommunities flag.

The results are the same as for running write mode with a named graph, see the write mode syntax above.

5.3.1.3. Examples

Consider the graph created by the following Cypher statement:

CREATE (nAlice:User {name: 'Alice', seed: 42})
CREATE (nBridget:User {name: 'Bridget', seed: 42})
CREATE (nCharles:User {name: 'Charles', seed: 42})
CREATE (nDoug:User {name: 'Doug'})
CREATE (nMark:User {name: 'Mark'})
CREATE (nMichael:User {name: 'Michael'})

CREATE (nAlice)-[:LINK {weight: 1}]->(nBridget)
CREATE (nAlice)-[:LINK {weight: 1}]->(nCharles)
CREATE (nCharles)-[:LINK {weight: 1}]->(nBridget)

CREATE (nAlice)-[:LINK {weight: 5}]->(nDoug)

CREATE (nMark)-[:LINK {weight: 1}]->(nDoug)
CREATE (nMark)-[:LINK {weight: 1}]->(nMichael)
CREATE (nMichael)-[:LINK {weight: 1}]->(nMark);

This graph has two clusters of Users, that are closely connected. Between those clusters there is one single edge. The relationships that connect the nodes in each component have a property weight which determines the strength of the relationship.

We can now create the graph and store it in the graph catalog. We load the LINK relationships with orientation set to UNDIRECTED as this works best with the Louvain algorithm.

In the examples below we will use named graphs and standard projections as the norm. However, Cypher projection and anonymous graphs could also be used.

The following statement will create the graph and store it in the graph catalog. 

CALL gds.graph.create(
    'myGraph',
    'User',
    {
        LINK: {
            orientation: 'UNDIRECTED'
        }
    },
    {
        nodeProperties: 'seed',
        relationshipProperties: 'weight'
    }
)

In the following examples we will demonstrate using the Louvain algorithm on this graph.

Memory Estimation

First off, we will estimate the cost of running the algorithm using the estimate procedure. This can be done with any execution mode. We will use the write mode in this example. Estimating the algorithm is useful to understand the memory impact that running the algorithm on your graph will have. When you later actually run the algorithm in one of the execution modes the system will perform an estimation. If the estimation shows that there is a very high probability of the execution going over its memory limitations, the execution is prohibited. To read more about this, see Section 3.1.3, “Automatic estimation and execution blocking”.

For more details on estimate in general, see Section 3.1, “Memory Estimation”.

The following will estimate the memory requirements for running the algorithm: 

CALL gds.louvain.write.estimate('myGraph', { writeProperty: 'community' })
YIELD nodeCount, relationshipCount, bytesMin, bytesMax, requiredMemory

Table 5.104. Results
nodeCount relationshipCount bytesMin bytesMax requiredMemory

6

14

5321

580088

"[5321 Bytes ... 566 KiB]"

Stream

In the stream execution mode, the algorithm returns the community ID for each node. This allows us to inspect the results directly or post-process them in Cypher without any side effects. For example, we can order the results to find the nodes with the highest betweenness centrality.

For more details on the stream mode in general, see Section 3.3.1, “Stream”.

The following will run the algorithm and stream results: 

CALL gds.louvain.stream('myGraph')
YIELD nodeId, communityId, intermediateCommunityIds
RETURN gds.util.asNode(nodeId).name AS name, communityId, intermediateCommunityIds
ORDER BY name ASC

Table 5.105. Results
name communityId intermediateCommunityIds

"Alice"

2

null

"Bridget"

2

null

"Charles"

2

null

"Doug"

5

null

"Mark"

5

null

"Michael"

5

null

We use default values for the procedure configuration parameter. Levels and innerIterations are set to 10 and the tolerance value is 0.0001. Because we did not set the value of includeIntermediateCommunities to true, the column communities is always null.

Stats

In the stats execution mode, the algorithm returns a single row containing a summary of the algorithm result. In particular, Betweenness Centrality returns the minimum, maximum and sum of all centrality scores. This execution mode does not have any side effects. It can be useful for evaluating algorithm performance by inspecting the computeMillis return item. In the examples below we will omit returning the timings. The full signature of the procedure can be found in the syntax section.

For more details on the stats mode in general, see Section 3.3.2, “Stats”.

The following will run the algorithm and returns the result in form of statistical and measurement values. 

CALL gds.louvain.stats('myGraph')
YIELD communityCount

Table 5.106. Results
communityCount

2

Mutate

The mutate execution mode extends the stats mode with an important side effect: updating the named graph with a new node property containing the community ID for that node. The name of the new property is specified using the mandatory configuration parameter mutateProperty. The result is a single summary row, similar to stats, but with some additional metrics. The mutate mode is especially useful when multiple algorithms are used in conjunction.

For more details on the mutate mode in general, see Section 3.3.3, “Mutate”.

The following will run the algorithm and store the results in myGraph

CALL gds.louvain.mutate('myGraph', { mutateProperty: 'communityId' })
YIELD communityCount, modularity, modularities

Table 5.107. Results
communityCount modularity modularities

2

0.3571428571428571

[0.3571428571428571]

In mutate mode, only a single row is returned by the procedure. The result contains meta information, like the number of identified communities and the modularity values. In contrast to the write mode the result is written to the GDS in-memory graph instead of the Neo4j database.

Write

The write execution mode extends the stats mode with an important side effect: writing the community ID for each node as a property to the Neo4j database. The name of the new property is specified using the mandatory configuration parameter writeProperty. The result is a single summary row, similar to stats, but with some additional metrics. The write mode enables directly persisting the results to the database.

For more details on the write mode in general, see Section 3.3.4, “Write”.

The following run the algorithm, and write back results: 

CALL gds.louvain.write('myGraph', { writeProperty: 'community' })
YIELD communityCount, modularity, modularities

Table 5.108. Results
communityCount modularity modularities

2

0.3571428571428571

[0.3571428571428571]

When writing back the results, only a single row is returned by the procedure. The result contains meta information, like the number of identified communities and the modularity values.

Weighted

The Louvain algorithm can also run on weighted graphs, taking the given relationship weights into concern when calculating the modularity.

The following will run the algorithm on a weighted graph and stream results: 

CALL gds.louvain.stream('myGraph', { relationshipWeightProperty: 'weight' })
YIELD nodeId, communityId, intermediateCommunityIds
RETURN gds.util.asNode(nodeId).name AS name, communityId, intermediateCommunityIds
ORDER BY name ASC

Table 5.109. Results
name communityId intermediateCommunityIds

"Alice"

3

null

"Bridget"

2

null

"Charles"

2

null

"Doug"

3

null

"Mark"

5

null

"Michael"

5

null

Using the weighted relationships, we see that Alice and Doug have formed their own community, as their link is much stronger than all the others.

Seeded

The Louvain algorithm can be run incrementally, by providing a seed property. With the seed property an initial community mapping can be supplied for a subset of the loaded nodes. The algorithm will try to keep the seeded community IDs.

The following will run the algorithm and stream results: 

CALL gds.louvain.stream('myGraph', { seedProperty: 'seed' })
YIELD nodeId, communityId, intermediateCommunityIds
RETURN gds.util.asNode(nodeId).name AS name, communityId, intermediateCommunityIds
ORDER BY name ASC

Table 5.110. Results
name communityId intermediateCommunityIds

"Alice"

42

null

"Bridget"

42

null

"Charles"

42

null

"Doug"

47

null

"Mark"

47

null

"Michael"

47

null

Using the seeded graph, we see that the community around Alice keeps its initial community ID of 42. The other community is assigned a new community ID, which is guaranteed to be larger than the largest seeded community ID. Note that the consecutiveIds configuration option cannot be used in combination with seeding in order to retain the seeding values.

Stream intermediate communities

As described before, Louvain is a hierarchical clustering algorithm. That means that after every clustering step all nodes that belong to the same cluster are reduced to a single node. Relationships between nodes of the same cluster become self-relationships, relationships to nodes of other clusters connect to the clusters representative. This condensed graph is then used to run the next level of clustering. The process is repeated until the clusters are stable.

In order to demonstrate this iterative behavior, we need to construct a more complex graph.

louvain multilevel graph
CREATE (a:Node {name: 'a'})
CREATE (b:Node {name: 'b'})
CREATE (c:Node {name: 'c'})
CREATE (d:Node {name: 'd'})
CREATE (e:Node {name: 'e'})
CREATE (f:Node {name: 'f'})
CREATE (g:Node {name: 'g'})
CREATE (h:Node {name: 'h'})
CREATE (i:Node {name: 'i'})
CREATE (j:Node {name: 'j'})
CREATE (k:Node {name: 'k'})
CREATE (l:Node {name: 'l'})
CREATE (m:Node {name: 'm'})
CREATE (n:Node {name: 'n'})
CREATE (x:Node {name: 'x'})

CREATE (a)-[:TYPE]->(b)
CREATE (a)-[:TYPE]->(d)
CREATE (a)-[:TYPE]->(f)
CREATE (b)-[:TYPE]->(d)
CREATE (b)-[:TYPE]->(x)
CREATE (b)-[:TYPE]->(g)
CREATE (b)-[:TYPE]->(e)
CREATE (c)-[:TYPE]->(x)
CREATE (c)-[:TYPE]->(f)
CREATE (d)-[:TYPE]->(k)
CREATE (e)-[:TYPE]->(x)
CREATE (e)-[:TYPE]->(f)
CREATE (e)-[:TYPE]->(h)
CREATE (f)-[:TYPE]->(g)
CREATE (g)-[:TYPE]->(h)
CREATE (h)-[:TYPE]->(i)
CREATE (h)-[:TYPE]->(j)
CREATE (i)-[:TYPE]->(k)
CREATE (j)-[:TYPE]->(k)
CREATE (j)-[:TYPE]->(m)
CREATE (j)-[:TYPE]->(n)
CREATE (k)-[:TYPE]->(m)
CREATE (k)-[:TYPE]->(l)
CREATE (l)-[:TYPE]->(n)
CREATE (m)-[:TYPE]->(n);

The following will load the example graph, run the algorithm and stream results including the intermediate communities: 

CALL gds.louvain.stream({
    nodeProjection: 'Node',
    relationshipProjection: {
        TYPE: {
            type: 'TYPE',
            orientation: 'undirected',
            aggregation: 'NONE'
        }
    },
    includeIntermediateCommunities: true
}) YIELD nodeId, communityId, intermediateCommunityIds
RETURN gds.util.asNode(nodeId).name AS name, communityId, intermediateCommunityIds
ORDER BY name ASC

Table 5.111. Results
name communityId intermediateCommunityIds

"a"

14

[3, 14]

"b"

14

[3, 14]

"c"

14

[14, 14]

"d"

14

[3, 14]

"e"

14

[14, 14]

"f"

14

[14, 14]

"g"

7

[7, 7]

"h"

7

[7, 7]

"i"

7

[7, 7]

"j"

12

[12, 12]

"k"

12

[12, 12]

"l"

12

[12, 12]

"m"

12

[12, 12]

"n"

12

[12, 12]

"x"

14

[14, 14]

In this example graph, after the first iteration we see 4 clusters, which in the second iteration are reduced to three.