4.2. The Betweenness Centrality algorithm

This section describes the Betweenness Centrality algorithm in the Neo4j Graph Algorithms library.

Betweenness centrality is a way of detecting the amount of influence a node has over the flow of information in a graph. It is often used to find nodes that serve as a bridge from one part of a graph to another.

In the following example, Alice is the main connection in the graph:

betweenness centrality

If Alice is removed, all connections in the graph would be cut off. This makes Alice important, because she ensures that no nodes are isolated.

4.2.1. History and explanation

The Betweenness Centrality algorithm calculates the shortest (weighted) path between every pair of nodes in a connected graph, using the breadth-first search algorithm. Each node receives a score, based on the number of these shortest paths that pass through the node. Nodes that most frequently lie on these shortest paths will have a higher betweenness centrality score.

The algorithm was given its first formal definition by Linton Freeman, in his 1971 paper "A Set of Measures of Centrality Based on Betweenness". It was considered to be one of the "three distinct intuitive conceptions of centrality".

4.2.2. Use-cases - when to use the Betweenness Centrality algorithm

  • Betweenness centrality is used to research the network flow in a package delivery process, or telecommunications network. These networks are characterized by traffic that has a known target and takes the shortest path possible. This, and other scenarios, are described by Stephen P. Borgatti in "Centrality and network flow".
  • Betweenness centrality is used to identify influencers in legitimate, or criminal, organizations. Studies show that influencers in organizations are not necessarily in management positions, but instead can be found in brokerage positions of the organizational network. Removal of such influencers could seriously destabilize the organization. More detail can be found in "Brokerage qualifications in ringing operations", by Carlo Morselli and Julie Roy.
  • Betweenness centrality can be used to help microbloggers spread their reach on Twitter, with a recommendation engine that targets influencers that they should interact with in the future. This approach is described in "Making Recommendations in a Microblog to Improve the Impact of a Focal User".

4.2.3. Constraints - when not to use the Betweenness Centrality algorithm

  • Betweeness centrality makes the assumption that all communication between nodes happens along the shortest path and with the same frequency, which isn’t the case in real life. Therefore, it doesn’t give us a perfect view of the most influential nodes in a graph, but rather a good representation. Newman explains this in more detail on page 186 of Networks: An Introduction.
  • For large graphs, exact centrality computation isn’t practical. The fastest known algorithm for exactly computing betweenness of all the nodes requires at least O(nm) time for unweighted graphs, where n is the number of nodes and m is the number of relationships. Instead, we can use an approximation algorithm that works with a subset of nodes.

4.2.4. Betweenness Centrality algorithm sample

People with high betweenness tend to be the innovators and brokers in social networks. They combine different perspectives, transfer ideas between groups, and get power from their ability to make introductions and pull strings.

The following will create a sample graph: 

MERGE (nAlice:User {id:'Alice'})
MERGE (nBridget:User {id:'Bridget'})
MERGE (nCharles:User {id:'Charles'})
MERGE (nDoug:User {id:'Doug'})
MERGE (nMark:User {id:'Mark'})
MERGE (nMichael:User {id:'Michael'})

MERGE (nAlice)-[:MANAGE]->(nBridget)
MERGE (nAlice)-[:MANAGE]->(nCharles)
MERGE (nAlice)-[:MANAGE]->(nDoug)
MERGE (nMark)-[:MANAGE]->(nAlice)
MERGE (nCharles)-[:MANAGE]->(nMichael);

The following will run the algorithm and stream results: 

CALL algo.betweenness.stream('User','MANAGE',{direction:'out'})
YIELD nodeId, centrality

MATCH (user:User) WHERE id(user) = nodeId

RETURN user.id AS user,centrality
ORDER BY centrality DESC;

The following will run the algorithm and write back results: 

CALL algo.betweenness('User','MANAGE', {direction:'out',write:true, writeProperty:'centrality'})
YIELD nodes, minCentrality, maxCentrality, sumCentrality, loadMillis, computeMillis, writeMillis;

Table 4.7. Results
Name Centrality weight

Alice

4

Charles

2

Bridget

0

Michael

0

Doug

0

Mark

0

We can see that Alice is the main broker in this network, and Charles is a minor broker. The others don’t have any influence, because all the shortest paths between pairs of people go via Alice or Charles.

4.2.5. Approximation of Betweenness Centrality

As mentioned above, calculating the exact betweenness centrality on large graphs can be very time consuming. Therefore, you might choose to use an approximation algorithm that will run much quicker, and still provide useful information.

4.2.5.1. RA-Brandes algorithm

The RA-Brandes algorithm is the best known algorithm for calculating an approximate score for betweenness centrality. Rather than calculating the shortest path between every pair of nodes, the RA-Brandes algorithm considers only a subset of nodes. Two common strategies for selecting the subset of nodes are:

random
Nodes are selected uniformly, at random, with defined probability of selection. The default probability is log10(N) / e^2. If the probability is 1, then the algorithm works the same way as the normal Betweenness Centrality algorithm, where all nodes are loaded.
degree
First, the mean degree of the nodes is calculated, and then only the nodes whose degree is higher than the mean are visited (i.e. only dense nodes are visited).

As a further optimisation, you can choose to limit the depth used by the shortest path algorithm. This can be controlled by the maxDepth parameter.

The following will run the algorithm and stream results: 

CALL algo.betweenness.sampled.stream('User','MANAGE',
  {strategy:'random', probability:1.0, maxDepth:1, direction: "out"})

YIELD nodeId, centrality

MATCH (user) WHERE id(user) = nodeId
RETURN user.id AS user,centrality
ORDER BY centrality DESC;

The following will run the algorithm and write back results: 

CALL algo.betweenness.sampled('User','MANAGE',
  {strategy:'random', probability:1.0, writeProperty:'centrality', maxDepth:1, direction: "out"})
YIELD nodes, minCentrality, maxCentrality

Table 4.8. Results
Name Centrality weight

Alice

3

Charles

1

Bridget

0

Michael

0

Doug

0

Mark

0

Alice is still the main broker in the network, and Charles is a minor broker, although their centrality score has reduced as the algorithm only considers relationships at a depth of 1. The others don’t have any influence, because all the shortest paths between pairs of people go via Alice or Charles.

4.2.6. Example usage

4.2.7. Syntax

The following will run the Brandes algorithm and write back results: 

CALL algo.betweenness(label:String, relationship:String,
  {direction:'out',write:true, stats:true, writeProperty:'centrality',concurrency:1})
YIELD nodes, minCentrality, maxCentrality, sumCentrality, loadMillis, computeMillis, writeMillis
- calculates betweenness centrality and potentially writes back

Table 4.9. Parameters
Name Type Default Optional Description

label

string

null

yes

The label to load from the graph. If null, load all nodes

relationship

string

null

yes

The relationship-type to load from the graph. If null, load all relationships

direction

string

outgoing

yes

The relationship direction to load from the graph. If 'both', treats the relationships as undirected

write

boolean

true

yes

Specifies if the result should be written back as a node property

stats

boolean

true

yes

Specifies if stats about centrality should be returned

writeProperty

string

'centrality'

yes

The property name written back to

graph

string

'heavy'

yes

Use 'heavy' when describing the subset of the graph with label and relationship-type parameter. Use 'cypher' for describing the subset with cypher node-statement and relationship-statement

concurrency

int

available CPUs

yes

The number of concurrent threads

Table 4.10. Results
Name Type Description

nodes

int

The number of nodes considered

minCentrality

int

The minimum centrality value

maxCentrality

int

The maximum centrality value

sumCentrality

int

The sum of all centrality values

loadMillis

int

Milliseconds for loading data

evalMillis

int

Milliseconds for running the algorithm

writeMillis

int

Milliseconds for writing result data back

The following will run the Brandes algorithm and stream results: 

CALL algo.betweenness.stream(label:String, relationship:String,
{direction:'out',concurrency:1})
YIELD nodeId, centrality - yields centrality for each node

Table 4.11. Parameters
Name Type Default Optional Description

label

string

null

yes

The label to load from the graph. If null, load all nodes

relationship

string

null

yes

The relationship-type to load from the graph. If null, load all relationships

concurrency

int

available CPUs

yes

The number of concurrent threads

direction

string

outgoing

yes

The relationship direction to load from the graph. If 'both', treats the relationships as undirected

Table 4.12. Results

Name

Type

Description

node

long

Node ID

centrality

float

Betweenness centrality weight

The following will run the RA-Brandes algorithm and write back results: 

CALL algo.betweenness.sampled(label:String, relationship:String,
  {direction:'out', strategy:'random', probability: 1, maxDepth: 4, stats:true,
 writeProperty:'centrality',concurrency:1})
YIELD nodes, minCentrality, maxCentrality, sumCentrality, loadMillis, computeMillis, writeMillis
- calculates betweenness centrality and potentially writes back

Table 4.13. Parameters
Name Type Default Optional Description

label

string

null

yes

The label to load from the graph. If null, load all nodes

relationship

string

null

yes

The relationship-type to load from the graph. If null, load all nodes

direction

string

outgoing

yes

The relationship direction to load from the graph. If 'both', treats the relationships as undirected

write

boolean

true

yes

Specifies if the result should be written back as a node property

strategy

string

'random'

yes

The node selection strategy

probability

float

log10(N) / e^2

yes

The probability a node is selected. Values between 0 and 1. If 1, selects all nodes and works like original Brandes algorithm

maxDepth

int

Integer.MAX

yes

The depth of the shortest paths traversal

stats

boolean

true

yes

Specifies if stats about centrality should be returned

writeProperty

string

'centrality'

yes

The property name written back to

graph

string

'heavy'

yes

Use 'heavy' when describing the subset of the graph with label and relationship-type parameter. Use 'cypher' for describing the subset with cypher node-statement and relationship-statement

concurrency

int

available CPUs

yes

The number of concurrent threads

Table 4.14. Results
Name Type Description

nodes

int

The number of nodes considered

minCentrality

int

The minimum centrality value

maxCentrality

int

The maximum centrality value

sumCentrality

int

The sum of all centrality values

loadMillis

int

Milliseconds for loading data

evalMillis

int

Milliseconds for running the algorithm

writeMillis

int

Milliseconds for writing result data back

The following will run the RA-Brandes algorithm and stream results: 

CALL algo.betweenness.sampled.stream(label:String, relationship:String,
  {direction:'out',concurrency:1, strategy:'random', probability: 1, maxDepth: 4})
YIELD nodeId, centrality - yields centrality for each node

Table 4.15. Parameters
Name Type Default Optional Description

label

string

null

yes

The label to load from the graph. If null, load all nodes

relationship

string

null

yes

The relationship-type to load from the graph. If null, load all relationships

concurrency

int

available CPUs

yes

The number of concurrent threads

direction

string

outgoing

yes

The relationship direction to load from the graph. If 'both', treats the relationships as undirected

strategy

string

'random'

yes

The node selection strategy

probability

float

log10(N) / e^2

yes

The probability a node is selected. Values between 0 and 1. If 1, selects all nodes and works like original Brandes algorithm

maxDepth

int

Integer.MAX

yes

The depth of the shortest paths traversal

Table 4.16. Results

Name

Type

Description

node

long

Node ID

centrality

float

Betweenness centrality weight

4.2.8. Cypher projection

If label and relationship-type are not selective enough to describe your subgraph to run the algorithm on, you can use Cypher statements to load or project subsets of your graph. This can also be used to run algorithms on a virtual graph.

Set graph:'cypher' in the config: 

CALL algo.betweenness(
  'MATCH (p:User) RETURN id(p) as id',
  'MATCH (p1:User)-[:MANAGE]->(p2:User) RETURN id(p1) as source, id(p2) as target',
  {graph:'cypher', write: true}
);

4.2.9. Graph type support

The Betweenness Centrality algorithm supports the following graph types:

  • ✓ directed, unweighted

    • loading incoming relationships: 'INCOMING','IN','I' or '<'
    • loading outgoing relationships: 'OUTGOING','OUT','O' or '>'
  • ❏ directed, weighted
  • ✓ undirected, unweighted

    • direction:'both' or '<>'
  • ❏ undirected, weighted

4.2.10. Implementations

algo.betweenness()

  • Implementation of brandes-bc algorithm and nodePartitioning extension.
  • If concurrency parameter is set (and >1), ParallelBetweennessCentrality is used.
  • ParallelBC spawns N(given by the concurrency param) concurrent threads for calculation, where each one calculates the BC for one node at a time.

algo.betweenness.exp1()

  • Brandes-like algorithm, which uses successor sets instead of predecessor sets.
  • The algorithm is based on Brandes definition, but with some changes regarding the dependency-accumulation step.
  • Does not support undirected graph

algo.betweenness.sampled()

  • Calculates betweenness-dependencies on a subset of pivot nodes (instead of all nodes). 2 randomization strategies are implemented, which can be set using the optional argument strategy: random selection(default): strategy:'random': (takes optional argument probability:double(0-1) or log10(N) / e^2 as default)
  • Degree based randomization: strategy:'degree': (makes dense nodes more likely)
  • Optional Arguments: maxDepth:int