This section describes the Betweenness Centrality algorithm in the Neo4j Graph Algorithms library.
Betweenness centrality is a way of detecting the amount of influence a node has over the flow of information in a graph. It is often used to find nodes that serve as a bridge from one part of a graph to another.
In the following example, Alice is the main connection in the graph:
If Alice is removed, all connections in the graph would be cut off. This makes Alice important, because she ensures that no nodes are isolated.
This section includes:
The Betweenness Centrality algorithm calculates the shortest (weighted) path between every pair of nodes in a connected graph, using the breadth-first search algorithm. Each node receives a score, based on the number of these shortest paths that pass through the node. Nodes that most frequently lie on these shortest paths will have a higher betweenness centrality score.
The algorithm was given its first formal definition by Linton Freeman, in his 1971 paper "A Set of Measures of Centrality Based on Betweenness". It was considered to be one of the "three distinct intuitive conceptions of centrality".
O(nm)
time for unweighted graphs, where n
is the number of nodes and m
is the number of relationships.
Instead, we can use an approximation algorithm that works with a subset of nodes.
People with high betweenness tend to be the innovators and brokers in social networks. They combine different perspectives, transfer ideas between groups, and get power from their ability to make introductions and pull strings.
The following will create a sample graph:
MERGE (nAlice:User {id:'Alice'})
MERGE (nBridget:User {id:'Bridget'})
MERGE (nCharles:User {id:'Charles'})
MERGE (nDoug:User {id:'Doug'})
MERGE (nMark:User {id:'Mark'})
MERGE (nMichael:User {id:'Michael'})
MERGE (nAlice)-[:MANAGE]->(nBridget)
MERGE (nAlice)-[:MANAGE]->(nCharles)
MERGE (nAlice)-[:MANAGE]->(nDoug)
MERGE (nMark)-[:MANAGE]->(nAlice)
MERGE (nCharles)-[:MANAGE]->(nMichael);
The following will run the algorithm and stream results:
CALL algo.betweenness.stream('User','MANAGE',{direction:'out'})
YIELD nodeId, centrality
MATCH (user:User) WHERE id(user) = nodeId
RETURN user.id AS user,centrality
ORDER BY centrality DESC;
The following will run the algorithm and write back results:
CALL algo.betweenness('User','MANAGE', {direction:'out',write:true, writeProperty:'centrality'})
YIELD nodes, minCentrality, maxCentrality, sumCentrality, loadMillis, computeMillis, writeMillis;
Name | Centrality weight |
---|---|
Alice |
4 |
Charles |
2 |
Bridget |
0 |
Michael |
0 |
Doug |
0 |
Mark |
0 |
We can see that Alice is the main broker in this network, and Charles is a minor broker. The others don’t have any influence, because all the shortest paths between pairs of people go via Alice or Charles.
As mentioned above, calculating the exact betweenness centrality on large graphs can be very time consuming. Therefore, you might choose to use an approximation algorithm that will run much quicker, and still provide useful information.
The RA-Brandes algorithm is the best known algorithm for calculating an approximate score for betweenness centrality. Rather than calculating the shortest path between every pair of nodes, the RA-Brandes algorithm considers only a subset of nodes. Two common strategies for selecting the subset of nodes are:
log10(N) / e^2
.
If the probability is 1, then the algorithm works the same way as the normal Betweenness Centrality algorithm, where all nodes
are loaded.
As a further optimisation, you can choose to limit the depth used by the shortest path algorithm.
This can be controlled by the maxDepth
parameter.
The following will run the algorithm and stream results:
CALL algo.betweenness.sampled.stream('User','MANAGE',
{strategy:'random', probability:1.0, maxDepth:1, direction: "out"})
YIELD nodeId, centrality
MATCH (user) WHERE id(user) = nodeId
RETURN user.id AS user,centrality
ORDER BY centrality DESC;
The following will run the algorithm and write back results:
CALL algo.betweenness.sampled('User','MANAGE',
{strategy:'random', probability:1.0, writeProperty:'centrality', maxDepth:1, direction: "out"})
YIELD nodes, minCentrality, maxCentrality
Name | Centrality weight |
---|---|
Alice |
3 |
Charles |
1 |
Bridget |
0 |
Michael |
0 |
Doug |
0 |
Mark |
0 |
Alice is still the main broker in the network, and Charles is a minor broker, although their centrality score has reduced as the algorithm only considers relationships at a depth of 1. The others don’t have any influence, because all the shortest paths between pairs of people go via Alice or Charles.
The following will run the Brandes algorithm and write back results:
CALL algo.betweenness(label:String, relationship:String,
{direction:'out',write:true, stats:true, writeProperty:'centrality',concurrency:1})
YIELD nodes, minCentrality, maxCentrality, sumCentrality, loadMillis, computeMillis, writeMillis
- calculates betweenness centrality and potentially writes back
Name | Type | Default | Optional | Description |
---|---|---|---|---|
label |
string |
null |
yes |
The label to load from the graph. If null, load all nodes |
relationship |
string |
null |
yes |
The relationship-type to load from the graph. If null, load all relationships |
direction |
string |
outgoing |
yes |
The relationship direction to load from the graph. If 'both', treats the relationships as undirected |
write |
boolean |
true |
yes |
Specifies if the result should be written back as a node property |
stats |
boolean |
true |
yes |
Specifies if stats about centrality should be returned |
writeProperty |
string |
'centrality' |
yes |
The property name written back to |
graph |
string |
'heavy' |
yes |
Use 'heavy' when describing the subset of the graph with label and relationship-type parameter. Use 'cypher' for describing the subset with cypher node-statement and relationship-statement |
concurrency |
int |
available CPUs |
yes |
The number of concurrent threads |
Name | Type | Description |
---|---|---|
nodes |
int |
The number of nodes considered |
minCentrality |
int |
The minimum centrality value |
maxCentrality |
int |
The maximum centrality value |
sumCentrality |
int |
The sum of all centrality values |
loadMillis |
int |
Milliseconds for loading data |
evalMillis |
int |
Milliseconds for running the algorithm |
writeMillis |
int |
Milliseconds for writing result data back |
The following will run the Brandes algorithm and stream results:
CALL algo.betweenness.stream(label:String, relationship:String,
{direction:'out',concurrency:1})
YIELD nodeId, centrality - yields centrality for each node
Name | Type | Default | Optional | Description |
---|---|---|---|---|
label |
string |
null |
yes |
The label to load from the graph. If null, load all nodes |
relationship |
string |
null |
yes |
The relationship-type to load from the graph. If null, load all relationships |
concurrency |
int |
available CPUs |
yes |
The number of concurrent threads |
direction |
string |
outgoing |
yes |
The relationship direction to load from the graph. If 'both', treats the relationships as undirected |
Name |
Type |
Description |
node |
long |
Node ID |
centrality |
float |
Betweenness centrality weight |
The following will run the RA-Brandes algorithm and write back results:
CALL algo.betweenness.sampled(label:String, relationship:String,
{direction:'out', strategy:'random', probability: 1, maxDepth: 4, stats:true,
writeProperty:'centrality',concurrency:1})
YIELD nodes, minCentrality, maxCentrality, sumCentrality, loadMillis, computeMillis, writeMillis
- calculates betweenness centrality and potentially writes back
Name | Type | Default | Optional | Description |
---|---|---|---|---|
label |
string |
null |
yes |
The label to load from the graph. If null, load all nodes |
relationship |
string |
null |
yes |
The relationship-type to load from the graph. If null, load all nodes |
direction |
string |
outgoing |
yes |
The relationship direction to load from the graph. If 'both', treats the relationships as undirected |
write |
boolean |
true |
yes |
Specifies if the result should be written back as a node property |
strategy |
string |
'random' |
yes |
The node selection strategy |
probability |
float |
log10(N) / e^2 |
yes |
The probability a node is selected. Values between 0 and 1. If 1, selects all nodes and works like original Brandes algorithm |
maxDepth |
int |
Integer.MAX |
yes |
The depth of the shortest paths traversal |
stats |
boolean |
true |
yes |
Specifies if stats about centrality should be returned |
writeProperty |
string |
'centrality' |
yes |
The property name written back to |
graph |
string |
'heavy' |
yes |
Use 'heavy' when describing the subset of the graph with label and relationship-type parameter. Use 'cypher' for describing the subset with cypher node-statement and relationship-statement |
concurrency |
int |
available CPUs |
yes |
The number of concurrent threads |
Name | Type | Description |
---|---|---|
nodes |
int |
The number of nodes considered |
minCentrality |
int |
The minimum centrality value |
maxCentrality |
int |
The maximum centrality value |
sumCentrality |
int |
The sum of all centrality values |
loadMillis |
int |
Milliseconds for loading data |
evalMillis |
int |
Milliseconds for running the algorithm |
writeMillis |
int |
Milliseconds for writing result data back |
The following will run the RA-Brandes algorithm and stream results:
CALL algo.betweenness.sampled.stream(label:String, relationship:String,
{direction:'out',concurrency:1, strategy:'random', probability: 1, maxDepth: 4})
YIELD nodeId, centrality - yields centrality for each node
Name | Type | Default | Optional | Description |
---|---|---|---|---|
label |
string |
null |
yes |
The label to load from the graph. If null, load all nodes |
relationship |
string |
null |
yes |
The relationship-type to load from the graph. If null, load all relationships |
concurrency |
int |
available CPUs |
yes |
The number of concurrent threads |
direction |
string |
outgoing |
yes |
The relationship direction to load from the graph. If 'both', treats the relationships as undirected |
strategy |
string |
'random' |
yes |
The node selection strategy |
probability |
float |
log10(N) / e^2 |
yes |
The probability a node is selected. Values between 0 and 1. If 1, selects all nodes and works like original Brandes algorithm |
maxDepth |
int |
Integer.MAX |
yes |
The depth of the shortest paths traversal |
Name |
Type |
Description |
node |
long |
Node ID |
centrality |
float |
Betweenness centrality weight |
If label and relationship-type are not selective enough to describe your subgraph to run the algorithm on, you can use Cypher statements to load or project subsets of your graph. This can also be used to run algorithms on a virtual graph. You can learn more in the Section 1.3.2, “Cypher projection” section of the manual.
Set graph:'cypher'
in the config:
CALL algo.betweenness(
'MATCH (p:User) RETURN id(p) as id',
'MATCH (p1:User)-[:MANAGE]->(p2:User) RETURN id(p1) as source, id(p2) as target',
{graph:'cypher', write: true}
);
The Betweenness Centrality algorithm supports the following graph types:
✓ directed, unweighted
✓ undirected, unweighted
algo.betweenness()
algo.betweenness.exp1()
algo.betweenness.sampled()
random selection(default): strategy:'random':
(takes optional argument probability:double(0-1) or log10(N) / e^2 as default)
strategy:'degree':
(makes dense nodes more likely)
maxDepth:int