### 9.2.3. The Closeness Centrality algorithm

This section describes the Closeness Centrality algorithm in the Neo4j Labs Graph Algorithms library.

 This is documentation for the Graph Algorithms Library, which has been deprecated by the Graph Data Science Library (GDS).

Closeness centrality is a way of detecting nodes that are able to spread information very efficiently through a graph.

The closeness centrality of a node measures its average farness (inverse distance) to all other nodes. Nodes with a high closeness score have the shortest distances to all other nodes.

 The Closeness Centrality algorithm was developed by the Neo4j Labs team and is not officially supported.

This section includes:

#### 9.2.3.1. History and explanation

For each node, the Closeness Centrality algorithm calculates the sum of its distances to all other nodes, based on calculating the shortest paths between all pairs of nodes. The resulting sum is then inverted to determine the closeness centrality score for that node.

The raw closeness centrality of a node is calculated using the following formula:

`raw closeness centrality(node) = 1 / sum(distance from node to all other nodes)`

It is more common to normalize this score so that it represents the average length of the shortest paths rather than their sum. This adjustment allow comparisons of the closeness centrality of nodes of graphs of different sizes

The formula for normalized closeness centrality is as follows:

`normalized closeness centrality(node) = (number of nodes - 1) / sum(distance from node to all other nodes)`

#### 9.2.3.2. Use-cases - when to use the Closeness Centrality algorithm

• Closeness centrality is used to research organizational networks, where individuals with high closeness centrality are in a favourable position to control and acquire vital information and resources within the organization. One such study is "Mapping Networks of Terrorist Cells" by Valdis E. Krebs.
• Closeness centrality can be interpreted as an estimated time of arrival of information flowing through telecommunications or package delivery networks where information flows through shortest paths to a predefined target. It can also be used in networks where information spreads through all shortest paths simultaneously, such as infection spreading through a social network. Find more details in "Centrality and network flow" by Stephen P. Borgatti.
• Closeness centrality has been used to estimate the importance of words in a document, based on a graph-based keyphrase extraction process. This process is described by Florian Boudin in "A Comparison of Centrality Measures for Graph-Based Keyphrase Extraction".

#### 9.2.3.3. Constraints - when not to use the Closeness Centrality algorithm

• Academically, closeness centrality works best on connected graphs. If we use the original formula on an unconnected graph, we can end up with an infinite distance between two nodes in separate connected components. This means that we’ll end up with an infinite closeness centrality score when we sum up all the distances from that node.

In practice, a variation on the original formula is used so that we don’t run into these issues.

#### 9.2.3.4. Closeness Centrality algorithm sample The following will create a sample graph:

``````MERGE (a:Node{id:"A"})
MERGE (b:Node{id:"B"})
MERGE (c:Node{id:"C"})
MERGE (d:Node{id:"D"})
MERGE (e:Node{id:"E"})

The following will run the algorithm and stream results:

``````CALL algo.closeness.stream('Node', 'LINK')
YIELD nodeId, centrality

RETURN algo.asNode(nodeId).id AS node, centrality
ORDER BY centrality DESC
LIMIT 20;``````

The following will run the algorithm and write back results:

``````CALL algo.closeness('Node', 'LINK', {write:true, writeProperty:'centrality'})

Table 9.16. Results
Name Centrality weight

C

0.6666666666666666

B

0.5714285714285714

D

0.5714285714285714

A

0.4

E

0.4

C is the best connected node in this graph, although B and D aren’t far behind. A and E don’t have close ties to many other nodes, so their scores are lower. Any node that has a direct connection to all other nodes would score 1.

Calculation:

• count farness in each msbfs-callback
• divide by N-1

`N = 5` // number of nodes

`k = N-1 = 4` // used for normalization

```     A     B     C     D     E
--|-----------------------------
A | 0     1     2     3     4       // farness between each pair of nodes
B | 1     0     1     2     3
C | 2     1     0     1     2
D | 3     2     1     0     1
E | 4     3     2     1     0
--|-----------------------------
S | 10    7     6     7     10      // raw closeness centrality
==|=============================
k/S| 0.4  0.57  0.67  0.57   0.4     // normalized closeness centrality```

#### 9.2.3.5. Huge graph projection

The default label and relationship-type projection has a limitation of 2 billion nodes and 2 billion relationships. Therefore, if our projected graph contains more than 2 billion nodes or relationships, we will need to use huge graph projection.

Set `graph:'huge'` in the config:

``````CALL algo.closeness('Node', 'LINK', {graph:'huge'})

#### 9.2.3.6. Cypher projection

If node label and relationship type are not selective enough to describe your subgraph to run the algorithm on, you can use Cypher statements to load or project subsets of your graph. This can also be used to run algorithms on a virtual graph. You can learn more in the Section 2.2, “Cypher projection” section of the manual.

Set `graph:'cypher'` in the config:

``````CALL algo.closeness(
'MATCH (p:Node) RETURN id(p) as id',
'MATCH (p1:Node)-[:LINK]->(p2:Node) RETURN id(p1) as source, id(p2) as target',
{graph:'cypher', write: true}
);``````

#### 9.2.3.7. Syntax

The following will run the algorithm and write back results:

``````CALL algo.closeness(label:String, relationship:String,
{write:true, writeProperty:'centrality',graph:'huge', concurrency:4})

Table 9.17. Parameters
Name Type Default Optional Description

label

string

null

yes

The label to load from the graph. If null, load all nodes.

relationship

string

null

yes

The relationship type to load from the graph. If null, load all relationships.

write

boolean

true

yes

Specifies if the result should be written back as a node property.

concurrency

int

available CPUs

yes

The number of concurrent threads used for running the algorithm. Also provides the default value for 'readConcurrency' and 'writeConcurrency'.

int

value of 'concurrency'

yes

writeConcurrency

int

value of 'concurrency'

yes

The number of concurrent threads used for writing the result.

writeProperty

string

'centrality'

yes

The property name written back to.

graph

string

'huge'

yes

Use 'huge' when describing the subset of the graph with label and relationship-type parameter. Use 'cypher' for describing the subset with cypher node statement and relationship statement.

Table 9.18. Results
Name Type Description

nodes

int

The number of nodes considered.

int

evalMillis

int

Milliseconds for running the algorithm.

writeMillis

int

Milliseconds for writing result data back.

The following will run the algorithm and stream results:

``````CALL algo.closeness.stream(label:String, relationship:String, {concurrency:4})
YIELD nodeId, centrality``````

Table 9.19. Parameters
Name Type Default Optional Description

label

string

null

yes

The label to load from the graph. If null, load all nodes.

relationship

string

null

yes

The relationship type to load from the graph. If null, load all relationships.

concurrency

int

available CPUs

yes

The number of concurrent threads used for running the algorithm. Also provides the default value for 'readConcurrency'.

int

value of 'concurrency'

yes

graph

string

'huge'

yes

Use 'huge' when describing the subset of the graph with label and relationship-type parameter. Use 'cypher' for describing the subset with cypher node statement and relationship statement.

Table 9.20. Results
Name Type Description

node

long

Node ID

centrality

float

Closeness centrality weight

#### 9.2.3.8. Graph type support

The Closeness Centrality algorithm supports the following graph types:

• ✓ directed, unweighted
• ❏ directed, weighted
• ✓ undirected, unweighted

• Only with cypher projection
• ❏ undirected, weighted