6.5. The Triangle Counting / Clustering Coefficient algorithm

This section describes the Triangle Count or Clustering Coefficient algorithm in the Neo4j Graph Algorithms library.

Triangle counting is a community detection graph algorithm that is used to determine the number of triangles passing through each node in the graph. A triangle is a set of three nodes, where each node has a relationship to all other nodes.

This section includes:

6.5.1. History and explanation

Triangle counting gained popularity in social network analysis, where it is used to detect communities and measure the cohesiveness of those communities. It can also be used to determine the stability of a graph, and is often used as part of the computation of network indices, such as the clustering coefficient.

There are two types of clustering coefficient:

Local clustering coefficient
The local clustering coefficient of a node is the likelihood that its neighbours are also connected. The computation of this score involves triangle counting.
Global clustering coefficient
The global clustering coefficient is the normalized sum of those local clustering coefficients.

The transitivity coefficient of a graph is sometimes used, which is three times the number of triangles divided by the number of triples in the graph. For more information, see "Finding, Counting and Listing all Triangles in Large Graphs, An Experimental Study".

6.5.3. Triangle Counting / Clustering Coefficient algorithm sample The following will create a sample graph:

``````MERGE (alice:Person{id:"Alice"})
MERGE (michael:Person{id:"Michael"})
MERGE (karin:Person{id:"Karin"})
MERGE (chris:Person{id:"Chris"})
MERGE (will:Person{id:"Will"})
MERGE (mark:Person{id:"Mark"})

MERGE (michael)-[:KNOWS]->(karin)
MERGE (michael)-[:KNOWS]->(chris)
MERGE (will)-[:KNOWS]->(michael)
MERGE (mark)-[:KNOWS]->(michael)
MERGE (mark)-[:KNOWS]->(will)
MERGE (alice)-[:KNOWS]->(michael)
MERGE (will)-[:KNOWS]->(chris)
MERGE (chris)-[:KNOWS]->(karin);``````

The following will return a stream of triples, with `nodeId` for each triangle:

``````CALL algo.triangle.stream('Person','KNOWS')
YIELD nodeA,nodeB,nodeC

RETURN algo.asNode(nodeA).id AS nodeA, algo.asNode(nodeB).id AS nodeB, algo.asNode(nodeC).id AS nodeC``````

Table 6.24. Results
nodeA nodeB nodeC

Will

Michael

Chris

Will

Mark

Michael

Michael

Karin

Chris

We can see that there are `KNOWS` triangles containing "Will, Michael, and Chris", "Will, Mark, and Michael", and "Michael, Karin, and Chris". This means that everybody in the triangle knows each other.

The following will count the number of triangles that a node is member of, and write it back. It will return the total triangle count and average clustering coefficient of the given graph:

``````CALL algo.triangleCount('Person', 'KNOWS',
{concurrency:4, write:true, writeProperty:'triangles',clusteringCoefficientProperty:'coefficient'})
YIELD loadMillis, computeMillis, writeMillis, nodeCount, triangleCount, averageClusteringCoefficient;``````

The following will count the number of triangles that a node is member of, and return a stream with `nodeId` and `triangleCount`

``````CALL algo.triangleCount.stream('Person', 'KNOWS', {concurrency:4})
YIELD nodeId, triangles, coefficient

RETURN algo.asNode(nodeId).id AS name, triangles, coefficient
ORDER BY coefficient DESC``````

Table 6.25. Results
Name Triangles Coefficient

Karin

1

1

Mark

1

1

Chris

2

0.6666666666666666

Will

2

0.6666666666666666

Michael

3

0.3

Alice

0

0

We learn that Michael is part of the most triangles, but it’s Karin and Mark who are the best at introducing their friends - all of the people who know them, know each other!

6.5.4. Example usage

In graph theory, a clustering coefficient is a measure of the degree to which nodes in a graph tend to cluster together. Evidence suggests that in most real-world networks, and in particular social networks, nodes tend to create tightly knit groups characterised by a relatively high density of ties; this likelihood tends to be greater than the average probability of a tie randomly established between two nodes.

We check if this holds true for Yelp’s social network of friends:

``````CALL algo.triangleCount('User', 'FRIEND',
{concurrency:4, write:true, writeProperty:'triangles',clusteringCoefficientProperty:'coefficient'})
YIELD loadMillis, computeMillis, writeMillis, nodeCount, triangleCount, averageClusteringCoefficient;``````

Average clustering coefficient is 0.0523, which is really low for a social network. This indicates that groups of friends are not tightly knit together, but rather sparse. We can assume that users are not on Yelp for finding and creating friends, like Facebook for example, but rather something else, like finding good restaurant recommendations.

Local triangle count and clustering coefficient of nodes can be used as features in finding influencers in social networks.

6.5.5. Cypher projection

If label and relationship-type are not selective enough to describe your subgraph to run the algorithm on, you can use Cypher statements to load or project subsets of your graph. This can also be used to run algorithms on a virtual graph. You can learn more in the Section 2.2, “Cypher projection” section of the manual.

Set `graph:'cypher'` in the config:

``````CALL algo.triangleCount(
'MATCH (p:Person) RETURN id(p) as id',
'MATCH (p1:Person)-[:KNOWS]->(p2:Person) RETURN id(p1) as source,id(p2) as target',
{concurrency:4, write:true, writeProperty:'triangle',graph:'cypher', clusteringCoefficientProperty:'coefficient'})
YIELD loadMillis, computeMillis, writeMillis, nodeCount, triangleCount, averageClusteringCoefficient``````

6.5.6. Syntax

The following will return a stream of triples with `nodeId` for each triangle:

``````CALL algo.triangle.stream(label:String, relationship:String, {concurrency:4})
YIELD nodeA, nodeB, nodeC``````

Table 6.26. Parameters
Name Type Default Optional Description

label

string

null

yes

The label to load from the graph. If null, load all nodes

relationship

string

null

yes

The relationship-type to load from the graph. If null, load all nodes

concurrency

int

available CPUs

yes

Table 6.27. Results
Name Type Description

nodeA

int

The ID of node in the given triangle

nodeB

int

The ID of node in the given triangle

nodeC

int

The ID of node in the given triangle

The following will count the number of triangles that a node is a member of, and return a stream with `nodeId` and `triangleCount`

``````CALL algo.triangleCount.stream(label:String, relationship:String, {concurrency:4})
YIELD nodeId, triangles``````

Table 6.28. Parameters
Name Type Default Optional Description

label

string

null

yes

The label to load from the graph. If null, load all nodes

relationship

string

null

yes

The relationship-type to load from the graph. If null, load all relationships

concurrency

int

available CPUs

yes

Table 6.29. Results
Name Type Description

nodeId

int

The ID of node

triangles

int

The number of triangles a node is member of

The following will count the number of triangles that a node is a member of, and write it back. It will return the total triangle count and average clustering coefficient of the given graph:

``````CALL algo.triangleCount(label:String, relationship:String,
{concurrency:4, write:true, writeProperty:'triangles', clusteringCoefficientProperty:'coefficient'})
YIELD loadMillis, computeMillis, writeMillis, nodeCount, triangleCount, averageClusteringCoefficient``````

Table 6.30. Parameters
Name Type Default Optional Description

label

string

null

yes

The label to load from the graph. If null, load all nodes

relationship

string

null

yes

The relationship-type to load from the graph. If null, load all relationships

concurrency

int

available CPUs

yes

write

boolean

true

yes

Specifies if the result should be written back as a node property

writeProperty

string

'triangles'

yes

The property name the number of triangles a node is member of is written to

clusteringCoefficientProperty

string

'coefficient'

yes

The property name clustering coefficient of the node is written to

Table 6.31. Results
Name Type Description

int

computeMillis

int

Milliseconds for running the algorithm

writeMillis

int

Milliseconds for writing result data back

postProcessingMillis

int

Milliseconds for computing percentiles and community count

nodeCount

int

The number of nodes considered

triangleCount

int

The number of triangles found

averageClusteringCoefficient

float

The average clustering coefficient of the given graph

p1

double

The 1 percentile of number of triangles.

p5

double

The 5 percentile of number of triangles.

p10

double

The 10 percentile of number of triangles.

p25

double

The 25 percentile of number of triangles.

p50

double

The 50 percentile of number of triangles.

p75

double

The 75 percentile of number of triangles.

p90

double

The 90 percentile of number of triangles.

p95

double

The 95 percentile of number of triangles.

p99

double

The 99 percentile of number of triangles.

p100

double

The 100 percentile of number of triangles.

write

boolean

Specifies if the result was written back as a node property

writeProperty

string

The property name the number of triangles a node is member of is written to

clusteringCoefficientProperty

string

The property name clustering coefficient of the node is written to

6.5.7. Graph type support

The Triangle Count algorithms support the following graph type:

• ✓ undirected, unweighted