5.5. The Triangle Counting / Clustering Coefficient algorithm

This section describes the Triangle Count or Clustering Coefficient algorithm in the Neo4j Graph Algorithms library.

Triangle counting is a community detection graph algorithm that is used to determine the number of triangles passing through each node in the graph. A triangle is a set of three nodes, where each node has a relationship to all other nodes.

5.5.1. History and explanation

Triangle counting gained popularity in social network analysis, where it is used to detect communities and measure the cohesiveness of those communities. It can also be used to determine the stability of a graph, and is often used as part of the computation of network indices, such as the clustering coefficient.

There are two types of clustering coefficient:

Local clustering coefficient
The local clustering coefficient of a node is the likelihood that its neighbours are also connected. The computation of this score involves triangle counting.
Global clustering coefficient
The global clustering coefficient is the normalized sum of those local clustering coefficients.

The transitivity coefficient of a graph is sometimes used, which is three times the number of triangles divided by the number of triples in the graph. For more information, see "Finding, Counting and Listing all Triangles in Large Graphs, An Experimental Study".

5.5.2. Use-cases - when to use the Triangle Counting / Clustering Coefficient algorithm

5.5.3. Triangle Counting / Clustering Coefficient algorithm sample

triangle count

The following will create a sample graph: 

MERGE (alice:Person{id:"Alice"})
MERGE (michael:Person{id:"Michael"})
MERGE (karin:Person{id:"Karin"})
MERGE (chris:Person{id:"Chris"})
MERGE (will:Person{id:"Will"})
MERGE (mark:Person{id:"Mark"})

MERGE (michael)-[:KNOWS]->(karin)
MERGE (michael)-[:KNOWS]->(chris)
MERGE (will)-[:KNOWS]->(michael)
MERGE (mark)-[:KNOWS]->(michael)
MERGE (mark)-[:KNOWS]->(will)
MERGE (alice)-[:KNOWS]->(michael)
MERGE (will)-[:KNOWS]->(chris)
MERGE (chris)-[:KNOWS]->(karin);

The following will return a stream of triples, with nodeId for each triangle: 

CALL algo.triangle.stream('Person','KNOWS')
YIELD nodeA,nodeB,nodeC

MATCH (a:Person) WHERE id(a) = nodeA
MATCH (b:Person) WHERE id(b) = nodeB
MATCH (c:Person) WHERE id(c) = nodeC

RETURN a.id AS nodeA, b.id AS nodeB, c.id AS nodeC

Table 5.22. Results
nodeA nodeB nodeC

Will

Michael

Chris

Will

Mark

Michael

Michael

Karin

Chris

We can see that there are KNOWS triangles containing "Will, Michael, and Chris", "Will, Mark, and Michael", and "Michael, Karin, and Chris". This means that everybody in the triangle knows each other.

The following will count the number of triangles that a node is member of, and write it back. It will return the total triangle count and average clustering coefficient of the given graph: 

CALL algo.triangleCount('Person', 'KNOWS',
  {concurrency:4, write:true, writeProperty:'triangles',clusteringCoefficientProperty:'coefficient'})
YIELD loadMillis, computeMillis, writeMillis, nodeCount, triangleCount, averageClusteringCoefficient;

The following will count the number of triangles that a node is member of, and return a stream with nodeId and triangleCount

CALL algo.triangleCount.stream('Person', 'KNOWS', {concurrency:4})
YIELD nodeId, triangles, coefficient

MATCH (p:Person) WHERE id(p) = nodeId

RETURN p.id AS name, triangles, coefficient
ORDER BY coefficient DESC

Table 5.23. Results
Name Triangles Coefficient

Karin

1

1

Mark

1

1

Chris

2

0.6666666666666666

Will

2

0.6666666666666666

Michael

3

0.3

Alice

0

0

We learn that Michael is part of the most triangles, but it’s Karin and Mark who are the best at introducing their friends - all of the people who know them, know each other!

5.5.4. Example usage

In graph theory, a clustering coefficient is a measure of the degree to which nodes in a graph tend to cluster together. Evidence suggests that in most real-world networks, and in particular social networks, nodes tend to create tightly knit groups characterised by a relatively high density of ties; this likelihood tends to be greater than the average probability of a tie randomly established between two nodes.

We check if this holds true for Yelp’s social network of friends:

CALL algo.triangleCount('User', 'FRIEND',
  {concurrency:4, write:true, writeProperty:'triangles',clusteringCoefficientProperty:'coefficient'})
YIELD loadMillis, computeMillis, writeMillis, nodeCount, triangleCount, averageClusteringCoefficient;

Average clustering coefficient is 0.0523, which is really low for a social network. This indicates that groups of friends are not tightly knit together, but rather sparse. We can assume that users are not on Yelp for finding and creating friends, like Facebook for example, but rather something else, like finding good restaurant recommendations.

Local triangle count and clustering coefficient of nodes can be used as features in finding influencers in social networks.

5.5.5. Syntax

The following will return a stream of triples with nodeId for each triangle: 

CALL algo.triangle.stream(label:String, relationship:String, {concurrency:4})
YIELD nodeA, nodeB, nodeC

Table 5.24. Parameters
Name Type Default Optional Description

label

string

null

yes

The label to load from the graph. If null, load all nodes

relationship

string

null

yes

The relationship-type to load from the graph. If null, load all nodes

concurrency

int

available CPUs

yes

The number of concurrent threads

Table 5.25. Results
Name Type Description

nodeA

int

The ID of node in the given triangle

nodeB

int

The ID of node in the given triangle

nodeC

int

The ID of node in the given triangle

The following will count the number of triangles that a node is a member of, and return a stream with nodeId and triangleCount

CALL algo.triangleCount.stream(label:String, relationship:String, {concurrency:4})
YIELD nodeId, triangles

Table 5.26. Parameters
Name Type Default Optional Description

label

string

null

yes

The label to load from the graph. If null, load all nodes

relationship

string

null

yes

The relationship-type to load from the graph. If null, load all relationships

concurrency

int

available CPUs

yes

The number of concurrent threads

Table 5.27. Results
Name Type Description

nodeId

int

The ID of node

triangles

int

The number of triangles a node is member of

The following will count the number of triangles that a node is a member of, and write it back. It will return the total triangle count and average clustering coefficient of the given graph: 

CALL algo.triangleCount(label:String, relationship:String,
    {concurrency:4, write:true, writeProperty:'triangles', clusteringCoefficientProperty:'coefficient'})
YIELD loadMillis, computeMillis, writeMillis, nodeCount, triangleCount, averageClusteringCoefficient

Table 5.28. Parameters
Name Type Default Optional Description

label

string

null

yes

The label to load from the graph. If null, load all nodes

relationship

string

null

yes

The relationship-type to load from the graph. If null, load all relationships

concurrency

int

available CPUs

yes

The number of concurrent threads

write

boolean

true

yes

Specifies if the result should be written back as a node property

writeProperty

string

'triangles'

yes

The property name the number of triangles a node is member of is written to

clusteringCoefficientProperty

string

'coefficient'

yes

The property name clustering coefficient of the node is written to

Table 5.29. Results
Name Type Description

nodeCount

int

The number of nodes considered

loadMillis

int

Milliseconds for loading data

evalMillis

int

Milliseconds for running the algorithm

writeMillis

int

Milliseconds for writing result data back

triangleCount

int

The number of triangles in the given graph

averageClusteringCoefficient

float

The average clustering coefficient of the given graph

5.5.6. Cypher projection

If label and relationship-type are not selective enough to describe your subgraph to run the algorithm on, you can use Cypher statements to load or project subsets of your graph. This can also be used to run algorithms on a virtual graph.

Set graph:'cypher' in the config: 

CALL algo.triangleCount(
  'MATCH (p:Person) RETURN id(p) as id',
  'MATCH (p1:Person)-[:KNOWS]->(p2:Person) RETURN id(p1) as source,id(p2) as target',
  {concurrency:4, write:true, writeProperty:'triangle',graph:'cypher', clusteringCoefficientProperty:'coefficient'})
YIELD loadMillis, computeMillis, writeMillis, nodeCount, triangleCount, averageClusteringCoefficient

5.5.7. Graph type support

The Triangle Count algorithms support the following graph type:

  • ✓ undirected, unweighted