Approximate Maximum k-cut

This section describes the Approximate Maximum k-cut algorithm in the Neo4j Graph Data Science library.

1. Introduction

A k-cut of a graph is an assignment of its nodes into k disjoint communities. So for example a 2-cut of a graph with nodes a,b,c,d could be the communities {a,b,c} and {d}.

A Maximum k-cut is a k-cut such that the total weight of relationships between nodes from different communities in the k-cut is maximized. That is, a k-cut that maximizes the sum of weights of relationships whose source and target nodes are assigned to different communities in the k-cut. Suppose in the simple a,b,c,d node set example above we only had one relationship b → c, and it was of weight 1.0. The 2-cut we outlined above would then not be a maximum 2-cut (with a cut cost of 0.0), whereas for example the 2-cut with communities {a,b} and {c,d} would be one (with a cut cost of 1.0).

Maximum k-cut is the same as Maximum Cut when k = 2.

This algorithm is in the alpha tier. For more information on algorithm tiers, see Algorithms.

1.1. Applications

Finding the maximum k-cut for a graph has several known applications, for example it is used to:

  • analyze protein interaction

  • design circuit (VLSI) layouts

  • solve wireless communication problems

  • analyze cryptocurrency transaction patterns

  • design computer networks

1.2. Approximation

In practice, finding the best cut is not feasible for larger graphs and only an approximation can be computed in reasonable time.

The approximate heuristic algorithm implemented in GDS is a parallelized GRASP style algorithm optionally enhanced (via config) with variable neighborhood search (VNS).

For detailed information about a serial version of the algorithm, with a slightly different construction phase, when k = 2 see GRASP+VNR in the paper:

To see how the algorithm above performs in terms of solution quality compared to other algorithms when k = 2 see FES02GV in the paper:

By the stochastic nature of the algorithm, the results it yields will not be deterministic unless running single-threaded (concurrency = 1) and using the same random seed (randomSeed = SOME_FIXED_VALUE).

2. Tuning the algorithm parameters

There are two important algorithm specific parameters which lets you trade solution quality for shorter runtime.

2.1. Iterations

GRASP style algorithms are iterative by nature. Every iteration they run the same well-defined steps to derive a solution, but each time with a different random seed yielding solutions that (highly likely) are different too. In the end the highest scoring solution is picked as the winner.

2.2. VNS max neighborhood order

Variable neighborhood search (VNS) works by slightly perturbing a locally optimal solution derived from the previous steps in an iteration of the algorithm, followed by locally optimizing this perturbed solution. Perturb in this case means to randomly move some nodes from their current (locally optimal) community to another community.

VNS will in turn move 1,2,…​,vnsMaxNeighborhoodOrder random nodes and using each of the resulting solutions try to find a new locally optimal solution that’s better. This means that although potentially better solutions can be derived using VNS it will take more time, and additionally some more memory will be needed to temporarily store the perturbed solutions.

By default, VNS is not used (vnsMaxNeighborhoodOrder = 0). To use it, experimenting with a maximum order equal to 20 is a good place to start.

3. Syntax

This section covers the syntax used to execute the Approximate Maximum k-cut algorithm in each of its execution modes. We are describing the named graph variant of the syntax. To learn more about general syntax variants, see Syntax overview.

Example 1. Approximate Maximum k-cut syntax per mode
Run Approximate Maximum k-cut in stream mode on a named graph.
CALL gds.alpha.maxkcut.stream(
  graphName: String,
  configuration: Map
) YIELD
  nodeId: Integer,
  communityId: Integer
Table 1. Parameters
Name Type Default Optional Description

graphName

String

n/a

no

The name of a graph stored in the catalog.

configuration

Map

{}

yes

Configuration for algorithm-specifics and/or graph filtering.

Table 2. General configuration for algorithm execution on a named graph.
Name Type Default Optional Description

nodeLabels

List of String

['*']

yes

Filter the named graph using the given node labels.

relationshipTypes

List of String

['*']

yes

Filter the named graph using the given relationship types.

concurrency

Integer

4

yes

The number of concurrent threads used for running the algorithm.

Table 3. Algorithm specific configuration
Name Type Default Optional Description

k

Integer

2

yes

The number of disjoint communities the nodes will be divided into.

iterations

Integer

8

yes

The number of iterations the algorithm will run before returning the best solution among all the iterations.

vnsMaxNeighborhoodOrder

Integer

0 (VNS off)

yes

The maximum number of nodes VNS will swap when perturbing solutions.

randomSeed

Integer

n/a

yes

A random seed which is used for all randomness in the computation. Requires concurrency = 1.

relationshipWeightProperty

String

null

yes

If set, the values stored at the given property are used as relationship weights during the computation. If not set, the graph is considered unweighted.

Table 4. Results
Name Type Description

nodeId

Integer

Node ID.

communityId

Integer

Community ID.

Run Approximate Maximum k-cut in mutate mode on a named graph.
CALL gds.alpha.maxkcut.mutate(
  graphName: String,
  configuration: Map
) YIELD
  cutCost: Float,
  createMillis: Integer,
  computeMillis: Integer,
  postProcessingMillis: Integer,
  mutateMillis: Integer,
  nodePropertiesWritten: Integer,
  configuration: Map
Table 5. Parameters
Name Type Default Optional Description

graphName

String

n/a

no

The name of a graph stored in the catalog.

configuration

Map

{}

yes

Configuration for algorithm-specifics and/or graph filtering.

Table 6. General configuration for algorithm execution on a named graph.
Name Type Default Optional Description

nodeLabels

List of String

['*']

yes

Filter the named graph using the given node labels.

relationshipTypes

List of String

['*']

yes

Filter the named graph using the given relationship types.

concurrency

Integer

4

yes

The number of concurrent threads used for running the algorithm.

mutateProperty

String

n/a

no

The node property in the GDS graph to which the approximate maximum k-cut is written.

Table 7. Algorithm specific configuration
Name Type Default Optional Description

k

Integer

2

yes

The number of disjoint communities the nodes will be divided into.

iterations

Integer

8

yes

The number of iterations the algorithm will run before returning the best solution among all the iterations.

vnsMaxNeighborhoodOrder

Integer

0 (VNS off)

yes

The maximum number of nodes VNS will swap when perturbing solutions.

randomSeed

Integer

n/a

yes

A random seed which is used for all randomness in the computation. Requires concurrency = 1.

relationshipWeightProperty

String

null

yes

If set, the values stored at the given property are used as relationship weights during the computation. If not set, the graph is considered unweighted.

Table 8. Results
Name Type Description

cutCost

Float

Sum of weights of all relationships connecting nodes from different communities.

createMillis

Integer

Milliseconds for creating the graph.

computeMillis

Integer

Milliseconds for running the algorithm.

postProcessingMillis

Integer

Milliseconds for computing the statistics.

mutateMillis

Integer

Milliseconds for adding properties to the in-memory graph.

nodePropertiesWritten

Integer

Number of properties added to the in-memory graph.

configuration

Map

Configuration used for running the algorithm.

4. Examples

In this section we will show examples of running the Approximate Maximum k-cut algorithm on a concrete graph. The intention is to illustrate what the results look like and to provide a guide in how to make use of the algorithm in a real setting. We will do this on a small Bitcoin transactions graph of a handful nodes connected in a particular pattern. The example graph looks like this:

Visualization of the example graph
The following Cypher statement will create the example graph in the Neo4j database:
CREATE
  (alice:Person {name: 'Alice'}),
  (bridget:Person {name: 'Bridget'}),
  (charles:Person {name: 'Charles'}),
  (doug:Person {name: 'Doug'}),
  (eric:Person {name: 'Eric'}),
  (fiona:Person {name: 'Fiona'}),
  (george:Person {name: 'George'}),
  (alice)-[:TRANSACTION {value: 81.0}]->(bridget),
  (alice)-[:TRANSACTION {value: 7.0}]->(doug),
  (bridget)-[:TRANSACTION {value: 1.0}]->(doug),
  (bridget)-[:TRANSACTION {value: 1.0}]->(eric),
  (bridget)-[:TRANSACTION {value: 1.0}]->(fiona),
  (bridget)-[:TRANSACTION {value: 1.0}]->(george),
  (charles)-[:TRANSACTION {value: 45.0}]->(bridget),
  (charles)-[:TRANSACTION {value: 3.0}]->(eric),
  (doug)-[:TRANSACTION {value: 3.0}]->(charles),
  (doug)-[:TRANSACTION {value: 1.0}]->(bridget),
  (eric)-[:TRANSACTION {value: 1.0}]->(bridget),
  (fiona)-[:TRANSACTION {value: 3.0}]->(alice),
  (fiona)-[:TRANSACTION {value: 1.0}]->(bridget),
  (george)-[:TRANSACTION {value: 1.0}]->(bridget),
  (george)-[:TRANSACTION {value: 4.0}]->(charles)

With the graph in Neo4j we can now project it into the graph catalog to prepare it for algorithm execution. We do this using a native projection targeting the Person nodes and the TRANSACTION relationships.

The following statement will create a graph store it in the graph catalog under the name 'myGraph'.
CALL gds.graph.create(
  'myGraph',
  'Person',
  {
    TRANSACTION: {
      properties: ['value']
    }
  }
)

4.1. Memory Estimation

First off, we will estimate the cost of running the algorithm using the estimate procedure. This can be done with any execution mode. We will use the mutate mode in this example. Estimating the algorithm is useful to understand the memory impact that running the algorithm on your graph will have. When you later actually run the algorithm in one of the execution modes the system will perform an estimation. If the estimation shows that there is a very high probability of the execution going over its memory limitations, the execution is prohibited. To read more about this, see Automatic estimation and execution blocking.

For more details on estimate in general, see Memory Estimation.

The following will estimate the memory requirements for running the algorithm:
CALL gds.alpha.maxkcut.mutate.estimate('myGraph', {mutateProperty: 'community'})
YIELD nodeCount, relationshipCount, bytesMin, bytesMax, requiredMemory
Table 9. Results
nodeCount relationshipCount bytesMin bytesMax requiredMemory

7

15

488

488

"488 Bytes"

4.2. Mutate

The mutate execution mode extends the stats mode with an important side effect: updating the named graph with a new node property containing the approximate maximum k-cut for that node. The name of the new property is specified using the mandatory configuration parameter mutateProperty. The result is a single summary row, similar to stats, but with some additional metrics. The mutate mode is especially useful when multiple algorithms are used in conjunction.

For more details on the mutate mode in general, see Mutate.

The following will run the algorithm in mutate mode:
CALL gds.alpha.maxkcut.mutate('myGraph', {mutateProperty: 'community'})
YIELD cutCost, nodePropertiesWritten
Table 10. Results
cutCost nodePropertiesWritten

13.0

7

We can see that when relationship weight is not taken into account we derive a cut into two (since we didn’t override the default k = 2) communities of cost 13.0. The total cost is represented by the cutCost column here. This is the value we want to be as high as possible. Additionally, the graph 'myGraph' now has a node property community which stores the community to which each node belongs.

To inspect which community each node belongs to we can stream node properties.

Stream node properties:
CALL gds.graph.streamNodeProperty('myGraph', 'community')
YIELD nodeId, propertyValue
RETURN gds.util.asNode(nodeId).name as name, propertyValue AS community
Table 11. Results
name community

"Alice"

0

"Bridget"

0

"Charles"

0

"Doug"

1

"Eric"

1

"Fiona"

1

"George"

1

Looking at our graph topology we can see that there are no relationships between the nodes of community 1, and two relationships between the nodes of community 0, namely Alice → Bridget and Charles → Bridget. However, since there are a total of eight relationships between Bridget and nodes of community 1, and our graph is unweighted assigning Bridget to community 1 would not yield a cut of a higher total weight. Thus, since the number of relationships connecting nodes of different communities greatly outnumber the number of relationships connecting nodes of the same community it seems like a good solution. In fact, this is the maximum 2-cut for this graph.

Because of the inherent randomness in the Approximate Maximum k-Cut algorithm (unless having concurrency = 1 and fixed randomSeed), running it another time might yield a different solution. For our case here it would be equally plausible to get the inverse solution, i.e. when our community 0 nodes are mapped to community 1 instead, and vice versa. Note however, that for that solution the cut cost would remain the same.

4.3. Mutate with relationship weights

In this example we will have a look at how adding relationship weight can affect our solution.

The following will run the algorithm in mutate mode, diving our nodes into two communities once again:
CALL gds.alpha.maxkcut.mutate(
   'myGraph',
   {
        relationshipWeightProperty: 'value',
        mutateProperty: 'weightedCommunity'
    }
)
YIELD cutCost, nodePropertiesWritten
Table 12. Results
cutCost nodePropertiesWritten

146.0

7

Since the value properties on our TRANSACTION relationships were all at least 1.0 and several of a larger value it’s not surprising that we obtain a cut with a larger cost in the weighted case.

Let us now stream node properties to once again inspect the node community distribution.

Stream node properties:
CALL gds.graph.streamNodeProperties('myGraph', 'weightedCommunity')
YIELD nodeId, propertyValue
RETURN gds.util.asNode(nodeId).name as name, propertyValue AS weightedCommunity
Table 13. Results
name weightedCommunity

"Alice"

0

"Bridget"

1

"Charles"

0

"Doug"

1

"Eric"

1

"Fiona"

1

"George"

1

Comparing this result with that of unweighted case we can see that Bridget has moved to another community but the output is otherwise the same. Indeed, this makes sense by looking at our graph. Bridget is connected to nodes of community 1 by eight relationships, but these relationships all have weight 1.0. And although Bridget is only connected to two community 0 nodes, these relationships are of weight 81.0 and 45.0. Moving Bridget back to community 0 would lower the total cut cost of 81.0 + 45.0 - 8 * 1.0 = 118.0. Hence, it does make sense that Bridget is now in community 1. In fact, this is the maximum 2-cut in the weighted case.

Because of the inherent randomness in the Approximate Maximum k-Cut algorithm (unless having concurrency = 1 and fixed randomSeed), running it another time might yield a different solution. For our case here it would be equally plausible to get the inverse solution, i.e. when our community 0 nodes are mapped to community 1 instead, and vice versa. Note however, that for that solution the cut cost would remain the same.

4.4. Stream

In the stream execution mode, the algorithm returns the approximate maximum k-cut for each node. This allows us to inspect the results directly or post-process them in Cypher without any side effects.

For more details on the stream mode in general, see Stream.

The following will run the algorithm in stream mode using default configuration parameters:
CALL gds.alpha.maxkcut.stream('myGraph')
YIELD nodeId, communityId
RETURN gds.util.asNode(nodeId).name AS name, communityId
Table 14. Results
name communityId

"Alice"

0

"Bridget"

0

"Charles"

0

"Doug"

1

"Eric"

1

"Fiona"

1

"George"

1

We can see that the result is what we expect, namely the same as in the mutate unweighted example.

Because of the inherent randomness in the Approximate Maximum k-Cut algorithm (unless having concurrency = 1 and fixed randomSeed), running it another time might yield a different solution. For our case here it would be equally plausible to get the inverse solution, i.e. when our community 0 nodes are mapped to community 1 instead, and vice versa. Note however, that for that solution the cut cost would remain the same.