Greedy

This section describes the Greedy influence maximization algorithm in the Neo4j Graph Data Science library.

This algorithm is in the alpha tier. For more information on algorithm tiers, see Algorithms.

This topic includes:

1. Introduction

The Greedy algorithm for influence maximization aims to find k nodes that maximize the expected spread of influence in a network. It simulates the influence spread using the Independent Cascade model, which calculates the expected spread by taking the average spread over the mc Monte-Carlo simulations. In the propagation process, a node is influenced in case that a uniform random draw is less than the probability p.

Kempe et al. 2003 introduced the Greedy algorithm in their study Maximizing the Spread of Influence through a Social Network to deal with the NP-hard problem of influence maximization. The Greedy algorithm successively selecting the node within the maximum marginal gain approximation in polynomial time. For large networks CELF algorithm should be used.

2. Syntax

This algorithm is in the alpha tier. For more information on algorithm tiers, see Algorithms.

Example 1. Greedy syntax per mode
Run Greedy in stream mode on a named graph.
CALL gds.alpha.influenceMaximization.greedy.stream(
  graphName: String,
  configuration: Map
)
YIELD
  nodeId: Integer,
  spread : Float
Table 1. Parameters
Name Type Default Optional Description

graphName

String

n/a

no

The name of a graph stored in the catalog.

configuration

Map

{}

yes

Configuration for algorithm-specifics and/or graph filtering.

Table 2. General configuration for algorithm execution on a named graph.
Name Type Default Optional Description

nodeLabels

String[]

['*']

yes

Filter the named graph using the given node labels.

relationshipTypes

String[]

['*']

yes

Filter the named graph using the given relationship types.

concurrency

Integer

4

yes

The number of concurrent threads used for running the algorithm.

Table 3. Algorithm specific configuration
Name Type Default Optional Description

seedSetSize

Integer

n/a

no

The number of nodes that maximize the expected spread in the network.

monteCarloSimulations

Integer

1000

yes

The number of Monte-Carlo simulations.

propagationProbability

Float

0.1

yes

The probability of a node being activated by an active neighbour node.

Table 4. Results
Name Type Description

nodeId

Integer

Node ID.

spread

Float

The spread gained by selecting the node.

Run Greedy in stats mode on a named graph.
CALL gds.alpha.influenceMaximization.greedy.stats(
  graphName: String,
  configuration: Map
)
YIELD
  nodes: Integer,
  computeMillis: Integer,
Table 5. Parameters
Name Type Default Optional Description

graphName

String

n/a

no

The name of a graph stored in the catalog.

configuration

Map

{}

yes

Configuration for algorithm-specifics and/or graph filtering.

Table 6. General configuration for algorithm execution on a named graph.
Name Type Default Optional Description

nodeLabels

String[]

['*']

yes

Filter the named graph using the given node labels.

relationshipTypes

String[]

['*']

yes

Filter the named graph using the given relationship types.

concurrency

Integer

4

yes

The number of concurrent threads used for running the algorithm.

Table 7. Algorithm specific configuration
Name Type Default Optional Description

seedSetSize

Integer

n/a

no

The number of nodes that maximize the expected spread in the network.

monteCarloSimulations

Integer

1000

yes

The number of Monte-Carlo simulations.

propagationProbability

Float

0.1

yes

The probability of a node being activated by an active neighbour node.

Table 8. Results
Name Type Description

nodes

Integer

The number of nodes in the graph.

computeMillis

Integer

Milliseconds for running the algorithm.

3. Examples

In this section we will show examples of running the Greedy algorithm on a concrete graph. The intention is to illustrate what the results look like and to provide a guide in how to make use of the algorithm in a real setting. We will do this on a small social network graph of a handful nodes connected in a particular pattern. The example graph looks like this:

Visualization of the example graph
The following Cypher statement will create the example graph in the Neo4j database:
CREATE
  (a:Person {name: 'Jimmy'}),
  (b:Person {name: 'Jack'}),
  (c:Person {name: 'Alice'}),
  (d:Person {name: 'Ceri'}),
  (e:Person {name: 'Mohammed'}),
  (f:Person {name: 'Michael'}),
  (g:Person {name: 'Ethan'}),
  (h:Person {name: 'Lara'}),
  (i:Person {name: 'Amir'}),
  (j:Person {name: 'Willie'}),

  (b)-[:FRIEND_OF]->(c),
  (c)-[:FRIEND_OF]->(a),
  (c)-[:FRIEND_OF]->(g),
  (c)-[:FRIEND_OF]->(h),
  (c)-[:FRIEND_OF]->(i),
  (c)-[:FRIEND_OF]->(j),
  (d)-[:FRIEND_OF]->(g),
  (f)-[:FRIEND_OF]->(e),
  (f)-[:FRIEND_OF]->(g),
  (g)-[:FRIEND_OF]->(a),
  (g)-[:FRIEND_OF]->(b),
  (g)-[:FRIEND_OF]->(h),
  (g)-[:FRIEND_OF]->(e),
  (h)-[:FRIEND_OF]->(i);

In the example, we will use the Greedy algorithm to find k nodes subset.

The following statement will create the graph and store it in the graph catalog.
CALL gds.graph.create(
  'myGraph',
  'Person',
  'FRIEND_OF'
);

In the following examples we will demonstrate using the Greedy algorithm on this graph.

3.1. Stream

In the stream execution mode, the algorithm returns the spread for each node. This allows us to inspect the results directly or post-process them in Cypher without any side effects.

For more details on the stream mode in general, see Stream.

The following will run the algorithm, and stream results:
CALL gds.alpha.influenceMaximization.greedy.stream('myGraph', {seedSetSize: 3, concurrency: 4})
YIELD nodeId, spread
RETURN gds.util.asNode(nodeId).name AS Name, spread
ORDER BY spread ASC
Table 9. Results
Name spread

"Alice"

1.519

"Ethan"

2.701

"Michael"

3.8

3 rows