CELF

This section describes the Cost Effective Lazy Forward (CELF) influence maximization algorithm in the Neo4j Graph Data Science library.

This algorithm is in the alpha tier. For more information on algorithm tiers, see Algorithms.

This topic includes:

1. Introduction

The CELF algorithm for influence maximization aims to find k nodes that maximize the expected spread of influence in the network. It simulates the influence spread using the Independent Cascade model, which calculates the expected spread by taking the average spread over the mc Monte-Carlo simulations. In the propagation process, a node is influenced in case that a uniform random draw is less than the probability p.

Leskovec et al. 2007 introduced the CELF algorithm in their study Cost-effective Outbreak Detection in Networks to deal with the NP-hard problem of influence maximization. The CELF algorithm is based on a "lazy-forward" optimization. Τhe CELF algorithm dramatically improves the efficiency of the Greedy algorithm and should be preferred for large networks.

2. Syntax

This algorithm is in the alpha tier. For more information on algorithm tiers, see Algorithms.

CELF syntax per mode
Run CELF in stream mode on a named graph.
CALL gds.alpha.influenceMaximization.celf.stream(
  graphName: String,
  configuration: Map
)
YIELD
  nodeId: Integer,
  spread: Float
Table 1. Parameters
Name Type Default Optional Description

graphName

String

n/a

no

The name of a graph stored in the catalog.

configuration

Map

{}

yes

Configuration for algorithm-specifics and/or graph filtering.

Table 2. General configuration for algorithm execution on a named graph.
Name Type Default Optional Description

nodeLabels

List of String

['*']

yes

Filter the named graph using the given node labels.

relationshipTypes

List of String

['*']

yes

Filter the named graph using the given relationship types.

concurrency

Integer

4

yes

The number of concurrent threads used for running the algorithm.

Table 3. Algorithm specific configuration
Name Type Default Optional Description

seedSetSize

Integer

n/a

no

The number of nodes that maximize the expected spread in the network.

monteCarloSimulations

Integer

1000

yes

The number of Monte-Carlo simulations.

propagationProbability

Float

0.1

yes

The probability of a node being activated by an active neighbour node.

Table 4. Results
Name Type Description

nodeId

Integer

Node ID.

spread

Float

The spread gained by selecting the node.

Run CELF in stats mode on a named graph.
CALL gds.alpha.influenceMaximization.celf.stats(
  graphName: String,
  configuration: Map
)
YIELD
  nodes: Integer,
  computeMillis: Integer,
Table 5. Parameters
Name Type Default Optional Description

graphName

String

n/a

no

The name of a graph stored in the catalog.

configuration

Map

{}

yes

Configuration for algorithm-specifics and/or graph filtering.

Table 6. General configuration for algorithm execution on a named graph.
Name Type Default Optional Description

nodeLabels

List of String

['*']

yes

Filter the named graph using the given node labels.

relationshipTypes

List of String

['*']

yes

Filter the named graph using the given relationship types.

concurrency

Integer

4

yes

The number of concurrent threads used for running the algorithm.

Table 7. Algorithm specific configuration
Name Type Default Optional Description

seedSetSize

Integer

n/a

no

The number of nodes that maximize the expected spread in the network.

monteCarloSimulations

Integer

1000

yes

The number of Monte-Carlo simulations.

propagationProbability

Float

0.1

yes

The probability of a node being activated by an active neighbour node.

Table 8. Results
Name Type Description

nodes

Integer

The number of nodes in the graph.

computeMillis

Integer

Milliseconds for running the algorithm.

3. Examples

In this section we will show examples of running the CELF algorithm on a concrete graph. The intention is to illustrate what the results look like and to provide a guide in how to make use of the algorithm in a real setting. We will do this on a small social network graph of a handful nodes connected in a particular pattern. The example graph looks like this:

Visualization of the example graph
The following Cypher statement will create the example graph in the Neo4j database:
CREATE
  (a:Person {name: 'Jimmy'}),
  (b:Person {name: 'Jack'}),
  (c:Person {name: 'Alice'}),
  (d:Person {name: 'Ceri'}),
  (e:Person {name: 'Mohammed'}),
  (f:Person {name: 'Michael'}),
  (g:Person {name: 'Ethan'}),
  (h:Person {name: 'Lara'}),
  (i:Person {name: 'Amir'}),
  (j:Person {name: 'Willie'}),

  (b)-[:FRIEND_OF]->(c),
  (c)-[:FRIEND_OF]->(a),
  (c)-[:FRIEND_OF]->(g),
  (c)-[:FRIEND_OF]->(h),
  (c)-[:FRIEND_OF]->(i),
  (c)-[:FRIEND_OF]->(j),
  (d)-[:FRIEND_OF]->(g),
  (f)-[:FRIEND_OF]->(e),
  (f)-[:FRIEND_OF]->(g),
  (g)-[:FRIEND_OF]->(a),
  (g)-[:FRIEND_OF]->(b),
  (g)-[:FRIEND_OF]->(h),
  (g)-[:FRIEND_OF]->(e),
  (h)-[:FRIEND_OF]->(i);

In the example, we will use the CELF algorithm to find k nodes subset.

The following statement will create the graph and store it in the graph catalog.
CALL gds.graph.create(
  'myGraph',
  'Person',
  'FRIEND_OF'
);

In the following examples we will demonstrate using the CELF algorithm on this graph.

3.1. Stream

In the stream execution mode, the algorithm returns the spread for each node. This allows us to inspect the results directly or post-process them in Cypher without any side effects.

For more details on the stream mode in general, see Stream.

The following will run the algorithm, and stream results:
CALL gds.alpha.influenceMaximization.celf.stream('myGraph', {seedSetSize: 3, concurrency: 4})
YIELD nodeId, spread
RETURN gds.util.asNode(nodeId).name AS Name, spread
ORDER BY spread ASC
Table 9. Results
Name spread

"Alice"

1.519

"Ethan"

2.701

"Michael"

3.8