CELF
This algorithm is in the alpha tier. For more information on algorithm tiers, see Graph algorithms.
This topic includes:
1. Introduction
The CELF algorithm for influence maximization aims to find k
nodes that maximize the expected spread of influence in the network.
It simulates the influence spread using the Independent Cascade model, which calculates the expected spread by taking the average spread over the mc
Monte-Carlo simulations.
In the propagation process, a node is influenced in case that a uniform random draw is less than the probability p
.
Leskovec et al. 2007 introduced the CELF algorithm in their study Cost-effective Outbreak Detection in Networks to deal with the NP-hard problem of influence maximization. The CELF algorithm is based on a "lazy-forward" optimization. Τhe CELF algorithm dramatically improves the efficiency of the Greedy algorithm and should be preferred for large networks.
2. Syntax
This algorithm is in the alpha tier. For more information on algorithm tiers, see Graph algorithms.
CALL gds.alpha.influenceMaximization.celf.stream(
graphName: String,
configuration: Map
)
YIELD
nodeId: Integer,
spread: Float
Name | Type | Default | Optional | Description |
---|---|---|---|---|
graphName |
String |
|
no |
The name of a graph stored in the catalog. |
configuration |
Map |
|
yes |
Configuration for algorithm-specifics and/or graph filtering. |
Name | Type | Default | Optional | Description |
---|---|---|---|---|
List of String |
|
yes |
Filter the named graph using the given node labels. |
|
List of String |
|
yes |
Filter the named graph using the given relationship types. |
|
Integer |
|
yes |
The number of concurrent threads used for running the algorithm. |
|
String |
|
yes |
An ID that can be provided to more easily track the algorithm’s progress. |
|
seedSetSize |
Integer |
|
no |
The number of nodes that maximize the expected spread in the network. |
monteCarloSimulations |
Integer |
|
yes |
The number of Monte-Carlo simulations. |
propagationProbability |
Float |
|
yes |
The probability of a node being activated by an active neighbour node. |
Name | Type | Description |
---|---|---|
nodeId |
Integer |
Node ID. |
spread |
Float |
The spread gained by selecting the node. |
CALL gds.alpha.influenceMaximization.celf.stats(
graphName: String,
configuration: Map
)
YIELD
nodes: Integer,
computeMillis: Integer,
Name | Type | Default | Optional | Description |
---|---|---|---|---|
graphName |
String |
|
no |
The name of a graph stored in the catalog. |
configuration |
Map |
|
yes |
Configuration for algorithm-specifics and/or graph filtering. |
Name | Type | Default | Optional | Description |
---|---|---|---|---|
List of String |
|
yes |
Filter the named graph using the given node labels. |
|
List of String |
|
yes |
Filter the named graph using the given relationship types. |
|
Integer |
|
yes |
The number of concurrent threads used for running the algorithm. |
|
String |
|
yes |
An ID that can be provided to more easily track the algorithm’s progress. |
|
seedSetSize |
Integer |
|
no |
The number of nodes that maximize the expected spread in the network. |
monteCarloSimulations |
Integer |
|
yes |
The number of Monte-Carlo simulations. |
propagationProbability |
Float |
|
yes |
The probability of a node being activated by an active neighbour node. |
Name | Type | Description |
---|---|---|
nodes |
Integer |
The number of nodes in the graph. |
computeMillis |
Integer |
Milliseconds for running the algorithm. |
3. Examples
In this section we will show examples of running the CELF algorithm on a concrete graph. The intention is to illustrate what the results look like and to provide a guide in how to make use of the algorithm in a real setting. We will do this on a small social network graph of a handful nodes connected in a particular pattern. The example graph looks like this:
CREATE
(a:Person {name: 'Jimmy'}),
(b:Person {name: 'Jack'}),
(c:Person {name: 'Alice'}),
(d:Person {name: 'Ceri'}),
(e:Person {name: 'Mohammed'}),
(f:Person {name: 'Michael'}),
(g:Person {name: 'Ethan'}),
(h:Person {name: 'Lara'}),
(i:Person {name: 'Amir'}),
(j:Person {name: 'Willie'}),
(b)-[:FRIEND_OF]->(c),
(c)-[:FRIEND_OF]->(a),
(c)-[:FRIEND_OF]->(g),
(c)-[:FRIEND_OF]->(h),
(c)-[:FRIEND_OF]->(i),
(c)-[:FRIEND_OF]->(j),
(d)-[:FRIEND_OF]->(g),
(f)-[:FRIEND_OF]->(e),
(f)-[:FRIEND_OF]->(g),
(g)-[:FRIEND_OF]->(a),
(g)-[:FRIEND_OF]->(b),
(g)-[:FRIEND_OF]->(h),
(g)-[:FRIEND_OF]->(e),
(h)-[:FRIEND_OF]->(i);
In the example, we will use the CELF algorithm to find k
nodes subset.
CALL gds.graph.project(
'myGraph',
'Person',
'FRIEND_OF'
);
In the following examples we will demonstrate using the CELF algorithm on this graph.
3.1. Stream
In the stream
execution mode, the algorithm returns the spread for each node.
This allows us to inspect the results directly or post-process them in Cypher without any side effects.
For more details on the stream
mode in general, see Stream.
CALL gds.alpha.influenceMaximization.celf.stream('myGraph', {seedSetSize: 3, concurrency: 4})
YIELD nodeId, spread
RETURN gds.util.asNode(nodeId).name AS Name, spread
ORDER BY spread ASC
Name | spread |
---|---|
"Alice" |
1.519 |
"Ethan" |
2.701 |
"Michael" |
3.8 |
Was this page helpful?