FastRP and kNN example
In this example we consider a graph of products and customers, and we want to find new products to recommend for each customer. We want to use the K-Nearest Neighbors algorithm (kNN) to identify similar customers and base our product recommendations on that. In order to be able to leverage topological information about the graph in kNN, we will first create node embeddings using FastRP. These embeddings will then be the input to the kNN algorithm.
For each pair of similar customers we can then recommend products that have been purchased by one of the customers but not the other, using a simple cypher query.
1. Graph creation
We will start by creating our graph of products and customers in the database.
The amount
relationship property represents the average weekly amount of money spent by a customer on a given product.
CREATE
(dan:Person {name: 'Dan'}),
(annie:Person {name: 'Annie'}),
(matt:Person {name: 'Matt'}),
(jeff:Person {name: 'Jeff'}),
(brie:Person {name: 'Brie'}),
(elsa:Person {name: 'Elsa'}),
(cookies:Product {name: 'Cookies'}),
(tomatoes:Product {name: 'Tomatoes'}),
(cucumber:Product {name: 'Cucumber'}),
(celery:Product {name: 'Celery'}),
(kale:Product {name: 'Kale'}),
(milk:Product {name: 'Milk'}),
(chocolate:Product {name: 'Chocolate'}),
(dan)-[:BUYS {amount: 1.2}]->(cookies),
(dan)-[:BUYS {amount: 3.2}]->(milk),
(dan)-[:BUYS {amount: 2.2}]->(chocolate),
(annie)-[:BUYS {amount: 1.2}]->(cucumber),
(annie)-[:BUYS {amount: 3.2}]->(milk),
(annie)-[:BUYS {amount: 3.2}]->(tomatoes),
(matt)-[:BUYS {amount: 3}]->(tomatoes),
(matt)-[:BUYS {amount: 2}]->(kale),
(matt)-[:BUYS {amount: 1}]->(cucumber),
(jeff)-[:BUYS {amount: 3}]->(cookies),
(jeff)-[:BUYS {amount: 2}]->(milk),
(brie)-[:BUYS {amount: 1}]->(tomatoes),
(brie)-[:BUYS {amount: 2}]->(milk),
(brie)-[:BUYS {amount: 2}]->(kale),
(brie)-[:BUYS {amount: 3}]->(cucumber),
(brie)-[:BUYS {amount: 0.3}]->(celery),
(elsa)-[:BUYS {amount: 3}]->(chocolate),
(elsa)-[:BUYS {amount: 3}]->(milk);
The graph can be visualized in the following way:

Now we can proceed to project a graph which we can run the algorithms on.
CALL gds.graph.project(
'purchases',
['Person','Product'],
{
BUYS: {
orientation: 'UNDIRECTED',
properties: 'amount'
}
}
)
2. FastRP embedding
Now we run the FastRP algorithm to generate node embeddings that capture topological information from the graph.
We choose to work with embeddingDimension
set to 4 which is sufficient since our example graph is very small.
The iterationWeights
are chosen empirically to yield sensible results.
Please see the syntax section of the FastRP documentation for more information on these parameters.
Since we want to use the embeddings as input when we run kNN later we use FastRP’s mutate
mode.
CALL gds.fastRP.mutate('purchases',
{
embeddingDimension: 4,
randomSeed: 42,
mutateProperty: 'embedding',
relationshipWeightProperty: 'amount',
iterationWeights: [0.8, 1, 1, 1]
}
)
YIELD nodePropertiesWritten
nodePropertiesWritten |
---|
13 |
3. Similarities with kNN
Now we can run kNN to identify similar nodes by using the node embeddings that we generated with FastRP as nodeProperties
.
Since we are working with a small graph, we can set sampleRate
to 1 and deltaThreshold
to 0 without having to worry about long computation times.
The concurrency
parameter is set to 1 (along with the fixed randomSeed
) in order to get a deterministic result.
Please see the syntax section of the kNN documentation for more information on these parameters.
Note that we use the algorithm’s write
mode to write the properties and relationships back to our database, so that we can analyze them later using Cypher.
CALL gds.knn.write('purchases', {
topK: 2,
nodeProperties: ['embedding'],
randomSeed: 42,
concurrency: 1,
sampleRate: 1.0,
deltaThreshold: 0.0,
writeRelationshipType: "SIMILAR",
writeProperty: "score"
})
YIELD nodesCompared, relationshipsWritten, similarityDistribution
RETURN nodesCompared, relationshipsWritten, similarityDistribution.mean as meanSimilarity
nodesCompared | relationshipsWritten | meanSimilarity |
---|---|---|
13 |
26 |
0.917060998769907 |
As we can see the mean similarity between nodes is quite high. This is due to the fact that we have a small example where there are no long paths between nodes leading to many similar FastRP node embeddings.
4. Results exploration
Let us now inspect the results of our kNN call by using Cypher.
We can use the SIMILARITY
relationship type to filter out the relationships we are interested in.
And since we just care about similarities between people for our product recommendation engine, we make sure to only match nodes with the Person
label.
MATCH (n:Person)-[r:SIMILAR]->(m:Person)
RETURN n.name as person1, m.name as person2, r.score as similarity
ORDER BY similarity DESCENDING, person1, person2
person1 | person2 | similarity |
---|---|---|
"Annie" |
"Matt" |
0.983087003231049 |
"Matt" |
"Annie" |
0.983087003231049 |
"Dan" |
"Elsa" |
0.980300545692444 |
"Elsa" |
"Dan" |
0.980300545692444 |
"Jeff" |
"Annie" |
0.815471172332764 |
Our kNN results indicate among other things that the Person
nodes named "Annie" and "Matt" are very similar.
Looking at the BUYS
relationships for these two nodes we can see that such a conclusion makes sense.
They both buy three products, two of which are the same (Product
nodes named "Cucumber" and "Tomatoes") for both people and with similar amounts.
We therefore have high confidence in our approach.
5. Making recommendations
Using the information we derived that the Person
nodes named "Annie" and "Matt" are similar, we can make product recommendations for each of them.
Since they are similar, we can assume that products purchased by only one of the people may be of interest to buy also for the other person not already buying the product.
By this principle we can derive product recommendations for the Person
named "Matt" using a simple Cypher query.
Person
node with name "Matt":MATCH (:Person {name: "Annie"})-->(p1:Product)
WITH collect(p1) as products
MATCH (:Person {name: "Matt"})-->(p2:Product)
WHERE not p2 in products
RETURN p2.name as recommendation
recommendation |
---|
"Kale" |
Indeed, "Kale" is the one product that the Person
named "Annie" buys that is also not purchased by the Person
named "Matt".
6. Conclusion
Using two GDS algorithms and some basic Cypher we were easily able to derive some sensible product recommendations for a customer in our small example.
To make sure to get similarities to other customers for every customer in our graph with kNN, we could play around with increasing the topK
parameter.