# FastRP and kNN example

In this example we consider a graph of products and customers, and we want to find new products to recommend for each customer. We want to use the K-Nearest Neighbors algorithm (kNN) to identify similar customers and base our product recommendations on that. In order to be able to leverage topological information about the graph in kNN, we will first create node embeddings using FastRP. These embeddings will then be the input to the kNN algorithm.

For each pair of similar customers we can then recommend products that have been purchased by one of the customers but not the other, using a simple cypher query.

## 1. Graph creation

We will start by creating our graph of products and customers in the database. The `amount` relationship property represents the average weekly amount of money spent by a customer on a given product.

Consider the graph created by the following Cypher statement:
``````CREATE
(dan:Person {name: 'Dan'}),
(annie:Person {name: 'Annie'}),
(matt:Person {name: 'Matt'}),
(jeff:Person {name: 'Jeff'}),
(brie:Person {name: 'Brie'}),
(elsa:Person {name: 'Elsa'}),

(tomatoes:Product {name: 'Tomatoes'}),
(cucumber:Product {name: 'Cucumber'}),
(celery:Product {name: 'Celery'}),
(kale:Product {name: 'Kale'}),
(milk:Product {name: 'Milk'}),
(chocolate:Product {name: 'Chocolate'}),

The graph can be visualized in the following way:

Now we can proceed to project a graph which we can run the algorithms on.

Project a graph called 'purchases' and store it in the graph catalog:
``````CALL gds.graph.project(
'purchases',
['Person','Product'],
{
orientation: 'UNDIRECTED',
properties: 'amount'
}
}
)``````

## 2. FastRP embedding

Now we run the FastRP algorithm to generate node embeddings that capture topological information from the graph. We choose to work with `embeddingDimension` set to 4 which is sufficient since our example graph is very small. The `iterationWeights` are chosen empirically to yield sensible results. Please see the syntax section of the FastRP documentation for more information on these parameters. Since we want to use the embeddings as input when we run kNN later we use FastRP’s `mutate` mode.

Create node embeddings using FastRP:
``````CALL gds.fastRP.mutate('purchases',
{
embeddingDimension: 4,
randomSeed: 42,
mutateProperty: 'embedding',
relationshipWeightProperty: 'amount',
iterationWeights: [0.8, 1, 1, 1]
}
)
YIELD nodePropertiesWritten``````
Table 1. Results
nodePropertiesWritten

13

## 3. Similarities with kNN

Now we can run kNN to identify similar nodes by using the node embeddings that we generated with FastRP as `nodeProperties`. Since we are working with a small graph, we can set `sampleRate` to 1 and `deltaThreshold` to 0 without having to worry about long computation times. The `concurrency` parameter is set to 1 (along with the fixed `randomSeed`) in order to get a deterministic result. Please see the syntax section of the kNN documentation for more information on these parameters. Note that we use the algorithm’s `write` mode to write the properties and relationships back to our database, so that we can analyze them later using Cypher.

Run kNN with FastRP node embeddings as input:
``````CALL gds.knn.write('purchases', {
topK: 2,
nodeProperties: ['embedding'],
randomSeed: 42,
concurrency: 1,
sampleRate: 1.0,
deltaThreshold: 0.0,
writeRelationshipType: "SIMILAR",
writeProperty: "score"
})
YIELD nodesCompared, relationshipsWritten, similarityDistribution
RETURN nodesCompared, relationshipsWritten, similarityDistribution.mean as meanSimilarity``````
Table 2. Results
nodesCompared relationshipsWritten meanSimilarity

13

26

0.917060998769907

As we can see the mean similarity between nodes is quite high. This is due to the fact that we have a small example where there are no long paths between nodes leading to many similar FastRP node embeddings.

## 4. Results exploration

Let us now inspect the results of our kNN call by using Cypher. We can use the `SIMILARITY` relationship type to filter out the relationships we are interested in. And since we just care about similarities between people for our product recommendation engine, we make sure to only match nodes with the `Person` label.

List pairs of people that are similar:
``````MATCH (n:Person)-[r:SIMILAR]->(m:Person)
RETURN n.name as person1, m.name as person2, r.score as similarity
ORDER BY similarity DESCENDING, person1, person2``````
Table 3. Results
person1 person2 similarity

"Annie"

"Matt"

0.983087003231049

"Matt"

"Annie"

0.983087003231049

"Dan"

"Elsa"

0.980300545692444

"Elsa"

"Dan"

0.980300545692444

"Jeff"

"Annie"

0.815471172332764

Our kNN results indicate among other things that the `Person` nodes named "Annie" and "Matt" are very similar. Looking at the `BUYS` relationships for these two nodes we can see that such a conclusion makes sense. They both buy three products, two of which are the same (`Product` nodes named "Cucumber" and "Tomatoes") for both people and with similar amounts. We therefore have high confidence in our approach.

## 5. Making recommendations

Using the information we derived that the `Person` nodes named "Annie" and "Matt" are similar, we can make product recommendations for each of them. Since they are similar, we can assume that products purchased by only one of the people may be of interest to buy also for the other person not already buying the product. By this principle we can derive product recommendations for the `Person` named "Matt" using a simple Cypher query.

Product recommendations for `Person` node with name "Matt":
``````MATCH (:Person {name: "Annie"})-->(p1:Product)
WITH collect(p1) as products
MATCH (:Person {name: "Matt"})-->(p2:Product)
WHERE not p2 in products
RETURN p2.name as recommendation``````
Table 4. Results
recommendation

"Kale"

Indeed, "Kale" is the one product that the `Person` named "Annie" buys that is also not purchased by the `Person` named "Matt".

## 6. Conclusion

Using two GDS algorithms and some basic Cypher we were easily able to derive some sensible product recommendations for a customer in our small example.

To make sure to get similarities to other customers for every customer in our graph with kNN, we could play around with increasing the `topK` parameter.