This section describes the Approximate Nearest Neighbors algorithm in the Neo4j Labs Graph Algorithms library.

The Approximate Nearest Neighbors algorithm was developed by the Neo4j Labs team and is not officially supported. |

The Approximate Nearest Neighbors algorithm constructs a k-Nearest Neighbors Graph for a set of objects based on a provided similarity algorithm. The similarity of items is computed based on Jaccard Similarity, Cosine Similarity, Euclidean Distance, or Pearson Similarity.

The implementation in the library is based on Dong, Charikar, and Li’s paper Efficient K-Nearest Neighbor Graph Construction for Generic Similarity Measures.

This section includes:

We can use the Approximate Nearest Neighbors algorithm to work out the approximate k most similar items to each other. The corresponding k-Nearest Neighbors Graph can then be used as part of recommendation queries.

The following will create a sample graph:

```
MERGE (french:Cuisine {name:'French'})
MERGE (italian:Cuisine {name:'Italian'})
MERGE (indian:Cuisine {name:'Indian'})
MERGE (lebanese:Cuisine {name:'Lebanese'})
MERGE (portuguese:Cuisine {name:'Portuguese'})
MERGE (zhen:Person {name: "Zhen"})
MERGE (praveena:Person {name: "Praveena"})
MERGE (michael:Person {name: "Michael"})
MERGE (arya:Person {name: "Arya"})
MERGE (karin:Person {name: "Karin"})
MERGE (shrimp:Recipe {title: "Shrimp Bolognese"})
MERGE (saltimbocca:Recipe {title: "Saltimbocca alla roman"})
MERGE (periperi:Recipe {title: "Peri Peri Naan"})
MERGE (praveena)-[:LIKES]->(indian)
MERGE (praveena)-[:LIKES]->(portuguese)
MERGE (zhen)-[:LIKES]->(french)
MERGE (zhen)-[:LIKES]->(indian)
MERGE (michael)-[:LIKES]->(french)
MERGE (michael)-[:LIKES]->(italian)
MERGE (michael)-[:LIKES]->(indian)
MERGE (arya)-[:LIKES]->(lebanese)
MERGE (arya)-[:LIKES]->(italian)
MERGE (arya)-[:LIKES]->(portuguese)
MERGE (karin)-[:LIKES]->(lebanese)
MERGE (karin)-[:LIKES]->(italian)
MERGE (shrimp)-[:TYPE]->(italian)
MERGE (shrimp)-[:TYPE]->(indian)
MERGE (saltimbocca)-[:TYPE]->(italian)
MERGE (saltimbocca)-[:TYPE]->(french)
MERGE (periperi)-[:TYPE]->(portuguese)
MERGE (periperi)-[:TYPE]->(indian)
```

The following will return a stream of nodes, along with up to the 3 most similar nodes to them based on Jaccard Similarity:

```
MATCH (p:Person)-[:LIKES]->(cuisine)
WITH {item:id(p), categories: collect(id(cuisine))} as userData
WITH collect(userData) as data
CALL algo.labs.ml.ann.stream("jaccard", data, {similarityCutoff: 0.1})
YIELD item1, item2, similarity
return algo.asNode(item1).name AS from, algo.asNode(item2).name AS to, similarity
ORDER BY from
```

From | To | Similarity |
---|---|---|

"Arya" |
"Karin" |
0.6666666666666666 |

"Arya" |
"Praveena" |
0.25 |

"Arya" |
"Michael" |
0.2 |

"Karin" |
"Arya" |
0.6666666666666666 |

"Karin" |
"Michael" |
0.25 |

"Michael" |
"Karin" |
0.25 |

"Michael" |
"Praveena" |
0.25 |

"Michael" |
"Arya" |
0.2 |

"Praveena" |
"Zhen" |
0.3333333333333333 |

"Praveena" |
"Arya" |
0.25 |

"Praveena" |
"Michael" |
0.25 |

"Zhen" |
"Michael" |
0.6666666666666666 |

"Zhen" |
"Praveena" |
0.3333333333333333 |

Arya and Karin, and Zhen and Michael have the most similar food preferences, with two overlapping cuisines for a similarity
of 0.66.
We also have 3 pairs of users who are not similar at all.
We’d probably want to filter those out, which we can do by passing in the `similarityCutoff`

parameter.

The following will find up to 3 similar users for each user, and store a relationship between those users:

```
MATCH (p:Person)-[:LIKES]->(cuisine)
WITH {item:id(p), categories: collect(id(cuisine))} as userData
WITH collect(userData) as data
CALL algo.labs.ml.ann("jaccard", data, {similarityCutoff: 0.1, showComputations: true, write: true})
YIELD nodes, similarityPairs, write, writeRelationshipType, writeProperty, min, max, mean, p95
RETURN nodes, similarityPairs, write, writeRelationshipType, writeProperty, min, max, mean, p95
```

`nodes` |
`similarityPairs` |
`write` |
`writeRelationshipType` |
`writeProperty` |
`min` |
`max` |
`mean` |
`p95` |
---|---|---|---|---|---|---|---|---|

5 |
13 |
true |
SIMILAR |
score |
0.19999980926513672 |
0.6666669845581055 |
0.3512822664701022 |
0.6666669845581055 |

We then could write a query to find out what types of cuisine that other people similar to us might like.

The following will find the most similar user to Praveena, and return their favorite cuisines that Praveena doesn’t (yet!) like:

```
MATCH (p:Person {name: "Praveena"})-[:SIMILAR]->(other),
(other)-[:LIKES]->(cuisine)
WHERE not((p)-[:LIKES]->(cuisine))
RETURN cuisine.name AS cuisine, count(*) AS count
ORDER BY count DESC
```

`cuisine` |
count |
---|---|

Italian |
2 |

French |
2 |

Lebanese |
1 |

The following will run the algorithm and write back results:

```
CALL algo.labs.ml.ann(userData:List<Map>, {
topK: 1, similarityCutoff: 0.1, write:true, writeProperty: "score"
})
YIELD nodes, similarityPairs, write, writeRelationshipType, writeProperty, min, max, mean, stdDev, p25, p50, p75, p90, p95, p99, p999, p100
```

Name | Type | Default | Optional | Description |
---|---|---|---|---|

'algorithm' |
string |
null |
no |
The similarity algorithm to use. Valid values: jaccard', 'cosine', 'pearson', 'euclidean'. |

'data' |
list |
null |
no |
If algorithm is 'jaccard', a list of maps of the following structure: |

'top' |
int |
0 |
yes |
The number of similar pairs to return. If |

'topK' |
int |
3 |
yes |
The number of similar values to return per node. If |

'similarityCutoff' |
int |
-1 |
yes |
The threshold for similarity. Values below this will not be returned. |

'degreeCutoff' |
int |
0 |
yes |
The threshold for the number of items in the |

concurrency |
int |
available CPUs |
yes |
The number of concurrent threads used for running the algorithm. Also provides the default value for 'readConcurrency' and 'writeConcurrency'. |

'readConcurrency' |
int |
value of 'concurrency' |
yes |
The number of concurrent threads used for reading the graph. |

'writeConcurrency' |
int |
value of 'concurrency' |
yes |
The number of concurrent threads used for writing the result. |

'write' |
boolean |
false |
yes |
Indicates whether results should be stored. |

'writeBatchSize' |
int |
10000 |
yes |
The batch size to use when storing results. |

'writeRelationshipType' |
string |
SIMILAR |
yes |
The relationship type to use when storing results. |

'writeProperty' |
string |
score |
yes |
The property to use when storing results. |

Name | Type | Description |
---|---|---|

'nodes' |
int |
The number of nodes passed in. |

'similarityPairs' |
int |
The number of pairs of similar nodes computed. |

'write' |
boolean |
Indicates whether results were stored. |

'writeRelationshipType' |
string |
The relationship type used when storing results. |

'writeProperty' |
string |
The property used when storing results. |

'min' |
double |
The minimum similarity score computed. |

'max' |
double |
The maximum similarity score computed. |

'mean' |
double |
The mean of similarities scores computed. |

'stdDev' |
double |
The standard deviation of similarities scores computed. |

'p25' |
double |
The 25 percentile of similarities scores computed. |

'p50' |
double |
The 50 percentile of similarities scores computed. |

'p75' |
double |
The 75 percentile of similarities scores computed. |

'p90' |
double |
The 90 percentile of similarities scores computed. |

'p95' |
double |
The 95 percentile of similarities scores computed. |

'p99' |
double |
The 99 percentile of similarities scores computed. |

'p999' |
double |
The 99.9 percentile of similarities scores computed. |

'p100' |
double |
The 25 percentile of similarities scores computed. |

The following will run the algorithm and stream results:

```
CALL algo.labs.ml.ann.stream(userData:List<Map>, {
degreeCutoff: 10, similarityCutoff: 0.1, concurrency:4
})
YIELD item1, item2, count1, count2, intersection, similarity
```

Name | Type | Default | Optional | Description |
---|---|---|---|---|

'algorithm' |
string |
null |
no |
The similarity algorithm to use. Valid values: jaccard', 'cosine', 'pearson', 'euclidean' |

'data' |
list |
null |
no |
If algorithm is 'jaccard', a list of maps of the following structure: |

'top' |
int |
0 |
yes |
The number of similar pairs to return. If |

'topK' |
int |
3 |
yes |
The number of similar values to return per node. If |

'similarityCutoff' |
int |
-1 |
yes |
The threshold for similarity. Values below this will not be returned. |

'degreeCutoff' |
int |
0 |
yes |
The threshold for the number of items in the |

concurrency |
int |
available CPUs |
yes |
The number of concurrent threads used for running the algorithm. Also provides the default value for 'readConcurrency'. |

'readConcurrency' |
int |
value of 'concurrency' |
yes |
The number of concurrent threads used for reading the graph. |

Name | Type | Description |
---|---|---|

'item1' |
int |
The ID of one node in the similarity pair. |

'item2' |
int |
The ID of other node in the similarity pair. |

'count1' |
int |
The size of the |

'count2' |
int |
The size of the |

'intersection' |
int |
The number of intersecting values in the two nodes |

'similarity' |
int |
The similarity of the two nodes. |