Similarity functions
Definitions
The Neo4j GDS library provides a set of measures that can be used to calculate similarity between two arrays ps, pt of numbers.
The similarity functions can be classified into two groups.
The first is categorical
measures which treat the arrays as sets and calculate similarity based on the intersection between the two sets.
The second is numerical
measures which compute similarity based on how close the numbers at each position are to each other.
Similarity Function name | Formula | Type | Value range |
---|---|---|---|
|
Categorical |
|
|
|
Categorical |
|
|
|
Numerical |
|
|
|
Numerical |
|
|
|
Numerical |
|
|
|
Numerical |
|
Examples
An example of usage for each function is provided below:
RETURN gds.similarity.jaccard(
[1.0, 5.0, 3.0, 6.7],
[5.0, 2.5, 3.1, 9.0]
) AS jaccardSimilarity
jaccardSimilarity |
---|
0.142857142857143 |
RETURN gds.similarity.overlap(
[1.0, 5.0, 3.0, 6.7],
[5.0, 2.5, 3.1, 9.0]
) AS overlapSimilarity
overlapSimilarity |
---|
0.25 |
RETURN gds.similarity.cosine(
[1.0, 5.0, 3.0, 6.7],
[5.0, 2.5, 3.1, 9.0]
) AS cosineSimilarity
cosineSimilarity |
---|
0.882757381034594 |
RETURN gds.similarity.pearson(
[1.0, 5.0, 3.0, 6.7],
[5.0, 2.5, 3.1, 9.0]
) AS pearsonSimilarity
pearsonSimilarity |
---|
0.468277483648113 |
RETURN gds.similarity.euclidean(
[1.0, 5.0, 3.0, 6.7],
[5.0, 2.5, 3.1, 9.0]
) AS euclideanSimilarity
euclideanSimilarity |
---|
0.160030485454022 |
RETURN gds.similarity.euclideanDistance(
[1.0, 5.0, 3.0, 6.7],
[5.0, 2.5, 3.1, 9.0]
) AS euclideanDistance
euclideanDistance |
---|
5.248809388804284 |
The functions can also compute results when one or more values in the provided vectors are null
.
In the case of functions based on intersection such as Jaccard or Overlap, the null values are excluded from the set and the computation.
In the rest of the functions the null
value is replaced with a 0.0
value.
See the examples below.
RETURN gds.similarity.jaccard(
[1.0, null, 3.0],
[1.0, 2.0, 3.0]
) AS jaccardSimilarity
jaccardSimilarity |
---|
0.666666666666667 |
RETURN gds.similarity.cosine(
[1.0, null, 3.0],
[1.0, 2.0, 3.0]
) AS cosineSimilarity
cosineSimilarity |
---|
0.845154254728517 |