# Similarity functions

## Definitions

The Neo4j GDS library provides a set of measures that can be used to calculate similarity between two arrays ps, pt of numbers.

The similarity functions can be classified into two groups. The first is `categorical` measures which treat the arrays as sets and calculate similarity based on the intersection between the two sets. The second is `numerical` measures which compute similarity based on how close the numbers at each position are to each other.

Similarity Function name Formula Type Value range

`gds.similarity.jaccard`

Categorical

`[0,1]`

`gds.similarity.overlap`

Categorical

`[0, 1]`

`gds.similarity.cosine`

Numerical

`[-1, 1]`

`gds.similarity.pearson`

Numerical

`[-1, 1]`

`gds.similarity.euclideanDistance`

Numerical

`[0, ∞)`

`gds.similarity.euclidean`

Numerical

`(0, 1]`

## Examples

An example of usage for each function is provided below:

Jaccard similarity function
``````RETURN gds.similarity.jaccard(
[1.0, 5.0, 3.0, 6.7],
[5.0, 2.5, 3.1, 9.0]
) AS jaccardSimilarity``````
Table 1. Results
jaccardSimilarity

0.142857142857143

Overlap similarity function
``````RETURN gds.similarity.overlap(
[1.0, 5.0, 3.0, 6.7],
[5.0, 2.5, 3.1, 9.0]
) AS overlapSimilarity``````
Table 2. Results
overlapSimilarity

0.25

Cosine similarity function
``````RETURN gds.similarity.cosine(
[1.0, 5.0, 3.0, 6.7],
[5.0, 2.5, 3.1, 9.0]
) AS cosineSimilarity``````
Table 3. Results
cosineSimilarity

0.882757381034594

Pearson similarity function
``````RETURN gds.similarity.pearson(
[1.0, 5.0, 3.0, 6.7],
[5.0, 2.5, 3.1, 9.0]
) AS pearsonSimilarity``````
Table 4. Results
pearsonSimilarity

0.468277483648113

Euclidean similarity function
``````RETURN gds.similarity.euclidean(
[1.0, 5.0, 3.0, 6.7],
[5.0, 2.5, 3.1, 9.0]
)  AS euclideanSimilarity``````
Table 5. Results
euclideanSimilarity

0.160030485454022

Euclidean distance function
``````RETURN gds.similarity.euclideanDistance(
[1.0, 5.0, 3.0, 6.7],
[5.0, 2.5, 3.1, 9.0]
) AS euclideanDistance``````
Table 6. Results
euclideanDistance

5.248809388804284

The functions can also compute results when one or more values in the provided vectors are `null`. In the case of functions based on intersection such as Jaccard or Overlap, the null values are excluded from the set and the computation. In the rest of the functions the `null` value is replaced with a `0.0` value. See the examples below.

Jaccard with null values
``````RETURN gds.similarity.jaccard(
[1.0, null, 3.0],
[1.0, 2.0, 3.0]
) AS jaccardSimilarity``````
Table 7. Results
jaccardSimilarity

0.666666666666667

Cosine with null values
``````RETURN gds.similarity.cosine(
[1.0, null, 3.0],
[1.0, 2.0, 3.0]
) AS cosineSimilarity``````
Table 8. Results
cosineSimilarity

0.845154254728517