Duplicate Check in Cypher

I answered a question on StackOverflow that required a duplicate check for a collection.

This would be easy with an isUnique(coll) in cypher or a to_set(coll) / uniq(coll) function to allow an expression like size(to_set(coll)) = size(coll).

But neither is there, so we need a tiny algorithm to solve it.

One solution is: Iterate over a collection and check if the current element is contained in the rest of the collection.

With Cypher we can use reduce and CASE expressions.

The accumulator holds the rest of the collection and x is the current element. We shortcut the execution by returning NULL in the duplicate case. Otherwise when the IN check does not succeed we return the rest of the collection to be the new accumulator.

WITH [1,2,3] AS coll
RETURN reduce(a=coll, x IN coll |
              CASE WHEN a IS NULL OR x IN tail(a) THEN NULL ELSE tail(a) END ) IS NOT NULL as is_unique

WITH [1,2,3,1] AS coll
RETURN reduce(a=coll, x IN coll |
              CASE WHEN a IS NULL OR x IN tail(a) THEN NULL ELSE tail(a) END ) IS NOT NULL as is_unique

Chris Leishman posted a nice solution for simulating the unique function:

WITH [1,2,3,1] AS coll
RETURN reduce(a=[], x IN coll | CASE WHEN x IN a THEN a ELSE a + x END) as unique

Is this page helpful?

GraphGists

Duplicate Check in Cypher