Preview

Skip Lists in Cypher

Skip lists are my new favorite probabilistic sorted data structure. O(log N) for most operations (insert/delete/find/range!). If you don’t know what they are, for the rest of the explanation I’ll assume that you’ve at least read the wikipedia page. You take a random number and calculate the number of levels a particular node should have. In this case, our max level is 3. To calculate the random level, we’ll take the \(\large \log _.5 \left( rand() \right) \) which gives us about a 1/2 chance for our node to have just 1 level, and a 1/4 chance for our node to have just 2 levels, and a 1/8 chance for our node to have 3 levels. In a normal skip list, you’d want several more levels, but we’ll keep it down to 3 for the purpose of this experiment.

From wikipedia (shows a skip list structure):

In Neo, we don’t have to maintain special lists of pointers, because relationships handle that pretty well. In this, I’ve used separate relationship types for each level:

Unfortunately, Cypher doesn’t have a longestPath() function that efficiently finds the longest path that meets certain criteria. So I’m having to do ORDER BY length(path) DESC, which is probably not as efficient as it should be. I think I’ll add longestPath() to my TODO list. Still, I bet this beats a linked list for some operations.

Starting structure

In order to make this work in Cypher, I ended up needing to have the head and tail of the list set up beforehand. It’s possible I overlooked something, but this made it easier, because there’s no such thing as a null pointer in Neo4j, so I needed a way to express the end of the list, and that was with the tail node. The head and tail have unique labels for a fast lookup.

CREATE
 (head:TimeIndexHead {time:0}), // this is always earlier than any time
 (tail:TimeIndexTail {time:9223372036854775807}), // this is always later than any time
 (head)-[:NEXT3]->(tail),
 (head)-[:NEXT2]->(tail),
 (head)-[:NEXT1]->(tail)
; // we need something to start at

Let’s add a few events in our timeline

Now that Cypher has LOG() and RAND(), we can do this sort of awesomeness. CEIL( LOG(RAND()) / LOG(.5) ) gives us a minimum of 1, and a maximum of infinity, but a very low chance of reaching infinity. RAND() would need to be 0. To avoid that very small possibility, I’ll add a small amount to the RAND(). Our first :Event will have a time of 1. Let’s add it!

WITH CEIL(LOG(RAND()+0.000000000001)/LOG(.5)) as level
CREATE (e:Event {time:1, val:"I just drank some Taster's Choice French Roast, the best coffee ever."})
WITH e, level
MATCH p3=(head:TimeIndexHead)-[:NEXT3*0..]->(max3)-[r3:NEXT3]->(next3)
WHERE max3.time < e.time
WITH e, level, max3, next3, r3
ORDER by length(p3) DESC
LIMIT 1
MATCH p2=(max3)-[:NEXT2*0..]->(max2)-[r2:NEXT2]->(next2)
WHERE max2.time < e.time
WITH e, level, max3, next3, r3, max2, next2, r2
ORDER by length(p2) DESC
LIMIT 1
MATCH p1=(max2)-[:NEXT1*0..]->(max1)-[r1:NEXT1]->(next1)
WHERE max1.time < e.time
WITH e, level, max3, next3, r3, max2, next2, r2, max1, next1, r1
ORDER by length(p1) DESC
LIMIT 1
WHERE level >= 1
CREATE (max1)-[:NEXT1]->(e)-[:NEXT1]->(next1)
DELETE r1
WITH e, level, max3, next3, max2, next2, r2, r3
WHERE level >= 2
CREATE (max2)-[:NEXT2]->(e)-[:NEXT2]->(next2)
DELETE r2
WITH e, level, max3, next3, r3
WHERE level >= 3
CREATE (max3)-[:NEXT3]->(e)-[:NEXT3]->(next3)
DELETE r3
;

I know what you’re thinking. That was a lot of Cypher. Seriously? That has to run for each insert? Yes, but it’s not that bad. The top level list is faster to traverse along because it has much fewer nodes. Let’s do several more.

{time:17, val:"I just ate some eggs with sriracha, I don't care if the sriracha factory is burning people's eyes."}
{time:34, val:"Mexican food is the best. Are tortillas paleo?"}
{time:78, val:"I think I read somewhere that you shouldn't tweet about what you eat. What a ridiculous idea."}
{time:150, val:"Did you know that Cypher has a haversin function?"}
{time:100, val:"Have you ever eaten an enmolada? Tia Tona makes the best mole for enmoladas."}
WITH CEIL(LOG(RAND()+0.000000000001)/LOG(.5)) as level
CREATE (e:Event {time:17, val:"I just ate some eggs with sriracha, I don't care if the sriracha factory is burning people's eyes."})
WITH e, level
MATCH p3=(head:TimeIndexHead)-[:NEXT3*0..]->(max3)-[r3:NEXT3]->(next3)
WHERE max3.time < e.time
WITH e, level, max3, next3, r3
ORDER by length(p3) DESC
LIMIT 1
MATCH p2=(max3)-[:NEXT2*0..]->(max2)-[r2:NEXT2]->(next2)
WHERE max2.time < e.time
WITH e, level, max3, next3, r3, max2, next2, r2
ORDER by length(p2) DESC
LIMIT 1
MATCH p1=(max2)-[:NEXT1*0..]->(max1)-[r1:NEXT1]->(next1)
WHERE max1.time < e.time
WITH e, level, max3, next3, r3, max2, next2, r2, max1, next1, r1
ORDER by length(p1) DESC
LIMIT 1
WHERE level >= 1
CREATE (max1)-[:NEXT1]->(e)-[:NEXT1]->(next1)
DELETE r1
WITH e, level, max3, next3, max2, next2, r2, r3
WHERE level >= 2
CREATE (max2)-[:NEXT2]->(e)-[:NEXT2]->(next2)
DELETE r2
WITH e, level, max3, next3, r3
WHERE level >= 3
CREATE (max3)-[:NEXT3]->(e)-[:NEXT3]->(next3)
DELETE r3
;
WITH CEIL(LOG(RAND()+0.000000000001)/LOG(.5)) as level
CREATE (e:Event {time:34, val:"Mexican food is the best. Are tortillas paleo?"})
WITH e, level
MATCH p3=(head:TimeIndexHead)-[:NEXT3*0..]->(max3)-[r3:NEXT3]->(next3)
WHERE max3.time < e.time
WITH e, level, max3, next3, r3
ORDER by length(p3) DESC
LIMIT 1
MATCH p2=(max3)-[:NEXT2*0..]->(max2)-[r2:NEXT2]->(next2)
WHERE max2.time < e.time
WITH e, level, max3, next3, r3, max2, next2, r2
ORDER by length(p2) DESC
LIMIT 1
MATCH p1=(max2)-[:NEXT1*0..]->(max1)-[r1:NEXT1]->(next1)
WHERE max1.time < e.time
WITH e, level, max3, next3, r3, max2, next2, r2, max1, next1, r1
ORDER by length(p1) DESC
LIMIT 1
WHERE level >= 1
CREATE (max1)-[:NEXT1]->(e)-[:NEXT1]->(next1)
DELETE r1
WITH e, level, max3, next3, max2, next2, r2, r3
WHERE level >= 2
CREATE (max2)-[:NEXT2]->(e)-[:NEXT2]->(next2)
DELETE r2
WITH e, level, max3, next3, r3
WHERE level >= 3
CREATE (max3)-[:NEXT3]->(e)-[:NEXT3]->(next3)
DELETE r3
;
WITH CEIL(LOG(RAND()+0.000000000001)/LOG(.5)) as level
CREATE (e:Event {time:78, val:"I think I read somewhere that you shouldn't tweet about what you eat. What a ridiculous idea."})
WITH e, level
MATCH p3=(head:TimeIndexHead)-[:NEXT3*0..]->(max3)-[r3:NEXT3]->(next3)
WHERE max3.time < e.time
WITH e, level, max3, next3, r3
ORDER by length(p3) DESC
LIMIT 1
MATCH p2=(max3)-[:NEXT2*0..]->(max2)-[r2:NEXT2]->(next2)
WHERE max2.time < e.time
WITH e, level, max3, next3, r3, max2, next2, r2
ORDER by length(p2) DESC
LIMIT 1
MATCH p1=(max2)-[:NEXT1*0..]->(max1)-[r1:NEXT1]->(next1)
WHERE max1.time < e.time
WITH e, level, max3, next3, r3, max2, next2, r2, max1, next1, r1
ORDER by length(p1) DESC
LIMIT 1
WHERE level >= 1
CREATE (max1)-[:NEXT1]->(e)-[:NEXT1]->(next1)
DELETE r1
WITH e, level, max3, next3, max2, next2, r2, r3
WHERE level >= 2
CREATE (max2)-[:NEXT2]->(e)-[:NEXT2]->(next2)
DELETE r2
WITH e, level, max3, next3, r3
WHERE level >= 3
CREATE (max3)-[:NEXT3]->(e)-[:NEXT3]->(next3)
DELETE r3
;
WITH CEIL(LOG(RAND()+0.000000000001)/LOG(.5)) as level
CREATE (e:Event {time:150, val:"Did you know that Cypher has a haversin function?"})
WITH e, level
MATCH p3=(head:TimeIndexHead)-[:NEXT3*0..]->(max3)-[r3:NEXT3]->(next3)
WHERE max3.time < e.time
WITH e, level, max3, next3, r3
ORDER by length(p3) DESC
LIMIT 1
MATCH p2=(max3)-[:NEXT2*0..]->(max2)-[r2:NEXT2]->(next2)
WHERE max2.time < e.time
WITH e, level, max3, next3, r3, max2, next2, r2
ORDER by length(p2) DESC
LIMIT 1
MATCH p1=(max2)-[:NEXT1*0..]->(max1)-[r1:NEXT1]->(next1)
WHERE max1.time < e.time
WITH e, level, max3, next3, r3, max2, next2, r2, max1, next1, r1
ORDER by length(p1) DESC
LIMIT 1
WHERE level >= 1
CREATE (max1)-[:NEXT1]->(e)-[:NEXT1]->(next1)
DELETE r1
WITH e, level, max3, next3, max2, next2, r2, r3
WHERE level >= 2
CREATE (max2)-[:NEXT2]->(e)-[:NEXT2]->(next2)
DELETE r2
WITH e, level, max3, next3, r3
WHERE level >= 3
CREATE (max3)-[:NEXT3]->(e)-[:NEXT3]->(next3)
DELETE r3
;
WITH CEIL(LOG(RAND()+0.000000000001)/LOG(.5)) as level
CREATE (e:Event {time:100, val:"Have you ever eaten an enmolada? Tia Tona makes the best mole for enmoladas."})
WITH e, level
MATCH p3=(head:TimeIndexHead)-[:NEXT3*0..]->(max3)-[r3:NEXT3]->(next3)
WHERE max3.time < e.time
WITH e, level, max3, next3, r3
ORDER by length(p3) DESC
LIMIT 1
MATCH p2=(max3)-[:NEXT2*0..]->(max2)-[r2:NEXT2]->(next2)
WHERE max2.time < e.time
WITH e, level, max3, next3, r3, max2, next2, r2
ORDER by length(p2) DESC
LIMIT 1
MATCH p1=(max2)-[:NEXT1*0..]->(max1)-[r1:NEXT1]->(next1)
WHERE max1.time < e.time
WITH e, level, max3, next3, r3, max2, next2, r2, max1, next1, r1
ORDER by length(p1) DESC
LIMIT 1
WHERE level >= 1
CREATE (max1)-[:NEXT1]->(e)-[:NEXT1]->(next1)
DELETE r1
WITH e, level, max3, next3, max2, next2, r2, r3
WHERE level >= 2
CREATE (max2)-[:NEXT2]->(e)-[:NEXT2]->(next2)
DELETE r2
WITH e, level, max3, next3, r3
WHERE level >= 3
CREATE (max3)-[:NEXT3]->(e)-[:NEXT3]->(next3)
DELETE r3
;
Loading graph...

What can we do with a skip list?

Well, a skip list is really just a fancy linked list. You might want to find what happened at time 100. Or what happened after time 100. Or what happened before time 17. Where linked lists can do this stuff, skip lists can do this stuff efficiently.

What happened at time 100?

WITH 100 AS time
MATCH p3=(head:TimeIndexHead)-[:NEXT3*0..]->(max3)
WHERE max3.time <= time
WITH max3, time
ORDER BY length(p3) DESC
LIMIT 1
MATCH p2=(max3)-[:NEXT2*0..]->(max2)
WHERE max2.time <= time
WITH max2, time
ORDER BY length(p2) DESC
LIMIT 1
MATCH p1=(max2)-[:NEXT1*0..]->(e)
WHERE e.time = time
WITH e, time
ORDER BY length(p1) DESC
LIMIT 1
RETURN e
;
Loading table...

What happened after time 100 (inclusive)?

WITH 100 AS time
MATCH p3=(head:TimeIndexHead)-[:NEXT3*0..]->(max3)
WHERE max3.time <= time
WITH max3, time
ORDER BY length(p3) DESC
LIMIT 1
MATCH p2=(max3)-[:NEXT2*0..]->(max2)
WHERE max2.time <= time
WITH max2, time
ORDER BY length(p2) DESC
LIMIT 1
MATCH p1=(max2)-[:NEXT1*0..]->(max1)
WHERE max1.time <= time
WITH max1, time
ORDER BY length(p1) DESC
LIMIT 1
MATCH (max1)-[:NEXT1*0..]->(event)
WHERE NOT event:TimeIndexTail
RETURN event
ORDER BY event.time
;
Loading table...

What happened before time 17 (inclusive)?

WITH 17 as time
MATCH (head:TimeIndexHead)-[:NEXT1*0..]->(event)
WHERE event.time <= time
  AND NOT event:TimeIndexTail
  AND NOT event:TimeIndexHead
RETURN event
ORDER BY event.time
;
Loading table...

What happened between time 17 and 100?

WITH 17 AS time, 100 as endTime
MATCH p3=(head:TimeIndexHead)-[:NEXT3*0..]->(max3)
WHERE max3.time <= time
WITH max3, time, endTime
ORDER BY length(p3) DESC
LIMIT 1
MATCH p2=(max3)-[:NEXT2*0..]->(max2)
WHERE max2.time <= time
WITH max2, time, endTime
ORDER BY length(p2) DESC
LIMIT 1
MATCH p1=(max2)-[:NEXT1*0..]->(max1)
WHERE max1.time <= time
WITH max1, time, endTime
ORDER BY length(p1) DESC
LIMIT 1
MATCH (max1)-[:NEXT1*0..]->(event)
WHERE event.time <= endTime
  AND NOT event:TimeIndexTail
RETURN event
ORDER BY event.time
;
Loading table...

Running queries, preparing the console!