Knowledge Base

How to diagnose locking issues

Since Neo4j 3.4 it’s possible to better understand locking issues caused by concurrent query. This KB article will not detail the basics of locking in Neo4j. We assume a situation where you concurrently run lots of queries - might be the same parameterized query or different queries. The typical execution times of these are much higher than you expect - you basically think that Neo4j is "slow". In fact the source for this kind of observation might be that your queries try to grab locks on a common node, therefore they need to wait until the lock holder releases the lock. The queries waiting for the common lock are effectively serialized.

In a previous version of neo4j the query logging feature allowed for the adding of information related to how long that query was in waiting state, whether waiting on locks or waiting on IO. This can be switched on through neo4j.conf parameter of dbms.logs.query.time_logging_enabled=true. However this gives us only information on the total waiting time and no insight in which contended node or relationship might have caused it.

Commencing with Neo4j 3.4, and along with new stored procedure dbms.listTransactions() one can better understand the cause of the delay. To demonstrate the procedures with 2 cypher-shell connections run the following cypher

session1:    merge (n:Lock {id:1}) set n.age=20 with n call apoc.util.sleep(200000) return n;
session2:    Match (n:Lock {id:1}) set n.age=85 return n;

The first session will set the age property to 20 on the :Lock labelled node with id=1 and then sleep for 200 seconds (thus keeping the lock on said node). The 2nd session will attempt to update the same node and set its age property to 85 but will be blocked for these 200 seconds.

During these 200 seconds and via the Neo4j Browser, running call dbms.listTransactions() yield transactionId, startTime, currentQueryId, currentQuery, status will return output similar to

diagnose locking issues qbloQCO

To which from this output we can see that transaction-1381/query-1378 reports its status of Blocked by: [transaction-1380] and to which transaction-1380 is also listed in the output and both transactions describe the currentQuery.

Also, running call dbms.listActiveLocks('<currentQueryId>'); for example call dbms.listActiveLocks('query-1377'); will return output similar to

From the output above we see that query-1377 has a EXCLUSIVE NODE lock on a node with id(n)=8432.

diagnose locking issues ElrX7jH

With these 2 procedures you have powerful tooling at hand to understand what nodes/relationships might be a source for lock contention.