Use Weakly Connected Components to Avoid Cypher Query Crashing

PhD & Consultant Engineer, Neo4j
4 min read

This article will present a solution to avoid Cypher query crashing, thanks to the Graph Data Science plugin from Neo4j and, in particular, the use of the Weakly Connected Components (WCC) algorithm.
This article requires the use of Graph Data Science from Neo4j.
The Problem
Imagine you have some data in your Neo4j instance and you want to perform some refactoring or operations that will create new nodes or relationships. When you use concurrent transactions, I’m sure you’ve already come across one of the following errors.
The first is a deadlock issue because one transaction is blocked by another transaction attempting to concurrently modify the same node or relationship locked by the other transaction.
The transaction will be rolled back and terminated. Error: ForsetiClient[transactionId=6698, clientId=1] can't acquire ExclusiveLock{owner=ForsetiClient[transactionId=6697, clientId=3]} on NODE(27), because holders of that lock are waiting for ForsetiClient[transactionId=6698, clientId=1].
Wait list:ExclusiveLock[
Client[6697] waits for [ForsetiClient[transactionId=6698, clientId=1]]]
Learn more about concurrent data access.
The second error occurs when inside a transaction, the Cypher query deletes a node or a relationship, but another part of the Cypher query needed it. Usually, this happens when you execute refactoring queries.
Failed to invoke procedure `apoc.refactor.mergeNodes`: Caused by: org.neo4j.graphdb.NotFoundException: Node 7 not found
In both cases, the transaction is rolled back.
And finally, you’ll rewrite the query differently or implement a retry mechanism.
The Solution
What I suggest is rewriting the Cypher query that doesn’t require a retry mechanism by using the WCC algorithm from Graph Data Science.
Learn more about Weakly Connected Components.
Here’s a reminder of how the algorithm works from the Neo4j documentation: “The Weakly Connected Components algorithm finds sets of connected nodes in directed and undirected graphs. Two nodes are connected, if there exists a path between them. The set of all nodes that are connected with each other form a component.”
Strategy
We first need to project the sub-graph composed of nodes and relationships needed by the Cypher query.
Then we execute the WCC algorithm on the sub-graph previously projected in write mode. For better performance, you can index the resulting property.
Finally, we rewrite the Cypher query by making sure that only one transaction will be executed by community. The simplest way is to iterate over all communities in the first part of the query and do the processing in the second part of the query.
You can easily apply this solution when you have queries with concurrency, such as CALL subqueries in transactions and CALL apoc.periodic.iterate.
Example
This example comes from an article written by Pierre Halftermeyer titled Mastering Fraud Detection With Temporal Graph Modeling.
To summarize, he created :SAME_CC_AS
relationships between User and Event nodes to link events chronologically to build a time-respecting view of user interactions and shared resources with the following query:
CYPHER 5
MATCH (e:Event&!ConnectedComponent)
WITH e ORDER BY e.timestamp
CALL (e) {
MATCH (e)(()-[:WITH]->(entity)<-[:WITH]-(:ConnectedComponent)
){0,1}()<-[:COMMITS]-(u)
WITH DISTINCT e, u
MATCH (u)-[:SAME_CC_AS]->*(cc WHERE NOT EXISTS {(cc)-[:SAME_CC_AS]->()})
MERGE (cc)-[:SAME_CC_AS]->(e)
SET e:ConnectedComponent
} IN TRANSACTIONS OF 100 ROWS
The previous query builds the structure by processing events sequentially. The problem is that this is inefficient, especially when the graph is gigantic, because there’s only one thread working.
Therefore, I suggested using WCC, and he transformed his first request into the following one, which can build the structure concurrently:
CYPHER 5
CALL gds.wcc.stream('wcc_graph')
YIELD nodeId, componentId
WITH gds.util.asNode(nodeId) AS event, componentId
WITH componentId, collect(event) AS events
ORDER BY rand() // shuffle components to balance batch size
CALL (events) {
UNWIND events AS e
WITH e
WHERE NOT e:ConnectedComponent
ORDER BY e.timestamp ASC
CALL (e) {
MATCH (e)(()-[:WITH]->(entity)<-[:WITH]-(:ConnectedComponent)){0,1}()<-[:COMMITS]-(p)
WITH DISTINCT e, p
MATCH (p)-[:SAME_CC_AS]->*(cc WHERE NOT EXISTS {(cc)-[:SAME_CC_AS]->()})
MERGE (cc)-[:SAME_CC_AS]->(e)
SET e:ConnectedComponent
}
} IN CONCURRENT TRANSACTIONS OF 100 ROWS
Execution Time Comparison
If the graph is small (a few thousand nodes/relationships), the execution time of the version with WCC will be close, but still better than the version using only Cypher. However, when the graph is large, the WCC version is much more efficient.
Here are the execution times from the example (with 27,911 nodes and 29,241 relationships):
- 712 ms for the Cypher only

- 356 ms for Cypher+WCC

These values were obtained on a Neo4j AuraDS Enterprise instance with 8 GB of memory and two CPUs.
Summary
I presented an original solution using the Weakly Connected Components algorithm from Graph Data Science to avoid Cypher query crashing due to concurrent transactions.
I see several advantages with this solution, including:
- Executing Cypher queries with concurrency without crashing due to deadlocks or missing nodes/relationships
- Optimizing complex refactoring/writing operations
- Keeping your Cypher queries simple
- Using Graph Data Science to optimize write operations
Note that this solution won’t work if you’re unable to split the write operation by communities or if there are no disjoint graphs. You can have communities of very different sizes, but transactions will take longer on larger communities compared to smaller ones.
Resources
You can have communities of very different sizes, but transactions will take longer on larger communities compared to smaller ones.
Use Weakly Connected Components to Avoid Cypher Query Crashing was originally published in Neo4j Developer Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.