When one has enabled query.log through Neo4j Enterprise parameter
the included bash shell script can be used to quickly parse the log and identify the top 10 most expensive queries based upon total execution time and if one has enabled
then the top 10 most expensive queries based upon planning, cpu and waiting time will also be reported.
The bash shell script is as follows:
Upon running the script and passing the query.log file to be analyzed, for example
./parse_querylog.sh query.log output will be similar to
(Note: the listing of 10 queries per section has been reduced to 2 lines for brevity. Additonally the Cypher of the query has been reduced)
From the output of the 2nd run we see that the log spans
First Query Reported at: 2018-01-24 07:58:13.360+0000 through
Last Query Reported at: 2018-01-24 15:19:40.897+0000.
The script then reports that we found
Total # of Completed Queries: 13655 and then provide a listing of the Top 10 most expensive
queries based upon Total/Avg Exection time, Total/Avg Time Planning, Total/Avg Time CPU and Total/Avg Time Waiting.
Additionally from the output we can see that ‘query planning’ is not a significant issue as
Number of Queries NOT Planned: 11448 83.8374% indicating 83.8374% of queries were satisfied from the query plan cache.
Finally, ‘locking’ is not a concern as
Number of Queries NOT Waiting: 13654 99.9927%
indicating 99.9927% of all queries spent no time in a ‘waiting’ state.
Note: There are certain ‘caveats’ to the script.
- The above script will report the line number of the query in the log file for those queries that meet any of the ‘Top 10’ results. However if a query is multi-line including carriage returns, only the first line of the query is displayed. One would need to read the query.log at the spcific line number to fully understand the query in question.
- Times reported can be inflated if run against a query.log which is immediately after a cold start of Neo4j. This is as a result of queries not being in the query plan cache or the pagecache.
One can configure query.log logging to only log queries longer than X duration through parameter
dbms.logs.query.threshold. If you set to 0 then everything is logged. If you set to 2s then only queries longer than 2 seconds will be logged and this will effect the results above.