Error handling

This section describes how to manage errors that you may encounter while managing databases.

When running the database management queries, such as CREATE DATABASE, it is possible to encounter errors.

1. Observing errors

Because database management operations are performed asynchronously, these errors may not returned immediately upon query execution. Instead, you must monitor the output of SHOW DATABASE; particularly the error and currentStatus columns.

Example 1. Fail to create a database
neo4j@system> CREATE DATABASE foo;
0 rows available after 108 ms, consumed after another 0 ms
neo4j@system> SHOW DATABASE foo;

In standalone mode:

+------------------------------------------------------------------------------------------------------------------+
| name   | address          | role         | requestedStatus | currentStatus | error                     | default |
+------------------------------------------------------------------------------------------------------------------+
| "foo"  | "localhost:7687" | "standalone" | "online"        | "dirty"       | "File system permissions" | FALSE   |
+------------------------------------------------------------------------------------------------------------------+

1 rows available after 4 ms, consumed after another 1 ms

In a Causal Cluster:

+----------------------------------------------------------------------------------------------------------------+
| name   | address          | role       | requestedStatus | currentStatus | error                     | default |
+----------------------------------------------------------------------------------------------------------------+
| "foo"  | "localhost:7687" | "leader"   | "online"        | "online"      | ""                        | FALSE   |
| "foo"  | "localhost:7688" | "follower" | "online"        | "online"      | ""                        | FALSE   |
| "foo"  | "localhost:7689" | "follower" | "online"        | "dirty"       | "File system permissions" | FALSE   |
+----------------------------------------------------------------------------------------------------------------+

3 row available after 100 ms, consumed after another 6 ms

2. Database states

A database management operation may fail for a number of reasons. For example, if the file system instance has incorrect permissions, or Neo4j itself is misconfigured. As a result, the contents of the error column in the SHOW DATABASE query results may vary significantly.

However, databases may only be in one of a select number of states:

Current state Description

initial

The database has not yet been created.

online

The database is running.

offline

The database is not running.

store copying

The database is currently being updated from another instance of Neo4j.

dropped

The database has been deleted.

dirty

This state implies an error has occurred. The database’s underlying store files may be invalid. For more information, consult the server’s logs.

quarantined

The database is effectively stopped and its state may not be changed until no longer quarantined.

unknown

This instance of Neo4j doesn’t know the state of this database.

Most often, when a database management operation fails, Neo4j attempts to transition the database in question to the offline state. If the system is certain that no store files have yet been created, it transitions the database to initial instead. Similarly, if the system suspects that the store files underlying the database are invalid (incomplete, partially deleted, or corrupt), then it transitions the database to dirty.

Whilst dropped is a valid database state, it is only transiently observable, as database records are removed from SHOW DATABASE results once the DROP operation is complete.

3. Retrying failed operations

Database management operations may be safely retried in the event of failure. However, these retries are not guaranteed to succeed, and errors may persist through several attempts.

Example 2. Retry to start a database
neo4j@system> START DATABASE foo;
0 rows available after 108 ms, consumed after another 0 ms
neo4j@system> SHOW DATABASE foo;
+-------------------------------------------------------------------------------------------------------------+
| name   | address          | role         | requestedStatus | currentStatus | error                | default |
+-------------------------------------------------------------------------------------------------------------+
| "foo"  | "localhost:7687" | "standalone" | "online"        | "offline"     | "Some error message" | FALSE   |
+-------------------------------------------------------------------------------------------------------------+

1 rows available after 4 ms, consumed after another 1 ms

After investigating and addressing the underlying issue, you can start the database again and verify that it is running properly:

neo4j@system> START DATABASE foo;
0 rows available after 108 ms, consumed after another 0 ms
neo4j@system> SHOW DATABASE foo;
+------------------------------------------------------------------------------------------------+
| name     | address          | role         | requestedStatus | currentStatus | error | default |
+------------------------------------------------------------------------------------------------+
| "foo"    | "localhost:7687" | "standalone" | "online"        | "online"      | ""    | FALSE   |
+------------------------------------------------------------------------------------------------+

1 rows available after 4 ms, consumed after another 1 ms

If repeated retries of a command have no effect, or if a database is in a dirty state, you may drop and recreate the database, as detailed in Cypher manual → Administration.

When running DROP DATABASE as part of an error handling operation, you can also append DUMP DATA to the command. It produces a database dump that can be further examined and potentially repaired.

4. Using quarantine in a cluster for fixing errors

You can use the dbms.cluster.quarantineDatabase procedure locally (only on the cluster member where it is executed) to isolate a specific database. For example, when a database is unable to start on a given member due to a file system permissions issue with the volume where the database is located, or when a recently started database begins to log errors. The quarantine state renders the database inaccessible on that cluster member and prevents its state from being changed, for example, via the START DATABASE command. After lifting the quarantine, the cluster member tries to bring the database to the desired state.

It is recommended to run the quarantine procedure over the bolt:// protocol rather than neo4j://, which may route requests to unexpected instances.

Syntax:

CALL dbms.cluster.quarantineDatabase(databaseName,setStatus,reason)

Arguments:

Name Type Description

databaseName

String

The name of the database that will be put into or removed from quarantine.

setStatus

Boolean

true for placing the database into quarantine; false for lifting the quarantine.

reason

String

(Optional) The reason for placing the database in quarantine.

Returns:

Name Type Description

databaseName

String

The name of the database.

quarantined

String

Actual state.

result

String

Result of the last operation. The result contains the user, the time, and the reason for the quarantine.

Quarantine a database
neo4j@system> CALL dbms.cluster.quarantineDatabase("foo",true);
+--------------------------------------------------------------------------------------+
| databaseName | quarantined | result                                                  |
+--------------------------------------------------------------------------------------+
| "foo"        | TRUE        | "By neo4j at 2020-10-15T15:10:41.348Z: No reason given" |
+--------------------------------------------------------------------------------------+

3 row available after 100 ms, consumed after another 6 ms
Check if a database is quarantined
neo4j@system> SHOW DATABASE foo;
+---------------------------------------------------------------------------------------------------------------------------------------------+
| name  | address          | role       | requestedStatus | currentStatus | error                                                   | default |
+---------------------------------------------------------------------------------------------------------------------------------------------+
| "foo" | "localhost:7688" | "unknown"  | "online"        | "quarantined" | "By neo4j at 2020-10-15T15:10:41.348Z: No reason given" | FALSE   |
| "foo" | "localhost:7689" | "follower" | "online"        | "online"      | ""                                                      | FALSE   |
| "foo" | "localhost:7687" | "leader"   | "online"        | "online"      | ""                                                      | FALSE   |
+---------------------------------------------------------------------------------------------------------------------------------------------+

3 row available after 100 ms, consumed after another 6 ms