Error handling

This section describes how to manage errors that you may encounter while managing databases.

When running the database management queries, such as CREATE DATABASE, it is possible to encounter errors.

1. Observing errors

Because database management operations are performed asynchronously, these errors may not returned immediately upon query execution. Instead, you must monitor the output of SHOW DATABASE; particularly the error and currentStatus columns.

Example 1. Fail to create a database
neo4j@system> CREATE DATABASE foo;
0 rows available after 108 ms, consumed after another 0 ms
neo4j@system> SHOW DATABASE foo;

In standalone mode:

+------------------------------------------------------------------------------------------------------------------+
| name   | address          | role         | requestedStatus | currentStatus | error                     | default |
+------------------------------------------------------------------------------------------------------------------+
| "foo"  | "localhost:7687" | "standalone" | "online"        | "dirty"       | "File system permissions" | FALSE   |
+------------------------------------------------------------------------------------------------------------------+

1 rows available after 4 ms, consumed after another 1 ms

In a Causal Cluster:

+----------------------------------------------------------------------------------------------------------------+
| name   | address          | role       | requestedStatus | currentStatus | error                     | default |
+----------------------------------------------------------------------------------------------------------------+
| "foo"  | "localhost:7687" | "leader"   | "online"        | "online"      | ""                        | FALSE   |
| "foo"  | "localhost:7688" | "follower" | "online"        | "online"      | ""                        | FALSE   |
| "foo"  | "localhost:7689" | "follower" | "online"        | "dirty"       | "File system permissions" | FALSE   |
+----------------------------------------------------------------------------------------------------------------+

3 row available after 100 ms, consumed after another 6 ms

2. Database states

A database management operation may fail for a number of reasons. For example, if the file system instance has incorrect permissions, or Neo4j itself is misconfigured. As a result, the contents of the error column in the SHOW DATABASE query results may vary significantly.

However, databases may only be in one of a select number of states:

Current state Description

initial

The database has not yet been created.

online

The database is running.

offline

The database is not running.

store copying

The database is currently being updated from another instance of Neo4j.

dropped

The database has been deleted.

dirty

This state implies an error has occurred. The database’s underlying store files may be invalid. For more information, consult the server’s logs.

quarantined

The database is effectively stopped and its state may not be changed until no longer quarantined.

unknown

This instance of Neo4j doesn’t know the state of this database.

Most often, when a database management operation fails, Neo4j attempts to transition the database in question to the offline state. If the system is certain that no store files have yet been created, it transitions the database to initial instead. Similarly, if the system suspects that the store files underlying the database are invalid (incomplete, partially deleted, or corrupt), then it transitions the database to dirty.

While dropped is a valid database state, it is only transiently observable, as database records are removed from SHOW DATABASE results once the DROP operation is complete.

3. Retrying failed operations

Database management operations may be safely retried in the event of failure. However, these retries are not guaranteed to succeed, and errors may persist through several attempts.

If a database is in the quarantined state, retrying the last operation will not work.

Example 2. Retry to start a database
neo4j@system> START DATABASE foo;
0 rows available after 108 ms, consumed after another 0 ms
neo4j@system> SHOW DATABASE foo;
+-------------------------------------------------------------------------------------------------------------+
| name   | address          | role         | requestedStatus | currentStatus | error                | default |
+-------------------------------------------------------------------------------------------------------------+
| "foo"  | "localhost:7687" | "standalone" | "online"        | "offline"     | "Some error message" | FALSE   |
+-------------------------------------------------------------------------------------------------------------+

1 rows available after 4 ms, consumed after another 1 ms

After investigating and addressing the underlying issue, you can start the database again and verify that it is running properly:

neo4j@system> START DATABASE foo;
0 rows available after 108 ms, consumed after another 0 ms
neo4j@system> SHOW DATABASE foo;
+------------------------------------------------------------------------------------------------+
| name     | address          | role         | requestedStatus | currentStatus | error | default |
+------------------------------------------------------------------------------------------------+
| "foo"    | "localhost:7687" | "standalone" | "online"        | "online"      | ""    | FALSE   |
+------------------------------------------------------------------------------------------------+

1 rows available after 4 ms, consumed after another 1 ms

If repeated retries of a command have no effect, or if a database is in a dirty state, you may drop and recreate the database, as detailed in <<cypher-manual#administration-databases, Cypher Manual → Database management>.

When running DROP DATABASE as part of an error handling operation, you can also append DUMP DATA to the command. It produces a database dump that can be further examined and potentially repaired.

4. Quarantined databases

There are two ways to get a database into a quarantined state:

  • By using the dbms.quarantineDatabase procedure locally to isolate a specific database. The procedure must be executed on the instance whose copy of the database you want to quarantine. A reason for that can be, for example, when a database is unable to start on a given instance due to a file system permissions issue with the volume where the database is located or when a recently started database begins to log errors. The quarantine state renders the database inaccessible on that instance and prevents its state from being changed, for example, with the START DATABASE command.

    If running in a cluster, database management commands such as START DATABASE foo will still take effect on the instances which have not quarantined foo.

  • When a database encounters a severe error during its normal run, which prevents it from a further operation, Neo4j stops that database and brings it into a quarantined state. Meaning, it is not possible to restart it with a simple START DATABASE command. You have to execute CALL dbms.quarantineDatabase(databaseName, false) on the instance with the failing database in order to lift the quarantine.

After lifting the quarantine, the instance will automatically try to bring the database to the desired state.

It is recommended to run the quarantine procedure over the bolt:// protocol rather than neo4j://, which may route requests to unexpected instances.

Syntax:

CALL dbms.cluster.quarantineDatabase(databaseName,setStatus,reason)

Arguments:

Name Type Description

databaseName

String

The name of the database that will be put into or removed from quarantine.

setStatus

Boolean

true for placing the database into quarantine; false for lifting the quarantine.

reason

String

(Optional) The reason for placing the database in quarantine.

Returns:

Name Type Description

databaseName

String

The name of the database.

quarantined

String

Actual state.

result

String

Result of the last operation. The result contains the user, the time, and the reason for the quarantine.

Quarantine a database
neo4j@system> CALL dbms.cluster.quarantineDatabase("foo",true);
+--------------------------------------------------------------------------------------+
| databaseName | quarantined | result                                                  |
+--------------------------------------------------------------------------------------+
| "foo"        | TRUE        | "By neo4j at 2020-10-15T15:10:41.348Z: No reason given" |
+--------------------------------------------------------------------------------------+

3 row available after 100 ms, consumed after another 6 ms
Check if a database is quarantined
neo4j@system> SHOW DATABASE foo;
+---------------------------------------------------------------------------------------------------------------------------------------------+
| name  | address          | role       | requestedStatus | currentStatus | error                                                   | default |
+---------------------------------------------------------------------------------------------------------------------------------------------+
| "foo" | "localhost:7688" | "unknown"  | "online"        | "quarantined" | "By neo4j at 2020-10-15T15:10:41.348Z: No reason given" | FALSE   |
| "foo" | "localhost:7689" | "follower" | "online"        | "online"      | ""                                                      | FALSE   |
| "foo" | "localhost:7687" | "leader"   | "online"        | "online"      | ""                                                      | FALSE   |
+---------------------------------------------------------------------------------------------------------------------------------------------+

3 row available after 100 ms, consumed after another 6 ms

A quarantined state is persisted for user databases. This means that if a database is quarantined, it will remain so even if that Neo4j instance is restarted. You can remove it only by running the dbms.quarantineDatabase procedure on the instance where the quarantined database is located, passing false for the setStatus parameter.

The one exception to this rule is for the built-in system database. Any quarantine for that database is removed automatically after instance restart.