CDC on Neo4j Aura

Neo4j extracts CDC information from the transaction log. However, by default the transaction log does not contain information directly usable by CDC. For CDC to work, the transaction log need to be enriched with further information. This is applied as an extra configuration option to each database. As soon as CDC is enabled, the database is ready to answer CDC queries from client applications.

CDC has three working modes:

  • OFF — CDC is disabled (default).

  • DIFF — Changes are captured as the difference between before and after states of each changed entity (i.e. they only contain removals, updates and additions).

  • FULL — Changes are recorded as a complete copy of the before and after states of each changed entity (i.e. the contain the full node/relationship, regardless of the extent to which they were altered).

Enable/Toggle CDC mode

Admin users can tweak the CDC mode for a database through the setting Edit CDC Mode, accessible via the Aura instance options. Non-admin users may view the current CDC mode, but may not edit it.

Modifying CDC mode from DIFF to FULL or vice-versa immediately changes the structure of captured changes. Your CDC application must be able to deal with the change of format.

Disable CDC

Admin users can disable CDC for a database through the setting Edit CDC Mode, accessible via the Aura instance options. Set the mode to OFF. Only admin users can disable CDC mode for a database.

Disabling CDC immediately breaks the continuity of change events. Change identifiers generated before disabling can no longer be used and, even if CDC is re-enabled, the previously-generated change identifiers remain invalid. Disabling and then re-enabling CDC is equivalent to enabling it for the first time: there is no memory of previous changes.

CDC is automatically disabled for:

  • New instances

  • Cloned instances

  • Paused instances

  • Instances restored from a snapshot

Key considerations

Pause/Resume databases

It is not recommended to pause a database that has CDC mode set to DIFF or FULL, as the CDC mode gets set to OFF when the database is resumed.

When an instance is resumed, it behaves similarly to restoring a snapshot.

Security

CDC returns all changes in the database and is not limited to the entities which a certain user is allowed to access. In order to prevent unauthorized access, the procedure db.cdc.query requires admin privileges and should be configured for least privilege access.

For a regular user to be able to run db.cdc.query, the user must have been granted execute privileges as well as boosted execute privileges.

GRANT EXECUTE PROCEDURE db.cdc.query ON DBMS TO $role;
GRANT EXECUTE BOOSTED PROCEDURE db.cdc.query ON DBMS TO $role;

Non-boosted execute privileges are usually part of the PUBLIC role, in which case they do not need to be granted a second time.

Furthermore, the user does not have access to a database unless they have been granted access.

GRANT ACCESS ON DATABASE $database TO $role

Usually the PUBLIC role already has access to the default database.

The procedures db.cdc.current and db.cdc.earliest do not require admin privileges. In order to execute these, access to the database and regular execution privileges are sufficient.

For more details regarding procedure privileges in Neo4j, see Operations Manual → Manage procedure and user-defined function permissions.

Transaction log retention

Since CDC information is stored in transaction log entries, the time for which the logs are retained dictates how far back in time your application may query for CDC data.

The amount of disk space reserved for transaction logs is fixed, and logs are rotated and pruned regularly to ensure their size doesn’t grow beyond the threshold. When logs get pruned, your CDC application loses access to the transactions contained in the deleted files. Normally this is not an issue, as applications are meant to consume changes in (quasi) real time. Should your CDC application be unavailable for too long a time though, some of the oldest unconsumed changes may have already been pruned by the time it is available again.

Although retrieving the earliest change event may help you get a feeling for how far back in time your database’s transaction logs are retained, you should be careful in making assumptions based on that: the rate at which old transactions are pruned from logs depends on the number and size of incoming transactions, and is thus likely to oscillate over time.