Enable CDC on self-managed instances

This feature has been released as a public beta in AuraDB Enterprise October Release and Neo4j Enterprise Edition 5.13 and breaking changes are likely to be introduced before it is made generally available (GA).

Neo4j extracts CDC information from the transaction log. However, by default the transaction log does not contain information directly usable by CDC. For CDC to work, the transaction log need to be enriched with further information. This is applied as an extra configuration option to each database. As soon as CDC is enabled, the database is ready to answer CDC queries from client applications.

CDC has three enrichment modes:

  • OFF — CDC is disabled (default).

  • DIFF — Changes are captured as the difference between before and after states of each changed entity (i.e. they only contain removals, updates and additions).

  • FULL — Changes are recorded as a complete copy of the before and after states of each changed entity (i.e. the contain the full node/relationship, regardless of the extent to which they were altered).

Set the enrichment mode

Create a database with log enrichment enabled

To create a new database with CDC enabled, use the CREATE DATABASE Cypher command and set the option txLogEnrichment to either FULL or DIFF.

Query
CREATE DATABASE <dbName> IF NOT EXISTS OPTIONS {txLogEnrichment: "FULL"}

Modify a database’s log enrichment mode

To tweak the enrichment mode on an existing database, use the ALTER DATABASE Cypher command and set the option txLogEnrichment to either FULL or DIFF.

Query
ALTER DATABASE <dbName> SET OPTION txLogEnrichment "DIFF"

Modifying enrichment mode from DIFF to FULL or viceversa immediately changes the structure of captured changes. Your CDC application must be able to deal with the different format.

Get a database’s log enrichment mode

To see what value the enrichment mode of a database is, use the SHOW DATABASES Cypher command.

Query
SHOW DATABASES YIELD name, options
Table 1. Result
name options

"neo4j"

{"txLogEnrichment": "DIFF"}

"system"

{}

Disable log enrichment

To disable log enrichment on a database, either set txLogEnrichment explicitly to OFF or remove it altogether.

Using set option clause
ALTER DATABASE <dbName> SET OPTION txLogEnrichment "OFF"
Using remove option clause
ALTER DATABASE <dbName> REMOVE OPTION txLogEnrichment

Disabling enrichment immediately breaks the continuity of change events. Change identifiers generated before disabling can no longer be used and, even if enrichment is re-enabled, the previously-generated change identifiers remain invalid. Disabling and then re-enabling CDC is equivalent to enabling it for the first time: there is no memory of previous changes.

Points to consider

Security

CDC returns all changes in the database and is not limited to the entities which a certain user is allowed to access. In order to prevent unauthorized access, the procedure db.cdc.query requires admin privileges and should be configured for least privilege access.

For a regular user to be able to run db.cdc.query, the user must have been granted execute privileges as well as boosted execute privileges.

GRANT EXECUTE PROCEDURE db.cdc.query ON DBMS TO $role;
GRANT EXECUTE BOOSTED PROCEDURE db.cdc.query ON DBMS TO $role;

Non-boosted execute privileges are usually part of the PUBLIC role, in which case they do not need to be granted a second time.

Furthermore, the user does not have access to a database unless they have been granted access.

GRANT ACCESS ON DATABASE $database TO $role

Usually the PUBLIC role already has access to the default database.

The procedures db.cdc.current and db.cdc.earliest do not require admin privileges. In order to execute these, access to the database and regular execution privileges are sufficient.

For more details regarding procedure privileges in Neo4j, see Operations Manual → Manage procedure and user-defined function permissions.

Disk size

Enabling change data capture on a database causes more data to be written into transaction log files. This means that the log files are rotated more frequently and log pruning is activated sooner (based on your configuration). The disk may run out of space if the disk size for transaction log storage is limited, so ensure that you have plenty of available space.

In particular, plan for a 50% increase in data written to the transaction log with DIFF enrichment mode, and 75% for FULL enrichment mode. Actual disk usage depends on the application, data model and transaction characteristics.

Transaction log retention

Since Neo4j stores change data capture information inside transaction log entries, you should configure the transaction log retention period based on your application requirements.

The number of hours or days to keep the transaction log depends on your CDC use case, but as a general rule of thumb you can pick the period based on downtime tolerance of your downstream application so that the changes you have not yet processed are not pruned.

For more details on transaction log retention and how to configure it, see Operations Manual → Configuration → Transaction Log.

Unrecorded changes

CDC can only capture data changes that pass through the transaction layer and any data creation that avoids this layer can therefore not be captured. For example, when importing data with the neo4j-admin database import tool, whether full or incremental, or when loading data with the neo4j-admin database load tool, data is written directly to the store without sending anything to the transaction log, so such changes are not captured by CDC.