Enable CDC on self-managed instances
This feature has been released as a public beta in AuraDB Enterprise October Release and Neo4j Enterprise Edition 5.13 and breaking changes are likely to be introduced before it is made generally available (GA). |
Neo4j extracts CDC information from the transaction log. However, by default the transaction log does not contain information directly usable by CDC. For CDC to work, the transaction log need to be enriched with further information. This is applied as an extra configuration option to each database. As soon as CDC is enabled, the database is ready to answer CDC queries from client applications.
CDC has three enrichment modes:
-
OFF
— CDC is disabled (default). -
DIFF
— Changes are captured as the difference between before and after states of each changed entity (i.e. they only contain removals, updates and additions). -
FULL
— Changes are recorded as a complete copy of the before and after states of each changed entity (i.e. the contain the full node/relationship, regardless of the extent to which they were altered).
Set the enrichment mode
Create a database with log enrichment enabled
To create a new database with CDC enabled, use the CREATE DATABASE
Cypher command and set the option txLogEnrichment
to either FULL
or DIFF
.
CREATE DATABASE <dbName> IF NOT EXISTS OPTIONS {txLogEnrichment: "FULL"}
Modify a database’s log enrichment mode
To tweak the enrichment mode on an existing database, use the ALTER DATABASE
Cypher command and set the option txLogEnrichment
to either FULL
or DIFF
.
ALTER DATABASE <dbName> SET OPTION txLogEnrichment "DIFF"
Modifying enrichment mode from |
Get a database’s log enrichment mode
To see what value the enrichment mode of a database is, use the SHOW DATABASES
Cypher command.
SHOW DATABASES YIELD name, options
name | options |
---|---|
|
|
|
|
Disable log enrichment
To disable log enrichment on a database, either set txLogEnrichment
explicitly to OFF
or remove it altogether.
ALTER DATABASE <dbName> SET OPTION txLogEnrichment "OFF"
ALTER DATABASE <dbName> REMOVE OPTION txLogEnrichment
Disabling enrichment immediately breaks the continuity of change events. Change identifiers generated before disabling can no longer be used and, even if enrichment is re-enabled, the previously-generated change identifiers remain invalid. Disabling and then re-enabling CDC is equivalent to enabling it for the first time: there is no memory of previous changes. |
Points to consider
Security
CDC returns all changes in the database and is not limited to the entities which a certain user is allowed to access.
In order to prevent unauthorized access, the procedure db.cdc.query
requires admin privileges and should be configured for least privilege access.
For a regular user to be able to run db.cdc.query
, the user must have been granted execute privileges as well as boosted execute privileges.
GRANT EXECUTE PROCEDURE db.cdc.query ON DBMS TO $role;
GRANT EXECUTE BOOSTED PROCEDURE db.cdc.query ON DBMS TO $role;
Non-boosted execute privileges are usually part of the |
Furthermore, the user does not have access to a database unless they have been granted access.
GRANT ACCESS ON DATABASE $database TO $role
Usually the |
The procedures db.cdc.current
and db.cdc.earliest
do not require admin privileges. In order to execute these, access to the database and regular execution privileges are sufficient.
For more details regarding procedure privileges in Neo4j, see Operations Manual → Manage procedure and user-defined function permissions.
Disk size
Enabling change data capture on a database causes more data to be written into transaction log files. This means that the log files are rotated more frequently and log pruning is activated sooner (based on your configuration). The disk may run out of space if the disk size for transaction log storage is limited, so ensure that you have plenty of available space.
In particular, plan for a 50% increase in data written to the transaction log with DIFF
enrichment mode, and 75% for FULL
enrichment mode.
Actual disk usage depends on the application, data model and transaction characteristics.
Transaction log retention
Since Neo4j stores change data capture information inside transaction log entries, you should configure the transaction log retention period based on your application requirements.
The number of hours or days to keep the transaction log depends on your CDC use case, but as a general rule of thumb you can pick the period based on downtime tolerance of your downstream application so that the changes you have not yet processed are not pruned.
For more details on transaction log retention and how to configure it, see Operations Manual → Configuration → Transaction Log.
Unrecorded changes
CDC can only capture data changes that pass through the transaction layer and any data creation that avoids this layer can therefore not be captured.
For example, when importing data with the neo4j-admin database import
tool, whether full or incremental, or when loading data with the neo4j-admin database load
tool, data is written directly to the store without sending anything to the transaction log, so such changes are not captured by CDC.