Migrate a Causal Cluster

This chapter describes the necessary steps to migrate a Causal Cluster from Neo4j version 3.5 directly to 4.x.

The migration of a Causal Cluster from Neo4j version 3.5 to 4.x requires downtime. Therefore, it is recommended to perform a test migration in a production-like environment to get information on the duration of the downtime.

The prerequisites and the migration steps must be completed for each cluster member.

1. Prerequisites

Ensure that you have completed all tasks on the Migration checklist.

2. Prepare for the migration

The strategy for migrating a cluster deployment is to complete an offline copy from one cluster instance, and then use the copied store to seed the new cluster.

Remember, a migration is a single event. Do not perform independent migrations on each of your instances! There should be a single migration event and that migrated store will be your source of truth for all the other instances of the cluster. This is important because when migrating, Neo4j generate random store IDs and, if done independently, your cluster will end up with as many store IDs as instances you have. Neo4j will fail to start if that is the case. Due to this, some of the cluster migrations steps will be performed on a single instance while others will be performed on all instances. Each step will tell you where to perform the necessary actions.

At this stage, you should elect one instance to work on. This will be the instance where the migration will actually happen. The next steps will tell you whether to perform the step on the elected instance, on the remaining instances or on all instances.

On each cluster member
  1. Verify that you have shut down all the cluster members (Cores and Read Replicas). You can check the neo4j.log.

  2. Install the Neo4j version that you want to migrate to on each instance. For more information on how to install the distribution that you are using, see Operations Manual v4.0 → Installation.

  3. Replace the neo4j.conf file with the one that you have prepared for each instance in section Prepare a new neo4j.conf file to be used by the new installation.

  4. Copy all the files used for encryption, such as private key, public certificate, and the contents of the trusted and revoked directories (located in <neo4j-home>/certificates/).

    If your old installation has modified configurations starting with dbms.directories.* or the setting dbms.active_database, verify that the new neo4j.conf file is configured properly to find these directories.

3. Migrate the data

On the elected instance

Using the 4.x Neo4j Admin tool, migrate the data store of your 3.5 Neo4j. The neo4j-admin copy command also removes any inconsistent nodes, properties, and relationships and does not copy them to the newly created store.

  1. From the <neo4j-home> folder, run the following command to copy the data store. You need to specify the old store location and the name for the target updated database:

    bin/neo4j-admin copy --from-path=/path/to/3.5.x/graph.db --to-database=<db_name>
    Starting to copy store, output will be saved to:  $neo4j_home/logs/neo4j-admin-copy-2020-11-26.16.07.19.log
    2020-11-26 16:07:19.939+0000 INFO [StoreCopy] ### Copy Data ###
    2020-11-26 16:07:19.940+0000 INFO [StoreCopy] Source: /path/to/3.5.x/graph.db (page cache 8m)
    2020-11-26 16:07:19.940+0000 INFO [StoreCopy] Target:  $neo4j_home/data/databases/db_name (page cache 8m)
    2020-11-26 16:07:19.940+0000 INFO [StoreCopy] Empty database created, will start importing readable data from the source.
    2020-11-26 16:07:21.661+0000 INFO [o.n.i.b.ImportLogic] Import starting
    
    Import starting 2020-11-26 16:07:21.699+0000
      Estimated number of nodes: 50.00 k
      Estimated number of node properties: 50.00 k
      Estimated number of relationships: 0.00
      Estimated number of relationship properties: 50.00 k
      Estimated disk space usage: 2.680MiB
      Estimated required memory usage: 8.598MiB
    
    (1/4) Node import 2020-11-26 16:07:22.220+0000
      Estimated number of nodes: 50.00 k
      Estimated disk space usage: 1.698MiB
      Estimated required memory usage: 8.598MiB
    .......... .......... .......... .......... ..........   5% ∆239ms
    .......... .......... .......... .......... ..........  10% ∆1ms
    .......... .......... .......... .......... ..........  15% ∆1ms
    .......... .......... .......... .......... ..........  20% ∆0ms
    .......... .......... .......... .......... ..........  25% ∆1ms
    .......... .......... .......... .......... ..........  30% ∆0ms
    .......... .......... .......... .......... ..........  35% ∆0ms
    .......... .......... .......... .......... ..........  40% ∆1ms
    .......... .......... .......... .......... ..........  45% ∆0ms
    .......... .......... .......... .......... ..........  50% ∆1ms
    .......... .......... .......... .......... ..........  55% ∆0ms
    .......... .......... .......... .......... .........-  60% ∆51ms
    .......... .......... .......... .......... ..........  65% ∆0ms
    .......... .......... .......... .......... ..........  70% ∆0ms
    .......... .......... .......... .......... ..........  75% ∆1ms
    .......... .......... .......... .......... ..........  80% ∆0ms
    .......... .......... .......... .......... ..........  85% ∆0ms
    .......... .......... .......... .......... ..........  90% ∆1ms
    .......... .......... .......... .......... ..........  95% ∆0ms
    .......... .......... .......... .......... .......... 100% ∆0ms
    
    (2/4) Relationship import 2020-11-26 16:07:22.543+0000
      Estimated number of relationships: 0.00
      Estimated disk space usage: 1006KiB
      Estimated required memory usage: 15.60MiB
    (3/4) Relationship linking 2020-11-26 16:07:22.879+0000
      Estimated required memory usage: 7.969MiB
    (4/4) Post processing 2020-11-26 16:07:23.272+0000
      Estimated required memory usage: 7.969MiB
    -......... .......... .......... .......... ..........   5% ∆356ms
    .......... .......... .......... .......... ..........  10% ∆0ms
    .......... .......... .......... .......... ..........  15% ∆1ms
    .......... .......... .......... .......... ..........  20% ∆0ms
    .......... .......... .......... .......... ..........  25% ∆0ms
    .......... .......... .......... .......... ..........  30% ∆1ms
    .......... .......... .......... .......... ..........  35% ∆0ms
    .......... .......... .......... .......... ..........  40% ∆0ms
    .......... .......... .......... .......... ..........  45% ∆1ms
    .......... .......... .......... .......... ..........  50% ∆0ms
    .......... .......... .......... .......... ..........  55% ∆0ms
    .......... .......... .......... .......... ..........  60% ∆0ms
    .......... .......... .......... .......... ..........  65% ∆1ms
    .......... .......... .......... .......... ..........  70% ∆0ms
    .......... .......... .......... .......... ..........  75% ∆0ms
    .......... .......... .......... .......... ..........  80% ∆0ms
    .......... .......... .......... .......... ..........  85% ∆0ms
    .......... .......... .......... .......... ..........  90% ∆0ms
    .......... .......... .......... .......... ..........  95% ∆1ms
    .......... .......... .......... .......... .......... 100% ∆0ms
    
    
    IMPORT DONE in 2s 473ms.
    Imported:
      1 nodes
      0 relationships
      1 properties
    Peak memory usage: 15.60MiB
    2020-11-26 16:07:24.140+0000 INFO [o.n.i.b.ImportLogic] Import completed successfully, took 2s 473ms. Imported:
      1 nodes
      0 relationships
      1 properties
    2020-11-26 16:07:24.668+0000 INFO [StoreCopy] Import summary: Copying of 100704 records took 4 seconds (25176 rec/s). Unused Records 100703 (99%) Removed Records 0 (0%)
    2020-11-26 16:07:24.669+0000 INFO [StoreCopy] ### Extracting schema ###
    2020-11-26 16:07:24.669+0000 INFO [StoreCopy] Trying to extract schema...
    2020-11-26 16:07:24.920+0000 INFO [StoreCopy] ... found 1 schema definitions. The following can be used to recreate the schema:
    2020-11-26 16:07:24.922+0000 INFO [StoreCopy]
    
    CALL db.createIndex('index_5c0607ad', ['Person'], ['name'], 'native-btree-1.0', {`spatial.cartesian-3d.min`: [-1000000.0, -1000000.0, -1000000.0],`spatial.cartesian.min`: [-1000000.0, -1000000.0],`spatial.wgs-84.min`: [-180.0, -90.0],`spatial.cartesian-3d.max`: [1000000.0, 1000000.0, 1000000.0],`spatial.cartesian.max`: [1000000.0, 1000000.0],`spatial.wgs-84-3d.min`: [-180.0, -90.0, -1000000.0],`spatial.wgs-84-3d.max`: [180.0, 90.0, 1000000.0],`spatial.wgs-84.max`: [180.0, 90.0]})
    2020-11-26 16:07:24.923+0000 INFO [StoreCopy] You have to manually apply the above commands to the database when it is stared to recreate the indexes and constraints. The commands are saved to $neo4j_home/logs/neo4j-admin-copy-2020-11-26.16.07.19.log as well for reference.

    When using the direct path, indexes are not automatically migrated so you have to recreate them. After running the store migration, the neo4j-admin copy command extracts the schema and generates a list of commands you can later use to recreate your schema on the new 4.x store. The recreate schema commands are also saved in the migration log file, located in the /logs directory.

4. Prepare for seeding the cluster

On the elected instance

Use neo4j-admin dump to make a dump of your newly migrated database and transactions.

bin/neo4j-admin dump --database=neo4j --to=$BACKUP_DESTINATION/neo4j.dump

Be aware that after you migrate, Neo4j Admin commands can differ slightly because Neo4j now supports multiple databases.

Do not yet start the server.

5. Seed the cluster

If you are migrating to a version of Neo4j prior to 4.3 and your migrated database is set as the default database in neo4j.conf, you should copy the migrated database directory from the elected instance to all other instances to seed the cluster. This step is required so that all instances have the same copy of the database when the database is started. If the migrated database is not the default database and the Neo4j version is 4.3+, this step is not required.

  1. Copy the dump to the remaining instances.

  2. Use neo4j-admin load --from=<archive-path> --database=<db_name> --force to replace each of your databases with the one migrated on the elected instance:

    bin/neo4j-admin load --from=$BACKUP_DESTINATION/neo4j.dump --database=neo4j --force

6. Start the cluster

On each cluster member, including the elected instance

Before continuing, make sure the following activities happened and were completed successfully:

  • Content of neo4j.conf is correct and required changes were applied on all instances.

  • Single migration event occurred on elected instance.

  • Backup (via neo4j-admin dump) of migrated store performed on the elected instance.

  • Backup of the migrated store was transferred to the remaining instances.

  • Store was loaded on the remaining instances (via neo4j-admin load).

  1. If everything on the list was successful, you can go ahead and start all instances of the cluster.

    bin/neo4j start

    or

    systemctl start neo4j
  2. If the migrated database is the default database, it should have been started automatically on instance startup and this step is not required. If the migrated database is not the default database, it is still in the STOPPED state. You now need to start the database. Run the following command in Neo4j Browser or Cypher Shell:

    Neo4j 4.0/4.1/4.2
    CREATE DATABASE <db_name>;
    Neo4j 4.3+
    CREATE DATABASE <db_name> OPTIONS { existingData : 'use', existingDataSeedInstance: '<seedInstanceId>'};

    Where <seedInstanceId> is the ID of the elected instance, which can be found by calling CALL dbms.cluster.overview().

7. Recreate indexes

The final step is to recreate any indexes or constraints that were output by the neo4j-admin copy command. Changing the active database to the newly migrated one and run the command output by the neo4j-admin copy:

CALL db.createIndex('index_5c0607ad', ['Person'], ['name'], 'native-btree-1.0', {`spatial.cartesian-3d.min`: [-1000000.0, -1000000.0, -1000000.0],`spatial.cartesian.min`: [-1000000.0, -1000000.0],`spatial.wgs-84.min`: [-180.0, -90.0],`spatial.cartesian-3d.max`: [1000000.0, 1000000.0, 1000000.0],`spatial.cartesian.max`: [1000000.0, 1000000.0],`spatial.wgs-84-3d.min`: [-180.0, -90.0, -1000000.0],`spatial.wgs-84-3d.max`: [180.0, 90.0, 1000000.0],`spatial.wgs-84.max`: [180.0, 90.0]})

8. Post-migration

8.1. Recreate user data

Neo4j 3.5.x stores user and roles information in a flat file located under $NEO4J_HOME/data/dbms directory. Starting with Neo4j 4.0, this information is stored instead on the system database. If you were using native users, you need to recreate them. Go to the backed-up content of your old $NEO4J_HOME/data/dbms directory. The authentication data is found in the auth file, which is a column separated CSV file looking like this:

neo4j:SHA256,1066956C2D4E46C810CA39AE218AAD128854F2C08E9E831C379958CBFA6FF17D,899F9D67F2
96746766848D92B325B29EAFD9AC93940257713BA7CF4CF2B166FF:

The first column contains the username, the second column the password information. User can be recreated using the CREATE USER statement against the system database, such as:

CREATE USER neo4j SET ENCRYPTED PASSWORD
‘0,1066956C2D4E46C810CA39AE218AAD128854F2C08E9E831C379958CBFA6FF17D,899F9D67F29
6746766848D92B325B29EAFD9AC93940257713BA7CF4CF2B166FF’ CHANGE NOT REQUIRED

Where the string SHA-256 is replaced by the character 0 (zero).

The role data is found in the roles files, looking like this:

admin:neo4j

This can be recreated by running the following, again against the system database:

GRANT ROLE admin TO neo4j

You can use Neo4j to parse the auth and roles files. This will process the files and generate all CREATE USER and GRANT ROLE commands required to recreate all users and roles. To do this, you simply need to move both your backed-up auth and roles files to Neo4j’s /import directory. After that you can use the following two queries, one for users and the other for roles:

Recreate all users
LOAD CSV FROM 'file:///auth' as line
with split(line[0], ":")[0] as user, split(line[2], ":") as hash
with user, hash[0] as pwd, CASE hash[1] WHEN "" THEN "NOT" ELSE "" END as
pwdChange
with "CREATE OR REPLACE USER "+user+" SET ENCRYPTED PASSWORD '0,"+pwd+"' CHANGE
"+pwdChange+" REQUIRED" as cypher
return *
Recreate all roles
LOAD CSV FROM 'file:///roles' as line FIELDTERMINATOR ':'
WITH line[0] as role, split(line[1],",") as users
UNWIND users as user
with "GRANT ROLE "+role+" TO "+user as cypher
return *

Each of these queries returns a list of Cypher commands which, when executed against the system database, recreates all users and roles previously used in the Neo4j 3.5.x deployment.

8.2. Review the logs and metrics

It is advisable to review the logs and metrics to make sure everything looks good. All things going well, you should see error free logs and correctly reported metrics.

8.3. Restart the server/cluster

It is advisable to restart the server/cluster one last time just to clear everything and assume the last configuration changes.

8.4. Reactivate external applications connecting to Neo4j

After the restart and confirmation that everything was successfully migrated and healthy, you can proceed to reactivate any applications you have connecting to Neo4j. At this point, the Neo4j store migration is complete, and you need to focus on the application side, making sure that all your requests are being served and your application is on a healthy state.

8.5. Clean up space

You can clean up the disk space taken by the extra backups required for the migration.

8.6. Back up

It is recommended to perform a full backup, using an empty target directory.