Concepts

A composite database is a special type of database. Composite databases are the means to access this partitioned data or graphs with a single Cypher query.

Composite databases do not store data independently. They contain database aliases that target the local or remote databases (the so-called constituents). Local database aliases target databases within the same DBMS, while remote database aliases target databases from another Neo4j DBMS. For more information, see Managing database aliases in composite databases.

Composite databases are managed using Cypher administrative commands. For a detailed example of how to create a Composite database and add database aliases to it, see Set up and query composite databases.

Composite databases cannot guarantee compatibility between constituents from different Neo4j versions. Constituents from versions without breaking changes should work fine, apart from newly-added features. If a new feature is introduced, its availability will be limited to the intersection of features available on the DBMS where the composite database is defined, as well as all of its constituents.

Composite databases have the following characteristics:

Provide access to graphs found on other databases (local or remote).
Do not store data independently.
Can be deployed in standalone and cluster deployments.
Managed using Cypher commands, such as CREATE COMPOSITE DATABASE and CREATE ALIAS.
You can shard an existing database with the help of the neo4j-admin copy command. See Sharding data with the copy command for details.
Use the existing user for local constituents or the user credentials defined by the remote aliases for remote consituents.
Do not support privileges, index, and constraint management commands. These must be defined on the constituent target database in the respective DBMS.
Allow only transactions with queries that read from multiple graphs, or read from multiple graphs and write to a single graph.
Do not support Neo4j embedded in Java applications. Composite databases can be used only in a typical client/server mode when users connect to a Neo4j DBMS from their client application or tool via Bolt or HTTP protocol.

The main concepts that are relevant to understand when working with Composite databases are:

Data Federation: Data federation is when your data is in two disjoint graphs with different labels and relationship types. For example, you have data about users with their location and data about the users' posts on different forums, and you want to query them together.
Data Sharding: Data sharding is when you have two graphs that share the same model (same labels and relationship types) but contain different data. For example, you can deploy shards on separate servers, splitting the load on resources and storage. Or, you can deploy shards in different locations, to be able to manage them independently or split the load on network traffic. An existing database can be sharded with the help of the neo4j-admin database copy command. For an example, see Sharding data with the copy command.
Connecting data across graphs: Because relationships cannot span across graphs, to query your data, you have to federate the graphs by using a proxy node modeling pattern, where nodes with a specific label must be present in both federated domains.

In one of the graphs, nodes with that specific label contain all the data related to that label, while in the other graph, the same label is associated with a proxy node that only contains the <node>ID property. The <node>ID property allows you to link data across the graphs in this federation.

Figure 1. Data federation and sharding

For a step-by-step tutorial on setting up and using a Composite database with federated and sharded data, see Tutorials → Setting up and using a composite database.