Metadata

Catalog and schema

All the methods on connection level dealing with metadata and information about the available content provide information in terms that are defined in the SQL standard, including catalogs and schemas.

In most relational databases, a catalog is equivalent to a specific database on a server or cluster, and the schema refers to the collection of tables, views and procedures in that catalog.

From the SQL 1992 standard (find an archived copy here):

(4.12) Catalogs are named collections of schemas in an SQL-environment. An SQL-environment contains zero or more catalogs. A catalog contains one or more schemas, but always contains a schema named INFORMATION_SCHEMA that contains the views and domains of the Information Schema.

This driver only supports a single catalog, which is equal to the Neo4j database to which the driver is connected to. Filtering on a catalog pattern is not supported in metadata queries, all values except literal null, the empty or blank string or the name of the current database will lead to an exception.

The same standard defines schemas as follows:

(4.11) An SQL-schema is a persistent descriptor that includes:

[…] the descriptor of every component of the SQL-schema.

In this International Standard, the term "schema" is used only in the sense of SQL-schema. Each component descriptor is either a domain descriptor, a base table descriptor, a view descriptor, an assertion descriptor, a privilege descriptor, a character set descriptor, a collation descriptor, or a translation descriptor. The persistent objects described by the descriptors are said to be owned by or to have been created by the <authorization identifier> of the schema.

We report the literal name public as schema for any result component of a metadata result set. We support querying for objects in any schema, however only literal null or public will potentially produce a result.

Labels will be reported as table objects with the TABLE_TYPE being literal TABLE.

Summary

  • Catalog: Always equals to the current database; filtering on anything else except an empty string will error.

  • Schema: Always public; filtering on public and literal null will yield result, anything else won’t.

  • Table descriptors: Reported as TABLE in the TABLE_TYPE column.

Labels to tables

The CLG and Langstar groups speak about "Node type combinations" and gravitate towards "open node type semantics" in the GQL standard:

Node type combinations is a less permissive form of open node type semantics. The idea of node type combinations is that nodes also conform to a graph type if they are not of one of the node types in the node type set of the graph type, but of a node type that is an intersection of (a subset of) the node type in a node type set of the graph type.

An example for their proposal can be found here.

This driver therefore compute node types in a similar way:

  • Single label nodes will map naturally to a table name, the single label will become the table name

    • The label name will be taken as is and will be case-sensitive. The labels Movie, movie, MOVIE will result in three tables in the metadata

    • This is in line with the default SQL-to-Cypher translation

  • Node type combinations will map to table names composed as label1_label2, sorting the labels alphabetically to make them independent of the order Neo4j returns them

  • Property sets for these node type combinations will then be computed

Primary keys

The driver uses available constraint information and will figure out if there is a single, unique constraint on a label. If that’s the case, the constrained property will be assumed to be the primary key. This will also work for unique constraints over multiple properties, in SQL lingo composite primary keys. If there is no unique constraint or more than one, we assume the v$id virtual columns for the elementId value to be primary keys.