SQL to Cypher translation

Introduction

The translation of queries from SQL to Cypher® is an optional feature of this driver and it consists of two parts:

  • The translator SPI, located in the module org.neo4j:neo4j-jdbc-translator-spi. It consists of two interfaces: SqlTranslatorFactory and the actual SqlTranslator.

  • A concrete implementation of this SPI, published as org.neo4j:neo4j-jdbc-translator-impl.

The latter is covered in "Using the default translator" and available in the "full bundle", described in Available bundles. The former is provided for two reasons: it allows us to distribute the driver with and without the bundled, default translator and allows you to run your custom translator.

Translators can be chained, and there can be as many translators on the classpath as you want. Their precedence is configurable, with our default implementation having the lowest precedence. Thus, you can for example have a custom translator that takes care of a fixed set of queries and, if it receives a query it cannot translate, it will pass it down to our implementation.

Translating arbitrary SQL queries to Cypher is an opinionated task, as there is no right way to map table names to graph objects: a table name can be used as-is as a label, or may be transformed into a singular form, etc. Mapping relationships is even trickier: should relationship types derive from a join table, a join column (in that case, which one?), or a foreign key?

We believe our assumptions are appropriate for various use cases and instead of providing configuration to cater for all scenarios, we offer the possibility to write your own translation layer. The driver will use the standard Java service loader mechanism to find an implementation of the SPI on the module- or classpath.

Some tools (like Tableau) use a class-loader that won’t let the driver use the standard Java service loader mechanism. For these scenarios, we provide an additional configuration property named translatorFactory. Set this to DEFAULT to directly load our default implementation or to a fully-qualified classname for any other factory. Be aware that either our default implementation or your custom one must be on the classpath.

Translating SQL to Cypher

There’s only one requirement to enable the SQL-to-Cypher translation: you have to have one module implementing the SPI on the classpath. This is automatically the case if you use the full-bundle (org.neo4j:neo4j-jdbc-full-bundle). In that case, you don’t need to add any other dependency. If you use the individual distribution or the "small" bundle org.neo4j:neo4j-jdbc-bundle, you must add the artifact org.neo4j:neo4j-jdbc-translator-impl.

The implementation will be automatically loaded. If you use the translation on a case-by-case basis, it will be lazily loaded (i.e no additional classes are touched or loaded into memory). If you configure automatic translation for all statements, the implementation will be eagerly loaded. There are no further configuration options with regard to loading the implementation.

On a case-by-case basis

The translator can be used on a case-by-case basis through the official JDBC API nativeSQL, which you find on the java.sql.Connection class. With the following imports:

import java.sql.DriverManager;
import java.sql.SQLException;
import java.util.Properties;
import java.util.logging.Logger;

You just pass your SQL statement to nativeSQL and you will get Cypher back:

try (var connection = DriverManager.getConnection(url, username, password)) {
    var sql = connection.nativeSQL("SELECT * FROM Movie n");
    assert """
            MATCH (n:Movie)
            RETURN *""".equals(sql);
}

For all queries

If you open the connection to your Neo4j instance using enableSQLTranslation=true either as URL parameter or configuration property, all statements will be translated from SQL to Cypher. If you configure the driver in this way, the translator will be eagerly loaded.

var url = "jdbc:neo4j://localhost:7687?enableSQLTranslation=true";
try (var connection = DriverManager.getConnection(url, username, password);
        var stmnt = connection.createStatement();
        var result = stmnt.executeQuery("SELECT n.title FROM Movie n")) {
    while (result.next()) {
        LOGGER.info(result.getString("n.title"));
    }
}

Sometimes you may need to fall back to Cypher for some statements, either to use constructs that you cannot express with SQL, or because our default translator cannot handle your query. We offer a special comment that you can use as a hint in your statement to stop automatic translation: /*+ NEO4J FORCE_CYPHER */.

var url = "jdbc:neo4j://localhost:7687?enableSQLTranslation=true";
var query = """
        /*+ NEO4J FORCE_CYPHER */
        MATCH (:Station { name: 'Denmark Hill' })<-[:CALLS_AT]-(d:Stop)
            ((:Stop)-[:NEXT]->(:Stop)){1,3}
            (a:Stop)-[:CALLS_AT]->(:Station { name: 'Clapham Junction' })
        RETURN localtime(d.departs) AS departureTime,
            localtime(a.arrives) AS arrivalTime
        """;
try (var connection = DriverManager.getConnection(url, username, password);
        var stmnt = connection.createStatement();
        var result = stmnt.executeQuery(query)) {
    while (result.next()) {
        LOGGER.info(result.getTime("departureTime").toString());
    }
}

Possible error scenarios

A NoSuchElementException with a message of No SQL translators available will be thrown when there is no implementation of the SQL to Cypher translator available, and you either used java.sql.Connection.nativeSQL or enabled automatic translation. The exception will be thrown when you access the method or eagerly on opening a connection in the latter case.

Using the default translator

Supported SQL dialects

Our default translator uses the OSS parser from jOOQ, which supports a broad spectrum of SQL dialects already. We picked the generic, default dialect of jOOQ as our default dialect, but you can overwrite this in the SQL to Cypher configuration using the parameter s2c.sqlDialect with one of the supported dialects listed in configuration below. POSTGRES can be a good choice for several integrations.

Bear in mind though that any shortcomings in the translation are probably not due to a lack in the parser, but due to the lack of an obvious, semantically equivalent Cypher construct. That means we might be able to parse a certain piece of SQL, but are unable to translate in into something meaningful that Neo4j can understand without additional, contextual information.

Configuration

The default implementation provides a number of configuration settings. They must be prefixed with s2c in the URL or config options:

Name Meaning Default

parseNameCase

Whether to parse table names as is or not.

true

tableToLabelMappings

A map from table names to labels.

An empty map

joinColumnsToTypeMappings

A map from column names to relationship types.

An empty map

prettyPrint

Whether to format the generated Cypher or not.

true

alwaysEscapeNames

Whether to always escape names.

Unless explicitly configured false when pretty printing is on, otherwise true.

sqlDialect

Which dialect to use when parsing. Supported values are POSTGRES, SQLITE, MYSQL, H2, HSQLDB, DERBY and DEFAULT.

DEFAULT

The next few examples use the properties config to avoid terrible long URLs in this documentation, but all the attributes can be specified via URL as well.

Listing 1. Disable pretty printing; only escape if necessary; configure dedicated table mappings
var properties = new Properties();
properties.put("username", "neo4j");
properties.put("password", "verysecret");
properties.put("enableSQLTranslation", "true");
properties.put("s2c.prettyPrint", "false");
properties.put("s2c.alwaysEscapeNames", "false");
properties.put("s2c.tableToLabelMappings", "people:Person;movies:Movie;movie_actors:ACTED_IN");

var url = "jdbc:neo4j://localhost:7687";
var query = """
        SELECT p.name, m.title
        FROM people p
        JOIN movie_actors r ON r.person_id = p.id
        JOIN movies m ON m.id = r.person_id""";
try (var connection = DriverManager.getConnection(url, properties)) {
    var sql = connection.nativeSQL(query);
    assert "MATCH (p:Person)-[r:ACTED_IN]->(m:Movie) RETURN p.name, m.title".equals(sql);
}
Listing 2. Parse table names into upper case
var properties = new Properties();
properties.put("username", "neo4j");
properties.put("password", "verysecret");
properties.put("enableSQLTranslation", "true");
properties.put("s2c.parseNameCase", "UPPER");

var url = "jdbc:neo4j://localhost:7687";
var query = "SELECT * FROM people";
try (var connection = DriverManager.getConnection(url, properties)) {
    var sql = connection.nativeSQL(query);
    assert """
            MATCH (people:PEOPLE)
            RETURN *""".equals(sql);
}

Named parameter syntax in the SQL parser defaults to :name (such as supported by Oracle, JPA, Spring, a colon followed by a name). The following example changes that prefix to $ (the same prefix that Cypher uses):

Listing 3. Change parameters prefix and add mappings for join columns
var properties = new Properties();
properties.put("username", "neo4j");
properties.put("password", "verysecret");
properties.put("enableSQLTranslation", "true");
properties.put("s2c.parseNamedParamPrefix", "$");
properties.put("s2c.joinColumnsToTypeMappings", "people.movie_id:DIRECTED");

var url = "jdbc:neo4j://localhost:7687";
var query = """
        SELECT *
        FROM people p
        JOIN movies m ON m.id = p.movie_id
        WHERE p.name = $1
        """;
try (var connection = DriverManager.getConnection(url, properties)) {
    var sql = connection.nativeSQL(query);
    assert """
            MATCH (p:people)-[:DIRECTED]->(m:movies)
            WHERE p.name = $1
            RETURN *""".equals(sql);
}

This is helpful when a tool generates names like that and does not allow customization.

Supported statements

The following statements are all under tests and describe what you can expect from the default translation layer:

Translation concepts

Table names to labels

The most simple SELECT statement to translate is one without FROM clause, such as:

SELECT 1

It is equivalent to a Cypher RETURN:

RETURN 1

SELECT statements without JOIN clauses are pretty straightforward to translate. The challenge here is how to map the table name to labels:

  • We parse the SQL statement case-sensitive by default

  • Table names are mapped to node labels

  • Table aliases are used as identifiable symbolic names

SELECT t.a, t.b
FROM My_Table (1)
  AS t (2)
WHERE t.a = 1
1 Will be used as the label to match, as-is, i.e. My_Table
2 The table alias will become the node alias

The whole construct will be translated to

MATCH (t:My_Table)
WHERE t.a = 1
RETURN t.a, t.b

Table aliases are optional, if you omit them, we derive aliases from the labels and types. If you inspect the translated queries, we recommend using aliases, as this makes the queries better readable.

Star-Selects

A star- or *- select comes in different forms:

Unqualified

SELECT * FROM table

Qualified

SELECT t.* FROM table t

And a variant, selecting the relation itself: SELECT t FROM table t.

We make use of this to let you decide whether you want to return Neo4j nodes and relationships as entities, maps, or flattened to individual columns. The latter requires our translator to have access to the schema of the underlying Neo4j database. The following sections describe the use-cases.

Projecting individual properties

Don’t use a star-select but enumerate the properties:

SELECT m.title FROM Movie m

The table alias will be used as a symbolic name:

MATCH (m:Movie)
RETURN m.title;

You can omit the table alias:

SELECT title FROM Movie

The lower-cased table name will be the symbolic name:

MATCH (movie:Movie)
RETURN movie.title;

Accessing JDBC columns by name leads to code that is hard to maintain, as column renamings impact code as well. To avoid this, alias the column:

SELECT title AS title FROM Movie

So that it has a stable, well-known name:

MATCH (movie:Movie)
RETURN movie.title AS title;
Projecting all properties

A SELECT * statement gets translated differently depending on whether the connection to the Neo4j database is available.

SELECT * FROM Person p

If you are offline, you will get the following Cypher statement:

MATCH (p:Person) RETURN *

The above query will return one column (p) containing a Neo4j node. This is usually not what you expect in the relational world. If you run the translation online and Neo4j metadata can be retrieved, you will get a statement that flattens the properties of each node and relationship, plus their element IDs:

In case the Person node has properties born and name,

SELECT * FROM Person p

you will get this Cypher statement

MATCH (p:Person)
RETURN elementId(p) AS `v$id`,
       p.born AS born, p.name AS name

This works well with multiple tables as well (Movie has properties title and released):

SELECT * FROM Person p JOIN Movie m ON m.id = p.acted_in
MATCH (p:Person)-[acted_in:ACTED_IN]->(m:Movie)
RETURN elementId(p) AS `v$id`, p.born AS born, p.name AS name,
       elementId(m) AS `v$id1`, m.title AS title, m.released AS released

We append increasing numbers to column names to clashing ones (ex. with name and remark properties both in Movie and Person):

SELECT * FROM Person p JOIN Movie m ON m.id = p.acted_in

Note the increasing numbers per duplicate name:

MATCH (p:Person)-[acted_in:ACTED_IN]->(m:Movie)
RETURN elementId(p) AS `v$id`,
       p.born AS born, p.name AS name, p.remark AS remark,
       elementId(m) AS `v$id1`,
       m.name AS name1, m.released AS released, m.remark AS remark1

The following example uses a join-table to access relationships (we explain this later in this manual when discussing joins), and the flattening of properties works here as well:

SELECT *
FROM people p
JOIN movie_actors r ON r.person_id = p.id
JOIN movies m ON m.id = r.person_id
MATCH (p:Person)-[r:ACTED_IN]->(m:Movie)
RETURN elementId(p) AS `v$id`,
       p.born AS born, p.name AS name,
       elementId(p) AS `v$person_id`,
       elementId(r) AS `v$id1`, r.role AS role,
       elementId(m) AS `v$movie_id`,
       elementId(m) AS `v$id2`,
       m.title AS title, m.released AS released

Listing 4. Ordering without specifying a table alias
SELECT * FROM Person p ORDER BY name ASC
MATCH (p:Person)
RETURN elementId(p) AS `v$id`,
       p.born AS born, p.name AS name
ORDER BY p.name

A qualified alias can be used as well. If no Neo4j metadata is available, you will get a map of properties of the node/relationship:

SELECT m.*, p.*
FROM Person p
JOIN Movie m ON m.id = p.acted_in

The corresponding columns must be downcast to a map in JDBC:

MATCH (p:Person)-[acted_in:ACTED_IN]->(m:Movie)
RETURN m{.*} AS m, p{.*} AS p

If we add more data (ex. born and name to Person), the qualified star will project all of them (note how we also project one single, known column from the Movie table):

SELECT p.*, m.title AS title
FROM Person p
JOIN Movie m ON m.id = p.acted_in
MATCH (p:Person)-[acted_in:ACTED_IN]->(m:Movie)
RETURN elementId(p) AS `v$id`, p.born AS born, p.name AS name, m.title AS title
Returning nodes and relationships

A statement that projects a table alias such as

SELECT m FROM Movie m

will result in a Cypher statement returning the matched node as node.

MATCH (m:Movie)
RETURN m;

A node can be aliased as well:

SELECT m AS node FROM Movie m
MATCH (m:Movie)
RETURN m AS node;

Un-aliased tables can be used as well:

SELECT movie FROM Movie
MATCH (movie:Movie)
RETURN movie;

Multiple entities are supported, too:

SELECT p, r, m FROM Person p
JOIN ACTED_IN r ON r.person_id = p.id
JOIN Movie m ON m.id = r.movie_id
MATCH (p:Person)-[r:ACTED_IN]->(m:Movie) RETURN p, r, m

Comparing SQL with Cypher examples

The source of the following examples is: Comparing SQL with Cypher.

Find all Products
Select and Return Records

Select everything from the products table.

SELECT p.*
FROM products as p

Similarly, in Cypher, you just MATCH a simple pattern: all nodes with the label Product and RETURN them.

MATCH (p:Product)
RETURN p{.*} AS p

The above query will project all properties of the matched node. If you want to return the node itself, select it without using the asterisk:

SELECT p
FROM products as p
MATCH (p:Product)
RETURN p
Field Access, Ordering and Paging

It is more efficient to return only a subset of attributes, like ProductName and UnitPrice. And while we are at it, let’s also order by price and only return the 10 most expensive items. (Remember that labels, relationship-types and property-names are case sensitive in Neo4j.)

SELECT p.`productName`, p.`unitPrice`
FROM products as p
ORDER BY p.`unitPrice` DESC
LIMIT 10
MATCH (p:Product)
RETURN p.productName, p.unitPrice ORDER BY p.unitPrice DESC LIMIT 10

The default order direction will be translated as is:

SELECT * FROM Movies m ORDER BY m.title
MATCH (m:Movies)
RETURN * ORDER BY m.title
DISTINCT projections

The DISTINCT keyword for projections is handled:

SELECT DISTINCT m.released FROM Movies m
MATCH (m:Movies)
RETURN DISTINCT m.released

It works with * projections as well:

SELECT DISTINCT m.* FROM Movies m
MATCH (m:Movies)
RETURN DISTINCT m {.*} AS m

However, as the qualified asterisks will use metadata if available, the translation with a database connection is different:

SELECT DISTINCT m.* FROM Movies m
MATCH (m:Movies)
RETURN DISTINCT elementId(m) AS `v$id`, m.title AS title, m.released AS released

Note that each row includes the Neo4j element ID, making each row unique. This being said, the DISCTINCT clause is of limited use with the asterisk.

Expressions

Most SQL expressions have corresponding Cypher expressions and can be translated straightforward.

Literal Values

Literal values are 1:1 translations.

SELECT
    1, TRUE, FALSE, NULL, 'a'
RETURN 1, TRUE, FALSE, NULL, 'a'

Arithmetic expressions

Arithmetic expressions are 1:1 translations.

SELECT
    1 + 2,
    1 - 2,
    1 * 2,
    1 / 2,
    square(2)

Note that the underlying tech of the default translator uses Cypher-DSL internally, which will wrap arithmetic (and logical) expressions with parentheses:

RETURN
    (1 + 2),
    (1 - 2),
    (1 * 2),
    (1 / 2),
    (2 * 2)

Functions

Numeric functions

We can translate all numeric functions that are supported by Neo4j’s Cypher implementation: Mathematical functions - Numeric:

SELECT
    abs(1),
    ceil(1),
    floor(1),
    round(1),
    round(1, 1),
    sign(1)

will be translated to

RETURN
    abs(1),
    ceil(1),
    floor(1),
    round(1),
    round(1, 1),
    sign(1)
Logarithmic functions

Neo4j supports a broad range of logarithmic functions.

SELECT
    exp(1),
    ln(1),
    log(2, 1),
    log10(1),
    sqrt(1)

will be translated to

RETURN
    exp(1),
    log(1),
    (log(1) / log(2)),
    log10(1),
    sqrt(1)
Trigonometric functions

Calls to trigonometric functions

SELECT
    acos(1),
    asin(1),
    atan(1),
    atan2(1, 2),
    cos(1),
    cot(1),
    degrees(1),
    pi(),
    radians(1),
    sin(1),
    tan(1)

will be translated to the corresponding Neo4j functions:

RETURN
    acos(1),
    asin(1),
    atan(1),
    atan2(1, 2),
    cos(1),
    cot(1),
    degrees(1),
    pi(),
    radians(1),
    sin(1),
    tan(1)
String functions

The following string manipulations are guaranteed to work:

SELECT
    lower('abc'),
    cast(3 as varchar),
    trim(' abc '),
    length('abc'),
    left('abc', 2),
    ltrim(' abc '),
    replace('abc', 'b'),
    replace('abc', 'b', 'x'),
    reverse('abc'),
    right('abc', 2),
    rtrim(' abc '),
    substring('abc', 2 - 1),
    substring('abc', 2 - 1, 2),
    upper('abc')

and will be translated to Neo4j’s versions:

RETURN
    toLower('abc'),
    toString(3),
    trim(' abc '),
    size('abc'),
    left('abc', 2),
    ltrim(' abc '),
    replace('abc', 'b', NULL),
    replace('abc', 'b', 'x'),
    reverse('abc'),
    right('abc', 2),
    rtrim(' abc '),
    substring('abc', (2 - 1)),
    substring('abc', (2 - 1), 2),
    toUpper('abc')
Scalar functions

The input

SELECT
    coalesce(1, 2),
    coalesce(1, 2, 3),
    nvl(1, 2),
    cast('1' as boolean),
    cast(1 as float),
    cast(1 as double precision),
    cast(1 as real),
    cast(1 as tinyint),
    cast(1 as smallint),
    cast(1 as int),
    cast(1 as bigint)

will be translated to (see Scalar functions):

RETURN
    coalesce(1, 2),
    coalesce(1, 2, 3),
    coalesce(1, 2),
    toBoolean('1'),
    toFloat(1),
    toFloat(1),
    toFloat(1),
    toInteger(1),
    toInteger(1),
    toInteger(1),
    toInteger(1)

Query expressions

Several advanced SQL expressions are supported as well.

CASE simple

The simple CASE expressions

SELECT
    CASE 1 WHEN 2 THEN 3 END,
    CASE 1 WHEN 2 THEN 3 ELSE 4 END,
    CASE 1 WHEN 2 THEN 3 WHEN 4 THEN 5 END,
    CASE 1 WHEN 2 THEN 3 WHEN 4 THEN 5 ELSE 6 END
RETURN CASE 1 WHEN 2 THEN 3 END, CASE 1 WHEN 2 THEN 3 ELSE 4 END, CASE 1 WHEN 2 THEN 3 WHEN 4 THEN 5 END, CASE 1 WHEN 2 THEN 3 WHEN 4 THEN 5 ELSE 6 END

See Cypher → Conditional expressions (CASE) for more information.

CASE advanced

And CASE statement using a search:

SELECT
    CASE WHEN 1 = 2 THEN 3 END,
    CASE WHEN 1 = 2 THEN 3 ELSE 4 END,
    CASE WHEN 1 = 2 THEN 3 WHEN 4 = 5 THEN 6 END,
    CASE WHEN 1 = 2 THEN 3 WHEN 4 = 5 THEN 6 ELSE 7 END

will be translated to

RETURN
    CASE WHEN 1 = 2 THEN 3 END,
    CASE WHEN 1 = 2 THEN 3 ELSE 4 END,
    CASE WHEN 1 = 2 THEN 3 WHEN 4 = 5 THEN 6 END,
    CASE WHEN 1 = 2 THEN 3 WHEN 4 = 5 THEN 6 ELSE 7 END

See Cypher → Conditional expressions (CASE) for more information.

CASE abbreviations (which are not COALESCE or NVL)

The input

SELECT
    nullif(1, 2),
    nvl2(1, 2, 3)

will be translated to

RETURN
    CASE WHEN 1 = 2 THEN NULL ELSE 1 END,
    CASE WHEN 1 IS NOT NULL THEN 2 ELSE 3 END

Predicates

As with expressions a lot of logical SQL expressions and conditions used as predicates can be translated straightforward into Cypher predicates.

Conjunctions and disjunctions

Logical conjunctions and disjunctions are all supported.

SELECT 1 FROM p WHERE 1 = 1 AND 2 = 2 OR 3 = 3
MATCH (p:p)
WHERE ((1 = 1
    AND 2 = 2)
  OR 3 = 3)
RETURN 1

The input

SELECT 1 FROM p WHERE NOT 1 = 1 XOR 2 = 2

will be translated to

MATCH (p:p)
WHERE (NOT (1 = 1)
  XOR 2 = 2)
RETURN 1

Operators

Arithmetic operators

The input

SELECT 1 FROM p WHERE 1 = 1 AND 2 > 1 AND 1 < 2 AND 1 <= 2 AND 2 >= 1 AND 1 != 2

will be translated to

MATCH (p:p)
WHERE (1 = 1
  AND 2 > 1
  AND 1 < 2
  AND 1 <= 2
  AND 2 >= 1
  AND 1 <> 2)
RETURN 1
Between

Between in SQL is inclusive

SELECT 1 FROM p WHERE 2 BETWEEN 1 AND 3

and will be translated to (we can’t generate the shorter form (1 ⇐ 2 ⇐ 3) due to restrictions in the underlying generator):

MATCH (p:p)
WHERE (1 <= 2) AND (2 <= 3)
RETURN 1

SQL has a SYMMETRIC keyword for the BETWEEN clause, to indicate that you do not care which bound of the range is larger:

SELECT 1 FROM p WHERE 2 BETWEEN SYMMETRIC 3 AND 1

We translate this into a disjunction:

MATCH (p:p)
WHERE (3 <= 2) AND (2 <= 1) OR (1 <= 2) AND (2 <= 3)
RETURN 1

Logical row value expressions

The above examples are based on scalar expressions. Row value expressions will be translated as well:

SELECT 1
FROM p
WHERE (1, 2) = (3, 4)
OR (1, 2) < (3, 4)
OR (1, 2) <= (3, 4)
OR (1, 2, 3) <> (4, 5, 6)
OR (1, 2, 3) > (4, 5, 6)
OR (1, 2, 3) >= (4, 5, 6)

Leads to a semantically equivalent cypher:

MATCH (p:p)
WHERE 1 = 3 AND 2 = 4
OR (1 < 3 OR 1 = 3 AND 2 < 4)
OR (1 < 3 OR 1 = 3 AND 2 <= 4)
OR (1 != 4 AND 2 != 5 AND 3 != 6)
OR (1 > 4 OR 1 = 4 AND (2 > 5 OR 2 = 5 AND 3 > 6))
OR (1 > 4 OR 1 = 4 AND (2 > 5 OR 2 = 5 AND 3 >= 6))
RETURN 1

Null Handling

For scalar expressions

The input

SELECT 1 FROM p WHERE 1 IS NULL AND 2 IS NOT NULL

will be translated to

MATCH (p:p)
WHERE (1 IS NULL
  AND 2 IS NOT NULL)
RETURN 1
For row value expressions

The input

SELECT 1 FROM p WHERE (1, 2) IS NULL OR (3, 4) IS NOT NULL

will be translated to

MATCH (p:p)
WHERE
  (1 IS NULL AND 2 IS NULL)
  OR (3 IS NOT NULL AND 4 IS NOT NULL)
RETURN 1

LIKE operator

The LIKE operator

SELECT * FROM movies m WHERE m.title LIKE '%Matrix%' OR m.title LIKE 'M_trix'

will be translated into a regular expressions, replacing the % with .*:

MATCH (m:`movies`) WHERE m.title =~ '.*Matrix.*' OR m.title =~ 'M.trix'
RETURN *

Using joins to map relationships

On the surface, joins are relationships materialized in SQL (foreign keys are not). Sadly, it’s not as straightforward to map to Cypher. There are several implementation options:

  • When joining two tables on a column, take the left hand table column, use its name as relationship type, and treat it as outgoing from left to right.

  • When joining two tables with an intersection table (which it’s usually modeled in SQL for m:n relationships with attributes), use the name of that intersection table as relationship type.

We implemented some variants thereof, however we don’t claim their absolute usefulness in all cases.

1:n joins

Natural joins

SQL NATURAL joins are the easiest way to denote relationship names without having to do any mapping. A one-hope NATURAL JOIN will translate to an anonymous, wildcard relationship.

SELECT p, m FROM Person p
NATURAL JOIN Movie m
MATCH (p:Person)-->(m:Movie) RETURN p, m

NATURAL joins can be chained, and the connecting join table does not need to exist. This will be turned into a Neo4j relationship:

SELECT p.name, r.roles, m.* FROM Person p
NATURAL JOIN ACTED_IN r
NATURAL JOIN Movie m
MATCH (p:Person)-[r:ACTED_IN]->(m:Movie)
RETURN p.name, r.roles,
       elementId(m) AS `v$id`, m.title AS title, m.released AS released
Simple join

Assume we configured the translator to use the following table mapping:

  • people mapped to label People

  • movies mapped to label Movie

With that in place, we translate

SELECT p.name, m.title
FROM people p
JOIN movies m ON m.id = p.directed

to

MATCH (p:Person)-[directed:DIRECTED]->(m:Movie)
RETURN p.name, m.title

DIRECTED is the uppercase version of the join column in the left table (p.directed).

We can add a join column mapping in the form of people.movie_id:DIRECTED if we have different column names:

SELECT p.name, m.title
FROM people p
JOIN movies m ON m.id = p.movie_id

to

MATCH (p:Person)-[directed:DIRECTED]->(m:Movie)
RETURN p.name, m.title
Using the ON clause

We used backticks here for the table and column names and no mapping.

SELECT p.name, m.title
FROM `Person` as p
JOIN `Movie` as m ON (m.id = p.`DIRECTED`)

The translation is same as before:

MATCH (p:Person)-[directed:DIRECTED]->(m:Movie)
RETURN p.name, m.title

m:n joins

An intersection table is a table that contains references to two other tables in the form of at least two columns. This construct is usually required in the relational model to create an m:n relationship. Such an auxiliary construct is not necessary in Neo4j. We can model as many outgoing and incoming relationships from one label to another, and they can also have properties. We can thus use that construct for our translator.

The following example uses a configured mapping as follows:

  • people mapped to label People

  • movies mapped to label Movie

  • movie_actors mapped to ACTED_IN

SELECT p.name, m.title
FROM people p (1)
JOIN movie_actors r ON r.person_id = p.id (2)
JOIN movies m ON m.id = r.person_id (3)
1 The table from which to map outgoing relationships
2 An intersection table, that is used again in the next JOIN clause
3 The final JOIN clause

We do no semantic analysis: the order of the joins matter, and will lead to the following query:

MATCH (p:Person)-[r:ACTED_IN]->(m:Movie)
RETURN p.name, m.title

Multiple joins will result in a chain of relationships:

SELECT p.name AS actor, d.name AS director, m.title
FROM people p
 JOIN movie_actors r ON r.person_id = p.id
 JOIN movies m ON m.id = r.person_id
 JOIN movie_directors r2 ON r2.movie_id = m.id
 JOIN people d ON r2.person_id = d.id
MATCH (p:`Person`)-[r:`ACTED_IN`]->(m:`Movie`)<-[r2:`DIRECTED`]-(d:`Person`)
RETURN p.name AS actor, d.name AS director, m.title

Notice how the direction of the DIRECTED relationship is defined by the order of the join columns.

DML statements

This section lists the supported Data-Manipulation-Language (DML) statements. Although a SELECT statement is technically DML as well, it is covered in Translation concepts.

Deleting nodes

Nodes can be deleted via the SQL DELETE statement.

For example, to unconditionally delete all person nodes:

DELETE FROM person
MATCH (person:person)
DELETE person

A WHERE clause can be added to prevent this:

DELETE FROM person
WHERE person.id = 1
MATCH (person:person)
WHERE person.id = 1
DELETE person

If you want to delete everything, but your tooling complains, just add a conditions that is always true:

DELETE FROM person
WHERE true
MATCH (person:person)
WHERE true
DELETE person

Of, the condition can also be that always evaluates to false, never deleting anything:

DELETE FROM person
WHERE false
MATCH (person:person)
WHERE false
DELETE person

Tables can be aliased, and the alias will be used in Cypher, too:

DELETE FROM person p
MATCH (p:person)
DELETE p

Aliasing tables is also supported in combination with specifying the label to which the table name is mapped. Using the same query with table_mappings=person:Person configured,

DELETE FROM person p

will be translated to

MATCH (p:Person)
DELETE p

You can use SQL TRUNCATE to DETACH DELETE nodes.

TRUNCATE TABLE people

will be translated to

MATCH (people:Person)
DETACH DELETE people

Inserting data

A single list of values with explicit columns and constant values can be inserted with a simple INSERT statement:

INSERT INTO People (first_name, last_name, born) VALUES ('Helge', 'Schneider', 1955)
CREATE (people:`Person` {first_name: 'Helge', last_name: 'Schneider', born: 1955})

All expressions, including parameters, are supported. Parameters will be named from 1 on upwards in Cypher.

INSERT INTO People (first_name, last_name, born) VALUES (?, ?, ?)
CREATE (people:`Person` {first_name: $1, last_name: $2, born: $3})

If you omit the column names on the insertion target, we generate names:

INSERT INTO People VALUES ('Helge', 'Schneider', 1955)

Note the unknown field xxx property names:

CREATE (people:`Person` {`unknown field 0`: 'Helge', `unknown field 1`: 'Schneider', `unknown field 2`: 1955})

The SQL VALUES clause actually supports lists of values:

INSERT INTO People (first_name, last_name, born) VALUES
    ('Helge', 'Schneider', 1955),
    ('Bela', 'B', 1962)

Those values will be translated into a Cypher array to be unwound in the Cypher statement. This is a great solution for batching inserts:

UNWIND [
  {first_name: 'Helge', last_name: 'Schneider', born: 1955},
  {first_name: 'Bela', last_name: 'B', born: 1962}]
AS properties
CREATE (people:`Person`)
SET people = properties

A returning clause is supported as well:

INSERT INTO People p (name) VALUES (?) RETURNING elementId(p)
CREATE (p:Person {name: $1}) RETURN elementId(p)

Upserts

We support a restricted range of "upserts" via the non-standard but pretty common ON DUPLICATE and ON CONFLICT SQL clauses. Upserts are translated into MERGE statements. While they work without constraints, you should really have uniqueness-constraints on the node properties you merge on, or Neo4j may create duplicates (see Understanding how merge works).

Upserts on all columns can happen via either ON DUPLICATE KEY IGNORE or ON CONFLICT IGNORE. While ON DUPLICATE KEY does offer upgrade options, it assumes the primary (or unique) key being violated to be known. Although this is most certainly the case in a relational system, this translation layer that runs without a database connection doesn’t know.

Listing 5. Upsert with ON DUPLICATE KEY IGNORE
INSERT INTO Movie(title, released) VALUES(?, ?) ON DUPLICATE KEY IGNORE
MERGE (movie:`Movie` {title: $1, released: $2})
Listing 6. Upsert with ON CONFLICT IGNORE
INSERT INTO actors(name, firstname) VALUES(?, ?) ON CONFLICT DO NOTHING
MERGE (actors:`Actor` {name: $1, firstname: $2})

If you want to define an action, you must use ON CONFLICT and specify the key you want to merge on.

INSERT INTO tbl(i, j, k) VALUES (1, 40, 700)
ON CONFLICT (i) DO UPDATE SET j = 0, k = 2 * EXCLUDED.k

Note how the special reference EXCLUDED can be used to refer to the values of columns that have not been part of the key. They will be reused with their values in the ON MATCH SET clause.

MERGE (tbl:`tbl` {i: 1})
ON CREATE SET tbl.j = 40, tbl.k = 700
ON MATCH SET tbl.j = 0, tbl.k = (2 * 700)

This works with parameters, too:

INSERT INTO tbl(i, j, k) VALUES (1, 2, ?)
ON CONFLICT (i) DO UPDATE SET j = EXCLUDED.k
MERGE (tbl:`tbl` {i: 1})
ON CREATE SET tbl.j = 2, tbl.k = $1
ON MATCH SET tbl.j = $1

It’s possible to just specify a concrete merge column instead of merging on all columns as well. It will be translated with ON CREATE:

INSERT INTO tbl(i, j, k) VALUES (1, 40, 700)
ON CONFLICT (i) DO NOTHING
MERGE (tbl:`tbl` {i: 1})
ON CREATE SET tbl.j = 40, tbl.k = 700

Using ON CONFLICT and specifying a key is the only way to insert multiple rows with a MERGE statement:

INSERT INTO People (first_name, last_name, born) VALUES
    ('Helge', 'Schneider', 1955),
    ('Bela', 'B', 1962)
ON CONFLICT(last_name) DO UPDATE SET born = EXCLUDED.born
UNWIND [{first_name: 'Helge', last_name: 'Schneider', born: 1955}, {first_name: 'Bela', last_name: 'B', born: 1962}] AS properties
MERGE (people:`People` {last_name: properties['last_name']})
ON CREATE SET
  people.first_name = properties.first_name,
  people.born = properties.born
ON MATCH SET people.born = properties['born']