SQL to Cypher translation
Introduction
The translation of queries from SQL to Cypher® is an optional feature of this driver and it consists of two parts:
-
The translator SPI, located in the module
org.neo4j:neo4j-jdbc-translator-spi
. It consists of two interfaces:SqlTranslatorFactory
and the actualSqlTranslator
. -
A concrete implementation of this SPI, published as
org.neo4j:neo4j-jdbc-translator-impl
.
The latter is covered in "Using the default translator" and available in the "full bundle", described in Available bundles. The former is provided for two reasons: it allows us to distribute the driver with and without the bundled, default translator and allows you to run your custom translator.
Translators can be chained, and there can be as many translators on the classpath as you want. Their precedence is configurable, with our default implementation having the lowest precedence. Thus, you can for example have a custom translator that takes care of a fixed set of queries and, if it receives a query it cannot translate, it will pass it down to our implementation.
Translating arbitrary SQL queries to Cypher is an opinionated task, as there is no right way to map table names to graph objects: a table name can be used as-is as a label, or may be transformed into a singular form, etc. Mapping relationships is even trickier: should relationship types derive from a join table, a join column (in that case, which one?), or a foreign key?
We believe our assumptions are appropriate for various use cases and instead of providing configuration to cater for all scenarios, we offer the possibility to write your own translation layer. The driver will use the standard Java service loader mechanism to find an implementation of the SPI on the module- or classpath.
Some tools (like Tableau) use a class-loader that won’t let the driver use the standard Java service loader mechanism.
For these scenarios, we provide an additional configuration property named translatorFactory .
Set this to DEFAULT to directly load our default implementation or to a fully-qualified classname for any other factory.
Be aware that either our default implementation or your custom one must be on the classpath.
|
Translating SQL to Cypher
There’s only one requirement to enable the SQL-to-Cypher translation: you have to have one module implementing the SPI on the classpath.
This is automatically the case if you use the full-bundle (org.neo4j:neo4j-jdbc-full-bundle
).
In that case, you don’t need to add any other dependency.
If you use the individual distribution or the "small" bundle org.neo4j:neo4j-jdbc-bundle
, you must add the artifact org.neo4j:neo4j-jdbc-translator-impl
.
The implementation will be automatically loaded. If you use the translation on a case-by-case basis, it will be lazily loaded (i.e no additional classes are touched or loaded into memory). If you configure automatic translation for all statements, the implementation will be eagerly loaded. There are no further configuration options with regard to loading the implementation.
On a case-by-case basis
The translator can be used on a case-by-case basis through the official JDBC API nativeSQL
, which you find on the java.sql.Connection
class.
With the following imports:
import java.sql.DriverManager;
import java.sql.SQLException;
import java.util.Properties;
import java.util.logging.Logger;
You just pass your SQL statement to nativeSQL
and you will get Cypher back:
try (var connection = DriverManager.getConnection(url, username, password)) {
var sql = connection.nativeSQL("SELECT * FROM Movie n");
assert """
MATCH (n:Movie)
RETURN *""".equals(sql);
}
For all queries
If you open the connection to your Neo4j instance using enableSQLTranslation=true
either as URL parameter or configuration property, all statements will be translated from SQL to Cypher.
If you configure the driver in this way, the translator will be eagerly loaded.
var url = "jdbc:neo4j://localhost:7687?enableSQLTranslation=true";
try (var connection = DriverManager.getConnection(url, username, password);
var stmnt = connection.createStatement();
var result = stmnt.executeQuery("SELECT n.title FROM Movie n")) {
while (result.next()) {
LOGGER.info(result.getString("n.title"));
}
}
Sometimes you may need to fall back to Cypher for some statements, either to use constructs that you cannot express with SQL, or because our default translator cannot handle your query.
We offer a special comment that you can use as a hint in your statement to stop automatic translation: /*+ NEO4J FORCE_CYPHER */
.
var url = "jdbc:neo4j://localhost:7687?enableSQLTranslation=true";
var query = """
/*+ NEO4J FORCE_CYPHER */
MATCH (:Station { name: 'Denmark Hill' })<-[:CALLS_AT]-(d:Stop)
((:Stop)-[:NEXT]->(:Stop)){1,3}
(a:Stop)-[:CALLS_AT]->(:Station { name: 'Clapham Junction' })
RETURN localtime(d.departs) AS departureTime,
localtime(a.arrives) AS arrivalTime
""";
try (var connection = DriverManager.getConnection(url, username, password);
var stmnt = connection.createStatement();
var result = stmnt.executeQuery(query)) {
while (result.next()) {
LOGGER.info(result.getTime("departureTime").toString());
}
}
Possible error scenarios
A NoSuchElementException
with a message of No SQL translators available
will be thrown when there is no implementation of the SQL to Cypher translator available, and you either used java.sql.Connection.nativeSQL
or enabled automatic translation. The exception will be thrown when you access the method or eagerly on opening a connection in the latter case.
Using the default translator
Supported SQL dialects
Our default translator uses the OSS parser from jOOQ, which supports a broad spectrum of SQL dialects already.
We picked the generic, default dialect of jOOQ as our default dialect, but you can overwrite this in the SQL to Cypher configuration using the parameter s2c.sqlDialect
with one of the supported dialects listed in configuration below.
POSTGRES
can be a good choice for several integrations.
Bear in mind though that any shortcomings in the translation are probably not due to a lack in the parser, but due to the lack of an obvious, semantically equivalent Cypher construct. That means we might be able to parse a certain piece of SQL, but are unable to translate in into something meaningful that Neo4j can understand without additional, contextual information.
Configuration
The default implementation provides a number of configuration settings.
They must be prefixed with s2c
in the URL or config options:
Name | Meaning | Default |
---|---|---|
|
Whether to parse table names as is or not. |
|
|
A map from table names to labels. |
An empty map |
|
A map from column names to relationship types. |
An empty map |
|
Whether to format the generated Cypher or not. |
|
|
Whether to always escape names. |
Unless explicitly configured |
|
Which dialect to use when parsing. Supported values are |
|
The next few examples use the properties
config to avoid terrible long URLs in this documentation, but all the attributes can be specified via URL as well.
var properties = new Properties();
properties.put("username", "neo4j");
properties.put("password", "verysecret");
properties.put("enableSQLTranslation", "true");
properties.put("s2c.prettyPrint", "false");
properties.put("s2c.alwaysEscapeNames", "false");
properties.put("s2c.tableToLabelMappings", "people:Person;movies:Movie;movie_actors:ACTED_IN");
var url = "jdbc:neo4j://localhost:7687";
var query = """
SELECT p.name, m.title
FROM people p
JOIN movie_actors r ON r.person_id = p.id
JOIN movies m ON m.id = r.person_id""";
try (var connection = DriverManager.getConnection(url, properties)) {
var sql = connection.nativeSQL(query);
assert "MATCH (p:Person)-[r:ACTED_IN]->(m:Movie) RETURN p.name, m.title".equals(sql);
}
var properties = new Properties();
properties.put("username", "neo4j");
properties.put("password", "verysecret");
properties.put("enableSQLTranslation", "true");
properties.put("s2c.parseNameCase", "UPPER");
var url = "jdbc:neo4j://localhost:7687";
var query = "SELECT * FROM people";
try (var connection = DriverManager.getConnection(url, properties)) {
var sql = connection.nativeSQL(query);
assert """
MATCH (people:PEOPLE)
RETURN *""".equals(sql);
}
Named parameter syntax in the SQL parser defaults to :name
(such as supported by Oracle, JPA, Spring, a colon followed by a name).
The following example changes that prefix to $
(the same prefix that Cypher uses):
var properties = new Properties();
properties.put("username", "neo4j");
properties.put("password", "verysecret");
properties.put("enableSQLTranslation", "true");
properties.put("s2c.parseNamedParamPrefix", "$");
properties.put("s2c.joinColumnsToTypeMappings", "people.movie_id:DIRECTED");
var url = "jdbc:neo4j://localhost:7687";
var query = """
SELECT *
FROM people p
JOIN movies m ON m.id = p.movie_id
WHERE p.name = $1
""";
try (var connection = DriverManager.getConnection(url, properties)) {
var sql = connection.nativeSQL(query);
assert """
MATCH (p:people)-[:DIRECTED]->(m:movies)
WHERE p.name = $1
RETURN *""".equals(sql);
}
This is helpful when a tool generates names like that and does not allow customization.
Supported statements
The following statements are all under tests and describe what you can expect from the default translation layer:
Translation concepts
Table names to labels
The most simple SELECT
statement to translate is one without FROM
clause, such as:
SELECT 1
It is equivalent to a Cypher RETURN
:
RETURN 1
SELECT
statements without JOIN
clauses are pretty straightforward to translate.
The challenge here is how to map the table name to labels:
-
We parse the SQL statement case-sensitive by default
-
Table names are mapped to node labels
-
Table aliases are used as identifiable symbolic names
SELECT t.a, t.b
FROM My_Table (1)
AS t (2)
WHERE t.a = 1
1 | Will be used as the label to match, as-is, i.e. My_Table |
2 | The table alias will become the node alias |
The whole construct will be translated to
MATCH (t:My_Table)
WHERE t.a = 1
RETURN t.a, t.b
Table aliases are optional, if you omit them, we derive aliases from the labels and types. If you inspect the translated queries, we recommend using aliases, as this makes the queries better readable.
Star-Selects
A star- or *
- select comes in different forms:
- Unqualified
-
SELECT * FROM table
- Qualified
-
SELECT t.* FROM table t
And a variant, selecting the relation itself: SELECT t FROM table t
.
We make use of this to let you decide whether you want to return Neo4j nodes and relationships as entities, maps, or flattened to individual columns. The latter requires our translator to have access to the schema of the underlying Neo4j database. The following sections describe the use-cases.
Projecting individual properties
Don’t use a star-select but enumerate the properties:
SELECT m.title FROM Movie m
The table alias will be used as a symbolic name:
MATCH (m:Movie)
RETURN m.title;
You can omit the table alias:
SELECT title FROM Movie
The lower-cased table name will be the symbolic name:
MATCH (movie:Movie)
RETURN movie.title;
Accessing JDBC columns by name leads to code that is hard to maintain, as column renamings impact code as well. To avoid this, alias the column:
SELECT title AS title FROM Movie
So that it has a stable, well-known name:
MATCH (movie:Movie)
RETURN movie.title AS title;
Projecting all properties
A SELECT *
statement gets translated differently depending on whether the connection to the Neo4j database is available.
SELECT * FROM Person p
If you are offline, you will get the following Cypher statement:
MATCH (p:Person) RETURN *
The above query will return one column (p
) containing a Neo4j node.
This is usually not what you expect in the relational world.
If you run the translation online and Neo4j metadata can be retrieved,
you will get a statement that flattens the properties of each node and relationship, plus their element IDs:
In case the Person
node has properties born
and name
,
SELECT * FROM Person p
you will get this Cypher statement
MATCH (p:Person)
RETURN elementId(p) AS `v$id`,
p.born AS born, p.name AS name
This works well with multiple tables as well (Movie
has properties title
and released
):
SELECT * FROM Person p JOIN Movie m ON m.id = p.acted_in
MATCH (p:Person)-[acted_in:ACTED_IN]->(m:Movie)
RETURN elementId(p) AS `v$id`, p.born AS born, p.name AS name,
elementId(m) AS `v$id1`, m.title AS title, m.released AS released
We append increasing numbers to column names to clashing ones (ex. with name
and remark
properties both in Movie
and Person
):
SELECT * FROM Person p JOIN Movie m ON m.id = p.acted_in
Note the increasing numbers per duplicate name:
MATCH (p:Person)-[acted_in:ACTED_IN]->(m:Movie)
RETURN elementId(p) AS `v$id`,
p.born AS born, p.name AS name, p.remark AS remark,
elementId(m) AS `v$id1`,
m.name AS name1, m.released AS released, m.remark AS remark1
The following example uses a join-table to access relationships (we explain this later in this manual when discussing joins), and the flattening of properties works here as well:
SELECT *
FROM people p
JOIN movie_actors r ON r.person_id = p.id
JOIN movies m ON m.id = r.person_id
MATCH (p:Person)-[r:ACTED_IN]->(m:Movie)
RETURN elementId(p) AS `v$id`,
p.born AS born, p.name AS name,
elementId(p) AS `v$person_id`,
elementId(r) AS `v$id1`, r.role AS role,
elementId(m) AS `v$movie_id`,
elementId(m) AS `v$id2`,
m.title AS title, m.released AS released
SELECT * FROM Person p ORDER BY name ASC
MATCH (p:Person)
RETURN elementId(p) AS `v$id`,
p.born AS born, p.name AS name
ORDER BY p.name
A qualified alias can be used as well. If no Neo4j metadata is available, you will get a map of properties of the node/relationship:
SELECT m.*, p.*
FROM Person p
JOIN Movie m ON m.id = p.acted_in
The corresponding columns must be downcast to a map in JDBC:
MATCH (p:Person)-[acted_in:ACTED_IN]->(m:Movie)
RETURN m{.*} AS m, p{.*} AS p
If we add more data (ex. born
and name
to Person
), the qualified star will project all of them (note how we also project one single, known column from the Movie
table):
SELECT p.*, m.title AS title
FROM Person p
JOIN Movie m ON m.id = p.acted_in
MATCH (p:Person)-[acted_in:ACTED_IN]->(m:Movie)
RETURN elementId(p) AS `v$id`, p.born AS born, p.name AS name, m.title AS title
Returning nodes and relationships
A statement that projects a table alias such as
SELECT m FROM Movie m
will result in a Cypher statement returning the matched node as node.
MATCH (m:Movie)
RETURN m;
A node can be aliased as well:
SELECT m AS node FROM Movie m
MATCH (m:Movie)
RETURN m AS node;
Un-aliased tables can be used as well:
SELECT movie FROM Movie
MATCH (movie:Movie)
RETURN movie;
Multiple entities are supported, too:
SELECT p, r, m FROM Person p
JOIN ACTED_IN r ON r.person_id = p.id
JOIN Movie m ON m.id = r.movie_id
MATCH (p:Person)-[r:ACTED_IN]->(m:Movie) RETURN p, r, m
Comparing SQL with Cypher examples
The source of the following examples is: Comparing SQL with Cypher.
Select and Return Records
Select everything from the products
table.
SELECT p.*
FROM products as p
Similarly, in Cypher, you just MATCH
a simple pattern: all nodes with the label Product
and RETURN
them.
MATCH (p:Product)
RETURN p{.*} AS p
The above query will project all properties of the matched node. If you want to return the node itself, select it without using the asterisk:
SELECT p
FROM products as p
MATCH (p:Product)
RETURN p
Field Access, Ordering and Paging
It is more efficient to return only a subset of attributes, like ProductName
and UnitPrice
.
And while we are at it, let’s also order by price and only return the 10 most expensive items.
(Remember that labels, relationship-types and property-names are case sensitive in Neo4j.)
SELECT p.`productName`, p.`unitPrice`
FROM products as p
ORDER BY p.`unitPrice` DESC
LIMIT 10
MATCH (p:Product)
RETURN p.productName, p.unitPrice ORDER BY p.unitPrice DESC LIMIT 10
The default order direction will be translated as is:
SELECT * FROM Movies m ORDER BY m.title
MATCH (m:Movies)
RETURN * ORDER BY m.title
DISTINCT
projections
The DISTINCT
keyword for projections is handled:
SELECT DISTINCT m.released FROM Movies m
MATCH (m:Movies)
RETURN DISTINCT m.released
It works with *
projections as well:
SELECT DISTINCT m.* FROM Movies m
MATCH (m:Movies)
RETURN DISTINCT m {.*} AS m
However, as the qualified asterisks will use metadata if available, the translation with a database connection is different:
SELECT DISTINCT m.* FROM Movies m
MATCH (m:Movies)
RETURN DISTINCT elementId(m) AS `v$id`, m.title AS title, m.released AS released
Note that each row includes the Neo4j element ID, making each row unique.
This being said, the DISCTINCT
clause is of limited use with the asterisk.
Expressions
Most SQL expressions have corresponding Cypher expressions and can be translated straightforward.
Literal Values
Literal values are 1:1 translations.
SELECT
1, TRUE, FALSE, NULL, 'a'
RETURN 1, TRUE, FALSE, NULL, 'a'
Arithmetic expressions
Arithmetic expressions are 1:1 translations.
SELECT
1 + 2,
1 - 2,
1 * 2,
1 / 2,
square(2)
Note that the underlying tech of the default translator uses Cypher-DSL internally, which will wrap arithmetic (and logical) expressions with parentheses:
RETURN
(1 + 2),
(1 - 2),
(1 * 2),
(1 / 2),
(2 * 2)
Functions
Numeric functions
We can translate all numeric functions that are supported by Neo4j’s Cypher implementation: Mathematical functions - Numeric:
SELECT
abs(1),
ceil(1),
floor(1),
round(1),
round(1, 1),
sign(1)
will be translated to
RETURN
abs(1),
ceil(1),
floor(1),
round(1),
round(1, 1),
sign(1)
Logarithmic functions
Neo4j supports a broad range of logarithmic functions.
SELECT
exp(1),
ln(1),
log(2, 1),
log10(1),
sqrt(1)
will be translated to
RETURN
exp(1),
log(1),
(log(1) / log(2)),
log10(1),
sqrt(1)
Trigonometric functions
Calls to trigonometric functions
SELECT
acos(1),
asin(1),
atan(1),
atan2(1, 2),
cos(1),
cot(1),
degrees(1),
pi(),
radians(1),
sin(1),
tan(1)
will be translated to the corresponding Neo4j functions:
RETURN
acos(1),
asin(1),
atan(1),
atan2(1, 2),
cos(1),
cot(1),
degrees(1),
pi(),
radians(1),
sin(1),
tan(1)
String functions
The following string manipulations are guaranteed to work:
SELECT
lower('abc'),
cast(3 as varchar),
trim(' abc '),
length('abc'),
left('abc', 2),
ltrim(' abc '),
replace('abc', 'b'),
replace('abc', 'b', 'x'),
reverse('abc'),
right('abc', 2),
rtrim(' abc '),
substring('abc', 2 - 1),
substring('abc', 2 - 1, 2),
upper('abc')
and will be translated to Neo4j’s versions:
RETURN
toLower('abc'),
toString(3),
trim(' abc '),
size('abc'),
left('abc', 2),
ltrim(' abc '),
replace('abc', 'b', NULL),
replace('abc', 'b', 'x'),
reverse('abc'),
right('abc', 2),
rtrim(' abc '),
substring('abc', (2 - 1)),
substring('abc', (2 - 1), 2),
toUpper('abc')
Scalar functions
The input
SELECT
coalesce(1, 2),
coalesce(1, 2, 3),
nvl(1, 2),
cast('1' as boolean),
cast(1 as float),
cast(1 as double precision),
cast(1 as real),
cast(1 as tinyint),
cast(1 as smallint),
cast(1 as int),
cast(1 as bigint)
will be translated to (see Scalar functions):
RETURN
coalesce(1, 2),
coalesce(1, 2, 3),
coalesce(1, 2),
toBoolean('1'),
toFloat(1),
toFloat(1),
toFloat(1),
toInteger(1),
toInteger(1),
toInteger(1),
toInteger(1)
Query expressions
Several advanced SQL expressions are supported as well.
CASE
simple
The simple CASE
expressions
SELECT
CASE 1 WHEN 2 THEN 3 END,
CASE 1 WHEN 2 THEN 3 ELSE 4 END,
CASE 1 WHEN 2 THEN 3 WHEN 4 THEN 5 END,
CASE 1 WHEN 2 THEN 3 WHEN 4 THEN 5 ELSE 6 END
RETURN CASE 1 WHEN 2 THEN 3 END, CASE 1 WHEN 2 THEN 3 ELSE 4 END, CASE 1 WHEN 2 THEN 3 WHEN 4 THEN 5 END, CASE 1 WHEN 2 THEN 3 WHEN 4 THEN 5 ELSE 6 END
See Cypher → Conditional expressions (CASE) for more information.
CASE
advanced
And CASE
statement using a search:
SELECT
CASE WHEN 1 = 2 THEN 3 END,
CASE WHEN 1 = 2 THEN 3 ELSE 4 END,
CASE WHEN 1 = 2 THEN 3 WHEN 4 = 5 THEN 6 END,
CASE WHEN 1 = 2 THEN 3 WHEN 4 = 5 THEN 6 ELSE 7 END
will be translated to
RETURN
CASE WHEN 1 = 2 THEN 3 END,
CASE WHEN 1 = 2 THEN 3 ELSE 4 END,
CASE WHEN 1 = 2 THEN 3 WHEN 4 = 5 THEN 6 END,
CASE WHEN 1 = 2 THEN 3 WHEN 4 = 5 THEN 6 ELSE 7 END
See Cypher → Conditional expressions (CASE) for more information.
Predicates
As with expressions a lot of logical SQL expressions and conditions used as predicates can be translated straightforward into Cypher predicates.
Conjunctions and disjunctions
Logical conjunctions and disjunctions are all supported.
SELECT 1 FROM p WHERE 1 = 1 AND 2 = 2 OR 3 = 3
MATCH (p:p)
WHERE ((1 = 1
AND 2 = 2)
OR 3 = 3)
RETURN 1
The input
SELECT 1 FROM p WHERE NOT 1 = 1 XOR 2 = 2
will be translated to
MATCH (p:p)
WHERE (NOT (1 = 1)
XOR 2 = 2)
RETURN 1
Operators
Arithmetic operators
The input
SELECT 1 FROM p WHERE 1 = 1 AND 2 > 1 AND 1 < 2 AND 1 <= 2 AND 2 >= 1 AND 1 != 2
will be translated to
MATCH (p:p)
WHERE (1 = 1
AND 2 > 1
AND 1 < 2
AND 1 <= 2
AND 2 >= 1
AND 1 <> 2)
RETURN 1
Between
Between in SQL is inclusive
SELECT 1 FROM p WHERE 2 BETWEEN 1 AND 3
and will be translated to (we can’t generate the shorter form (1 ⇐ 2 ⇐ 3
) due to restrictions in the underlying generator):
MATCH (p:p)
WHERE (1 <= 2) AND (2 <= 3)
RETURN 1
SQL has a SYMMETRIC
keyword for the BETWEEN
clause, to indicate that you do not care which bound of the range is larger:
SELECT 1 FROM p WHERE 2 BETWEEN SYMMETRIC 3 AND 1
We translate this into a disjunction:
MATCH (p:p)
WHERE (3 <= 2) AND (2 <= 1) OR (1 <= 2) AND (2 <= 3)
RETURN 1
Logical row value expressions
The above examples are based on scalar expressions. Row value expressions will be translated as well:
SELECT 1
FROM p
WHERE (1, 2) = (3, 4)
OR (1, 2) < (3, 4)
OR (1, 2) <= (3, 4)
OR (1, 2, 3) <> (4, 5, 6)
OR (1, 2, 3) > (4, 5, 6)
OR (1, 2, 3) >= (4, 5, 6)
Leads to a semantically equivalent cypher:
MATCH (p:p)
WHERE 1 = 3 AND 2 = 4
OR (1 < 3 OR 1 = 3 AND 2 < 4)
OR (1 < 3 OR 1 = 3 AND 2 <= 4)
OR (1 != 4 AND 2 != 5 AND 3 != 6)
OR (1 > 4 OR 1 = 4 AND (2 > 5 OR 2 = 5 AND 3 > 6))
OR (1 > 4 OR 1 = 4 AND (2 > 5 OR 2 = 5 AND 3 >= 6))
RETURN 1
Using joins to map relationships
On the surface, joins are relationships materialized in SQL (foreign keys are not). Sadly, it’s not as straightforward to map to Cypher. There are several implementation options:
-
When joining two tables on a column, take the left hand table column, use its name as relationship type, and treat it as outgoing from left to right.
-
When joining two tables with an intersection table (which it’s usually modeled in SQL for
m:n
relationships with attributes), use the name of that intersection table as relationship type.
We implemented some variants thereof, however we don’t claim their absolute usefulness in all cases.
1:n joins
Natural joins
SQL NATURAL
joins are the easiest way to denote relationship names without having to do any mapping.
A one-hope NATURAL JOIN
will translate to an anonymous, wildcard relationship.
SELECT p, m FROM Person p
NATURAL JOIN Movie m
MATCH (p:Person)-->(m:Movie) RETURN p, m
NATURAL
joins can be chained, and the connecting join table does not need to exist.
This will be turned into a Neo4j relationship:
SELECT p.name, r.roles, m.* FROM Person p
NATURAL JOIN ACTED_IN r
NATURAL JOIN Movie m
MATCH (p:Person)-[r:ACTED_IN]->(m:Movie)
RETURN p.name, r.roles,
elementId(m) AS `v$id`, m.title AS title, m.released AS released
Simple join
Assume we configured the translator to use the following table mapping:
-
people
mapped to labelPeople
-
movies
mapped to labelMovie
With that in place, we translate
SELECT p.name, m.title
FROM people p
JOIN movies m ON m.id = p.directed
to
MATCH (p:Person)-[directed:DIRECTED]->(m:Movie)
RETURN p.name, m.title
DIRECTED
is the uppercase version of the join column in the left table (p.directed
).
We can add a join column mapping in the form of people.movie_id:DIRECTED
if we have different column names:
SELECT p.name, m.title
FROM people p
JOIN movies m ON m.id = p.movie_id
to
MATCH (p:Person)-[directed:DIRECTED]->(m:Movie)
RETURN p.name, m.title
m:n joins
An intersection table is a table that contains references to two other tables in the form of at least two columns.
This construct is usually required in the relational model to create an m:n
relationship.
Such an auxiliary construct is not necessary in Neo4j.
We can model as many outgoing and incoming relationships from one label to another, and they can also have properties.
We can thus use that construct for our translator.
The following example uses a configured mapping as follows:
-
people
mapped to labelPeople
-
movies
mapped to labelMovie
-
movie_actors
mapped toACTED_IN
SELECT p.name, m.title
FROM people p (1)
JOIN movie_actors r ON r.person_id = p.id (2)
JOIN movies m ON m.id = r.person_id (3)
1 | The table from which to map outgoing relationships |
2 | An intersection table, that is used again in the next JOIN clause |
3 | The final JOIN clause |
We do no semantic analysis: the order of the joins matter, and will lead to the following query:
MATCH (p:Person)-[r:ACTED_IN]->(m:Movie)
RETURN p.name, m.title
Multiple joins will result in a chain of relationships:
SELECT p.name AS actor, d.name AS director, m.title
FROM people p
JOIN movie_actors r ON r.person_id = p.id
JOIN movies m ON m.id = r.person_id
JOIN movie_directors r2 ON r2.movie_id = m.id
JOIN people d ON r2.person_id = d.id
MATCH (p:`Person`)-[r:`ACTED_IN`]->(m:`Movie`)<-[r2:`DIRECTED`]-(d:`Person`)
RETURN p.name AS actor, d.name AS director, m.title
Notice how the direction of the DIRECTED
relationship is defined by the order of the join columns.
DML statements
This section lists the supported Data-Manipulation-Language (DML) statements.
Although a SELECT
statement is technically DML as well, it is covered in Translation concepts.
Deleting nodes
Nodes can be deleted via the SQL DELETE
statement.
For example, to unconditionally delete all person
nodes:
DELETE FROM person
MATCH (person:person)
DELETE person
A WHERE
clause can be added to prevent this:
DELETE FROM person
WHERE person.id = 1
MATCH (person:person)
WHERE person.id = 1
DELETE person
If you want to delete everything, but your tooling complains, just add a conditions that is always true
:
DELETE FROM person
WHERE true
MATCH (person:person)
WHERE true
DELETE person
Of, the condition can also be that always evaluates to false
, never deleting anything:
DELETE FROM person
WHERE false
MATCH (person:person)
WHERE false
DELETE person
Tables can be aliased, and the alias will be used in Cypher, too:
DELETE FROM person p
MATCH (p:person)
DELETE p
Aliasing tables is also supported in combination with specifying the label to which the table name is mapped.
Using the same query with table_mappings=person:Person
configured,
DELETE FROM person p
will be translated to
MATCH (p:Person)
DELETE p
Deleting nodes and their related nodes
You can use SQL TRUNCATE
to DETACH DELETE
nodes.
TRUNCATE TABLE people
will be translated to
MATCH (people:Person)
DETACH DELETE people
Inserting data
A single list of values with explicit columns and constant values can be inserted with a simple INSERT
statement:
INSERT INTO People (first_name, last_name, born) VALUES ('Helge', 'Schneider', 1955)
CREATE (people:`Person` {first_name: 'Helge', last_name: 'Schneider', born: 1955})
All expressions, including parameters, are supported. Parameters will be named from 1 on upwards in Cypher.
INSERT INTO People (first_name, last_name, born) VALUES (?, ?, ?)
CREATE (people:`Person` {first_name: $1, last_name: $2, born: $3})
If you omit the column names on the insertion target, we generate names:
INSERT INTO People VALUES ('Helge', 'Schneider', 1955)
Note the unknown field xxx
property names:
CREATE (people:`Person` {`unknown field 0`: 'Helge', `unknown field 1`: 'Schneider', `unknown field 2`: 1955})
The SQL VALUES
clause actually supports lists of values:
INSERT INTO People (first_name, last_name, born) VALUES
('Helge', 'Schneider', 1955),
('Bela', 'B', 1962)
Those values will be translated into a Cypher array to be unwound in the Cypher statement. This is a great solution for batching inserts:
UNWIND [
{first_name: 'Helge', last_name: 'Schneider', born: 1955},
{first_name: 'Bela', last_name: 'B', born: 1962}]
AS properties
CREATE (people:`Person`)
SET people = properties
A returning clause is supported as well:
INSERT INTO People p (name) VALUES (?) RETURNING elementId(p)
CREATE (p:Person {name: $1}) RETURN elementId(p)
Upserts
We support a restricted range of "upserts" via the non-standard but pretty common ON DUPLICATE
and ON CONFLICT
SQL clauses.
Upserts are translated into MERGE
statements.
While they work without constraints, you should really have uniqueness-constraints on the node properties you merge on, or Neo4j may create duplicates (see Understanding how merge works).
Upserts on all columns can happen via either ON DUPLICATE KEY IGNORE
or ON CONFLICT IGNORE
.
While ON DUPLICATE KEY
does offer upgrade options, it assumes the primary (or unique) key being violated to be known.
Although this is most certainly the case in a relational system, this translation layer that runs without a database connection doesn’t know.
ON DUPLICATE KEY IGNORE
INSERT INTO Movie(title, released) VALUES(?, ?) ON DUPLICATE KEY IGNORE
MERGE (movie:`Movie` {title: $1, released: $2})
ON CONFLICT IGNORE
INSERT INTO actors(name, firstname) VALUES(?, ?) ON CONFLICT DO NOTHING
MERGE (actors:`Actor` {name: $1, firstname: $2})
If you want to define an action, you must use ON CONFLICT
and specify the key you want to merge on.
INSERT INTO tbl(i, j, k) VALUES (1, 40, 700)
ON CONFLICT (i) DO UPDATE SET j = 0, k = 2 * EXCLUDED.k
Note how the special reference EXCLUDED
can be used to refer to the values of columns that have not been part of the key.
They will be reused with their values in the ON MATCH SET
clause.
MERGE (tbl:`tbl` {i: 1})
ON CREATE SET tbl.j = 40, tbl.k = 700
ON MATCH SET tbl.j = 0, tbl.k = (2 * 700)
This works with parameters, too:
INSERT INTO tbl(i, j, k) VALUES (1, 2, ?)
ON CONFLICT (i) DO UPDATE SET j = EXCLUDED.k
MERGE (tbl:`tbl` {i: 1})
ON CREATE SET tbl.j = 2, tbl.k = $1
ON MATCH SET tbl.j = $1
It’s possible to just specify a concrete merge column instead of merging on all columns as well.
It will be translated with ON CREATE
:
INSERT INTO tbl(i, j, k) VALUES (1, 40, 700)
ON CONFLICT (i) DO NOTHING
MERGE (tbl:`tbl` {i: 1})
ON CREATE SET tbl.j = 40, tbl.k = 700
Using ON CONFLICT
and specifying a key is the only way to insert multiple rows with a MERGE
statement:
INSERT INTO People (first_name, last_name, born) VALUES
('Helge', 'Schneider', 1955),
('Bela', 'B', 1962)
ON CONFLICT(last_name) DO UPDATE SET born = EXCLUDED.born
UNWIND [{first_name: 'Helge', last_name: 'Schneider', born: 1955}, {first_name: 'Bela', last_name: 'B', born: 1962}] AS properties
MERGE (people:`People` {last_name: properties['last_name']})
ON CREATE SET
people.first_name = properties.first_name,
people.born = properties.born
ON MATCH SET people.born = properties['born']