text2cypher

This is an experimental translator inspired by the Neo4j Labs project text2cypher.

If you add this translator to the classpath or use the text2cypher bundle, all queries that start with

🤖,

will be treated as natural language queries written in plain english. The driver will strip the prefix, and use OpenAI to translate the input into a Cypher® statement. The driver will augment the generation of the query by passing the current graph schema along with the input question.

The following data will be sent to an external API:

  • Your database schema, including label names

  • Any natural language query

Don’t use this translator if you don’t want the above, or are not allowed to do so.

This module requires one additional configuration: the OpenAI API key. You can use either a URL parameter, a JDBC property entry, or an environment variable:

  • URL parameter/property name is openAIApiKey

  • Environment variable name is OPEN_AI_API_KEY

Listing 1. Example of a valid URL
jdbc:neo4j://localhost:7687?openAIApiKey=sk-xxx-your-key

Additional configuration properties are

property name default value

openAIBaseUrl

https://api.openai.com/v1 (defined by langchain4j)

openAIModelName

gpt-4-turbo

openAITemperature

0.0

With that in place, a query such as the following can be translated into Cypher:

🤖, How was The Da Vinci Code rated?

The outcome of the LLM is not deterministic and is likely to vary. While you can execute it directly, we strongly recommend to use Connection#nativeSQL to retrieve the Cypher statement, inspect it, and then run it separately. In our test runs, the above questions was most often correctly translated to

MATCH (m:`Movie` {
  title: 'The Da Vinci Code'
})<-[r:`REVIEWED`]-(p:`Person`)
RETURN r.rating AS Rating, p.name AS ReviewerName

Other times the result was a syntactically correct statement, but it would only return the reviewers and the movie itself. Also note that while a human likely recognizes that you are actually thinking about the average rating, the LLM does not infer this. Making the question more explicit gives better results:

🤖, How was The Da Vinci Code rated on average?

is translated more accurately to:

MATCH (m:`Movie` {
  title: 'The Da Vinci Code'
})<-[:`REVIEWED`]-(p:`Person`)
RETURN avg(p.rating) AS AverageRating
Once a natural language query gets translated into Cypher, the result will be cached and further invocations of that query will use the cached result.

All that statements that do not start with 🤖 will be used as-is and treated as Cypher.