Learn with Neo4j's New "Get to Know Graph & GenAI" Webinar Series >>

Neo4j logo

Text2Cypher Beyond English: Initial Evaluation with Foundational LLMs

Session Track: AI Engineering

Session Time:

Session description

Large language models (LLMs) are changing how we interact with databases by letting users ask questions in natural language and automatically receiving database queries in return. One example is Text2Cypher, which translates questions like “What are the movies of Tom Hanks?” into Cypher queries that databases like Neo4j can understand. Most current research on text-to-query tasks (such as Text2SQL, Text2SPARQL, or Text2Cypher) focuses mainly on English, with limited evaluation in other languages. In this early study, we explore Text2Cypher performance across multiple languages using foundational LLMs. To do this, we created a test set by translating English questions into Spanish and Turkish while keeping the original database queries unchanged. This approach enables a fair comparison of how well models perform across these languages. Our results show that models perform best in English, reasonably well in Spanish, and face more challenges with Turkish. We believe this is due to differences in the amount of training data available and unique characteristics of each language. We also tested whether translating the instructions (prompts) into Spanish and Turkish would help but found it had little effect. This talk will share these findings and discuss why including more languages is important when developing natural language database tools.

Speakers

photo of Makbule Gulcin Ozsoy

Makbule Gulcin Ozsoy

Software Developer, Neo4j

Makbule Gulcin Ozsoy is a software developer and machine learning engineer, mainly working on recommender systems, ranking, and information retrieval.

photo of Will Tai

Will Tai

Senior Software Engineer, Neo4j

Will Tai is a senior software engineer at Neo4j, with a background in machine learning engineering and data science. He is one of the maintainers of the neo4j-genai-python package. He is currently interested in making knowledge graphs useful for machine learning and AI applications.