Session Track: AI Engineering
Session Time:
Session description
Large language models (LLMs) are changing how we interact with databases by letting users ask questions in natural language and automatically receiving database queries in return. One example is Text2Cypher, which translates questions like “What are the movies of Tom Hanks?” into Cypher queries that databases like Neo4j can understand. Most current research on text-to-query tasks (such as Text2SQL, Text2SPARQL, or Text2Cypher) focuses mainly on English, with limited evaluation in other languages. In this early study, we explore Text2Cypher performance across multiple languages using foundational LLMs. To do this, we created a test set by translating English questions into Spanish and Turkish while keeping the original database queries unchanged. This approach enables a fair comparison of how well models perform across these languages. Our results show that models perform best in English, reasonably well in Spanish, and face more challenges with Turkish. We believe this is due to differences in the amount of training data available and unique characteristics of each language. We also tested whether translating the instructions (prompts) into Spanish and Turkish would help but found it had little effect. This talk will share these findings and discuss why including more languages is important when developing natural language database tools.
Software Developer, Neo4j
Makbule Gulcin Ozsoy is a software developer and machine learning engineer, mainly working on recommender systems, ranking, and information retrieval.
Senior Software Engineer, Neo4j
Will Tai is a senior software engineer at Neo4j, with a background in machine learning engineering and data science. He is one of the maintainers of the neo4j-genai-python package. He is currently interested in making knowledge graphs useful for machine learning and AI applications.