Verify Neo4j Cypher Queries With CyVer

Research Fellow

October 2, 2025

15 min read

Never run a bad query again with this open-source Python library

Graph databases like Neo4j are powering more and more agentic GraphRAG systems — but writing safe Cypher queries isn’t always easy. When queries are produced by LLMs, especially open-source ones, small mistakes like schema mismatches or missing properties can quickly break things and not bring the desired result. That’s why query validation matters more than ever.

In this article, I’ll walk you through CyVer, an open-source Python library designed to validate Cypher queries for syntax, schema, and property correctness. Using the popular Neo4j Movies database as an example, you’ll learn how to catch errors before they hit your database and boost your confidence in LLM-generated queries. Whether you’re a data scientist, engineer, or graph enthusiast, CyVer can help you write safer, smarter Cypher — every time.

Before diving into CyVer details, I’ll share a quick backstory. While testing open-source LLMs on the generation of Cypher queries for SustainGraph, small but fatal Cypher errors kept popping up — and that frustration is what sparked CyVer. I started exploring the problem with my colleague Christina-Maria Androna, who has been an essential collaborator from the very beginning, and our journey began with a simple question: What if we could validate Cypher queries by checking the correctness of paths, nodes, and relationships against a Neo4j database schema before they even run? That question became the foundation of everything you’re reading here.

CyVer Release v2

With the v2 release, CyVer takes things further by offering three main validators that now work alongside metadata reporting, giving clear explanations for errors and even suggesting ways to fix them.

Recently, we introduced CyVer into the “Decoding the Mystery: How Can LLMs Turn Text Into Cypher in Complex Knowledge Graphs?” paper and thanks to the support and enthusiasm of the Neo4j team, we had the opportunity to showcase a small demo of CyVer v2 on SustainGraph at Neo4j Live (video attached below).

Regex Patterns

Before explaining the core classes of CyVer, let’s clarify the core building blocks of Cypher regex patterns:

🟢 Node patterns: A node pattern in Cypher represents an entity (n:Label{props}) and can include:

An optional variable (e.g., n)
One or more labels (e.g., n:Label)
Optional properties (e.g., n:Label {key: value})
Label negation (e.g., n:!Label)

➡️ Relationship patterns: A relationship pattern represents a connection between two nodes (r:REL_TYPE{props}) and can include:

An optional variable (e.g., r)
One or more relationship types(e.g., r:REL_TYPE1)
Optional properties (e.g., r:REL_TYPE1{key: value})
Type negation (e.g., r:!REL_TYPE1)
Variable length (e.g. *,*1..3)

🟢➡️🟢 Path patterns: A path pattern combines node and relationship patterns, with optional direction. A path pattern describes how nodes are connected — by relationships, with direction (->) or (<-) or undirected (-).

CyVer Classes

CyVer main classes, along with their methods

CyVer provides three core validation classes to ensure that your Cypher queries are robust, safe, and schema-aware:

🟠 Syntax Validator — Validates Cypher queries for correct syntax

🟠 Schema Validator — Ensures that Cypher queries conform to your knowledge graph schema

🟠 Properties Validator — Ensures that only valid properties are accessed in queries

You can dive deeper into each validator in the CyVer Documentation on GitLab.

Let’s Test CyVer

At first, let’s import CyVer validators and connect to the movies graph database offered by Neo4j Labs. You can find installation instructions and all the details in the GitLab repository.

The Movies graph database is useful for learning Cypher queries and graph concepts. It models movies, people (actors and directors), and simple relationships like :ACTED_IN, :DIRECTED, and :PRODUCED.

from neo4j import GraphDatabase,basic_auth
from CyVer import SyntaxValidator, SchemaValidator, PropertiesValidator

# Connect to Neo4j instance

# Get variables
database_url = 'neo4j+s://demo.neo4jlabs.com'
database_user = 'movies'
database_password = 'movies'
database_name = 'movies'

#Connect to neo4j database instance
driver = GraphDatabase.driver(database_url, auth=basic_auth(database_user, database_password) )
print("Connection established.",driver.verify_connectivity())

The code is available on GitLab.

In the following sections, we test a set of Cypher queries against each validator — one written correctly and others intentionally broken — to showcase how CyVer detects and explains errors.

Syntax Validator

Each option of the provided Cypher queries showcases a different error regarding the Syntax Validator:

valid_query: A correct Cypher query
closing_parenthesis_query: Missing a closing parenthesis (syntax error)
not_param_provided_query: Uses a parameter that isn’t provided (common mistake)
unsatisf_rel_type_query: Uses an invalid relationship type expression (not allowed in Cypher)
arithemtic_error_query: Contains an explicit division by zero
conflicting_label_query: Uses the same variable for nodes with different labels (can cause conflicts)

valid_query = "MATCH (p:Person)-[r:ACTED_IN]->(m:Movie) RETURN p.name"
closing_parenthesis_query = "MATCH (p:Person-[r:ACTED_IN]->(m:Movie) RETURN p, m"  
not_param_provided_query = "MATCH (p:Person{name:$prop_name}) RETURN p"  
unsatisf_rel_type_query = "MATCH (p:Person)-[r:ACTED_IN&FOLLOWS]->(m:Movie) RETURN p, m" 
arithemtic_error_query = "MATCH (p:Person) RETURN count(p)/0"
conflicting_label_query = "MATCH (p:Person) MATCH (p:Movie) RETURN p LIMIT 10"

# Step 2: Put them in a dictionary with descriptions
options = {
    1: ("Valid query", valid_query),
    2: ("Missing closing parenthesis", closing_parenthesis_query),
    3: ("Not provided parameter", not_param_provided_query),
    4: ("Unsatisfied relationship type", unsatisf_rel_type_query),
    5: ("Arithmetic error (division by zero)", arithemtic_error_query),
    6: ("Conflicting labels", conflicting_label_query)
}

# Step 3: Print a menu
print("Select a query you want to test with the Syntax validator:")
for key, (desc, _) in options.items():
    print(f"{key}. {desc}")

# Step 4: Get user choice
try:
    choice = int(input("Please enter a number (1-6) to choose which query to run in the Syntax Validator: ").strip())
    syntax_query = options[choice][1]   # map to the actual variable
except (ValueError, KeyError):
    print("Invalid choice! Defaulting to option 1.")
    syntax_query = valid_query

# Example function that uses the chosen variable
def use_global():
    print("Selected: \n  Candidate Query: {} \n --> {} ".format(syntax_query, options[choice][0],))

# Call the function
use_global()

Next step:

Initialize the validator: Create a SyntaxValidator object, passing your Neo4j driver. You can also enable advanced checks, such as for multi-labeled nodes.
Validate your query: Call .validate(query, database_name=…) with your Cypher query. The method returns:

is_valid: True if the query is syntactically correct, False otherwise
syntax_metadata: A list of dictionaries containing the error codes as key, and the descriptions explaining any issues found as value.

# Initialize the syntax validator
syntax_validator =  SyntaxValidator(driver, check_multilabeled_nodes=True)

# Validate the query syntax using CyVer
is_valid, syntax_metadata = syntax_validator.validate(syntax_query, database_name=database_name)

# Output whether the syntax is valid (expected: False)
print(f"Syntax Valid: {is_valid}")

# Print detailed information about the syntax errors detected
print("Syntax Error Metadata:")
for metadata in syntax_metadata:
    print(f"{metadata['code']}: \n {metadata['description']}")

Schema Validator

This section includes Cypher queries with incorrect node labels, relationship types, or invalid paths to test the Schema Validator.

A powerful feature of CyVer is path-pattern inference:

CyVer infers missing node labels or relationship types by matching query structures to known schema path patterns.

Each option of the provided Cypher queries showcases a different error regarding the Schema Validator:

valid_query: A correct Cypher query.
path_inference_query: If a node label or relationship type is missing in a path, CyVer will infer it from the schema. In this query, the label of variable m is missing. The extract method shows that the m label is inferred from the knowledge graph schema.
multiple_inference_path_available_query: If a node label or relationship type is missing in a path, CyVer will infer it from the schema. In this query, multiple path inferences are possible. Run the SchemaValidator execution cell to view the different extracted paths.
unknown_label_type_query: The provided query uses a node label or relationship type not present in the schema (e.g., typo).
unknown_path_query: This query attempts to connect two node labels with a disallowed relationship.
wrong_direction_path_query: This query uses a valid relationship between two nodes, but with the wrong direction.
variable_length_path_query: This query uses variable length in the relationship that doesn’t exist.

valid_query = "MATCH (p:Person)-[r:ACTED_IN]->(m:Movie) RETURN p.name"
path_inference_query = "MATCH (g:Person)-[:FOLLOWS]->(t:Person)-[:REVIEWED]->(m) RETURN *"  
multiple_inference_path_available_query = "MATCH (g:Person)-[]->(t:Movie) RETURN *" 
unknown_label_type_query = "MATCH (p:Persons)-[:WROTEs]->(m:Movie) RETURN p"  
unknown_path_query = "MATCH (p:Person)-[:FOLLOWS]->(m:Movie) RETURN p LIMIT 4"
wrong_direction_path_query = "MATCH (p:Person)<-[:PRODUCED]-(M:Movie) RETURN p LIMIT 10"
variable_length_path_query = "MATCH (p:Person)-[:FOLLOWS*3..4]->(m:Person) RETURN * LIMIT 5"


# Step 2: Put them in a dictionary with descriptions
options = {
    1: ("Valid query", valid_query),
    2: ("Path inference (missing node label, one possible inference)", path_inference_query),
    3: ("Path inference (missing node label, multiple possible inferences)", multiple_inference_path_available_query),
    4: ("Unknown node label/relationship type", unknown_label_type_query),
    5: ("Unknown path", unknown_path_query),
    6: ("Wrong direction in path", wrong_direction_path_query),
    7: ("Variable length path", variable_length_path_query)
}

# Step 3: Print a menu
print("Select a query you want to test with the Schema validator:")
for key, (desc, _) in options.items():
    print(f"{key}. {desc}")

# Step 4: Get user choice
try:
    choice = int(input("Please enter a number (1-7) to choose which query to run in the Schema Validator: ").strip())
    schema_query = options[choice][1]   # map to the actual variable
except (ValueError, KeyError):
    print("Invalid choice! Defaulting to option 1.")
    schema_query = valid_query

# Example function that uses the chosen variable
def use_global():
    print("Selected: \n  Candidate Query: {} \n --> {} ".format(schema_query, options[choice][0]))

# Call the function
use_global()

Then:

Initialize the validator: Create a SchemaValidator object, passing your Neo4j driver.
Extract patterns: Use .extract(query, database_name=…) to see which extracted node patterns, relationship patterns, and path patterns your query uses. This helps you understand how your query maps to the schema.
Validate your query: Call .validate(query, database_name=…) with your Cypher query. The method returns:

schema_score: Computed as a weighted average of the valid extracted patterns:

w1, w2, w3 are weighting factors that depend on the extracted components. Read more about the schema formula in the CyVer GitLab repository.

schema_metadata: A list of dictionaries containing the error codes as key and the descriptions explaining any issues found as value.

schema_validator =  SchemaValidator(driver)

# Extraction
print("--"*50)
extracted_node_labels, extracted_rel_labels, extracted_paths = schema_validator.extract(schema_query, database_name=database_name)
print(f"Extracted Node patterns : {extracted_node_labels}")
print(f"Extracted Relationships Patterns : {extracted_rel_labels}")
print(f"Extracted Path patterns : {extracted_paths}")
# Validation 
print("--"*50)
schema_score, schema_metadata = schema_validator.validate(schema_query, database_name=database_name)
print(f"Schema Validation Score: {schema_score}")
print(f"Schema Validation Metadata:")
for metadata in schema_metadata:
    print(f"{metadata['code']}: \n {metadata['description']}")

Properties Validator

Each option of the provided Cypher queries showcases a different error regarding the Properties Validator. The path-pattern inference is also available in the Properties Validator, as shown below:

valid_query: A correct Cypher query.
unknown_prop_query: The Cypher query accesses an unknown property.
invalid_prop_by_label_query: The Cypher query accesses an invalid property for the node label or relationship type.
inference_prop_query: A property is accessed on a node with a missing label in the provided query; CyVer will infer this label and will validate if the property belongs to this node.
infer_disjuction_label_query: Accesses a property on a variable that could have multiple labels (disjunction); CyVer will infer the possible labels for this node.
infer_negation_label_path_query: Accesses a property on a relationship with a negation type; CyVer will infer the possible labels for this relationship.
strict_query: An example to test the strict parameter in Properties Validator. Use True and False to see the difference.

valid_query = "MATCH (p:Movie) RETURN p.title "
unknown_prop_query = "MATCH (g:Person) RETURN g.age"  
invalid_prop_by_label_query = "MATCH (g:Movie) RETURN g.name" 
inference_prop_query = "MATCH (p:Person)-[:WROTE]->(m) RETURN m.released"  
infer_disjuction_label_query = "MATCH (p:Person|Movie) RETURN p.name LIMIT 10"
infer_negation_label_path_query = "MATCH (g:Movie{title:'As Good as It Gets'})-[r:!ACTED_IN|WROTE]-(p:Person) RETURN r.rating LIMIT 3"
strict_query = "MATCH (p) RETURN p.title,p.tagline,p.name LIMIT 4"


# Step 2: Put them in a dictionary with descriptions
options = {
    1: ("Valid query", valid_query),
    2: ("Unknown property", unknown_prop_query),
    3: ("Invalid property by label", invalid_prop_by_label_query),
    4: ("Inference of node label based on the accesed property", inference_prop_query),
    5: ("Inference of node label based on the accesed property (Disjunction labels)", infer_disjuction_label_query),
    6: ("Inference of rel type based on the accesed property (Negation label)", infer_negation_label_path_query),
    7: ("Strict mode: multiple properties, some may not exist (Please use strict = True/False to see the differences)", strict_query),

}

# Infer the possible labels in disjunction based on property anf the path they particpate
# Step 3: Print a menu
print("Select a query you want to test with the Schema validator:")
for key, (desc, _) in options.items():
    print(f"{key}. {desc}")

# Step 4: Get user choice
try:
    choice = int(input("Please enter a number (1-7) to choose which query to run in the Properties Validator: ").strip())
    props_query = options[choice][1]   # map to the actual variable
except (ValueError, KeyError):
    print("Invalid choice! Defaulting to option 1.")
    props_query = valid_query

# Example function that uses the chosen variable
def use_global():
    print("Selected: \n  Candidate Query: {} \n --> {} ".format(props_query, options[choice][0]))

# Call the function
use_global()

Then:

Initialize the validator: Create a PropertiesValidator object, passing your Neo4j driver.
Extract properties mappings: Use .extract(query, database_name=…) to see which properties are accessed by variables and labels in your query. It returns:

Variables-Properties Mapping, which shows which properties are accessed by each variable in the query ({Var_x → [property1, property2]})
Labels-Properties Mapping, which shows which properties are accessed for each node label or relationship type, including inferred labels/types ({ nodeLabel / relType → [property1, property2]}).

Validate your query: Call .validate(query, database_name=…) with your Cypher query. The method returns:

props_score: A score in [0, 1] indicating the percentage of correctly accessed properties:

props_metadata: A list of dictionaries containing the error codes as key and the descriptions explaining any issues found as value.

props_validator =  PropertiesValidator(driver)

# Set strict parameter to True or False to see the difference
strict = False

# Extraction
print("--"*50)
variables_properties , labels_properties = props_validator.extract(props_query, strict = strict, database_name=database_name)
print(f"Accessed properties by variables: {variables_properties}")
print(f"Accessed properties by labels (including inferred): {labels_properties}")

# Validation 
print("--"*50)
props_score, props_metadata = props_validator.validate(props_query, database_name=database_name, strict= strict)
print(f"Properties Validation Score: {props_score}")
print(f"Properties Validation Metadata:")
for metadata in props_metadata:
    print(f"{metadata['code']}: \n {metadata['description']}")

📊 Tip

Always start with the SyntaxValidator before using the Schema and Properties validators. Even a small syntax error can cause misleading results in downstream validation. Building on this principle, we define the KG Valid Query metric to systematically evaluate generated Cypher queries for knowledge graphs. A query is considered KG Valid only if it satisfies all three checks:

SyntaxValidator.is_valid= True → The query is syntactically correct.
SchemaValidator.score = 1 → The query respects the graph schema.
PropertiesValidator.score = 1 or None → The query accesses only valid properties.

Summary

In this article, we explored CyVer, an open-source Python library designed to validate Cypher queries for syntax, schema, and property correctness. You can now catch errors before they hit your database and boost your confidence in LLM-generated queries.

The project, developed in the Network Management and Optimal Design Laboratory (NETMODE Lab), is open source on GitLab. Explore the code, try it out, or contribute. Your feedback and a star are always welcome!

Resources

Video: Neo4j Live: CyVer — Verifying Cypher Queries
Neo4j Cypher Graph Query Language
GraphAcademy Course: Cypher Fundamentals

Verifying Neo4j Cypher Queries With CyVer was originally published in Neo4j Developer Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.