API Documentation ¶

async run(filepath, metadata=None, fs=None)[source]¶

Run the component and return its result.

Note: if run_with_context is implemented, this method will not be used.

Parameters:

filepath (str | Path)
metadata (Dict[str, str] | None)
fs (AbstractFileSystem | str | None)

Return type:

PdfDocument

TextSplitter¶

class neo4j_graphrag.experimental.components.text_splitters.base.TextSplitter[source]¶

Interface for a text splitter.

abstract async run(text)[source]¶

Splits a piece of text into chunks.

Parameters:: text (str) – The text to be split.
Returns:: A list of chunks.
Return type:: TextChunks

FixedSizeSplitter¶

class neo4j_graphrag.experimental.components.text_splitters.fixed_size_splitter.FixedSizeSplitter(chunk_size=4000, chunk_overlap=200, approximate=True)[source]¶

Text splitter which splits the input text into fixed or approximate fixed size: chunks with optional overlap.

Parameters:

chunk_size (int) – The number of characters in each chunk.
chunk_overlap (int) – The number of characters from the previous chunk to overlap with each chunk. Must be less than chunk_size.
approximate (bool) – If True, avoids splitting words in the middle at chunk boundaries. Defaults to True.

Example:

from neo4j_graphrag.experimental.components.text_splitters.fixed_size_splitter import FixedSizeSplitter
from neo4j_graphrag.experimental.pipeline import Pipeline

pipeline = Pipeline()
text_splitter = FixedSizeSplitter(chunk_size=4000, chunk_overlap=200, approximate=True)
pipeline.add_component(text_splitter, "text_splitter")

async run(text)[source]¶

Splits a piece of text into chunks.

Parameters:: text (str) – The text to be split.
Returns:: A list of chunks.
Return type:: TextChunks

LangChainTextSplitterAdapter¶

class neo4j_graphrag.experimental.components.text_splitters.langchain.LangChainTextSplitterAdapter(text_splitter)[source]¶

Adapter for LangChain TextSplitters. Allows instances of this class to be used in the knowledge graph builder pipeline.

Parameters:: text_splitter (LangChainTextSplitter) – An instance of LangChain’s TextSplitter class.

Example:

from langchain_text_splitters import RecursiveCharacterTextSplitter
from neo4j_graphrag.experimental.components.text_splitters.langchain import LangChainTextSplitterAdapter
from neo4j_graphrag.experimental.pipeline import Pipeline

pipeline = Pipeline()
text_splitter = LangChainTextSplitterAdapter(RecursiveCharacterTextSplitter())
pipeline.add_component(text_splitter, "text_splitter")

async run(text)[source]¶

Splits text into chunks.

Parameters:: text (str) – The text to split.
Returns:: The text split into chunks.
Return type:: TextChunks

LlamaIndexTextSplitterAdapter¶

class neo4j_graphrag.experimental.components.text_splitters.llamaindex.LlamaIndexTextSplitterAdapter(text_splitter)[source]¶

Adapter for LlamaIndex TextSplitters. Allows instances of this class to be used in the knowledge graph builder pipeline.

Parameters:: text_splitter (LlamaIndexTextSplitter) – An instance of LlamaIndex’s TextSplitter class.

Example:

from llama_index.core.node_parser.text.sentence import SentenceSplitter
from neo4j_graphrag.experimental.components.text_splitters.llamaindex import (
    LlamaIndexTextSplitterAdapter,
)
from neo4j_graphrag.experimental.pipeline import Pipeline

pipeline = Pipeline()
text_splitter = LlamaIndexTextSplitterAdapter(SentenceSplitter())
pipeline.add_component(text_splitter, "text_splitter")

async run(text)[source]¶

Splits text into chunks.

Parameters:: text (str) – The text to split.
Returns:: The text split into chunks.
Return type:: TextChunks

TextChunkEmbedder¶

class neo4j_graphrag.experimental.components.embedder.TextChunkEmbedder(embedder)[source]¶

Component for creating embeddings from text chunks.

Parameters:: embedder (Embedder) – The embedder to use to create the embeddings.

Example:

from neo4j_graphrag.experimental.components.embedder import TextChunkEmbedder
from neo4j_graphrag.embeddings.openai import OpenAIEmbeddings
from neo4j_graphrag.experimental.pipeline import Pipeline

embedder = OpenAIEmbeddings(model="text-embedding-3-large")
chunk_embedder = TextChunkEmbedder(embedder)
pipeline = Pipeline()
pipeline.add_component(chunk_embedder, "chunk_embedder")

async run(text_chunks)[source]¶

Embed a list of text chunks.

Parameters:: text_chunks (TextChunks) – The text chunks to embed.
Returns:: The input text chunks with each one having an added embedding.
Return type:: TextChunks

LexicalGraphBuilder¶

class neo4j_graphrag.experimental.components.lexical_graph.LexicalGraphBuilder(config=LexicalGraphConfig(id_prefix='', document_node_label='Document', chunk_node_label='Chunk', chunk_to_document_relationship_type='FROM_DOCUMENT', next_chunk_relationship_type='NEXT_CHUNK', node_to_chunk_relationship_type='FROM_CHUNK', chunk_id_property='id', chunk_index_property='index', chunk_text_property='text', chunk_embedding_property='embedding'))[source]¶

Builds the lexical graph to be inserted into neo4j. The lexical graph contains: - A node for each document - A node for each chunk - A relationship between each chunk and the document it was created from - A relationship between a chunk and the next one in the document

Parameters:: config (LexicalGraphConfig)

async run(text_chunks, document_info=None)[source]¶

Run the component and return its result.

Note: if run_with_context is implemented, this method will not be used.

Parameters:

text_chunks (TextChunks)
document_info (DocumentInfo | None)

Return type:

GraphResult

async process_chunk(graph, chunk, next_chunk, document_info=None)[source]¶

Add chunks and relationships between them (NEXT_CHUNK)

Updates graph in place.

Parameters:

graph (Neo4jGraph)
chunk (TextChunk)
next_chunk (TextChunk | None)
document_info (DocumentInfo | None)

Return type:

None

create_document_node(document_info)[source]¶

Create a Document node with ‘path’ property. Any document metadata is also added as a node property.

Parameters:: document_info (DocumentInfo)
Return type:: Neo4jNode

create_chunk_node(chunk)[source]¶

Create chunk node with properties ‘text’, ‘index’ and any ‘metadata’ added during the process. Special case for the potential chunk embedding property that gets added as an embedding_property

Parameters:: chunk (TextChunk)
Return type:: Neo4jNode

create_chunk_to_document_rel(chunk, document_info)[source]¶

Create the relationship between a chunk and the document it belongs to.

Parameters:

chunk (TextChunk)
document_info (DocumentInfo)

Return type:

Neo4jRelationship

create_next_chunk_relationship(chunk, next_chunk)[source]¶

Create relationship between a chunk and the next one

Parameters:

chunk (TextChunk)
next_chunk (TextChunk)

Return type:

Neo4jRelationship

create_node_to_chunk_rel(node, chunk_id)[source]¶

Create relationship between a chunk and entities found in that chunk

Parameters:

node (Neo4jNode)
chunk_id (str)

Return type:

Neo4jRelationship

async process_chunk_extracted_entities(chunk_graph, chunk)[source]¶

Create relationship between Chunk and each entity extracted from it.

Updates chunk_graph in place.

Parameters:

chunk_graph (Neo4jGraph)
chunk (TextChunk)

Return type:

None

Neo4jChunkReader¶

class neo4j_graphrag.experimental.components.neo4j_reader.Neo4jChunkReader(driver, fetch_embeddings=False, neo4j_database=None)[source]¶

Reads text chunks from a Neo4j database.

Parameters:

driver (neo4j.driver) – The Neo4j driver to connect to the database.
fetch_embeddings (bool) – If True, the embedding property is also returned. Default to False.
neo4j_database (Optional[str]) – The name of the Neo4j database. If not provided, this defaults to the server’s default database (“neo4j” by default) (see reference to documentation).

Example:

from neo4j import GraphDatabase
from neo4j_graphrag.experimental.components.neo4j_reader import Neo4jChunkReader

URI = "neo4j://localhost:7687"
AUTH = ("neo4j", "password")
DATABASE = "neo4j"

driver = GraphDatabase.driver(URI, auth=AUTH)
reader = Neo4jChunkReader(driver=driver, neo4j_database=DATABASE)
await reader.run()

async run(lexical_graph_config=LexicalGraphConfig(id_prefix='', document_node_label='Document', chunk_node_label='Chunk', chunk_to_document_relationship_type='FROM_DOCUMENT', next_chunk_relationship_type='NEXT_CHUNK', node_to_chunk_relationship_type='FROM_CHUNK', chunk_id_property='id', chunk_index_property='index', chunk_text_property='text', chunk_embedding_property='embedding'))[source]¶

Reads text chunks from a Neo4j database.

Parameters:: lexical_graph_config (LexicalGraphConfig) – Node labels and relationship types for the lexical graph.
Return type:: TextChunks

SchemaBuilder¶

class neo4j_graphrag.experimental.components.schema.SchemaBuilder[source]¶

A builder class for constructing GraphSchema objects from given entities, relations, and their interrelationships defined in a potential schema.

Example:

from neo4j_graphrag.experimental.components.schema import (
    SchemaBuilder,
    NodeType,
    PropertyType,
    RelationshipType,
)
from neo4j_graphrag.experimental.pipeline import Pipeline

node_types = [
    NodeType(
        label="PERSON",
        description="An individual human being.",
        properties=[
            PropertyType(
                name="name", type="STRING", description="The name of the person"
            )
        ],
    ),
    NodeType(
        label="ORGANIZATION",
        description="A structured group of people with a common purpose.",
        properties=[
            PropertyType(
                name="name", type="STRING", description="The name of the organization"
            )
        ],
    ),
]
relationship_types = [
    RelationshipType(
        label="EMPLOYED_BY", description="Indicates employment relationship."
    ),
]
patterns = [
    ("PERSON", "EMPLOYED_BY", "ORGANIZATION"),
]
pipe = Pipeline()
schema_builder = SchemaBuilder()
pipe.add_component(schema_builder, "schema_builder")
pipe_inputs = {
    "schema": {
        "node_types": node_types,
        "relationship_types": relationship_types,
        "patterns": patterns,
    },
    ...
}
pipe.run(pipe_inputs)

async run(node_types, relationship_types=None, patterns=None, **kwargs)[source]¶

Asynchronously constructs and returns a GraphSchema object.

Parameters:

node_types (Sequence[NodeType]) – Sequence of NodeType objects.
relationship_types (Sequence[RelationshipType]) – Sequence of RelationshipType objects.
patterns (Optional[Sequence[Tuple[str, str, str]]]) – Sequence of triplets: (source_entity_label, relation_label, target_entity_label).
kwargs (Any)

Returns:

A configured schema object, constructed asynchronously.

Return type:

GraphSchema

SchemaFromTextExtractor¶

class neo4j_graphrag.experimental.components.schema.SchemaFromTextExtractor(llm, prompt_template=None, llm_params=None)[source]¶

A component for constructing GraphSchema objects from the output of an LLM after automatic schema extraction from text.

Parameters:

llm (LLMInterface)
prompt_template (Optional[PromptTemplate])
llm_params (Optional[Dict[str, Any]])

async run(text, examples='', **kwargs)[source]¶

Asynchronously extracts the schema from text and returns a GraphSchema object.

Parameters:

text (str) – the text from which the schema will be inferred.
examples (str) – examples to guide schema extraction.
kwargs (Any)

Returns:

A configured schema object, extracted automatically and constructed asynchronously.

Return type:

GraphSchema

EntityRelationExtractor¶

class neo4j_graphrag.experimental.components.entity_relation_extractor.EntityRelationExtractor(*args, on_error=OnError.IGNORE, create_lexical_graph=True, **kwargs)[source]¶

Abstract class for entity relation extraction components.

Parameters:

on_error (OnError) – What to do when an error occurs during extraction. Defaults to raising an error.
create_lexical_graph (bool) – Whether to include the text chunks in the graph in addition to the extracted entities and relations. Defaults to True.
args (Any)
kwargs (Any)

async run(chunks, document_info=None, lexical_graph_config=None, **kwargs)[source]¶

Run the component and return its result.

Note: if run_with_context is implemented, this method will not be used.

Parameters:

chunks (TextChunks)
document_info (DocumentInfo | None)
lexical_graph_config (LexicalGraphConfig | None)
kwargs (Any)

Return type:

Neo4jGraph

update_ids(graph, chunk)[source]¶

Make node IDs unique across chunks, document and pipeline runs by prefixing them with a unique prefix.

Parameters:

graph (Neo4jGraph)
chunk (TextChunk)

Return type:

Neo4jGraph

LLMEntityRelationExtractor¶

class neo4j_graphrag.experimental.components.entity_relation_extractor.LLMEntityRelationExtractor(llm, prompt_template=<neo4j_graphrag.generation.prompts.ERExtractionTemplate object>, create_lexical_graph=True, on_error=OnError.RAISE, max_concurrency=5)[source]¶

Extracts a knowledge graph from a series of text chunks using a large language model.

Parameters:

llm (LLMInterface) – The language model to use for extraction.
prompt_template (ERExtractionTemplate | str) – A custom prompt template to use for extraction.
create_lexical_graph (bool) – Whether to include the text chunks in the graph in addition to the extracted entities and relations. Defaults to True.
on_error (OnError) – What to do when an error occurs during extraction. Defaults to raising an error.
max_concurrency (int) – The maximum number of concurrent tasks which can be used to make requests to the LLM.

Example:

from neo4j_graphrag.experimental.components.entity_relation_extractor import LLMEntityRelationExtractor
from neo4j_graphrag.llm import OpenAILLM
from neo4j_graphrag.experimental.pipeline import Pipeline

llm = OpenAILLM(model_name="gpt-4o", model_params={"temperature": 0, "response_format": {"type": "object"}})

extractor = LLMEntityRelationExtractor(llm=llm)
pipe = Pipeline()
pipe.add_component(extractor, "extractor")

async run(chunks, document_info=None, lexical_graph_config=None, schema=None, examples='', **kwargs)[source]¶

Perform entity and relation extraction for all chunks in a list.

Optionally, creates the “lexical graph” by adding nodes and relationships to represent the document and its chunks in the returned graph (For more details, see the Lexical Graph Builder doc and the User Guide)

Parameters:

chunks (TextChunks) – List of text chunks to extract entities and relations from.
document_info (Optional[DocumentInfo], optional) – Document the chunks are coming from. Used in the lexical graph creation step.
lexical_graph_config (Optional[LexicalGraphConfig], optional) – Lexical graph configuration to customize node labels and relationship types in the lexical graph.
schema (GraphSchema | None) – Definition of the schema to guide the LLM in its extraction.
examples (str) – Examples for few-shot learning in the prompt.
kwargs (Any)

Return type:

Neo4jGraph

KGWriter¶

class neo4j_graphrag.experimental.components.kg_writer.KGWriter[source]¶

Abstract class used to write a knowledge graph to a data store.

abstract async run(graph, lexical_graph_config=LexicalGraphConfig(id_prefix='', document_node_label='Document', chunk_node_label='Chunk', chunk_to_document_relationship_type='FROM_DOCUMENT', next_chunk_relationship_type='NEXT_CHUNK', node_to_chunk_relationship_type='FROM_CHUNK', chunk_id_property='id', chunk_index_property='index', chunk_text_property='text', chunk_embedding_property='embedding'))[source]¶

Writes the graph to a data store.

Parameters:

graph (Neo4jGraph) – The knowledge graph to write to the data store.
lexical_graph_config (LexicalGraphConfig) – Node labels and relationship types in the lexical graph.

Return type:

KGWriterModel

Neo4jWriter¶

class neo4j_graphrag.experimental.components.kg_writer.Neo4jWriter(driver, neo4j_database=None, batch_size=1000, clean_db=True)[source]¶

Writes a knowledge graph to a Neo4j database.

Parameters:

driver (neo4j.driver) – The Neo4j driver to connect to the database.
neo4j_database (Optional[str]) –
The name of the Neo4j database. If not provided, this defaults to the server’s default database (“neo4j” by default) (see reference to documentation).
batch_size (int) – The number of nodes or relationships to write to the database in a batch. Defaults to 1000.
clean_db (bool)

Example:

from neo4j import GraphDatabase
from neo4j_graphrag.experimental.components.kg_writer import Neo4jWriter
from neo4j_graphrag.experimental.pipeline import Pipeline

URI = "neo4j://localhost:7687"
AUTH = ("neo4j", "password")
DATABASE = "neo4j"

driver = GraphDatabase.driver(URI, auth=AUTH)
writer = Neo4jWriter(driver=driver, neo4j_database=DATABASE)

pipeline = Pipeline()
pipeline.add_component(writer, "writer")

async run(graph, lexical_graph_config=LexicalGraphConfig(id_prefix='', document_node_label='Document', chunk_node_label='Chunk', chunk_to_document_relationship_type='FROM_DOCUMENT', next_chunk_relationship_type='NEXT_CHUNK', node_to_chunk_relationship_type='FROM_CHUNK', chunk_id_property='id', chunk_index_property='index', chunk_text_property='text', chunk_embedding_property='embedding'))[source]¶

Upserts a knowledge graph into a Neo4j database.

Parameters:

graph (Neo4jGraph) – The knowledge graph to upsert into the database.
lexical_graph_config (LexicalGraphConfig) – Node labels and relationship types for the lexical graph.

Return type:

KGWriterModel

SinglePropertyExactMatchResolver¶

class neo4j_graphrag.experimental.components.resolver.SinglePropertyExactMatchResolver(driver, filter_query=None, resolve_property='name', neo4j_database=None)[source]¶

Resolve entities with same label and exact same property (default is “name”).

Parameters:

driver (neo4j.Driver) – The Neo4j driver to connect to the database.
filter_query (Optional[str]) – To reduce the resolution scope, add a Cypher WHERE clause.
resolve_property (str) – The property that will be compared (default: “name”). If values match exactly, entities are merged.
neo4j_database (Optional[str]) –
The name of the Neo4j database. If not provided, this defaults to the server’s default database (“neo4j” by default) (see reference to documentation).

Example:

from neo4j import GraphDatabase
from neo4j_graphrag.experimental.components.resolver import SinglePropertyExactMatchResolver

URI = "neo4j://localhost:7687"
AUTH = ("neo4j", "password")
DATABASE = "neo4j"

driver = GraphDatabase.driver(URI, auth=AUTH)
resolver = SinglePropertyExactMatchResolver(driver=driver, neo4j_database=DATABASE)
await resolver.run()  # no expected parameters

async run()[source]¶

Resolve entities based on the following rule: For each entity label, entities with the same ‘resolve_property’ value (exact match) are grouped into a single node:

Properties: the property from the first node will remain if already set, otherwise the first property in list will be written.
Relationships: merge relationships with same type and target node.

See apoc.refactor.mergeNodes documentation for more details.

Return type:: ResolutionStats

SpaCySemanticMatchResolver¶

class neo4j_graphrag.experimental.components.resolver.SpaCySemanticMatchResolver(driver, filter_query=None, resolve_properties=None, similarity_threshold=0.8, spacy_model='en_core_web_lg', neo4j_database=None)[source]¶

Resolve entities with same label and similar set of textual properties (default is [“name”]) based on spaCy’s static embeddings and cosine similarities.

Parameters:

driver (neo4j.Driver) – The Neo4j driver to connect to the database.
filter_query (Optional[str]) – Optional Cypher WHERE clause to reduce the resolution scope.
resolve_properties (Optional[List[str]]) – The list of properties to consider for embeddings Defaults to [“name”].
similarity_threshold (float) – The similarity threshold above which nodes are merged. Defaults to 0.8.
spacy_model (str) – The name of the spaCy model to load. Defaults to “en_core_web_lg”.
neo4j_database (Optional[str]) –
The name of the Neo4j database. If not provided, this defaults to the server’s default database (“neo4j” by default) (see reference to documentation).

Example:

from neo4j import GraphDatabase
from neo4j_graphrag.experimental.components.resolver import SpaCySemanticMatchResolver

URI = "neo4j://localhost:7687"
AUTH = ("neo4j", "password")
DATABASE = "neo4j"

driver = GraphDatabase.driver(URI, auth=AUTH)
resolver = SpaCySemanticMatchResolver(driver=driver, neo4j_database=DATABASE)
await resolver.run()  # no expected parameters

async run()[source]¶

Run the component and return its result.

Note: if run_with_context is implemented, this method will not be used.

Return type:: ResolutionStats

FuzzyMatchResolver¶

class neo4j_graphrag.experimental.components.resolver.FuzzyMatchResolver(driver, filter_query=None, resolve_properties=None, similarity_threshold=0.8, neo4j_database=None)[source]¶

Resolve entities with the same label and similar set of textual properties using RapidFuzz for fuzzy matching. Similarity scores are normalized to a value between 0 and 1.

Parameters:

driver (neo4j.Driver)
filter_query (Optional[str])
resolve_properties (Optional[List[str]])
similarity_threshold (float)
neo4j_database (Optional[str])

async run()[source]¶

Run the component and return its result.

Note: if run_with_context is implemented, this method will not be used.

Return type:: ResolutionStats

Pipelines¶

Pipeline¶

class neo4j_graphrag.experimental.pipeline.Pipeline(store=None, callback=None)[source]¶

This is the main pipeline, where components and their execution order are defined

Parameters:

store (Optional[ResultStore])
callback (Optional[EventCallbackProtocol])

draw(path, layout='dot', hide_unused_outputs=True)[source]¶

Render the pipeline graph to an HTML file at the specified path

Parameters:

path (str)
layout (str)
hide_unused_outputs (bool)

Return type:

Any

add_component(component, name)[source]¶

Add a new component. Components are uniquely identified by their name. If ‘name’ is already in the pipeline, a ValueError is raised.

Parameters:

component (Component)
name (str)

Return type:

None

connect(start_component_name, end_component_name, input_config=None)[source]¶

Connect one component to another.

Parameters:

start_component_name (str) – name of the component as defined in the add_component method
end_component_name (str) – name of the component as defined in the add_component method
input_config (Optional[dict[str, str]]) – end component input configuration: propagate previous components outputs.

Raises:

PipelineDefinitionError – if the provided component are not in the Pipeline or if the graph that would be created by this connection is cyclic.

Return type:

None

async run(data)[source]¶

Parameters:: data (dict[str, Any])
Return type:: PipelineResult

SimpleKGPipeline¶

class neo4j_graphrag.experimental.pipeline.kg_builder.SimpleKGPipeline(llm, driver, embedder, entities=None, relations=None, potential_schema=None, schema=None, from_pdf=True, text_splitter=None, pdf_loader=None, kg_writer=None, on_error='IGNORE', prompt_template=<neo4j_graphrag.generation.prompts.ERExtractionTemplate object>, perform_entity_resolution=True, lexical_graph_config=None, neo4j_database=None)[source]¶

A class to simplify the process of building a knowledge graph from text documents. It abstracts away the complexity of setting up the pipeline and its components.

Parameters:

llm (LLMInterface) – An instance of an LLM to use for entity and relation extraction.
driver (neo4j.Driver) – A Neo4j driver instance for database connection.
embedder (Embedder) – An instance of an embedder used to generate chunk embeddings from text chunks.
schema (Optional[Union[GraphSchema, dict[str, list]]]) – A schema configuration defining node types, relationship types, and graph patterns.
entities (Optional[List[Union[str, dict[str, str], NodeType]]]) –
DEPRECATED. A list of either:
- str: entity labels
- dict: following the NodeType schema, ie with label, description and properties keys
Deprecated since version 1.7.1: Use schema instead
relations (Optional[List[Union[str, dict[str, str], RelationshipType]]]) –
DEPRECATED. A list of either:
- str: relation label
- dict: following the RelationshipType schema, ie with label, description and properties keys
Deprecated since version 1.7.1: Use schema instead
potential_schema (Optional[List[tuple]]) –
DEPRECATED. A list of potential schema relationships.

Deprecated since version 1.7.1: Use schema instead
from_pdf (bool) – Determines whether to include the PdfLoader in the pipeline. If True, expects file_path input in run methods. If False, expects text input in run methods.
text_splitter (Optional[TextSplitter]) – A text splitter component. Defaults to FixedSizeSplitter().
pdf_loader (Optional[DataLoader]) – A PDF loader component. Defaults to PdfLoader().
kg_writer (Optional[KGWriter]) – A knowledge graph writer component. Defaults to Neo4jWriter().
on_error (str) – Error handling strategy for the Entity and relation extractor. Defaults to “IGNORE”, where chunk will be ignored if extraction fails. Possible values: “RAISE” or “IGNORE”.
perform_entity_resolution (bool) – Merge entities with same label and name. Default: True
prompt_template (str) – A custom prompt template to use for extraction.
lexical_graph_config (Optional[LexicalGraphConfig], optional) – Lexical graph configuration to customize node labels and relationship types in the lexical graph.
neo4j_database (Optional[str])

async run_async(file_path=None, text=None)[source]¶

Asynchronously runs the knowledge graph building process.

Parameters:

file_path (Optional[str]) – The path to the PDF file to process. Required if from_pdf is True.
text (Optional[str]) – The text content to process. Required if from_pdf is False.

Returns:

The result of the pipeline execution.

Return type:

PipelineResult

Config files¶

SimpleKGPipelineConfig¶

class neo4j_graphrag.experimental.pipeline.config.template_pipeline.simple_kg_builder.SimpleKGPipelineConfig(*, neo4j_config={}, llm_config={}, embedder_config={}, extras={}, template_=PipelineType.SIMPLE_KG_PIPELINE, from_pdf=False, entities=[], relations=[], potential_schema=None, schema=None, on_error=OnError.IGNORE, prompt_template=<neo4j_graphrag.generation.prompts.ERExtractionTemplate object>, perform_entity_resolution=True, lexical_graph_config=None, neo4j_database=None, pdf_loader=None, kg_writer=None, text_splitter=None)[source]¶

Parameters:

neo4j_config (dict[str, Neo4jDriverType])
llm_config (dict[str, LLMType])
embedder_config (dict[str, EmbedderType])
extras (dict[str, float | str | ParamFromEnvConfig | ParamFromKeyConfig | dict[str, Any]])
template_ (Literal[PipelineType.SIMPLE_KG_PIPELINE])
from_pdf (bool)
entities (Sequence[str | dict[str, str | list[dict[str, str]]]])
relations (Sequence[str | dict[str, str | list[dict[str, str]]]])
potential_schema (list[tuple[str, str, str]] | None)
schema (GraphSchema | None)
on_error (OnError)
prompt_template (ERExtractionTemplate | str)
perform_entity_resolution (bool)
lexical_graph_config (LexicalGraphConfig | None)
neo4j_database (str | None)
pdf_loader (ComponentType | None)
kg_writer (ComponentType | None)
text_splitter (ComponentType | None)

PipelineRunner¶

class neo4j_graphrag.experimental.pipeline.config.runner.PipelineRunner(pipeline_definition, config=None, do_cleaning=False)[source]¶

Pipeline runner builds a pipeline from different objects and exposes a run method to run pipeline

Pipeline can be built from: - A PipelineDefinition (__init__ method) - A PipelineConfig (from_config method) - A config file (from_config_file method)

Parameters:

pipeline_definition (PipelineDefinition)
config (Optional[AbstractPipelineConfig])
do_cleaning (bool)

Retrievers¶

RetrieverInterface¶

class neo4j_graphrag.retrievers.base.Retriever(driver, neo4j_database=None)[source]¶

Abstract class for Neo4j retrievers

Parameters:

driver (neo4j.Driver)
neo4j_database (Optional[str])

index_name: str¶

VERIFY_NEO4J_VERSION = True¶

search(*args, **kwargs)[source]¶

Search method. Call the get_search_results method that returns a list of neo4j.Record, and format them using the function returned by get_result_formatter to return RetrieverResult.

Parameters:

args (Any)
kwargs (Any)

Return type:

RetrieverResult

abstract get_search_results(*args, **kwargs)[source]¶

This method must be implemented in each child class. It will receive the same parameters provided to the public interface via the search method, after validation. It returns a RawSearchResult object which comprises a list of neo4j.Record objects and an optional metadata dictionary that can contain retriever-level information.

Note that, even though this method is not intended to be called from outside the class, we make it public to make it clearer for the developers that it should be implemented in child classes.

Returns:

List of Neo4j Records and optional metadata dict

Return type:

Parameters:

args (Any)
kwargs (Any)

get_result_formatter()[source]¶

Returns the function to use to transform a neo4j.Record to a RetrieverResultItem.

Return type:: Callable[[Record], RetrieverResultItem]

default_record_formatter(record)[source]¶

Best effort to guess the node-to-text method. Inherited classes can override this method to implement custom text formatting.

Parameters:: record (Record)
Return type:: RetrieverResultItem

VectorRetriever¶

class neo4j_graphrag.retrievers.VectorRetriever(driver, index_name, embedder=None, return_properties=None, result_formatter=None, neo4j_database=None)[source]¶

Provides retrieval method using vector search over embeddings. If an embedder is provided, it needs to have the required Embedder type.

Example:

import neo4j
from neo4j_graphrag.retrievers import VectorRetriever

driver = neo4j.GraphDatabase.driver(URI, auth=AUTH)

retriever = VectorRetriever(driver, "vector-index-name", custom_embedder)
retriever.search(query_text="Find me a book about Fremen", top_k=5)

or if the vector embedding of the query text is available:

retriever.search(query_vector=..., top_k=5)

Parameters:

driver (neo4j.Driver) – The Neo4j Python driver.
index_name (str) – Vector index name.
embedder (Optional[Embedder]) – Embedder object to embed query text.
return_properties (Optional[list[str]]) – List of node properties to return.
result_formatter (Optional[Callable[[neo4j.Record], RetrieverResultItem]]) –
Provided custom function to transform a neo4j.Record to a RetrieverResultItem.

Two variables are provided in the neo4j.Record:
- node: Represents the node retrieved from the vector index search.
- score: Denotes the similarity score.
neo4j_database (Optional[str]) –
The name of the Neo4j database. If not provided, this defaults to the server’s default database (“neo4j” by default) (see reference to documentation).

Raises:

RetrieverInitializationError – If validation of the input arguments fail.

search(query_vector=None, query_text=None, top_k=5, effective_search_ratio=1, filters=None)¶

Get the top_k nearest neighbor embeddings for either provided query_vector or query_text. See the following documentation for more details:

To query by text, an embedder must be provided when the class is instantiated. The embedder is not required if query_vector is passed.

Parameters:

query_vector (Optional[list[float]]) – The vector embeddings to get the closest neighbors of. Defaults to None.
query_text (Optional[str]) – The text to get the closest neighbors of. Defaults to None.
top_k (int) – The number of neighbors to return. Defaults to 5.
effective_search_ratio (int) – Controls the candidate pool size by multiplying top_k to balance query accuracy and performance. Defaults to 1.
filters (Optional[dict[str, Any]]) – Filters for metadata pre-filtering. Defaults to None.

Raises:

SearchValidationError – If validation of the input arguments fail.
EmbeddingRequiredError – If no embedder is provided.

Returns:

The results of the search query as a list of neo4j.Record and an optional metadata dict

Return type:

VectorCypherRetriever¶

class neo4j_graphrag.retrievers.VectorCypherRetriever(driver, index_name, retrieval_query, embedder=None, result_formatter=None, neo4j_database=None)[source]¶

Provides retrieval method using vector similarity augmented by a Cypher query. This retriever builds on VectorRetriever. If an embedder is provided, it needs to have the required Embedder type.

Note: node is a variable from the base query that can be used in retrieval_query as seen in the example below.

The retrieval_query is additional Cypher that can allow for graph traversal after retrieving node.

Example:

import neo4j
from neo4j_graphrag.retrievers import VectorCypherRetriever

driver = neo4j.GraphDatabase.driver(URI, auth=AUTH)

retrieval_query = "MATCH (node)-[:AUTHORED_BY]->(author:Author)" "RETURN author.name"
retriever = VectorCypherRetriever(
  driver, "vector-index-name", retrieval_query, custom_embedder
)
retriever.search(query_text="Find me a book about Fremen", top_k=5)

Parameters:

driver (neo4j.Driver) – The Neo4j Python driver.
index_name (str) – Vector index name.
retrieval_query (str) – Cypher query that gets appended.
embedder (Optional[Embedder]) – Embedder object to embed query text.
result_formatter (Optional[Callable[[neo4j.Record], RetrieverResultItem]]) – Provided custom function to transform a neo4j.Record to a RetrieverResultItem.
neo4j_database (Optional[str]) –
The name of the Neo4j database. If not provided, this defaults to the server’s default database (“neo4j” by default) (see reference to documentation).

HybridRetriever¶

class neo4j_graphrag.retrievers.HybridRetriever(driver, vector_index_name, fulltext_index_name, embedder=None, return_properties=None, result_formatter=None, neo4j_database=None)[source]¶

Provides retrieval method using combination of vector search over embeddings and fulltext search. If an embedder is provided, it needs to have the required Embedder type.

Example:

import neo4j
from neo4j_graphrag.retrievers import HybridRetriever

driver = neo4j.GraphDatabase.driver(URI, auth=AUTH)

retriever = HybridRetriever(
    driver, "vector-index-name", "fulltext-index-name", custom_embedder
)
retriever.search(query_text="Find me a book about Fremen", top_k=5)

Parameters:

driver (neo4j.Driver) – The Neo4j Python driver.
vector_index_name (str) – Vector index name.
fulltext_index_name (str) – Fulltext index name.
embedder (Optional[Embedder]) – Embedder object to embed query text.
return_properties (Optional[list[str]]) – List of node properties to return.
result_formatter (Optional[Callable[[neo4j.Record], RetrieverResultItem]]) – Provided custom function to transform a neo4j.Record to a RetrieverResultItem.
neo4j_database (Optional[str]) –
The name of the Neo4j database. If not provided, this defaults to the server’s default database (“neo4j” by default) (see reference to documentation).

Two variables are provided in the neo4j.Record:
- node: Represents the node retrieved from the vector index search.
- score: Denotes the similarity score.

search(query_text, query_vector=None, top_k=5, effective_search_ratio=1, ranker=HybridSearchRanker.NAIVE, alpha=None)¶

Get the top_k nearest neighbor embeddings for either provided query_vector or query_text. Both query_vector and query_text can be provided. If query_vector is provided, then it will be preferred over the embedded query_text for the vector search.

See the following documentation for more details:

To query by text, an embedder must be provided when the class is instantiated.

Parameters:

query_text (str) – The text to get the closest neighbors of.
query_vector (Optional[list[float]], optional) – The vector embeddings to get the closest neighbors of. Defaults to None.
top_k (int, optional) – The number of neighbors to return. Defaults to 5.
effective_search_ratio (int) – Controls the candidate pool size for the vector index by multiplying top_k to balance query accuracy and performance. Defaults to 1.
ranker (str, HybridSearchRanker) – Type of ranker to order the results from retrieval.
alpha (Optional[float]) – Weight for the vector score when using the linear ranker. The fulltext index score is multiplied by (1 - alpha). Required when using the linear ranker; must be between 0 and 1.

Raises:

SearchValidationError – If validation of the input arguments fail.
EmbeddingRequiredError – If no embedder is provided.

Returns:

The results of the search query as a list of neo4j.Record and an optional metadata dict

Return type:

HybridCypherRetriever¶

class neo4j_graphrag.retrievers.HybridCypherRetriever(driver, vector_index_name, fulltext_index_name, retrieval_query, embedder=None, result_formatter=None, neo4j_database=None)[source]¶

Provides retrieval method using combination of vector search over embeddings and fulltext search, augmented by a Cypher query. This retriever builds on HybridRetriever. If an embedder is provided, it needs to have the required Embedder type.

Note: node is a variable from the base query that can be used in retrieval_query as seen in the example below.

Example:

import neo4j
from neo4j_graphrag.retrievers import HybridCypherRetriever

driver = neo4j.GraphDatabase.driver(URI, auth=AUTH)

retrieval_query = "MATCH (node)-[:AUTHORED_BY]->(author:Author)" "RETURN author.name"
retriever = HybridCypherRetriever(
    driver, "vector-index-name", "fulltext-index-name", retrieval_query, custom_embedder
)
retriever.search(query_text="Find me a book about Fremen", top_k=5)

To query by text, an embedder must be provided when the class is instantiated.

Parameters:

driver (neo4j.Driver) – The Neo4j Python driver.
vector_index_name (str) – Vector index name.
fulltext_index_name (str) – Fulltext index name.
retrieval_query (str) – Cypher query that gets appended.
embedder (Optional[Embedder]) – Embedder object to embed query text.
result_formatter (Optional[Callable[[neo4j.Record], RetrieverResultItem]]) – Provided custom function to transform a neo4j.Record to a RetrieverResultItem.
neo4j_database (Optional[str]) –
The name of the Neo4j database. If not provided, this defaults to the server’s default database (“neo4j” by default) (see reference to documentation).

Raises:

RetrieverInitializationError – If validation of the input arguments fail.

search(query_text, query_vector=None, top_k=5, effective_search_ratio=1, query_params=None, ranker=HybridSearchRanker.NAIVE, alpha=None)¶

Get the top_k nearest neighbor embeddings for either provided query_vector or query_text. Both query_vector and query_text can be provided. If query_vector is provided, then it will be preferred over the embedded query_text for the vector search.

See the following documentation for more details:

Parameters:

query_text (str) – The text to get the closest neighbors of.
query_vector (Optional[list[float]]) – The vector embeddings to get the closest neighbors of. Defaults to None.
top_k (int) – The number of neighbors to return. Defaults to 5.
effective_search_ratio (int) – Controls the candidate pool size for the vector index by multiplying top_k to balance query accuracy and performance. Defaults to 1.
query_params (Optional[dict[str, Any]]) – Parameters for the Cypher query. Defaults to None.
ranker (str, HybridSearchRanker) – Type of ranker to order the results from retrieval.
alpha (Optional[float]) – Weight for the vector score when using the linear ranker. The fulltext index score is multiplied by (1 - alpha). Required when using the linear ranker; must be between 0 and 1.

Raises:

SearchValidationError – If validation of the input arguments fail.
EmbeddingRequiredError – If no embedder is provided.

Returns:

The results of the search query as a list of neo4j.Record and an optional metadata dict

Return type:

Text2CypherRetriever¶

class neo4j_graphrag.retrievers.Text2CypherRetriever(driver, llm, neo4j_schema=None, examples=None, result_formatter=None, custom_prompt=None, neo4j_database=None)[source]¶

Allows for the retrieval of records from a Neo4j database using natural language. Converts a user’s natural language query to a Cypher query using an LLM, then retrieves records from a Neo4j database using the generated Cypher query.

Parameters:

driver (neo4j.Driver) – The Neo4j Python driver.
llm (neo4j_graphrag.generation.llm.LLMInterface) – LLM object to generate the Cypher query.
neo4j_schema (Optional[str]) – Neo4j schema used to generate the Cypher query.
examples (Optional[list[str], optional) – Optional user input/query pairs for the LLM to use as examples.
custom_prompt (Optional[str]) – Optional custom prompt to use instead of auto generated prompt. Will include the neo4j_schema for schema and examples for examples prompt parameters, if they are provided.
result_formatter (Optional[Callable[[neo4j.Record], RetrieverResultItem]])
neo4j_database (Optional[str])

Raises:

RetrieverInitializationError – If validation of the input arguments fail.

search(query_text, prompt_params=None)¶

Converts query_text to a Cypher query using an LLM.: Retrieve records from a Neo4j database using the generated Cypher query.

Parameters:

query_text (str) – The natural language query used to search the Neo4j database.
prompt_params (Dict[str, Any]) – additional values to inject into the custom prompt, if it is provided. If the schema or examples parameter is specified, it will overwrite the corresponding value passed during initialization. Example: {‘schema’: ‘this is the graph schema’}

Raises:

SearchValidationError – If validation of the input arguments fail.
Text2CypherRetrievalError – If the LLM fails to generate a correct Cypher query.

Returns:

The results of the search query as a list of neo4j.Record and an optional metadata dict

Return type:

External Retrievers¶

This section includes retrievers that integrate with databases external to Neo4j.

WeaviateNeo4jRetriever¶

class neo4j_graphrag.retrievers.external.weaviate.weaviate.WeaviateNeo4jRetriever(driver, client, collection, id_property_external, id_property_neo4j, embedder=None, return_properties=None, retrieval_query=None, result_formatter=None, neo4j_database=None)[source]¶

Provides retrieval method using vector search over embeddings with a Weaviate database. If an embedder is provided, it needs to have the required Embedder type.

Example:

from neo4j import GraphDatabase
from neo4j_graphrag.retrievers import WeaviateNeo4jRetriever
from weaviate.connect.helpers import connect_to_local

with GraphDatabase.driver(NEO4J_URL, auth=NEO4J_AUTH) as neo4j_driver:
    with connect_to_local() as w_client:
        retriever = WeaviateNeo4jRetriever(
            driver=neo4j_driver,
            client=w_client,
            collection="Jeopardy",
            id_property_external="neo4j_id",
            id_property_neo4j="id"
        )

        result = retriever.search(query_text="biology", top_k=2)

Parameters:

driver (neo4j.Driver) – The Neo4j Python driver.
client (WeaviateClient) – The Weaviate client object.
collection (str) – Name of a set of Weaviate objects that share the same data structure.
id_property_external (str) – The name of the Weaviate property that has the identifier that refers to a corresponding Neo4j node id property.
id_property_neo4j (str) – The name of the Neo4j node property that’s used as the identifier for relating matches from Weaviate to Neo4j nodes.
embedder (Optional[Embedder]) – Embedder object to embed query text.
return_properties (Optional[list[str]]) – List of node properties to return.
result_formatter (Optional[Callable[[neo4j.Record], RetrieverResultItem]]) – Function to transform a neo4j.Record to a RetrieverResultItem.
neo4j_database (Optional[str]) –
The name of the Neo4j database. If not provided, this defaults to the server’s default database (“neo4j” by default) (see reference to documentation).
retrieval_query (Optional[str])

Raises:

RetrieverInitializationError – If validation of the input arguments fail.

search(query_vector=None, query_text=None, top_k=5, **kwargs)¶

Get the top_k nearest neighbor embeddings using Weaviate for either provided query_vector or query_text. Both query_vector and query_text can be provided. If query_vector is provided, then it will be preferred over the embedded query_text for the vector search. If query_text is provided, then it will check if an embedder is provided and use it to generate the query_vector. If no embedder is provided, then it will assume that the vectorizer is used in Weaviate.

Example:

import neo4j
from neo4j_graphrag.retrievers import WeaviateNeo4jRetriever

driver = neo4j.GraphDatabase.driver(URI, auth=AUTH)

retriever = WeaviateNeo4jRetriever(
    driver=driver,
    client=weaviate_client,
    collection="Jeopardy",
    id_property_external="neo4j_id",
    id_property_neo4j="id",
)

biology_embedding = ...
retriever.search(query_vector=biology_embedding, top_k=2)

Parameters:

query_text (Optional[str]) – The text to get the closest neighbors of.
query_vector (Optional[list[float]]) – The vector embeddings to get the closest neighbors of. Defaults to None.
top_k (int) – The number of neighbors to return. Defaults to 5.
kwargs (Any)

Raises:

SearchValidationError – If validation of the input arguments fail.

Returns:

The results of the search query as a list of neo4j.Record and an optional metadata dict

Return type:

PineconeNeo4jRetriever¶

class neo4j_graphrag.retrievers.external.pinecone.pinecone.PineconeNeo4jRetriever(driver, client, index_name, id_property_neo4j, embedder=None, return_properties=None, retrieval_query=None, result_formatter=None, neo4j_database=None)[source]¶

Provides retrieval method using vector search over embeddings with a Pinecone database. If an embedder is provided, it needs to have the required Embedder type.

Example:

from neo4j import GraphDatabase
from neo4j_graphrag.retrievers import PineconeNeo4jRetriever
from pinecone import Pinecone

with GraphDatabase.driver(NEO4J_URL, auth=NEO4J_AUTH) as neo4j_driver:
    pc_client = Pinecone(PC_API_KEY)
    embedder = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")

    retriever = PineconeNeo4jRetriever(
        driver=neo4j_driver,
        client=pc_client,
        index_name="jeopardy",
        id_property_neo4j="id",
        embedder=embedder,
    )

    result = retriever.search(query_text="biology", top_k=2)

Parameters:

driver (neo4j.Driver) – The Neo4j Python driver.
client (Pinecone) – The Pinecone client object.
index_name (str) – The name of the Pinecone index.
id_property_neo4j (str) – The name of the Neo4j node property that’s used as the identifier for relating matches from Pinecone to Neo4j nodes.
embedder (Optional[Embedder]) – Embedder object to embed query text.
return_properties (Optional[list[str]]) – List of node properties to return.
retrieval_query (str) – Cypher query that gets appended.
result_formatter (Optional[Callable[[neo4j.Record], RetrieverResultItem]]) – Function to transform a neo4j.Record to a RetrieverResultItem.
neo4j_database (Optional[str]) –
The name of the Neo4j database. If not provided, this defaults to the server’s default database (“neo4j” by default) (see reference to documentation).

Raises:

RetrieverInitializationError – If validation of the input arguments fail.

search(query_vector=None, query_text=None, top_k=5, **kwargs)¶

Get the top_k nearest neighbor embeddings using Pinecone for either provided query_vector or query_text. Both query_vector and query_text can be provided. If query_vector is provided, then it will be preferred over the embedded query_text for the vector search. If query_text is provided, then it will check if an embedder is provided and use it to generate the query_vector.

See the following documentation for more details: - Query a vector index - db.index.vector.queryNodes() - db.index.fulltext.queryNodes()

Example:

from neo4j import GraphDatabase
from neo4j_graphrag.retrievers import PineconeNeo4jRetriever
from pinecone import Pinecone

with GraphDatabase.driver(NEO4J_URL, auth=NEO4J_AUTH) as neo4j_driver:
    pc_client = Pinecone(PC_API_KEY)
    retriever = PineconeNeo4jRetriever(
        driver=neo4j_driver,
        client=pc_client,
        index_name="jeopardy",
        id_property_neo4j="id"
    )
    biology_embedding = ...
    retriever.search(query_vector=biology_embedding, top_k=2)

Parameters:

query_text (str) – The text to get the closest neighbors of.
query_vector (Optional[list[float]], optional) – The vector embeddings to get the closest neighbors of. Defaults to None.
top_k (Optional[int]) – The number of neighbors to return. Defaults to 5.
kwargs (Any)

Raises:

SearchValidationError – If validation of the input arguments fail.
EmbeddingRequiredError – If no embedder is provided when using text as an input.

Returns:

The results of the search query as a list of neo4j.Record and an optional metadata dict

Return type:

QdrantNeo4jRetriever¶

class neo4j_graphrag.retrievers.external.qdrant.qdrant.QdrantNeo4jRetriever(driver, client, collection_name, id_property_neo4j, id_property_external='id', using=None, embedder=None, return_properties=None, retrieval_query=None, result_formatter=None, neo4j_database=None)[source]¶

Provides retrieval method using vector search over embeddings with a Qdrant database.

Example:

from neo4j import GraphDatabase
from neo4j_graphrag.retrievers import QdrantNeo4jRetriever
from qdrant_client import QdrantClient

with GraphDatabase.driver(NEO4J_URL, auth=NEO4J_AUTH) as neo4j_driver:
    client = QdrantClient()
    retriever = QdrantNeo4jRetriever(
        driver=neo4j_driver,
        client=client,
        collection_name="my_collection",
        using="my_vector",
        id_property_external="neo4j_id"
    )
    embedding = ...
    retriever.search(query_vector=embedding, top_k=2)

Parameters:

driver (neo4j.Driver) – The Neo4j Python driver.
client (QdrantClient) – The Qdrant client object.
collection_name (str) – The name of the Qdrant collection to use.
using (str) – The name of the Qdrant vector contained in your collection in case of multi-vector collection
id_property_neo4j (str) – The name of the Neo4j node property that’s used as the identifier for relating matches from Qdrant to Neo4j nodes.
id_property_external (str) – The name of the Qdrant payload property with identifier that refers to a corresponding Neo4j node id property.
embedder (Optional[Embedder]) – Embedder object to embed query text.
return_properties (Optional[list[str]]) – List of node properties to return.
result_formatter (Optional[Callable[[neo4j.Record], RetrieverResultItem]]) – Function to transform a neo4j.Record to a RetrieverResultItem.
neo4j_database (Optional[str]) –
The name of the Neo4j database. If not provided, this defaults to the server’s default database (“neo4j” by default) (see reference to documentation).
retrieval_query (Optional[str])

Raises:

RetrieverInitializationError – If validation of the input arguments fail.

search(query_vector=None, query_text=None, top_k=5, **kwargs)¶

Get the top_k nearest neighbour embeddings using Qdrant for either provided query_vector or query_text. If query_text is provided, then the provided embedder is used to generate the query_vector.

See the following documentation for more details: - Query a vector index - db.index.vector.queryNodes() - db.index.fulltext.queryNodes()

Example:

from neo4j import GraphDatabase
from neo4j_graphrag.retrievers import QdrantNeo4jRetriever
from qdrant_client import QdrantClient

with GraphDatabase.driver(NEO4J_URL, auth=NEO4J_AUTH) as neo4j_driver:
    client = QdrantClient()
    retriever = QdrantNeo4jRetriever(
        driver=neo4j_driver,
        client=client,
        collection_name="my_collection",
        id_property_external="neo4j_id"
    )
    embedding = ...
    retriever.search(query_vector=embedding, top_k=2)

Parameters:

query_text (str) – The text to get the closest neighbours of.
query_vector (Optional[list[float]], optional) – The vector embeddings to get the closest neighbours of. Defaults to None.
top_k (Optional[int]) – The number of neighbours to return. Defaults to 5.
kwargs (Any) – Additional keyword arguments to pass to QdrantClient#query().

Raises:

SearchValidationError – If validation of the input arguments fail.
EmbeddingRequiredError – If no embedder is provided when using text as an input.

Returns:

The results of the search query as a list of neo4j.Record and an optional metadata dict

Return type:

Embedder¶

class neo4j_graphrag.embeddings.base.Embedder[source]¶

Interface for embedding models. An embedder passed into a retriever must implement this interface.

abstract embed_query(text)[source]¶

Embed query text.

Parameters:: text (str) – Text to convert to vector embedding
Returns:: A vector embedding.
Return type:: list[float]

SentenceTransformerEmbeddings¶

class neo4j_graphrag.embeddings.sentence_transformers.SentenceTransformerEmbeddings(model='all-MiniLM-L6-v2', *args, **kwargs)[source]¶

Parameters:

model (str)
args (Any)
kwargs (Any)

embed_query(text)[source]¶

Embed query text.

Parameters:: text (str) – Text to convert to vector embedding
Returns:: A vector embedding.
Return type:: list[float]

OpenAIEmbeddings¶

class neo4j_graphrag.embeddings.openai.OpenAIEmbeddings(model='text-embedding-ada-002', **kwargs)[source]¶

OpenAI embeddings class. This class uses the OpenAI python client to generate embeddings for text data.

Parameters:

model (str) – The name of the OpenAI embedding model to use. Defaults to “text-embedding-ada-002”.
kwargs (Any) – All other parameters will be passed to the openai.OpenAI init.

AzureOpenAIEmbeddings¶

class neo4j_graphrag.embeddings.openai.AzureOpenAIEmbeddings(model='text-embedding-ada-002', **kwargs)[source]¶

Azure OpenAI embeddings class. This class uses the Azure OpenAI python client to generate embeddings for text data.

Parameters:

model (str) – The name of the Azure OpenAI embedding model to use. Defaults to “text-embedding-ada-002”.
kwargs (Any) – All other parameters will be passed to the openai.AzureOpenAI init.

OllamaEmbeddings¶

class neo4j_graphrag.embeddings.ollama.OllamaEmbeddings(model, **kwargs)[source]¶

Ollama embeddings class. This class uses the ollama Python client to generate vector embeddings for text data.

Parameters:

model (str) – The name of the Mistral AI text embedding model to use. Defaults to “mistral-embed”.
kwargs (Any)

embed_query(text, **kwargs)[source]¶

Generate embeddings for a given query using an Ollama text embedding model.

Parameters:

text (str) – The text to generate an embedding for.
**kwargs (Any) – Additional keyword arguments to pass to the Ollama client.

Return type:

VertexAIEmbeddings¶

class neo4j_graphrag.embeddings.vertexai.VertexAIEmbeddings(model='text-embedding-004')[source]¶

Vertex AI embeddings class. This class uses the Vertex AI Python client to generate vector embeddings for text data.

Parameters:: model (str) – The name of the Vertex AI text embedding model to use. Defaults to “text-embedding-004”.

embed_query(text, task_type='RETRIEVAL_QUERY', **kwargs)[source]¶

Generate embeddings for a given query using a Vertex AI text embedding model.

Parameters:

text (str) – The text to generate an embedding for.
task_type (str) – The type of the text embedding task. Defaults to “RETRIEVAL_QUERY”. See https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/text-embeddings-api#tasktype for a full list.
**kwargs (Any) – Additional keyword arguments to pass to the Vertex AI client’s get_embeddings method.

Return type:

MistralAIEmbeddings¶

class neo4j_graphrag.embeddings.mistral.MistralAIEmbeddings(model='mistral-embed', **kwargs)[source]¶

Mistral AI embeddings class. This class uses the Mistral AI Python client to generate vector embeddings for text data.

Parameters:

model (str) – The name of the Mistral AI text embedding model to use. Defaults to “mistral-embed”.
kwargs (Any)

embed_query(text, **kwargs)[source]¶

Generate embeddings for a given query using a Mistral AI text embedding model.

Parameters:

text (str) – The text to generate an embedding for.
**kwargs (Any) – Additional keyword arguments to pass to the Mistral AI client.

Return type:

CohereEmbeddings¶

class neo4j_graphrag.embeddings.cohere.CohereEmbeddings(model='', **kwargs)[source]¶

Parameters:

model (str)
kwargs (Any)

embed_query(text, **kwargs)[source]¶

Embed query text.

Parameters:

text (str) – Text to convert to vector embedding
kwargs (Any)

Returns:

A vector embedding.

Return type:

Generation¶

LLM¶

LLMInterface¶

class neo4j_graphrag.llm.LLMInterface(model_name, model_params=None, rate_limit_handler=None, **kwargs)[source]¶

Interface for large language models.

Parameters:

model_name (str) – The name of the language model.
model_params (Optional[dict]) – Additional parameters passed to the model when text is sent to it. Defaults to None.
rate_limit_handler (Optional[RateLimitHandler]) – Handler for rate limiting. Defaults to retry with exponential backoff.
**kwargs (Any) – Arguments passed to the model when for the class is initialised. Defaults to None.

abstract invoke(input, message_history=None, system_instruction=None)[source]¶

Sends a text input to the LLM and retrieves a response.

Parameters:

input (str) – Text sent to the LLM.
message_history (Optional[Union[List[LLMMessage], MessageHistory]]) – A collection previous messages, with each message having a specific role assigned.
system_instruction (Optional[str]) – An option to override the llm system message for this invocation.

Returns:

The response from the LLM.

Return type:

Raises:

LLMGenerationError – If anything goes wrong.

abstract async ainvoke(input, message_history=None, system_instruction=None)[source]¶

Asynchronously sends a text input to the LLM and retrieves a response.

Parameters:

input (str) – Text sent to the LLM.
message_history (Optional[Union[List[LLMMessage], MessageHistory]]) – A collection previous messages, with each message having a specific role assigned.
system_instruction (Optional[str]) – An option to override the llm system message for this invocation.

Returns:

The response from the LLM.

Return type:

Raises:

LLMGenerationError – If anything goes wrong.

invoke_with_tools(input, tools, message_history=None, system_instruction=None)[source]¶

Sends a text input to the LLM with tool definitions and retrieves a tool call response.

This is a default implementation that should be overridden by LLM providers that support tool/function calling.

Parameters:

input (str) – Text sent to the LLM.
tools (Sequence[Tool]) – Sequence of Tools for the LLM to choose from. Each LLM implementation should handle the conversion to its specific format.
message_history (Optional[Union[List[LLMMessage], MessageHistory]]) – A collection previous messages, with each message having a specific role assigned.
system_instruction (Optional[str]) – An option to override the llm system message for this invocation.

Returns:

The response from the LLM containing a tool call.

Return type:

ToolCallResponse

Raises:

LLMGenerationError – If anything goes wrong.
NotImplementedError – If the LLM provider does not support tool calling.

async ainvoke_with_tools(input, tools, message_history=None, system_instruction=None)[source]¶

Asynchronously sends a text input to the LLM with tool definitions and retrieves a tool call response.

This is a default implementation that should be overridden by LLM providers that support tool/function calling.

Parameters:

input (str) – Text sent to the LLM.
tools (Sequence[Tool]) – Sequence of Tools for the LLM to choose from. Each LLM implementation should handle the conversion to its specific format.
message_history (Optional[Union[List[LLMMessage], MessageHistory]]) – A collection previous messages, with each message having a specific role assigned.
system_instruction (Optional[str]) – An option to override the llm system message for this invocation.

Returns:

The response from the LLM containing a tool call.

Return type:

ToolCallResponse

Raises:

LLMGenerationError – If anything goes wrong.
NotImplementedError – If the LLM provider does not support tool calling.

OpenAILLM¶

class neo4j_graphrag.llm.openai_llm.OpenAILLM(model_name, model_params=None, rate_limit_handler=None, **kwargs)[source]¶

Parameters:

model_name (str)
model_params (Optional[dict[str, Any]])
rate_limit_handler (Optional[RateLimitHandler])
kwargs (Any)

AzureOpenAILLM¶

class neo4j_graphrag.llm.openai_llm.AzureOpenAILLM(model_name, model_params=None, system_instruction=None, rate_limit_handler=None, **kwargs)[source]¶

Parameters:

model_name (str)
model_params (Optional[dict[str, Any]])
system_instruction (Optional[str])
rate_limit_handler (Optional[RateLimitHandler])
kwargs (Any)

OllamaLLM¶

class neo4j_graphrag.llm.ollama_llm.OllamaLLM(model_name, model_params=None, rate_limit_handler=None, **kwargs)[source]¶

Parameters:

model_name (str)
model_params (Optional[dict[str, Any]])
rate_limit_handler (Optional[RateLimitHandler])
kwargs (Any)

get_messages(input, message_history=None, system_instruction=None)[source]¶

Parameters:

input (str)
message_history (Optional[Union[List[LLMMessage], MessageHistory]])
system_instruction (Optional[str])

Return type:

Sequence[Message]

invoke(input, message_history=None, system_instruction=None)[source]¶

Sends text to the LLM and returns a response.

Parameters:

input (str) – The text to send to the LLM.
message_history (Optional[Union[List[LLMMessage], MessageHistory]]) – A collection previous messages, with each message having a specific role assigned.
system_instruction (Optional[str]) – An option to override the llm system message for this invocation.

Returns:

The response from the LLM.

Return type:

async ainvoke(input, message_history=None, system_instruction=None)[source]¶

Asynchronously sends a text input to the OpenAI chat completion model and returns the response’s content.

Parameters:

input (str) – Text sent to the LLM.
message_history (Optional[Union[List[LLMMessage], MessageHistory]]) – A collection previous messages, with each message having a specific role assigned.
system_instruction (Optional[str]) – An option to override the llm system message for this invocation.

Returns:

The response from OpenAI.

Return type:

Raises:

LLMGenerationError – If anything goes wrong.

VertexAILLM¶

class neo4j_graphrag.llm.vertexai_llm.VertexAILLM(model_name='gemini-1.5-flash-001', model_params=None, system_instruction=None, rate_limit_handler=None, **kwargs)[source]¶

Interface for large language models on Vertex AI

Parameters:

model_name (str, optional) – Name of the LLM to use. Defaults to “gemini-1.5-flash-001”.
model_params (Optional[dict], optional) – Additional parameters passed to the model when text is sent to it. Defaults to None.
system_instruction (Optional[str]) – Optional[str], optional): Additional instructions for setting the behavior and context for the model in a conversation. Defaults to None.
**kwargs (Any) – Arguments passed to the model when for the class is initialised. Defaults to None.
rate_limit_handler (Optional[RateLimitHandler])
**kwargs

Raises:

LLMGenerationError – If there’s an error generating the response from the model.

Example:

from neo4j_graphrag.llm import VertexAILLM
from vertexai.generative_models import GenerationConfig

generation_config = GenerationConfig(temperature=0.0)
llm = VertexAILLM(
    model_name="gemini-1.5-flash-001", generation_config=generation_config
)
llm.invoke("Who is the mother of Paul Atreides?")

get_messages(input, message_history=None)[source]¶

Parameters:

input (str)
message_history (List[LLMMessage] | MessageHistory | None)

Return type:

list[Content]

invoke(input, message_history=None, system_instruction=None)[source]¶

Sends text to the LLM and returns a response.

Parameters:

input (str) – The text to send to the LLM.
message_history (Optional[Union[List[LLMMessage], MessageHistory]]) – A collection previous messages, with each message having a specific role assigned.
system_instruction (Optional[str]) – An option to override the llm system message for this invocation.

Returns:

The response from the LLM.

Return type:

async ainvoke(input, message_history=None, system_instruction=None)[source]¶

Asynchronously sends text to the LLM and returns a response.

Parameters:

input (str) – The text to send to the LLM.
message_history (Optional[Union[List[LLMMessage], MessageHistory]]) – A collection previous messages, with each message having a specific role assigned.
system_instruction (Optional[str]) – An option to override the llm system message for this invocation.

Returns:

The response from the LLM.

Return type:

async ainvoke_with_tools(input, tools, message_history=None, system_instruction=None)[source]¶

Asynchronously sends a text input to the LLM with tool definitions and retrieves a tool call response.

This is a default implementation that should be overridden by LLM providers that support tool/function calling.

Parameters:

input (str) – Text sent to the LLM.
tools (Sequence[Tool]) – Sequence of Tools for the LLM to choose from. Each LLM implementation should handle the conversion to its specific format.
message_history (Optional[Union[List[LLMMessage], MessageHistory]]) – A collection previous messages, with each message having a specific role assigned.
system_instruction (Optional[str]) – An option to override the llm system message for this invocation.

Returns:

The response from the LLM containing a tool call.

Return type:

ToolCallResponse

Raises:

LLMGenerationError – If anything goes wrong.
NotImplementedError – If the LLM provider does not support tool calling.

invoke_with_tools(input, tools, message_history=None, system_instruction=None)[source]¶

Sends a text input to the LLM with tool definitions and retrieves a tool call response.

This is a default implementation that should be overridden by LLM providers that support tool/function calling.

Parameters:

input (str) – Text sent to the LLM.
tools (Sequence[Tool]) – Sequence of Tools for the LLM to choose from. Each LLM implementation should handle the conversion to its specific format.
message_history (Optional[Union[List[LLMMessage], MessageHistory]]) – A collection previous messages, with each message having a specific role assigned.
system_instruction (Optional[str]) – An option to override the llm system message for this invocation.

Returns:

The response from the LLM containing a tool call.

Return type:

ToolCallResponse

Raises:

LLMGenerationError – If anything goes wrong.
NotImplementedError – If the LLM provider does not support tool calling.

AnthropicLLM¶

class neo4j_graphrag.llm.anthropic_llm.AnthropicLLM(model_name, model_params=None, rate_limit_handler=None, **kwargs)[source]¶

Interface for large language models on Anthropic

Parameters:

model_name (str, optional) – Name of the LLM to use. Defaults to “gemini-1.5-flash-001”.
model_params (Optional[dict], optional) – Additional parameters passed to the model when text is sent to it. Defaults to None.
system_instruction – Optional[str], optional): Additional instructions for setting the behavior and context for the model in a conversation. Defaults to None.
**kwargs (Any) – Arguments passed to the model when for the class is initialised. Defaults to None.
rate_limit_handler (Optional[RateLimitHandler])
**kwargs

Raises:

LLMGenerationError – If there’s an error generating the response from the model.

Example:

from neo4j_graphrag.llm import AnthropicLLM

llm = AnthropicLLM(
    model_name="claude-3-opus-20240229",
    model_params={"max_tokens": 1000},
    api_key="sk...",   # can also be read from env vars
)
llm.invoke("Who is the mother of Paul Atreides?")

get_messages(input, message_history=None)[source]¶

Parameters:

input (str)
message_history (Optional[Union[List[LLMMessage], MessageHistory]])

Return type:

Iterable[MessageParam]

invoke(input, message_history=None, system_instruction=None)[source]¶

Sends text to the LLM and returns a response.

Parameters:

input (str) – The text to send to the LLM.
message_history (Optional[Union[List[LLMMessage], MessageHistory]]) – A collection previous messages, with each message having a specific role assigned.
system_instruction (Optional[str]) – An option to override the llm system message for this invocation.

Returns:

The response from the LLM.

Return type:

async ainvoke(input, message_history=None, system_instruction=None)[source]¶

Asynchronously sends text to the LLM and returns a response.

Parameters:

input (str) – The text to send to the LLM.
message_history (Optional[Union[List[LLMMessage], MessageHistory]]) – A collection previous messages, with each message having a specific role assigned.
system_instruction (Optional[str]) – An option to override the llm system message for this invocation.

Returns:

The response from the LLM.

Return type:

CohereLLM¶

class neo4j_graphrag.llm.cohere_llm.CohereLLM(model_name='', model_params=None, rate_limit_handler=None, **kwargs)[source]¶

Interface for large language models on the Cohere platform

Parameters:

model_name (str, optional) – Name of the LLM to use. Defaults to “gemini-1.5-flash-001”.
model_params (Optional[dict], optional) – Additional parameters passed to the model when text is sent to it. Defaults to None.
system_instruction – Optional[str], optional): Additional instructions for setting the behavior and context for the model in a conversation. Defaults to None.
**kwargs (Any) – Arguments passed to the model when for the class is initialised. Defaults to None.
rate_limit_handler (Optional[RateLimitHandler])
**kwargs

Raises:

LLMGenerationError – If there’s an error generating the response from the model.

Example:

from neo4j_graphrag.llm import CohereLLM

llm = CohereLLM(api_key="...")
llm.invoke("Say something")

get_messages(input, message_history=None, system_instruction=None)[source]¶

Parameters:

input (str)
message_history (Optional[Union[List[LLMMessage], MessageHistory]])
system_instruction (Optional[str])

Return type:

ChatMessages

invoke(input, message_history=None, system_instruction=None)[source]¶

Sends text to the LLM and returns a response.

Parameters:

input (str) – The text to send to the LLM.
message_history (Optional[Union[List[LLMMessage], MessageHistory]]) – A collection previous messages, with each message having a specific role assigned.
system_instruction (Optional[str]) – An option to override the llm system message for this invocation.

Returns:

The response from the LLM.

Return type:

async ainvoke(input, message_history=None, system_instruction=None)[source]¶

Asynchronously sends text to the LLM and returns a response.

Parameters:

input (str) – The text to send to the LLM.
message_history (Optional[Union[List[LLMMessage], MessageHistory]]) – A collection previous messages, with each message having a specific role assigned.
system_instruction (Optional[str]) – An option to override the llm system message for this invocation.

Returns:

The response from the LLM.

Return type:

MistralAILLM¶

class neo4j_graphrag.llm.mistralai_llm.MistralAILLM(model_name, model_params=None, rate_limit_handler=None, **kwargs)[source]¶

Parameters:

model_name (str)
model_params (Optional[dict[str, Any]])
rate_limit_handler (Optional[RateLimitHandler])
kwargs (Any)

get_messages(input, message_history=None, system_instruction=None)[source]¶

Parameters:

input (str)
message_history (List[LLMMessage] | MessageHistory | None)
system_instruction (str | None)

Return type:

list[Annotated[Annotated[AssistantMessage, Tag(tag=assistant)] | Annotated[SystemMessage, Tag(tag=system)] | Annotated[ToolMessage, Tag(tag=tool)] | Annotated[UserMessage, Tag(tag=user)], Discriminator(discriminator=~mistralai.models.chatcompletionrequest.<lambda>, custom_error_type=None, custom_error_message=None, custom_error_context=None)]]

invoke(input, message_history=None, system_instruction=None)[source]¶

Sends a text input to the Mistral chat completion model and returns the response’s content.

Parameters:

input (str) – Text sent to the LLM.
message_history (Optional[Union[List[LLMMessage], MessageHistory]]) – A collection previous messages, with each message having a specific role assigned.
system_instruction (Optional[str]) – An option to override the llm system message for this invocation.

Returns:

The response from MistralAI.

Return type:

Raises:

LLMGenerationError – If anything goes wrong.

async ainvoke(input, message_history=None, system_instruction=None)[source]¶

Asynchronously sends a text input to the MistralAI chat completion model and returns the response’s content.

Parameters:

input (str) – Text sent to the LLM.
message_history (Optional[Union[List[LLMMessage], MessageHistory]]) – A collection previous messages, with each message having a specific role assigned.
system_instruction (Optional[str]) – An option to override the llm system message for this invocation.

Returns:

The response from MistralAI.

Return type:

Raises:

LLMGenerationError – If anything goes wrong.

Rate Limiting¶

RateLimitHandler¶

class neo4j_graphrag.llm.rate_limit.RateLimitHandler[source]¶

Abstract base class for rate limit handling strategies.

abstract handle_sync(func)[source]¶

Apply rate limit handling to a synchronous function.

Parameters:: func (F) – The function to wrap with rate limit handling.
Returns:: The wrapped function.
Return type:: F

abstract handle_async(func)[source]¶

Apply rate limit handling to an asynchronous function.

Parameters:: func (AF) – The async function to wrap with rate limit handling.
Returns:: The wrapped async function.
Return type:: AF

RetryRateLimitHandler¶

class neo4j_graphrag.llm.rate_limit.RetryRateLimitHandler(max_attempts=3, min_wait=1.0, max_wait=60.0, multiplier=2.0, jitter=True)[source]¶

Rate limit handler using exponential backoff retry strategy.

This handler uses tenacity for retry logic with exponential backoff.

Parameters:

max_attempts (int) – Maximum number of retry attempts. Defaults to 3.
min_wait (float) – Minimum wait time between retries in seconds. Defaults to 1.
max_wait (float) – Maximum wait time between retries in seconds. Defaults to 60.
multiplier (float) – Exponential backoff multiplier. Defaults to 2.
jitter (bool) – Whether to add random jitter to retry delays to prevent thundering herd. Defaults to True.

handle_sync(func)[source]¶

Apply retry logic to a synchronous function.

Parameters:: func (F)
Return type:: F

handle_async(func)[source]¶

Apply retry logic to an asynchronous function.

Parameters:: func (AF)
Return type:: AF

NoOpRateLimitHandler¶

class neo4j_graphrag.llm.rate_limit.NoOpRateLimitHandler[source]¶

A no-op rate limit handler that does not apply any rate limiting.

handle_sync(func)[source]¶

Return the function unchanged.

Parameters:: func (F)
Return type:: F

handle_async(func)[source]¶

Return the async function unchanged.

Parameters:: func (AF)
Return type:: AF

PromptTemplate¶

class neo4j_graphrag.generation.prompts.PromptTemplate(template=None, expected_inputs=None, system_instructions=None)[source]¶

This class is used to generate a parameterized prompt. It is defined from a string (the template) using the Python format syntax (parameters between curly braces {}) and a list of required inputs. Before sending the instructions to an LLM, call the format method that will replace parameters with the provided values. If any of the expected inputs is missing, a PromptMissingInputError is raised.

Parameters:

template (Optional[str])
expected_inputs (Optional[list[str]])
system_instructions (Optional[str])

DEFAULT_SYSTEM_INSTRUCTIONS: str = ''¶

DEFAULT_TEMPLATE: str = ''¶

EXPECTED_INPUTS: list[str] = []¶

format(*args, **kwargs)[source]¶

This method is used to replace parameters with the provided values. Parameters must be provided: - as kwargs - as args if using the same order as in the expected inputs

Example:

prompt_template = PromptTemplate(
    template='''Explain the following concept to {target_audience}:
    Concept: {concept}
    Answer:
    ''',
    expected_inputs=['target_audience', 'concept']
)
prompt = prompt_template.format('12 yo children', concept='graph database')
print(prompt)

# Result:
# '''Explain the following concept to 12 yo children:
# Concept: graph database
# Answer:
# '''

Parameters:

args (Any)
kwargs (Any)

Return type:

RagTemplate¶

class neo4j_graphrag.generation.prompts.RagTemplate(template=None, expected_inputs=None, system_instructions=None)[source]¶

Parameters:

template (Optional[str])
expected_inputs (Optional[list[str]])
system_instructions (Optional[str])

DEFAULT_SYSTEM_INSTRUCTIONS: str = 'Answer the user question using the provided context.'¶

DEFAULT_TEMPLATE: str = 'Context:\n{context}\n\nExamples:\n{examples}\n\nQuestion:\n{query_text}\n\nAnswer:\n'¶

EXPECTED_INPUTS: list[str] = ['context', 'query_text', 'examples']¶

ERExtractionTemplate¶

class neo4j_graphrag.generation.prompts.ERExtractionTemplate(template=None, expected_inputs=None, system_instructions=None)[source]¶

Parameters:

template (Optional[str])
expected_inputs (Optional[list[str]])
system_instructions (Optional[str])

DEFAULT_TEMPLATE: str = '\nYou are a top-tier algorithm designed for extracting\ninformation in structured formats to build a knowledge graph.\n\nExtract the entities (nodes) and specify their type from the following text.\nAlso extract the relationships between these nodes.\n\nReturn result as JSON using the following format:\n{{"nodes": [ {{"id": "0", "label": "Person", "properties": {{"name": "John"}} }}],\n"relationships": [{{"type": "KNOWS", "start_node_id": "0", "end_node_id": "1", "properties": {{"since": "2024-08-01"}} }}] }}\n\nUse only the following node and relationship types (if provided):\n{schema}\n\nAssign a unique ID (string) to each node, and reuse it to define relationships.\nDo respect the source and target node types for relationship and\nthe relationship direction.\n\nMake sure you adhere to the following rules to produce valid JSON objects:\n- Do not return any additional information other than the JSON in it.\n- Omit any backticks around the JSON - simply output the JSON on its own.\n- The JSON object must not wrapped into a list - it is its own JSON object.\n- Property names must be enclosed in double quotes\n\nExamples:\n{examples}\n\nInput text:\n\n{text}\n'¶

EXPECTED_INPUTS: list[str] = ['text']¶

SchemaExtractionTemplate¶

class neo4j_graphrag.generation.prompts.SchemaExtractionTemplate(template=None, expected_inputs=None, system_instructions=None)[source]¶

Parameters:

template (Optional[str])
expected_inputs (Optional[list[str]])
system_instructions (Optional[str])

DEFAULT_TEMPLATE: str = '\nYou are a top-tier algorithm designed for extracting a labeled property graph schema in\nstructured formats.\n\nGenerate a generalized graph schema based on the input text. Identify key node types,\ntheir relationship types, and property types.\n\nIMPORTANT RULES:\n1. Return only abstract schema information, not concrete instances.\n2. Use singular PascalCase labels for node types (e.g., Person, Company, Product).\n3. Use UPPER_SNAKE_CASE labels for relationship types (e.g., WORKS_FOR, MANAGES).\n4. Include property definitions only when the type can be confidently inferred, otherwise omit them.\n5. When defining patterns, ensure that every node label and relationship label mentioned exists in your lists of node types and relationship types.\n6. Do not create node types that aren\'t clearly mentioned in the text.\n7. Keep your schema minimal and focused on clearly identifiable patterns in the text.\n\nAccepted property types are: BOOLEAN, DATE, DURATION, FLOAT, INTEGER, LIST,\nLOCAL_DATETIME, LOCAL_TIME, POINT, STRING, ZONED_DATETIME, ZONED_TIME.\n\nReturn a valid JSON object that follows this precise structure:\n{{\n "node_types": [\n {{\n "label": "Person",\n "properties": [\n {{\n "name": "name",\n "type": "STRING"\n }}\n ]\n }},\n ...\n ],\n "relationship_types": [\n {{\n "label": "WORKS_FOR"\n }},\n ...\n ],\n "patterns": [\n ["Person", "WORKS_FOR", "Company"],\n ...\n ]\n}}\n\nExamples:\n{examples}\n\nInput text:\n{text}\n'¶

EXPECTED_INPUTS: list[str] = ['text']¶

Text2CypherTemplate¶

class neo4j_graphrag.generation.prompts.Text2CypherTemplate(template=None, expected_inputs=None, system_instructions=None)[source]¶

Parameters:

template (Optional[str])
expected_inputs (Optional[list[str]])
system_instructions (Optional[str])

DEFAULT_TEMPLATE: str = '\nTask: Generate a Cypher statement for querying a Neo4j graph database from a user input.\n\nSchema:\n{schema}\n\nExamples (optional):\n{examples}\n\nInput:\n{query_text}\n\nDo not use any properties or relationships not included in the schema.\nDo not include triple backticks ``` or any additional text except the generated Cypher statement in your response.\n\nCypher query:\n'¶

EXPECTED_INPUTS: list[str] = ['query_text']¶

RAG¶

GraphRAG¶

class neo4j_graphrag.generation.graphrag.GraphRAG(retriever, llm, prompt_template=<neo4j_graphrag.generation.prompts.RagTemplate object>)[source]¶

Performs a GraphRAG search using a specific retriever and LLM.

Example:

import neo4j
from neo4j_graphrag.retrievers import VectorRetriever
from neo4j_graphrag.llm.openai_llm import OpenAILLM
from neo4j_graphrag.generation import GraphRAG

driver = neo4j.GraphDatabase.driver(URI, auth=AUTH)

retriever = VectorRetriever(driver, "vector-index-name", custom_embedder)
llm = OpenAILLM()
graph_rag = GraphRAG(retriever, llm)
graph_rag.search(query_text="Find me a book about Fremen")

Parameters:

retriever (Retriever) – The retriever used to find relevant context to pass to the LLM.
llm (LLMInterface) – The LLM used to generate the answer.
prompt_template (RagTemplate) – The prompt template that will be formatted with context and user question and passed to the LLM.

Raises:

RagInitializationError – If validation of the input arguments fail.

search(query_text='', message_history=None, examples='', retriever_config=None, return_context=None, response_fallback=None)[source]¶

Warning

The default value of ‘return_context’ will change from ‘False’ to ‘True’ in a future version.

This method performs a full RAG search:

Retrieval: context retrieval
Augmentation: prompt formatting
Generation: answer generation with LLM

Parameters:

query_text (str) – The user question.
message_history (Optional[Union[List[LLMMessage], MessageHistory]]) – A collection previous messages, with each message having a specific role assigned.
examples (str) – Examples added to the LLM prompt.
retriever_config (Optional[dict]) – Parameters passed to the retriever. search method; e.g.: top_k
return_context (bool) – Whether to append the retriever result to the final result (default: False).
response_fallback (Optional[str]) – If not null, will return this message instead of calling the LLM if context comes back empty.

Returns:

The LLM-generated answer.

Return type:

RagResultModel

conversation_prompt(summary, current_query)[source]¶

Parameters:

summary (str)
current_query (str)

Return type:

Database Interaction¶

neo4j_graphrag.indexes.create_vector_index(driver, name, label, embedding_property, dimensions, similarity_fn, fail_if_exists=False, neo4j_database=None)[source]¶

This method constructs a Cypher query and executes it to create a new vector index in Neo4j.

See Cypher manual on creating vector indexes.

Ensure that the index name provided is unique within the database context.

Example:

from neo4j import GraphDatabase
from neo4j_graphrag.indexes import create_vector_index

URI = "neo4j://localhost:7687"
AUTH = ("neo4j", "password")

INDEX_NAME = "vector-index-name"

# Connect to Neo4j database
driver = GraphDatabase.driver(URI, auth=AUTH)

# Creating the index
create_vector_index(
    driver,
    INDEX_NAME,
    label="Document",
    embedding_property="vectorProperty",
    dimensions=1536,
    similarity_fn="euclidean",
    fail_if_exists=False,
)

Parameters:

driver (neo4j.Driver) – Neo4j Python driver instance.
name (str) – The unique name of the index.
label (str) – The node label to be indexed.
embedding_property (str) – The property key of a node which contains embedding values.
dimensions (int) – Vector embedding dimension
similarity_fn (str) – case-insensitive values for the vector similarity function: euclidean or cosine.
fail_if_exists (bool) – If True raise an error if the index already exists. Defaults to False.
neo4j_database (Optional[str]) –
The name of the Neo4j database. If not provided, this defaults to the server’s default database (“neo4j” by default) (see reference to documentation).

Raises:

ValueError – If validation of the input arguments fail.
neo4j.exceptions.ClientError – If creation of vector index fails.

Return type:

None

neo4j_graphrag.indexes.create_fulltext_index(driver, name, label, node_properties, fail_if_exists=False, neo4j_database=None)[source]¶

This method constructs a Cypher query and executes it to create a new fulltext index in Neo4j.

See Cypher manual on creating fulltext indexes.

Ensure that the index name provided is unique within the database context.

Example:

from neo4j import GraphDatabase
from neo4j_graphrag.indexes import create_fulltext_index

URI = "neo4j://localhost:7687"
AUTH = ("neo4j", "password")

INDEX_NAME = "fulltext-index-name"

# Connect to Neo4j database
driver = GraphDatabase.driver(URI, auth=AUTH)

# Creating the index
create_fulltext_index(
    driver,
    INDEX_NAME,
    label="Document",
    node_properties=["vectorProperty"],
    fail_if_exists=False,
)

Parameters:

driver (neo4j.Driver) – Neo4j Python driver instance.
name (str) – The unique name of the index.
label (str) – The node label to be indexed.
node_properties (list[str]) – The node properties to create the fulltext index on.
fail_if_exists (bool) – If True raise an error if the index already exists. Defaults to False.
neo4j_database (Optional[str]) –
The name of the Neo4j database. If not provided, this defaults to the server’s default database (“neo4j” by default) (see reference to documentation).

Raises:

ValueError – If validation of the input arguments fail.
neo4j.exceptions.ClientError – If creation of fulltext index fails.

Return type:

None

neo4j_graphrag.indexes.drop_index_if_exists(driver, name, neo4j_database=None)[source]¶

This method constructs a Cypher query and executes it to drop an index in Neo4j, if the index exists. See Cypher manual on dropping vector indexes.

Example:

from neo4j import GraphDatabase
from neo4j_graphrag.indexes import drop_index_if_exists

URI = "neo4j://localhost:7687"
AUTH = ("neo4j", "password")

INDEX_NAME = "fulltext-index-name"

# Connect to Neo4j database
driver = GraphDatabase.driver(URI, auth=AUTH)

# Dropping the index if it exists
drop_index_if_exists(
    driver,
    INDEX_NAME,
)

Parameters:

driver (neo4j.Driver) – Neo4j Python driver instance.
name (str) – The name of the index to delete.
neo4j_database (Optional[str]) –
The name of the Neo4j database. If not provided, this defaults to the server’s default database (“neo4j” by default) (see reference to documentation).

Raises:

neo4j.exceptions.ClientError – If dropping of index fails.

Return type:

None

neo4j_graphrag.indexes.upsert_vectors(driver, ids, embedding_property, embeddings, neo4j_database=None, entity_type=EntityType.NODE)[source]¶

This method constructs a Cypher query and executes it to upsert (insert or update) embeddings on a set of nodes or relationships.

Example:

from neo4j import GraphDatabase
from neo4j_graphrag.indexes import upsert_vectors

URI = "neo4j://localhost:7687"
AUTH = ("neo4j", "password")

# Connect to Neo4j database
driver = GraphDatabase.driver(URI, auth=AUTH)

# Upsert embeddings data for several nodes
upsert_vectors(
    driver,
    ids=['123', '456', '789'],
    embedding_property="vectorProperty",
    embeddings=[
        [0.12, 0.34, 0.56],
        [0.78, 0.90, 0.12],
        [0.34, 0.56, 0.78],
    ],
    neo4j_database="neo4j",
    entity_type='NODE',
)

Parameters:

driver (neo4j.Driver) – Neo4j Python driver instance.
ids (List[int]) – The element IDs of the nodes or relationships.
embedding_property (str) – The name of the property to store the vectors in.
embeddings (List[List[float]]) – The list of vectors to store, one per ID.
neo4j_database (Optional[str]) – The name of the Neo4j database. If not provided, defaults to the server’s default database. ‘neo4j’ by default.
entity_type (EntityType) – Specifies whether to upsert to nodes (‘NODE’) or relationships (‘RELATIONSHIP’). Defaults to ‘NODE’.

Raises:

ValueError – If the lengths of IDs and embeddings do not match, or if embeddings are not of uniform dimension.
Neo4jInsertionError – If an error occurs while attempting to upsert the vectors in Neo4j.

Return type:

None

neo4j_graphrag.indexes.upsert_vector(driver, node_id, embedding_property, vector, neo4j_database=None)[source]¶

Warning

‘upsert_vector’ is deprecated and will be removed in a future version, please use ‘upsert_vectors’ instead.

This method constructs a Cypher query and executes it to upsert (insert or update) a vector property on a specific node.

Example:

from neo4j import GraphDatabase
from neo4j_graphrag.indexes import upsert_vector

URI = "neo4j://localhost:7687"
AUTH = ("neo4j", "password")

# Connect to Neo4j database
driver = GraphDatabase.driver(URI, auth=AUTH)

# Upsert the vector data
upsert_vector(
    driver,
    node_id="nodeId",
    embedding_property="vectorProperty",
    vector=...,
)

Parameters:

driver (neo4j.Driver) – Neo4j Python driver instance.
node_id (int) – The element id of the node.
embedding_property (str) – The name of the property to store the vector in.
vector (list[float]) – The vector to store.
neo4j_database (Optional[str]) –
The name of the Neo4j database. If not provided, this defaults to the server’s default database (“neo4j” by default) (see reference to documentation).

Raises:

Neo4jInsertionError – If upserting of the vector fails.

Return type:

None

neo4j_graphrag.indexes.upsert_vector_on_relationship(driver, rel_id, embedding_property, vector, neo4j_database=None)[source]¶

Warning

‘upsert_vector_on_relationship’ is deprecated and will be removed in a future version, please use ‘upsert_vectors’ instead.

This method constructs a Cypher query and executes it to upsert (insert or update) a vector property on a specific relationship.

Example:

from neo4j import GraphDatabase
from neo4j_graphrag.indexes import upsert_vector_on_relationship

URI = "neo4j://localhost:7687"
AUTH = ("neo4j", "password")

# Connect to Neo4j database
driver = GraphDatabase.driver(URI, auth=AUTH)

# Upsert the vector data
upsert_vector_on_relationship(
    driver,
    node_id="nodeId",
    embedding_property="vectorProperty",
    vector=...,
)

Parameters:

driver (neo4j.Driver) – Neo4j Python driver instance.
rel_id (int) – The element id of the relationship.
embedding_property (str) – The name of the property to store the vector in.
vector (list[float]) – The vector to store.
neo4j_database (Optional[str]) –
The name of the Neo4j database. If not provided, this defaults to the server’s default database (“neo4j” by default) (see reference to documentation).

Raises:

Neo4jInsertionError – If upserting of the vector fails.

Return type:

None

async neo4j_graphrag.indexes.async_upsert_vector(driver, node_id, embedding_property, vector, neo4j_database=None)[source]¶

Warning

‘async_upsert_vector’ is deprecated and will be removed in a future version.

This method constructs a Cypher query and asynchronously executes it to upsert (insert or update) a vector property on a specific node.

Example:

from neo4j import AsyncGraphDatabase
from neo4j_graphrag.indexes import upsert_vector

URI = "neo4j://localhost:7687"
AUTH = ("neo4j", "password")

# Connect to Neo4j database
driver = AsyncGraphDatabase.driver(URI, auth=AUTH)

# Upsert the vector data
async_upsert_vector(
    driver,
    node_id="nodeId",
    embedding_property="vectorProperty",
    vector=...,
)

Parameters:

driver (neo4j.AsyncDriver) – Neo4j Python asynchronous driver instance.
node_id (int) – The element id of the node.
embedding_property (str) – The name of the property to store the vector in.
vector (list[float]) – The vector to store.
neo4j_database (Optional[str]) –
The name of the Neo4j database. If not provided, this defaults to the server’s default database (“neo4j” by default) (see reference to documentation).

Raises:

Neo4jInsertionError – If upserting of the vector fails.

Return type:

None

async neo4j_graphrag.indexes.async_upsert_vector_on_relationship(driver, rel_id, embedding_property, vector, neo4j_database=None)[source]¶

Warning

‘async_upsert_vector_on_relationship’ is deprecated and will be removed in a future version.

This method constructs a Cypher query and asynchronously executes it to upsert (insert or update) a vector property on a specific relationship.

Example:

from neo4j import AsyncGraphDatabase
from neo4j_graphrag.indexes import upsert_vector_on_relationship

URI = "neo4j://localhost:7687"
AUTH = ("neo4j", "password")

# Connect to Neo4j database
driver = AsyncGraphDatabase.driver(URI, auth=AUTH)

# Upsert the vector data
async_upsert_vector_on_relationship(
    driver,
    node_id="nodeId",
    embedding_property="vectorProperty",
    vector=...,
)

Parameters:

driver (neo4j.AsyncDriver) – Neo4j Python asynchronous driver instance.
rel_id (int) – The element id of the relationship.
embedding_property (str) – The name of the property to store the vector in.
vector (list[float]) – The vector to store.
neo4j_database (Optional[str]) –
The name of the Neo4j database. If not provided, this defaults to the server’s default database (“neo4j” by default) (see reference to documentation).

Raises:

Neo4jInsertionError – If upserting of the vector fails.

Return type:

None

neo4j_graphrag.indexes.retrieve_vector_index_info(driver, index_name, label_or_type, embedding_property, neo4j_database=None)[source]¶

Check if a vector index exists in a Neo4j database and return its information. If no matching index is found, returns None.

Parameters:

driver (neo4j.Driver) – Neo4j Python driver instance.
index_name (str) – The name of the index to look up.
label_or_type (str) – The label (for nodes) or type (for relationships) of the index.
embedding_property (str) – The name of the property containing the embeddings.
neo4j_database (Optional[str]) –
The name of the Neo4j database. If not provided, this defaults to the server’s default database (“neo4j” by default) (see reference to documentation).

Returns:

A dictionary containing the first matching index’s information if found, or None otherwise.

Return type:

Optional[Dict[str, Any]]

neo4j_graphrag.indexes.retrieve_fulltext_index_info(driver, index_name, label_or_type, text_properties=[], neo4j_database=None)[source]¶

Check if a full text index exists in a Neo4j database and return its information. If no matching index is found, returns None.

Parameters:

driver (neo4j.Driver) – Neo4j Python driver instance.
index_name (str) – The name of the index to look up.
label_or_type (str) – The label (for nodes) or type (for relationships) of the index.
text_properties (List[str]) – The names of the text properties indexed.
neo4j_database (Optional[str]) –
The name of the Neo4j database. If not provided, this defaults to the server’s default database (“neo4j” by default) (see reference to documentation).

Returns:

A dictionary containing the first matching index’s information if found, or None otherwise.

Return type:

Optional[Dict[str, Any]]

neo4j_graphrag.schema.get_structured_schema(driver, is_enhanced=False, database=None, timeout=None, sanitize=False)[source]¶

Returns the structured schema of the graph.

Returns a dict with following format:

{
    'node_props': {
        'Person': [{'property': 'id', 'type': 'INTEGER'}, {'property': 'name', 'type': 'STRING'}]
    },
    'rel_props': {
        'KNOWS': [{'property': 'fromDate', 'type': 'DATE'}]
    },
    'relationships': [
        {'start': 'Person', 'type': 'KNOWS', 'end': 'Person'}
    ],
    'metadata': {
        'constraint': [
            {'id': 7, 'name': 'person_id', 'type': 'UNIQUENESS', 'entityType': 'NODE', 'labelsOrTypes': ['Person'], 'properties': ['id'], 'ownedIndex': 'person_id', 'propertyType': None},
        ],
        'index': [
            {'label': 'Person', 'properties': ['name'], 'size': 2, 'type': 'RANGE', 'valuesSelectivity': 1.0, 'distinctValues': 2.0},
        ]
    }
}

Note

The internal structure of the returned dict depends on the apoc.meta.data and apoc.schema.nodes procedures.

Warning

Some labels are excluded from the output schema:

The __Entity__ and __KGBuilder__ node labels which are created by the KG Builder pipeline within this package
Some labels related to Bloom internals.

Parameters:

driver (neo4j.Driver) – Neo4j Python driver instance.
is_enhanced (bool) – Flag indicating whether to format the schema with detailed statistics (True) or in a simpler overview format (False).
database (Optional[str]) – The name of the database to connect to. Default is ‘neo4j’.
timeout (Optional[float]) – The timeout for transactions in seconds. Useful for terminating long-running queries. By default, there is no timeout set.
sanitize (bool) – A flag to indicate whether to remove lists with more than 128 elements from results. Useful for removing embedding-like properties from database responses. Default is False.

Returns:

the graph schema information in a structured format.

Return type:

dict[str, Any]

neo4j_graphrag.schema.get_schema(driver, is_enhanced=False, database=None, timeout=None, sanitize=False)[source]¶

Returns the schema of the graph as a string with following format:

Node properties:
Person {id: INTEGER, name: STRING}
Relationship properties:
KNOWS {fromDate: DATE}
The relationships:
(:Person)-[:KNOWS]->(:Person)

Parameters:

driver (neo4j.Driver) – Neo4j Python driver instance.
is_enhanced (bool) – Flag indicating whether to format the schema with detailed statistics (True) or in a simpler overview format (False).
database (Optional[str]) – The name of the database to connect to. Default is ‘neo4j’.
timeout (Optional[float]) – The timeout for transactions in seconds. Useful for terminating long-running queries. By default, there is no timeout set.
sanitize (bool) – A flag to indicate whether to remove lists with more than 128 elements from results. Useful for removing embedding-like properties from database responses. Default is False.

Returns:

the graph schema information in a serialized format.

Return type:

neo4j_graphrag.schema.format_schema(schema, is_enhanced)[source]¶

Format the structured schema into a human-readable string.

Depending on the is_enhanced flag, this function either creates a concise listing of node labels and relationship types alongside their properties or generates an enhanced, more verbose representation with additional details like example or available values and min/max statistics. It also includes a formatted list of existing relationships.

Parameters:

schema (Dict[str, Any]) – The structured schema dictionary, containing properties for nodes and relationships as well as relationship definitions.
is_enhanced (bool) – Flag indicating whether to format the schema with detailed statistics (True) or in a simpler overview format (False).

Returns:

A formatted string representation of the graph schema, including node properties, relationship properties, and relationship patterns.

Return type: