LangChain Neo4j Integration

LangChain is a vast library for GenAI orchestration, it supports numerous LLMs, vector stores, document loaders and agents. It manages templates, composes components into chains and supports monitoring and observability.

The broad and deep Neo4j integration allows for vector search, cypher generation and database querying and knowledge graph construction.

Here is an overview of the Graph Integrations.

Integrating Neo4j into the LangChain Ecosystem

When upgrading to LangChain 0.1.0+ make sure to read this article: Updating GraphAcademy Neo4j & LLM Courses to Langchain v0.1.

Installation

pip install langchain langchain-community langchain-neo4j
# pip install langchain-openai tiktoken
# pip install neo4j

Functionality Includes

Neo4jVector

The Neo4j Vector integration supports a number of operations

create vector from langchain documents
query vector
query vector with additional graph retrieval Cypher query
construct vector instance from existing graph data
hybrid search
metadata filtering

from langchain.docstore.document import Document
from langchain.text_splitter import CharacterTextSplitter
from langchain_community.document_loaders import TextLoader
from langchain_neo4j import Neo4jVector
from langchain_openai import OpenAIEmbeddings

loader = TextLoader("../../modules/state_of_the_union.txt")

documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.split_documents(documents)

embeddings = OpenAIEmbeddings()

# The Neo4jVector Module will connect to Neo4j and create a vector index if needed.

db = Neo4jVector.from_documents(
    docs, embeddings, url=url, username=username, password=password
)

query = "What did the president say about Ketanji Brown Jackson"
docs_with_score = db.similarity_search_with_score(query, k=2)

Hybrid Search

Hybrid search combines vector search with fulltext search with re-ranking and de-duplication of the results.

from langchain.docstore.document import Document
from langchain.text_splitter import CharacterTextSplitter
from langchain_community.document_loaders import TextLoader
from langchain_neo4j import Neo4jVector
from langchain_openai import OpenAIEmbeddings

loader = TextLoader("../../modules/state_of_the_union.txt")

documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.split_documents(documents)

embeddings = OpenAIEmbeddings()

# The Neo4jVector Module will connect to Neo4j and create a vector index if needed.

db = Neo4jVector.from_documents(
    docs, embeddings, url=url, username=username, password=password,
    search_type="hybrid"
)

query = "What did the president say about Ketanji Brown Jackson"
docs_with_score = db.similarity_search_with_score(query, k=2)

Metadata filtering

Metadata filtering enhances vector search by allowing searches to be refined based on specific node properties. This integrated approach ensures more precise and relevant search results by leveraging both the vector similarities and the contextual attributes of the nodes.

db = Neo4jVector.from_documents(
    docs,
    OpenAIEmbeddings(),
    url=url, username=username, password=password
)

query = "What did the president say about Ketanji Brown Jackson"
filter = {"name": {"$eq": "adam"}}

docs = db.similarity_search(query, filter=filter)

Neo4j Graph

The Neo4j Graph integration is a wrapper for the Neo4j Python driver. It allows querying and updating the Neo4j database in a simplified manner from LangChain. Many integrations allow you to use the Neo4j Graph as a source of data for LangChain.

from langchain_neo4j import Neo4jGraph

graph = Neo4jGraph(url=NEO4J_URI, username=NEO4J_USERNAME, password=NEO4J_PASSWORD)

QUERY = """
MATCH (m:Movie)-[:IN_GENRE]->(:Genre {name: $genre})
RETURN m.title, m.plot
ORDER BY m.imdbRating DESC LIMIT 5
"""

graph.query(QUERY, params={"genre": "action"})

CypherQAChain

The CypherQAChain is a LangChain component that allows you to interact with a Neo4j graph database in natural language. Using an LLM and the graph schema it translates the user question into a Cypher query, executes it against the graph and uses the returned context information and the original question with a second LLM to generate a natural language response.

# pip install --upgrade --quiet  langchain
# pip install --upgrade --quiet  langchain-openai

from langchain_neo4j import Neo4jGraph, GraphCypherQAChain
from langchain_openai import ChatOpenAI

graph = Neo4jGraph(url=NEO4J_URI, username=NEO4J_USERNAME, password=NEO4J_PASSWORD)

# Insert some movie data
graph.query(
"""
MERGE (m:Movie {title:'Top Gun'})
WITH m
UNWIND ['Tom Cruise', 'Val Kilmer', 'Anthony Edwards', 'Meg Ryan'] AS actor
MERGE (a:Actor {name:actor})
MERGE (a)-[:ACTED_IN]->(m)
"""
)

chain = GraphCypherQAChain.from_llm(
    ChatOpenAI(temperature=0), graph=graph, verbose=True,
    allow_dangerous_requests=True
)

chain.run("Who acted in Top Gun?")

Advanced RAG Strategies

Besides the basic RAG strategy, the Neo4j Integration in LangChain supports advanced RAG strategies that allow for more complex retrieval strategies. These are also available as LangChain Templates.

regular rag - direct vector search
parent - child retriever that links embedded chunks representing specific concepts to parent documents
hypothetical questions - generate questions from the document chunks and vector index those to have better matching candidates for user questions
summary - index summaries of the documents not the whole document
Implementing Advanced Retrieval RAG Strategies with Neo4j

pip install -U "langchain-cli[serve]"

langchain app new my-app --package neo4j-advanced-rag

# update server.py to add the neo4j-advanced-rag template as an endpoint
cat <<EOF > server.py
from fastapi import FastAPI
from langserve import add_routes

from neo4j_advanced_rag import chain as neo4j_advanced_chain

app = FastAPI()

# Add this
add_routes(app, neo4j_advanced_chain, path="/neo4j-advanced-rag")


if __name__ == "__main__":
    import uvicorn

    uvicorn.run(app, host="0.0.0.0", port=8000)
EOF

langchain serve

jfDNiPa5ccefX6h0HiVzJbqnlgAZgfPda90truHSfbwSs3JkfxZ xbA9mZE8y2fNf 3n5cgVhbdhN0ryuMoK2JNbMgTe1OLJMA6CQRhWBxzdKRLVurUFDndT7ki4vMh cdv3SAn040HTpab9XkzGj5Q

Semantic Layer

A semantic layer on top of a (graph) database doesn’t rely on automatic query generation but offers a number of APIs and tools to give the LLM access to the database and it’s structures.

Unlike automatically generated queries, these tools are safe to use as they are implemented using correct queries and interactions and only take parameters from the LLM.

Many cloud (llm) providers offer similar integrations either via function calling (OpenAI, Anthropic) or extensions (Google Vertex AI, AWS Bedrock).

Examples for such tools or functions include:

retrieve entities with certain names
retrieve the neighbors of a node
retrieve a shortest path between two nodes
Enhancing Interaction Between Language Models and Graph Databases via a Semantic Layer

Conversational Memory

Storing the conversation, i.e. the flow of questions and answers of user sessions in a graph allows you to analyze the conversation history and use it to improve the user experience.

You can index embeddings for and link questions and answers back to the retrieved chunks and entities in the graph and use user feedback to re-rank those inputs for future similar questions.

Knowledge Graph Construction

Creating a Knowledge Graph from unstructured data like PDF documents used to be a complex and time-consuming task that required training and using dedicated, large NLP models.

The Graph Transformers are tools that allows you to extract structured data from unstructured documents and transform it into a Knowledge Graph.

You can see a practical application, code and demo for extracting knowledge graphs from PDFs, YouTube transcripts, wikipedia articles and more with the LLM Graph Builder.

image

Documentation

Starter Kit

This starter-kit demonstrates how to run a FastAPI server using LangChain to answer queries on data stored in a Neo4j instance. The single endpoint can be used to retrieve answers using either a Vector index chain, GraphCypherQA Chain, or a composite answer of both. The langserve branch contains an example of the same service, using LangServe

See this Developer Blog Article for additional details and instructions on working with the Starter Kit.