LangChain Neo4j Integration

1AH05dvGA 7db EMySc9AAw

LangChain is a vast library for GenAI orchestration, it supports numerous LLMs, vector stores, document loaders and agents. It manages templates, composes components into chains and supports monitoring and observability.

The broad and deep Neo4j integration allows for vector search, cypher generation and database querying and knowledge graph construction.

Here is an overview of the Graph Integrations.

When upgrading to LangChain 0.1.0+ make sure to read this article: Updating GraphAcademy Neo4j & LLM Courses to Langchain v0.1.
llm chatbot python

Installation

pip install langchain langchain-community langchain-neo4j
# pip install langchain-openai tiktoken
# pip install neo4j

Functionality Includes

Neo4jVector

The Neo4j Vector integration supports a number of operations

  • create vector from langchain documents

  • query vector

  • query vector with additional graph retrieval Cypher query

  • construct vector instance from existing graph data

  • hybrid search

  • metadata filtering

from langchain.docstore.document import Document
from langchain.text_splitter import CharacterTextSplitter
from langchain_community.document_loaders import TextLoader
from langchain_neo4j import Neo4jVector
from langchain_openai import OpenAIEmbeddings

loader = TextLoader("../../modules/state_of_the_union.txt")

documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.split_documents(documents)

embeddings = OpenAIEmbeddings()

# The Neo4jVector Module will connect to Neo4j and create a vector index if needed.

db = Neo4jVector.from_documents(
    docs, embeddings, url=url, username=username, password=password
)

query = "What did the president say about Ketanji Brown Jackson"
docs_with_score = db.similarity_search_with_score(query, k=2)

Hybrid search combines vector search with fulltext search with re-ranking and de-duplication of the results.

from langchain.docstore.document import Document
from langchain.text_splitter import CharacterTextSplitter
from langchain_community.document_loaders import TextLoader
from langchain_neo4j import Neo4jVector
from langchain_openai import OpenAIEmbeddings

loader = TextLoader("../../modules/state_of_the_union.txt")

documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.split_documents(documents)

embeddings = OpenAIEmbeddings()

# The Neo4jVector Module will connect to Neo4j and create a vector index if needed.

db = Neo4jVector.from_documents(
    docs, embeddings, url=url, username=username, password=password,
    search_type="hybrid"
)

query = "What did the president say about Ketanji Brown Jackson"
docs_with_score = db.similarity_search_with_score(query, k=2)

Metadata filtering

Metadata filtering enhances vector search by allowing searches to be refined based on specific node properties. This integrated approach ensures more precise and relevant search results by leveraging both the vector similarities and the contextual attributes of the nodes.

db = Neo4jVector.from_documents(
    docs,
    OpenAIEmbeddings(),
    url=url, username=username, password=password
)

query = "What did the president say about Ketanji Brown Jackson"
filter = {"name": {"$eq": "adam"}}

docs = db.similarity_search(query, filter=filter)

Neo4j Graph

The Neo4j Graph integration is a wrapper for the Neo4j Python driver. It allows querying and updating the Neo4j database in a simplified manner from LangChain. Many integrations allow you to use the Neo4j Graph as a source of data for LangChain.

from langchain_neo4j import Neo4jGraph

graph = Neo4jGraph(url=NEO4J_URI, username=NEO4J_USERNAME, password=NEO4J_PASSWORD)

QUERY = """
MATCH (m:Movie)-[:IN_GENRE]->(:Genre {name: $genre})
RETURN m.title, m.plot
ORDER BY m.imdbRating DESC LIMIT 5
"""

graph.query(QUERY, params={"genre": "action"})

CypherQAChain

The CypherQAChain is a LangChain component that allows you to interact with a Neo4j graph database in natural language. Using an LLM and the graph schema it translates the user question into a Cypher query, executes it against the graph and uses the returned context information and the original question with a second LLM to generate a natural language response.

# pip install --upgrade --quiet  langchain
# pip install --upgrade --quiet  langchain-openai

from langchain_neo4j import Neo4jGraph, GraphCypherQAChain
from langchain_openai import ChatOpenAI

graph = Neo4jGraph(url=NEO4J_URI, username=NEO4J_USERNAME, password=NEO4J_PASSWORD)

# Insert some movie data
graph.query(
"""
MERGE (m:Movie {title:'Top Gun'})
WITH m
UNWIND ['Tom Cruise', 'Val Kilmer', 'Anthony Edwards', 'Meg Ryan'] AS actor
MERGE (a:Actor {name:actor})
MERGE (a)-[:ACTED_IN]->(m)
"""
)

chain = GraphCypherQAChain.from_llm(
    ChatOpenAI(temperature=0), graph=graph, verbose=True,
    allow_dangerous_requests=True
)

chain.run("Who acted in Top Gun?")

Advanced RAG Strategies

Besides the basic RAG strategy, the Neo4j Integration in LangChain supports advanced RAG strategies that allow for more complex retrieval strategies. These are also available as LangChain Templates.

  • regular rag - direct vector search

  • parent - child retriever that links embedded chunks representing specific concepts to parent documents

  • hypothetical questions - generate questions from the document chunks and vector index those to have better matching candidates for user questions

  • summary - index summaries of the documents not the whole document

  • Implementing Advanced Retrieval RAG Strategies with Neo4j

pip install -U "langchain-cli[serve]"

langchain app new my-app --package neo4j-advanced-rag

# update server.py to add the neo4j-advanced-rag template as an endpoint
cat <<EOF > server.py
from fastapi import FastAPI
from langserve import add_routes

from neo4j_advanced_rag import chain as neo4j_advanced_chain

app = FastAPI()

# Add this
add_routes(app, neo4j_advanced_chain, path="/neo4j-advanced-rag")


if __name__ == "__main__":
    import uvicorn

    uvicorn.run(app, host="0.0.0.0", port=8000)
EOF

langchain serve
jfDNiPa5ccefX6h0HiVzJbqnlgAZgfPda90truHSfbwSs3JkfxZ xbA9mZE8y2fNf 3n5cgVhbdhN0ryuMoK2JNbMgTe1OLJMA6CQRhWBxzdKRLVurUFDndT7ki4vMh cdv3SAn040HTpab9XkzGj5Q

Semantic Layer

A semantic layer on top of a (graph) database doesn’t rely on automatic query generation but offers a number of APIs and tools to give the LLM access to the database and it’s structures.

Unlike automatically generated queries, these tools are safe to use as they are implemented using correct queries and interactions and only take parameters from the LLM.

Many cloud (llm) providers offer similar integrations either via function calling (OpenAI, Anthropic) or extensions (Google Vertex AI, AWS Bedrock).

Examples for such tools or functions include:

Conversational Memory

Storing the conversation, i.e. the flow of questions and answers of user sessions in a graph allows you to analyze the conversation history and use it to improve the user experience.

You can index embeddings for and link questions and answers back to the retrieved chunks and entities in the graph and use user feedback to re-rank those inputs for future similar questions.

Knowledge Graph Construction

Creating a Knowledge Graph from unstructured data like PDF documents used to be a complex and time-consuming task that required training and using dedicated, large NLP models.

The Graph Transformers are tools that allows you to extract structured data from unstructured documents and transform it into a Knowledge Graph.

You can see a practical application, code and demo for extracting knowledge graphs from PDFs, YouTube transcripts, wikipedia articles and more with the LLM Graph Builder.

Starter Kit

This starter-kit demonstrates how to run a FastAPI server using LangChain to answer queries on data stored in a Neo4j instance. The single endpoint can be used to retrieve answers using either a Vector index chain, GraphCypherQA Chain, or a composite answer of both. The langserve branch contains an example of the same service, using LangServe

See this Developer Blog Article for additional details and instructions on working with the Starter Kit.

Authors

Tomaz Bratanic

Community Support

Neo4j Online Community

LangChain Repository

GitHub

LangChain Issues

Issues

LangChain Neo4j Repository

GitHub

LangChain Neo4j API Docs

API Docs

LangChain Neo4j Issues

Issues

Documentation

Neo4j Integrations

Documentation

Graph Overview Docs

Starter Kit

LangChain Starter Kit

Juypter

Jupyter Notebooks

Videos & Tutorials