LangChain Neo4j Starter Kit


Photo by Supply

Have you heard of Generative AI (GenAI) and Graph Retrieval-Augmented Generation (GraphRAG), but aren’t sure where to begin?

Say hello to the Neo4j LangChain Starter Kit for Python developers. This code creates a server with a single REST endpoint that generates GenAI answers backed by data stored in a Neo4j Graph Database. This server can be run locally or hosted, and is usable by any service that can make a POST call.

The kit showcases how to combine the following technologies:

  • Neo4j
  • FastAPI
  • OpenAI
  • LangChain

Neo4j

Neo4j is a graph database that lets you store, query, and analyze complex interconnected data. It can be used as a general purpose database but excels at powering applications like social networks, supply-chain management, fraud detection, and recommendation services.

To run the kit, you’ll need credentials to an actively running and populated Neo4j database.

If you’ve never used Neo4j before, credentials are included in the repo with read-only access to a hosted database. This database contains recent public EDGAR SEC filing information.

This is the data model of the dataset:

Note the Node Labels and available Relationship types. The prompts and chains used in the kit explicitly instruct an LLM to use the data stored in a specified Neo4j database and nothing else.

Asking the kit referencing the SEC data above will only answer questions about publicly listed companies, who submitted those filings, and any details contained in those filings.

The kit code is generic though, so referencing a Neo4j database with another collection of data will allow you to ask an entirely different domain of questions.

More than 20 graph datasets are available for free with Neo4j Sandbox instances. Credentials from one of these databases can also be used with the starter kit.

Subset of available Sandbox datasets

FastAPI

This is a popular Python framework for creating APIs. It requires less boilerplate code than alternate frameworks and auto-generates interactive documentation for endpoints. This allows for the testing of APIs directly from a browser.

The starter kit code builds and runs a FastAPI server.

OpenAI

OpenAI has several Large Language Models (LLMs) developers can make use of. These models are essentially complex functions created from a Machine Learning (ML) process. They take natural language questions as input and output human-like answers. OpenAI hosts several models and provides SDKs and a cloud API for using these models without needing to download them.

To run the starter kit, you need an active OpenAI API key.

LangChain

This is an LLM orchestration framework to make implementing LLM applications more modular. So to swap out OpenAI with Anthropic, for example, the following code:

from langchain_openai import ChatOpenAI

LLM = ChatOpenAI(temperature=0, openai_api_key=<OPENAI_API_KEY>)

graph_chain = GraphCypherQAChain.from_llm(
cypher_llm=LLM,
qa_llm=LLM,
graph=graph
)

could be replaced with:

from langchain_anthropic import ChatAnthropic

LLM = ChatAnthropic(
temperature=0, api_key=<ANTHROPIC_API_KEY>, model_name="claude-3-opus-20240229"
)

graph_chain = GraphCypherQAChain.from_llm(
cypher_llm=LLM,
qa_llm=LLM,
graph=graph
)

LangChain has a construct known as chains. These are a series of calls or processes that can be strung together to execute sequences of GenAI-related tasks. These chains can be wrapped in interfaces called tools, which in turn can be used by agents for creating complex and semi-autonomous systems.

The starter kit is designed to be as simple as possible, so tools and agents are not used. Three chains in three separate files are included: graph_chain.py, vector_chain.py, and simple_agent.py.

The graph_chain file shows how to set up and run a GraphCypherQAChain. This chain converts a natural language query into Cypher, queries a specified Neo4j database, then returns an answer.

An example of using vector similarity searches with Neo4jVector can be found in the vector_chain file. The code in both the graph_chain and vector_chain files are meant to be independently portable, so either can be copied and pasted to other projects without any other files in the kit.

The simple_agent file contains an example of using a prompt plus the answers from both chains to generate a composite answer.

Running the Starter Kit

Running the kit requires cloning it and having Poetry installed on the host system. Poetry is a dependency management and virtual environment management tool, like pip or pipenv.

Once the above requirements are met, run poetry install to load all the dependencies listed in the included pyproject.toml file.

The Neo4j database and OpenAI credentials can be stored in a .env file, which would need to be created in the root folder of the cloned kit code. For security reasons and good practice, .envfiles aren’t (usually) included in Git repos.

#Sample .env file contents

NEO4J_URI=<database_uri>
NEO4J_USERNAME=<database_username>
NEO4J_PASSWORD=<database_password>
NEO4J_DATABASE=<database_base_name>
OPENAI_API_KEY=<openai_key>

Alternatively, env variables can be passed directly when running with Poetry:

NEO4J_URI=<database_uri> \
NEO4J_USERNAME=<database_username> \
NEO4J_PASSWORD=<database_password> \
NEO4J_DATABASE=<database_name> \
OPENAI_API_KEY=<openai_key> \
poetry run uvicorn app.server:app --reload --port=8000

Now open a browser to localhost:8000/docs and you should see:

Click on the greenapi/chat row to expand its details and options. This is where you can try out the endpoint from the browser.

Click the Try it out button, replace “string” with your own question, like “How many forms are there?”, then click the Execute button. After a few moments, an answer should appear in the Responses section.

Note that an Internal Error message will appear if either the Neo4j or OpenAI credentials are invalid.

What Next

The Neo4j LangChain Starter Kit is a basic entry point into the LangChain ecosystem and world of GenAI with graphs.

The LangChain components worth looking into next are LLM Graph Transformer, DiffbotGraphTransformer, and LangGraph. The graph transformers use LLMs to convert text into graph data that can be loaded directly into Neo4j. LangGraph is for creating multi-agent workflows that can run in a loop and conditionally use tools based on the decisions the agents make.

If you’d like to learn more about building Neo4j and LLM applications, check out GraphAcademy.


LangChain Neo4j Starter Kit was originally published in Neo4j Developer Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.