GenAI Stack Walkthrough: Behind the Scenes With Neo4j, LangChain, and Ollama in Docker


Interest in GenAI remains high, with new innovations emerging daily. To accelerate GenAI experimentation and learning, Neo4j has partnered with Docker, LangChain, and Ollama to announce the GenAI Stack – a pre-built development environment environment for creating GenAI applications. In this blog, you will learn how to implement a support agent that relies on information from Stack Overflow by following best practices and using trusted components.

Retrieval Augmented Generation (RAG)

Simply developing a wrapper around an LLM API doesn’t guarantee success with generated responses because well-known challenges with accuracy and knowledge cut-off go unaddressed. In this blog, we walk you through using the GenAI Stack to explore the approaches of using retrieval augmented generation (RAG) to improve accuracy, relevance, and provenance compared to relying on the internal knowledge of an LLM. Follow along to experiment with two approaches to information retrieval:
    • Using plain LLM and relying on their internal knowledge
    • Augmenting LLMs with additional information by combining vector search and context from the knowledge graph
The idea behind RAG applications is to provide LLMs with additional context at query time for answering the user’s question.
    1. When a user asks the support agent a question, the question first goes through an embedding model to calculate its vector representation.
    2. The next step is to find the most relevant nodes in the database by comparing the cosine similarity of the embedding values of the user’s question and the documents in the database.
    3. Once the relevant nodes are identified using vector search, the application is designed to retrieve additional information from the nodes themselves and also by traversing the relationships in the graph.
    4. Finally, the context information from the database is combined with the user question and additional instructions into a prompt that is passed to an LLM to generate the final answer, which is then sent to the user.

Open Source Local LLMs

Open-source LLM research has significantly advanced in recent times. Models like Llama2 and Mistral are showing impressive levels of accuracy and performance, making them a viable alternative for their commercial counterparts. A significant benefit of using open source LLMs is removing the dependency to an external LLM provider while retaining complete control over the data flows and how the data is being shared and stored. The maintainers behind the Ollama project have recognized the opportunity of open source LLMs by providing a seamless solution to set up and run local LLMs on your own infrastructure or even a laptop.

What Is the GenAI Stack?

The GenAI Stack is a set of Docker containers that are orchestrated by Docker Compose which includes a management tool for local LLMs (Ollama), a database for grounding (Neo4j), and GenAI apps based on LangChain. The containers provide a dev environment of a pre-built, support agent app with data import and response generation use-cases. You can experiment with importing different information in the knowledge graph and examine how the variety in underlying grounding information affects the generated responses by the LLM in the user interface. The GenAI Stack consists of:
    • Application containers (the application logic in Python built with LangChain for the orchestration and Streamlit for the UI).
    • Database container with vector index and graph search (Neo4j).
    • LLM container Ollama (if you’re on Linux). If you’re on MacOS, install Ollama outside of Docker.
These containers are tied together with Docker compose. Docker compose has a watch mode setup that rebuilds relevant containers any time you make a change to the application code, allowing for fast feedback loops and a good developer experience. You can use the GenAI Stack to quickly experiment with building and running GenAI apps in a trusted environment with ready-to-use, code-first examples. The setup is effortless, and all the components are guaranteed to run and work together. The GenAI Stack gives you a quick way to try out and evaluate different approaches to knowledge retrieval search and summarization so you can find the most accurate, explainable, and relevant responses for your users. Lastly, you can easily build on top of the sample code for your own needs.

How Do I Get It Running on My Machine?

In the Learning Center of Docker Desktop, there is now a new entry for “GenAI Stack” that you can follow.
 

GitHub Repository

  You can download or clone the repository from the Docker GitHub organization.

Spinning It Up Using Defaults

NOTE: The default LLM is Llama2 via Ollama, so you need to make sure you have Ollama installed first if you’re on MacOS. For a quick start using the default configuration, git clone the code repository and invoke this in your terminal. It uses the defaults in docker-compose.yml docker compose up This will download (at the first run) and start all containers in dependency order. The data import application will be running on http://localhost:8502 and the chat interface on http://localhost:8501. First, you should pick a StackOverflow tag that you’re interested in and load the last few hundred questions into the database. Then, you can open the chat interface and test out different questions that may or may not be in the public training data or the knowledge base.

Example Application: An Internal Support Agent Chat Interface

We use a fictional use-case of a technology company running a support organization for its products where human support agents answer questions from end-users. To do that, they use an internal knowledge base of existing questions and answers. So far, the system has relied on keyword search. To utilize the new capabilities of GenAI for natural language search and summarization, our developer team has been asked to build a prototype for a new natural language chat interface that either uses LLMs by themselves or by combining them with the data from the existing knowledge base. As internal knowledge bases are not accessible for public demos, we are using subsets of StackOverflow data to simulate that database. The demo applications that ship with the GenAI Stack showcase three things:
    1. Import and embed recent question-answer data from Stack Overflow via tags.
    2. Query the imported data via a chat interface using vector + graph search.
    3. Generate new questions is the style of highly ranked existing ones.

Import and Embed Data From Stack Overflow via Tags

The application served on http://localhost:8502 is a data import application that lets the user quickly import Stack Overflow question-answer data into Neo4j.
The data importer application fetches data from Stack Overflow via API requests and then embeds the content using LangChain Embeddings and stores the question-answer data into Neo4j. Additionally, it creates a vector search index to make sure relevant information can be easily and quickly retrieved by the chat or other applications. The data importer application allows users to specify a tag and the number of recent questions (in batches of 100) to import from the StackOverflow API.
It will take a minute or two to run the import. Most of the time is spent generating the embeddings. After or during the import you can click the link to http://localhost:7474 and log in with username “neo4j” and password “password” as configured in docker compose. There, you can see an overview in the left sidebar and show some connected data by clicking on the “pill” with the counts. The data loader will import the graph using the following schema.
The graph schema for Stack Overflow consists of nodes representing Questions, Answers, Users, and Tags. Users are linked to Questions they’ve asked via the “ASKED” relationship and to Answers they’ve provided with the “ANSWERS” relationship. Each Answer is also inherently associated with a specific Question. Furthermore, Questions are categorized by their relevant topics or technologies using the “TAGGED” relationship connecting them to Tags. You can see a subset of data imported as a graph below.
 

Support Agent App: Query the Imported Data via a Chat Interface Using Vector + Graph Search

This application server on http://localhost:8501 has the classic LLM chat UI and lets the user ask questions and get answers. There’s a switch called RAG mode where the user can rely either completely on the LLMs trained knowledge (RAG: Disabled), or the more capable (RAG: Enabled) mode where the application uses similarity search using text embedding and graph queries to find the most relevant questions and answers in the database.
By traversing the data in the graph, we can give the LLM more context-rich and accurate information to answer the question than what a pure vector lookup would provide. This is a very powerful capability and provides a better user experience. In our case, we are finding the most relevant (accepted and scored) answers for the questions returned from the similarity search, but this could go much further by e.g. taking relevant tags into account. Here is the Python code that uses LangChain to achieve the described functionality
qa_chain = load_qa_with_sources_chain( llm, 
           chain_type="stuff", prompt=qa_prompt)
  # Vector + Knowledge Graph response   kg = Neo4jVector.from_existing_index(   embedding=embeddings, url=url,…, index_name=”stackoverflow”,   retrieval_query=”””   CALL { with question   MATCH (question)<-[:ANSWERS]-(answer) RETURN answer   ORDER BY answer.is_accepted DESC, answer.score DESC LIMIT 2 }   RETURN question.title + ‘ ‘ + question.body + ‘ ‘   + collect(answer.body) AS text, similarity, {source: question.link} AS metadata   ORDER BY similarity   “””,   )   kg_qa = RetrievalQAWithSourcesChain(   combine_documents_chain=qa_chain,   retriever=kg.as_retriever(search_kwargs={“k”: 2})) Pure vector search is also supported in LangChain and Neo4j. Because RAG applications can provide the sources that were used to generate the answer, they allow the user to trust and verify, unlike pure LLM answers.  
When the LLM generates the answer from our context, it is also instructed in the prompt to provide the source of information used to create the response. The provided sources are links to Stack Overflow questions, as that is the data we used to ground the LLMs.

Generate New Questions in the Style of Highly Ranked Existing Ones

The last feature of this demo application is to let the LLM generate a new question in the style of highly ranked questions already in the database. The imaginary situation here is that the support agent cannot find an answer to the end-user question in the existing knowledge base and therefore wants to post a new question to the internal engineering support team.
When the user clicks the “Generate ticket” button, the LLM is fed high-ranked questions from the database, together with the user question, and is asked to create a new ticket based on the original user question with the same tone, style, and quality as the high ranked ones.
This part was the trickiest to get working as the local LLMs have less generation quality than large models and tend not to follow instructions well.

Custom Setup

For a more custom setup, follow the steps below to configure the stack.

Step 1: The local LLM

  If you want to use a local LLM and are on MacOS, you first need to install Ollama on your Mac. This is due to the lack of GPU support when running inside of a container. Once that’s done, you’d want to pull the model you want to use by opening the terminal and executing “ollama pull llama2” if you want to use the llama2 model. The complete list of available models can be found here.

Step 2: Environment Variables

  Copy example.env to a new file named .env. Edit the new file to decide what LLM you want to use.

LLM

  If you want to use any of the OpenAI LLMs, you need to insert an OpenAI API key and set either gpt-3.5 or gpt-4 as the value for the key LLM. For using other local LLMs via Ollama, you specify the model you want to use (by tag found here), for example, “llama2:7b” or “mistral”.

Database / Neo4j

  If you want to use a local containerized instance of Neo4j, there’s no need to specify any of the Neo4j-related keys in the .env file. A default password named “password” is specified in the docker-compose.yml file. For usage of a remote Neo4j instance (for example, in Neo4j Aura), uncomment the Neo4j-related variables and add values to them. (You get those credentials as a text file download when you spin up your cloud instance).

Application Debugging With LangSmith

  If you want to observe and debug this LangChain application using LangSmith, log in to your account, create a project and API key, and add them as environment variables.

Step 3: Start

  Once the one-time previous steps are completed, you can start the applications by invoking docker compose up in a terminal.

How Do I Adjust the Code and See My Changes?

Python

  If you want to make changes to the Python code (loader.py or bot.py) and have the affected containers automatically rebuild when your changes are saved, you can open a new terminal window and invoke “docker compose alpha watch”. Any changes you make to the Python files will now rebuild the container it’s included in, giving you a good developer experience.

Database

  For any data changes, you can go to http://localhost:7474 to load Neo4j Browser (password is “password” configured in docker-compose.yml file) so explore, edit, add, and delete any data in the database. The configuration uses a local “data” folder in your current working directory to keep the database files across container rebuilds and restarts. To reset from scratch, delete that folder.

How Do I Continue From Here?

From here, you can make any UI changes you want using the Streamlit framework. Maybe you want to serve the capabilities as an API instead? Install FastAPI or Flask, expose chat endpoints, and build your UI using any front-end technology. If you have private internal data like Obsidian markdown notes, slack conversations, or a real knowledge base, embed them and start asking them questions. LangChain has a lot of integrations if you want to add and combine multiple data sources or other LLM providers in the GenAI application. You can also check out our “Chat with your PDF” example application, which is also included with the stack. It allows you to upload PDF files, which will be chunked and turned into embeddings, and then ask questions about their content.

What’s Next?

We hope that the GenAI Stack helps you get started with GenAI apps and provides all the necessary building blocks out of the box. Please try it out, provide us feedback via GitHub issues or pull requests, and spread the word to your friends and colleagues who have felt overwhelmed when trying to start with GenAI applications. Get started with the GenAI Stack at the GitHub repository or the Docker Desktop Learning Center. You can make use of the GenAI Stack at the Docker AI/ML Hackathon that is starting at DockerCon this week and running for 5 weeks. Discover more about Neo4j’s GenAI capabilities here.  
Learn to build GenAI apps with graph technology at NODES 2023, our online developer conference on Oct 26. All 3 of us be presenting there on GenAI and the GenAI Stack.