Documentation for local deployments

Prerequisites

You will need to have a Neo4j Database V5.15 or later with APOC installed to use this Knowledge Graph Builder. You can use any Neo4j Aura database (including the free tier database). Neo4j Aura automatically includes APOC and run on the latest Neo4j version, making it a great choice to get started quickly. You can also use the free trial in Neo4j Sandbox, which also includes Graph Data Science.

If want to use Neo4j Desktop instead, you will not be able to use the docker-compose deployment method. You will have to follow the separate deployment of backend and frontend section.

Docker-compose

By default only OpenAI and Diffbot are enabled since Gemini requires extra GCP configurations.

In your root folder, create a .env file with your OPENAI and DIFFBOT keys (if you want to use both):

OPENAI_API_KEY="your-openai-key"
DIFFBOT_API_KEY="your-diffbot-key"

if you only want OpenAI:

LLM_MODELS="OpenAI GPT 3.5,OpenAI GPT 4o"
OPENAI_API_KEY="your-openai-key"

if you only want Diffbot:

LLM_MODELS="Diffbot"
DIFFBOT_API_KEY="your-diffbot-key"

You can then run Docker Compose to build and start all components:

docker-compose up --build

Additional configs

By default, the input sources will be: Local files, Youtube, Wikipedia and AWS S3. This is the default config applied if you do not overwrite it in your .env file:

REACT_APP_SOURCES="local,youtube,wiki,s3"

If however you want the Google GCS integration, add gcs and your Google client ID:

REACT_APP_SOURCES="local,youtube,wiki,s3,gcs"
GOOGLE_CLIENT_ID="xxxx"

The REACT_APP_SOURCES should be a comma-separated list of the sources you want to enable. You can of course combine all (local, youtube, wikipedia, s3 and gcs) or remove any you don’t want or need.

Development (Separate Frontend and Backend)

Alternatively, you can run the backend and frontend separately:

  • For the frontend:

    1. Create the frontend/.env file by copy/pasting the frontend/example.env.

    2. Change values as needed

    3. Run:

cd frontend
yarn
yarn run dev
  • For the backend:

    1. Create the backend/.env file by copy/pasting the backend/example.env.

    2. Change values as needed

    3. Run:

cd backend
python -m venv envName
source envName/bin/activate
pip install -r requirements.txt
uvicorn score:app --reload

ENV

Env Variable Name Mandatory/Optional Default Value Description

OPENAI_API_KEY

Optional

sk-…​

API key for OpenAI (if enabled)

DIFFBOT_API_KEY

Optional

API key for Diffbot (if enabled)

EMBEDDING_MODEL

Optional

all-MiniLM-L6-v2

Model for generating the text embedding (all-MiniLM-L6-v2 , openai , vertexai)

IS_EMBEDDING

Optional

true

Flag to enable text embedding

KNN_MIN_SCORE

Optional

0.94

Minimum score for KNN algorithm for connecting similar Chunks

GEMINI_ENABLED

Optional

False

Flag to enable Gemini

GCP_LOG_METRICS_ENABLED

Optional

False

Flag to enable Google Cloud logs

NUMBER_OF_CHUNKS_TO_COMBINE

Optional

6

Number of chunks to combine when extracting entities

UPDATE_GRAPH_CHUNKS_PROCESSED

Optional

20

Number of chunks processed before writing to the database and updating progress

NEO4J_URI

Optional

neo4j://database:7687

URI for Neo4j database

NEO4J_USERNAME

Optional

neo4j

Username for Neo4j database

NEO4J_PASSWORD

Optional

password

Password for Neo4j database

LANGCHAIN_API_KEY

Optional

API key for LangSmith

LANGCHAIN_PROJECT

Optional

Project for LangSmith

LANGCHAIN_TRACING_V2

Optional

true

Flag to enable LangSmith tracing

LANGCHAIN_ENDPOINT

Optional

https://api.smith.langchain.com

Endpoint for LangSmith API

BACKEND_API_URL

Optional

http://localhost:8000

URL for backend API

BLOOM_URL

Optional

https://workspace-preview.neo4j.io/workspace/explore?connectURL={CONNECT_URL}&search=Show+me+a+graph

URL for Bloom visualization

REACT_APP_SOURCES

Optional

local,youtube,wiki,s3

List of input sources that will be available

LLM_MODELS

Optional

Diffbot,OpenAI GPT 3.5,OpenAI GPT 4o

Models available for selection on the frontend, used for entities extraction and Q&A Chatbot (other models: Gemini 1.0 Pro, Gemini 1.5 Pro, OpenAI GPT 4)

ENV

Optional

DEV

Environment variable for the app

TIME_PER_CHUNK

Optional

4

Time per chunk for processing

CHUNK_SIZE

Optional

5242880

Size of each chunk for processing

GOOGLE_CLIENT_ID

Optional

Client ID for Google authentication for GCS upload