Constructing Knowledge Graphs With Neo4j GraphRAG for Python

Photo of Martin O'Hanlon

Martin O'Hanlon

Technical Curriculum Developer, Neo4j

A knowledge graph is an organized representation of real-world entities and their relationships. Knowledge graphs provide a structured way to represent entities, their attributes, and their relationships, allowing for a comprehensive and interconnected understanding of the information.

Creating knowledge graphs from unstructured data can be complex, involving multiple steps of data query, cleansing, and transforms. You can use the text analysis capabilities of LLMs to help automate knowledge graph creation.

The Neo4j GraphRAG for Python (neo4j_graphrag) package includes a Knowledge Graph Builder to help you convert your unstructured and structured data.

Knowledge Graph Builder

The SimpleKGPipeline class provides a pipeline that implements a series of steps to create a knowledge graph from unstructured data:

  1. Load the text
  2. Split the text into chunks
  3. Create embeddings for each chunk
  4. Extract entities from the chunks using an LLM
  5. Write the data to a Neo4j database

For example, you could turn the Neo4j Wikipedia page into a graph representing Neo4j the organization and the database.

The SimpleKGPipeline only requires a Neo4j connection, an embedding model, and an LLM to turn your documents into a knowledge graph.

import os
from dotenv import load_dotenv
load_dotenv()

import asyncio

from neo4j import GraphDatabase
from neo4j_graphrag.llm import OpenAILLM
from neo4j_graphrag.embeddings import OpenAIEmbeddings
from neo4j_graphrag.experimental.pipeline.kg_builder import SimpleKGPipeline

neo4j_driver = GraphDatabase.driver(
os.getenv("NEO4J_URI"),
auth=(os.getenv("NEO4J_USERNAME"), os.getenv("NEO4J_PASSWORD"))
)
neo4j_driver.verify_connectivity()

llm = OpenAILLM(
model_name="gpt-4o",
model_params={
"temperature": 0,
"response_format": {"type": "json_object"},
}
)

embedder = OpenAIEmbeddings(
model="text-embedding-ada-002"
)

kg_builder = SimpleKGPipeline(
llm=llm,
driver=neo4j_driver,
neo4j_database=os.getenv("NEO4J_DATABASE"),
embedder=embedder,
from_pdf=True,
)

pdf_file = ".my_document.pdf"
result = asyncio.run(kg_builder.run_async(file_path=pdf_file))
print(result.result)

You can learn how to use and customize the SimpleKGBuilder in a new GraphAcademy course: Constructing Knowledge Graphs with Neo4j GraphRAG for Python.

You’ll also learn how to:

  • Create text splitters and define chunks
  • Implement custom data loaders
  • Define a schema for your lexical (unstructured) graph to ensure that you’re extracting the data you need
  • Add structured data alongside your unstructured data
  • Create GraphRAG pipelines and retrievers to access your knowledge graph

Summary

Knowledge graphs help you organize and make sense of your data. Learn how to create them in the GraphAcademy Constructing Knowledge Graphs with Neo4j GraphRAG for Python course.

At Neo4j GraphAcademy, we offer a wide range of free courses, teaching everything from Neo4j Fundamentals to how to Build a Neo4j-backed Chatbot using Python. Visit GraphAcademy to see available courses.


Constructing Knowledge Graphs With Neo4j GraphRAG for Python was originally published in Neo4j Developer Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.