GraphRAG Agent for Customer & Retail Analytics

This is an end-to-end worked example for building a GraphRAG agent to accelerate customer and retail analytics. It covers the entire process from:

Quickly constructing a graph from mixed unstructured and structured data sources
Resolving and linking entities in the graph along the way
Creating diverse graph retrieval tools, including query templates, vector search, dynamic text2Cypher, and graph community detection to answer a boarder range of questions
Building an agent with Semantic Kernel for conducting analytics and responding to complex user questions

All of this using a central source-of-truth graph schema to govern the process and AI-interactions, ensuring higher data & retrieval quality.

This workflow can be adapted for analytics, reporting, and Q&A across various other business domains—especially when:

Data sources include a mix of structured and unstructured data.
AI needs to navigate a non-trivial business domain model for accurate responses.
The use case requires flexibility and the ability to scale to complex, evolving AI-driven tasks.

Follow the instructions below to try it yourself! 🚀

Prerequisites

Python >= 3.8
OpenAI API Key

Setup

Clone the GitHub repository.

git clone https://github.com/neo4j-product-examples/graphrag-examples.git

Create and activate a new python virtual environment.

python -m venv graphrag_venv
source graphrag_venv/bin/activate

Switch to project directory and install requirements.

cd graphrag-examples
pip install -r requirements.txt

Create a Neo4j DB instance per the direction here.

Switch to the customer-graph directory and create .env file by copying .env.template:

cd customer-graph
cp .env.template .env

Replace the Neo4j credentials and OpenAI key with your own.

Create the Graph from Source Data

Creating the graph requires ingesting unstructured and structured data. You will use schemas in the ontos folder to power them. For more information on how these schemas were generated from a central source, see the Schema Generation section.

The source data is a sample of the H&M Personalized Fashion Recommendations Dataset, real customer purchase data that includes rich information around products including names, types, descriptions. We used ChatGPT to further augment this data - simulating suppliers for the different articles and CreditNotes for returns/refunds. The data folder contains the resulting structured data (in csvs) & unstructured data in the form of credit-notes.pdf containing the return/refund data.

NOTE: Please follow the steps in order below, going out of order may result in some conflicting deduplication and indexing issues.

1) Load Unstructured Data

Run the unstructured ingest. This will take a few minutes.

python unstuctured_ingest.py

This script perform entity extraction on the credit-notes.pdf file and write entities and relationships to the graph according to the customer schema.

Once complete, you can check the database to see the generated graph. Go to the Aura Console and navigate to the Query tab.

ai customer graph unstruct ingest 1 goto query

Select the "Connect instance" button

ai customer graph unstruct ingest 2 query

You will be prompted to select your Aura instance. Select the one you made for this project and enter your credentials:

ai customer graph unstruct ingest 3 connect

Once in the query tab you should see nodes and relationships in the sidebar that include CreditNode, Order, Article, Document & Chunk.

ai customer graph unstruct ingest 4 graph stats

Run the following simple Cypher Query to see a sample of the data and explore.

MATCH p=()--() RETURN p LIMIT 1000

You should see clear relationships between Orders, CreditNotes, and Articles as well as their connection back to source Chunks (a.k.a. document text chunks) and the single Document node with metadata about where they came from.

ai customer graph unstruct ingest 5 simple query

2) Merge Structured Data

We will use Aura Importer for this which allows you to map structured data from csvs or other relational databases to graph.

Go to the Aura Console and navigate to the Import tab

ai customer graph struct ingest 0 goto import

Select the ellipsis in the top left corner and then select "Open Model" in the dropdown

ai customer graph struct ingest 1 open model

Choose customer-struct-import.json in the ontos folder. The resulting data model should look like the below:

ai customer graph struct ingest 2 see model

Now you need to select data sources. Aura Import allows you to import from several types of databases, but for today we will use local csvs. Select browse at the top of the Data source panel.

ai customer graph struct ingest 3 get sources

Select all the csv files in the data directory. Once complete you should see green check marks on each node and relationship. When selecting a node you should also see the mapping between node properties and columns in the csvs.

ai customer graph struct ingest 5 see mapping

We are now ready to run the import. Select the blue "Run import" button on the top left of the screen. You will be prompted to select your Aura instance - select the one you made for this project and enter your credentials:

ai customer graph struct ingest 6 connection credentials

The import should only take a few seconds. Once complete, you should get an Import results pop-up with a "completed successfully" message and some statistics.

ai customer graph struct ingest 7 import results

3) Post-Processing Script

The post-processing script is responsible for creating text properties, embeddings and a vector index to power search on Product nodes. It takes a few minutes run as we need to call OpenAI embedding endpoint in batches to retrieve text embeddings.

python ingest_post_processing.py

Once complete go back to query in the Aura console. and run a simple query to sample the graph like the below:

MATCH p=()--() RETURN p LIMIT 1000

you should now see the unstructured data, the structured data, and product text/vector properties merged together on one graph!.

Running the Agent

Currently, the best way to run the agent is through the command line tool cli_agent.py. The streamlit app app.py is a WIP and still has some issues with hanging for multi Q&A conversations.

To run, navigate to the graphrag folder and run the file:

cd graphrag
python cli_agent.py

Some sample questions to try:

What are some good sweaters for spring? Nothing too warm please!
Which suppliers have the highest number of returns (i.,e, credit notes)?
What are the top 3 most returned products for supplier 1616? Get those product codes and find other suppliers who have less returns for each product I can use instead.
Can you run a customer segmentation analysis?
What are the most common product types purchased for each segment?
Can you run a customer segmentation analysis? For the largest group make a creative spring promotional campaign for them highlighting recommended products. Draft it as an email.

⚠️ Note: Agentic AI is still an evolving technology and may not always behave as expected out-of-the-box. For example, agents might choose different tools than intended, resulting in errors or bad responses. This project provides a minimal agentic example, focusing on GraphRAG enhancement and integration, not on building a fully robust agentic system. To add more stability and formalization to agent behavior using Semantic Kernel, see their docs. For a GraphRAG example with deterministic tools & retrieval queries (instead of agents) on a similar dataset, see GraphRAG for Customer Experience^.

Schema Generation

The single-source graph schema is customer.ttl in the ontos directory. It was built in webprotege and exported in turtle (ttl) format. The other schemas in the ontos directory are just derivatives of this one and described in more detail below.

Per the process described in the GoingMeta series S2 episode 5, this schema was transformed into a json format to be uploaded into Aura Import. The source code for that is here. The schema was adjusted in Aura Import to produce customer-struct-import.json for the structured ingest. The adjustments include adding the csv property mapping and excluding some nodes that aren’t needed in the structured ingest.

The unstructured ingest (unstructured_ingest.py) uses the source ttl schema directly to inform the entity extraction and graph writing process. If you look in the customer.ttl file you will see "comment" annotations for some classes and properties. These are passed to the LLM to better describe the data schema and improve the entity extraction data quality.

The final schema in the ontos directory is text-to-cypher.json and it is used by the graphrag application for text2Cypher query generation - specifically in graphrag/retail_service.py. It was generated by running the following query against the database:

CALL apoc.meta.stats() YIELD relTypes
WITH  KEYS(relTypes) as relTypes
UNWIND relTypes as rel
WITH rel, split(split(rel, '[:')[1],']') as relationship
WITH rel, relationship[0] as relationship
CALL apoc.cypher.run("MATCH p = " + rel + " RETURN p LIMIT 10", {})
YIELD value
WITH relationship as relationshipType, nodes(value.p) as nodes, apoc.any.properties(nodes(value.p)[0]) as startNodeProps, apoc.any.properties(nodes(value.p)[-1]) as endNodeProps
WITH DISTINCT labels(nodes[0]) as startNodeLabels, relationshipType, labels(nodes[-1]) as endNodeLabels,  KEYS(startNodeProps) as startNodeProps,  KEYS(endNodeProps) as endNodeProps
WITH startNodeLabels, relationshipType, endNodeLabels, COLLECT(endNodeProps) as endNodeProps, COLLECT(startNodeProps) as startNodeProps
WITH startNodeLabels, relationshipType, endNodeLabels, apoc.coll.toSet(apoc.coll.flatten(endNodeProps)) as endNodeProps, apoc.coll.toSet(apoc.coll.flatten(startNodeProps)) as startNodeProps
RETURN COLLECT({source:{label:startNodeLabels , properties:startNodeProps}, relationship:relationshipType, target:{label:endNodeLabels , properties:endNodeProps}}) as schema