Neo4j Live: Entity Architecture for Efficient RAG on Graphs

At the intersection of graph technology and AI, there’s a new frontier for Retrieval-Augmented Generation (RAG) systems. In a recent Neo4j Live session, Irina Adamchic from Accenture’s Center of Advanced AI joined us to share her pioneering work on a Fixed Entity Architecture, a new method for building efficient, scalable RAG systems using knowledge graphs.

You can watch the full session above.

In this post, we’ll discuss the key ideas Irina presented: why traditional RAG approaches struggle at scale, how the Fixed Entity Architecture works and why it’s a game-changer for building knowledge-intensive AI applications.

The Problem: RAG Systems Struggle with Scale and Noise

Irina’s journey started in early 2023 while building a system to generate change management plans from unstructured internal documents. Traditional vector-based RAG approaches – indexing document chunks and retrieving the nearest ones – quickly hit limitations:

  • Poor retrieval quality: Embeddings alone weren’t enough to capture nuanced structure.
  • High costs: Relying heavily on LLMs for entity extraction and summarisation (à la Microsoft’s GraphRAG) resulted in ballooning API call costs.
  • Duplication and noise: Extracted knowledge graphs became cluttered with redundant, messy data.

In short, the classic pipeline of “unstructured chunks in, vector search out” wasn’t robust enough for structured, domain-specific outputs. Irina needed a better way.

Fixed Entity Architecture: Rethinking RAG with Knowledge Graphs

Instead of letting an LLM loose on a pile of text, Fixed Entity Architecture flips the script: First, define your domain ontology, and then build up your retrieval graph.

At the heart of Irina’s method are three distinct layers:

1. Fixed Ontology Layer

The foundation is a carefully curated set of domain-specific entities, defined manually and not extracted automatically. Each entity (e.g., “Program,” “Budget,” “Module”) includes a textual description and embedding. This provides structure and eliminates duplication by design.

2. Document Layer

Standard document chunks (e.g., sections from PDFs, web pages) form the second layer. Each chunk is embedded and connected to the ontology layer using cosine similarity without needing a single LLM call.

3. Extracted Entity Layer (Optional)

Using open-source NLP tools like spaCy, Irina optionally extracts named entities from chunks and links them back to the fixed ontology. This third layer enriches the graph even further without incurring additional LLM costs.

Why This Matters

Irina’s architecture solves real-world challenges developers face when building RAG systems:

  • No uncontrolled sprawl: By starting from a fixed ontology, you avoid noisy, redundant graphs.
  • Low compute cost: Minimal LLM involvement keeps the approach lightweight and affordable.
  • Scalability: New documents can be added seamlessly and embeddings are matched against existing ontology nodes via a simple Cypher query.
  • Search flexibility: Hybrid search (vector similarity + full-text + smart traversal) becomes trivial.

In short: better retrieval, faster iteration, lower cost.

Graph-Based Retrieval: Smarter Search with Cypher

One significant advantage of this approach is the ability to craft smart retrieval queries. Instead of naive nearest-neighbour lookups, developers can:

  • Traverse multiple layers (ontology → documents → extracted entities)
  • Use cosine similarity filters to prioritise semantically close matches
  • Combine vector search and keyword search for hybrid retrieval
  • Build dynamic subgraphs on the fly based on the user’s query context

This level of control is difficult, if not impossible, to achieve with pure vector databases. As Irina put it: “Your fantasy is the only limit once you have this graph in place.”

Real-World Applications

In the session, Irina shared examples of how she’s deployed this method:

  • Internal change management assistants: Automatically generating structured plans from company policies.
  • Semantic data layers: Building graph-based semantic layers over relational databases for smarter SQL generation.
  • Hybrid RAG pipelines: Combining traditional RAG retrieval with semantic graph reasoning for better accuracy.

And these use cases are just the beginning. As Irina showed, the architecture is flexible enough to support reinforcement learning-style feedback loops, semantic expansion across domains, and much more.

Takeaways for Developers

If you’re working on GenAI or knowledge graph projects, here’s what you should remember:

  • Start with your domain knowledge. Build a fixed ontology before adding documents.
  • Use cosine similarity smartly. You can connect layers automatically with simple Cypher queries.
  • Think beyond retrieval. Once you have a semantic graph, you can do reasoning, ranking, and feedback-driven optimisation.
  • Save on compute costs. You don’t need to call an LLM 10,000 times to build a useful knowledge graph.

Additional Resources