GenAI Graph Gathering 2.0: The Evolution of GraphRAG


GenAI Graph Gathering 2.0: The Evolution of GraphRAG

Scaling laws may plateau, yet the promise of GenAI is undiminished. Engineers everywhere have been busy figuring out what works and how to make it useful. GraphRAG, a combination of knowledge graphs and retrieval-augmented generation (RAG), has evolved into a range of techniques, with a growing body of research papers and software integrations.

With this in mind, we organized a second GenAI Graph Gathering and invited a group of the brightest people we know working at the intersection of graphs and LLMs. The goal was to catch up and compare notes about the many things happening around GraphRAG.

Retrospective: From May to Today

An incomplete timeline of GraphRAG since the first GenAI Graph Gathering in May

An incomplete timeline of GraphRAG since the first GenAI Graph Gathering in May.

At the first GenAI Graph Gathering, we explored how to use knowledge graphs for retrieval, observing that GenAI applications used source data from three main buckets:

  1. Unstructured data from text files or PDFs
  2. Structured data from existing databases
  3. Mixed data that has a combination of both

Over the summer, we saw a massive increase in interest from developers and businesses. Projects tended to start with either unstructured or structured data — either “chat with a PDF” or “chat with a CSV.” Many stalled in the pilot phase, though, even though advanced techniques suggested ways forward.

For those who were able to do more than a proof of concept, what did they have in common? In our experience, there seemed to be stronger connections between unstructured and structured data, landing in that sweet spot of mixed data.

Among GenAI proof-of-concept efforts, 71% stall when only using unstructured data. The third that gets past PoC either started with or incorporated more structured business data.

Among GenAI proof-of-concept efforts, 71% stall when only using unstructured data. The third that gets past PoC either started with or incorporated more structured business data.

It’s hard to turn academic theory into real-world practice. So we decided to do a few things:

  • Curate a pattern catalog that distills the information from research papers
  • Implement proven approaches in tools and libraries
  • Meet with our peers to figure out what else can help

At the Gathering, we broke out into group discussions, focusing on knowledge graph construction, GraphRAG techniques, and real-world experience.

Getting Started With GraphRAG

Assuming an audience of AI engineers who already know what vector embeddings are and can explain the RAG acronym, what’s the best way to get started with GraphRAG?

From our shared notes:

  • Side-by-side comparisons across techniques
  • Reveal complete context used to generate answers, including connected data and the resulting prompt
  • Graph visualization is important for that aha moment, for data and schema
  • Help with knowledge graph construction, balancing auto-magical and hand-curated
  • Three directions: 1) generic tool-based workflow with any source; 2) specific “northwind” knowledge graph as a well-known, in-depth example; 3) self-contained notebooks for a wide variety of examples

Developer Experience

Extending the conversation about getting started, another session considered the broader topic of developer experience.

From our shared notes:

  • Most people start with unstructured data, while GraphRAG shines with structured or mixed data
  • Plain RAG is domain-agnostic, while GraphRAG becomes domain-specific
  • “Advanced RAG” could be understood as the first steps into GraphRAG, remaining relatively agnostic to the domain
  • There is a “cold start” or “blank canvas” problem that could be addressed with templates
  • The right GraphRAG approach is domain-specific; guidance and examples are needed
  • “Seven graphs” to cover a broad range of general business concerns (more to come on this)
  • GraphRAG pattern catalog as a common reference, like the classic OO design patterns

Knowledge Graph Engineering

Knowledge graphs are an entire information architecture that can be as simple as chunked text with summarization or as comprehensive as a federated view of a whole enterprise. Data preparation, transformation, modeling, and evaluation are all needed for knowledge graphs, just as they are for any data engineering.

The group considered a mix of knowledge graphs:

  • Domain graph, mapped from structured data like CSVs or JSON
  • Domain graph with long-form text, mapped from structured data
  • Lexical graph with a known structure, derived from well-known document collections like a product catalog or manuals
  • Lexical graph with discovered structure, using named entity recognition (NER) guided by a known terminology
  • Lexical graph with both known and discovered structures, combining NER with structured data
  • Lexical graph with an entirely discovered structure, using open-ended NER

Ontologies: What’s the Plan?

An ontology is simply a set of concepts and categories in a subject area that shows their properties and the relations between them. Call it a graph schema, if you’d like.

From our shared notes:

  • Schemas help with interoperability, explainability, grounding
  • Useful for entity extraction, and alignment between unstructured and structured data
  • A full, formal ontology can be intimidating. Is there a simpler format for GraphRAG?
  • Existing schema from a catalog could be auto-selected to match unstructured data
  • Balance easy-to-use with rigorously correct
  • Schemas help with interoperability, explainability, grounding

Advanced Graph Retrieval

An incredible amount of research continues to explore the many techniques of using a graph for information retrieval.

From our shared notes:

  • Research includes: contextual retrieval, query-focused summarization, text-to-cypher, layered memory, graph-based re-ranking, hybrid indexes, GNNs
  • GNNs and graph data science can enrich, refine, and improve precision
  • Graphs can represent the source information, memory, security constraints, and guided paths for information retrieval
  • The main challenge is that the “right” thing to do is use-case-dependent

Final Thoughts

The GenAI Graph Gathering was a unique opportunity for cross-org collaboration. While the immediate goal was peer-to-peer connections, the long view is for each guest to be successful on their path, and ultimately for everyone to benefit from GraphRAG.

GraphRAG — using a graph for the ‘R’ in RAG — continues to evolve as a broad spectrum of approaches and technologies. The good news is that you don’t need to know everything or do everything at once. Graphs are pleasantly composable. The mental model, the way of thinking in graphs, has a few concepts that extend as far as you’re ready to go. It could be graphs all the way down from the ML model, to the application workflow, to the data storage on disk. Or, iIt can also be as simple as connecting some text chunks to each other and the containing document.

Start with a minimum viable graph. Add more data. Enrich, connect, repeat.