Knowledge Graph vs. Vector Database for Grounding Your LLM

Director of Product Marketing, Neo4j

July 13, 2023

3 min read

Enterprises want to infuse Large Language Models (LLMs) into their mission-critical applications. However, the unpredictable nature of LLMs can lead to hallucinations – inaccurate inferences or outright errors – posing serious challenges for enterprises looking for accuracy, explainability, and reliability.

Retrieval augmented generation is the leading consideration for overcoming these challenges, by grounding your LLM in facts. Knowledge graphs and vector databases are the two primary contenders as potential solutions for implementing retrieval augmented generation. But which one of them offers a more accurate, reliable, and explainable foundation for your LLM?

Let’s take a look at some of the key factors to consider when choosing between knowledge graphs and vector databases to ground your LLM.

Answering Complex Questions

The higher the complexity of the question, the harder it is for a vector database to quickly and efficiently return results. Adding more subjects to a query makes it harder for the database to find the information you want.

For example: Both a knowledge graph and a vector database can easily return an answer to “Who is the CEO of my company?” but a knowledge graph will outpace a vector database on a question like “Which board meetings in the last twelve months had at least two members abstain from a vote?”

A vector database is likely to find an answer in the middle of the subjects within the vector space, and not the specific answer. A knowledge graph looks for and returns precise information based on traversing a graph that is connected by relationships.

Getting Complete Responses

Vector databases are more likely to provide incomplete or irrelevant results when returning an answer because they rely on similarity scoring and a predefined result limit.

For example: If you ask: “List all the books written by John Smith,” a vector database will return:

An incomplete list of titles (predefined limit too low), or
All titles by John Smith and some by other authors (predefined limit too high), or
The exact answer (predefined limit just right).

Because developers can’t know the predefined limit for all possible queries, it is nearly impossible to get an exact answer from a vector database.

However, because knowledge graph entities are directly connected by relationships, the number of relationships is different for every entity. Knowledge graphs retrieve and return the exact answer, and nothing more. In this case, a knowledge graph query will return all books written by John Smith and nothing else.

Getting Credible Responses

Vector databases can connect two factual pieces of information together and infer something inaccurate.

For example: If you asked: “Who is on the product management team?”, a vector database might incorrectly infer that someone was on the product team because they have frequent commenting access to documents (fact) produced by the product team (fact) and return their name in the results. Because a knowledge graph uses nodes and relationships to identify how people in an organization are related, it would return only those on the product team.

Knowledge graph queries follow a flow of connected information, making responses consistently accurate and explainable.

Correcting LLM Hallucinations

Knowledge graphs have a human-readable representation of data, whereas vector databases offer only a black box.

For example: When a member of the product team is misidentified, a vector database will not be able to identify the facts it used to infer the misinformation. This means it isn’t possible to undo it or even understand the source of the error. On the other hand, it’s easy for knowledge graph users to find and correct the misinformation, should the LLM infer something incorrectly.

That’s because knowledge graphs have full transparency. They help you identify misinformation in data, trace back the pathway of the query, and make corrections to it, which can help improve LLM accuracy. Vector databases, on the other hand, provide little to no transparency and no ability to make specific corrections.

Knowledge Graphs for Your LLM

Knowledge graphs are the best choice to back your LLM to help ensure accuracy, explainability, and context. Neo4j’s reliable and verifiable knowledge graph boosts LLM accuracy and explainability, offering robust enterprise capabilities like data protection, governance, high availability, scalability, and flexible deployment, which make it a reliable and scalable choice to pair with LLMs that support mission-critical applications.

Learn more about Neo4j’s knowledge graphs for LLM-powered applications or read up on how to build a knowledge graph in this developer’s guide.

Download the Developer’s Guide