RAG - Retrieval Augmented Generation Procedures
Query with Retrieval-augmented generation (RAG) technique
This procedure apoc.ml.rag
takes a list of paths or a vector index name, relevant attributes and a natural language question
to create a prompt implementing a Retrieval-augmented generation (RAG) technique.
See here for more info about the RAG process.
It uses the chat/completions
API which is documented here.
name | description | mandatory |
---|---|---|
paths |
the list of paths to retrieve and augment the prompt, it can also be a matching query or a vector index name |
yes |
attributes |
the relevant attributes useful to retrieve and augment the prompt |
yes |
question |
the user question |
yes |
conf |
An optional configuration map, please check the next section |
no |
name | description | mandatory |
---|---|---|
getLabelTypes |
add the label / rel-type names to the info to augment the prompt |
no, default |
embeddings |
to search similar embeddings stored into a node vector index (in case of |
no, default |
topK |
number of neighbors to find for each node (in case of |
no, default |
apiKey |
OpenAI API key |
in case |
prompt |
the base prompt to be augmented with the context |
no, default is: "You are a customer service agent that helps a customer with answering questions about a service.
Use the following context to answer the |
Using the apoc.ml.rag procedure we can reduce AI hallucinations (i.e. false or misleading responses), providing relevant and up-to-date information to our procedure via the 1st parameter.
For example, by executing the following procedure (with the gpt-3.5-turbo
model, last updated in January 2022)
we have a hallucination
CALL apoc.ml.openai.chat([
{role:"user", content: "Which athletes won the gold medal in mixed doubles's curling at the 2022 Winter Olympics?"}
], $apiKey)
value |
---|
The gold medal in curling at the 2022 Winter Olympics was won by the Swedish men’s team and the Russian women’s team. |
So, we can use the RAG technique to provide real results. For example with the given dataset (with data taken from this wikipedia page):
CREATE (mixed2022:Discipline {title:"Mixed doubles's curling", year: 2022})
WITH mixed2022
CREATE (:Athlete {name: 'Stefania Constantini', country: 'Italy', irrelevant: 'asdasd'})-[:HAS_MEDAL {medal: 'Gold', irrelevant2: 'asdasd'}]->(mixed2022)
CREATE (:Athlete {name: 'Amos Mosaner', country: 'Italy', irrelevant: 'qweqwe'})-[:HAS_MEDAL {medal: 'Gold', irrelevant2: 'rwerew'}]->(mixed2022)
CREATE (:Athlete {name: 'Kristin Skaslien', country: 'Norway', irrelevant: 'dfgdfg'})-[:HAS_MEDAL {medal: 'Silver', irrelevant2: 'gdfg'}]->(mixed2022)
CREATE (:Athlete {name: 'Magnus Nedregotten', country: 'Norway', irrelevant: 'xcvxcv'})-[:HAS_MEDAL {medal: 'Silver', irrelevant2: 'asdasd'}]->(mixed2022)
CREATE (:Athlete {name: 'Almida de Val', country: 'Sweden', irrelevant: 'rtyrty'})-[:HAS_MEDAL {medal: 'Bronze', irrelevant2: 'bfbfb'}]->(mixed2022)
CREATE (:Athlete {name: 'Oskar Eriksson', country: 'Sweden', irrelevant: 'qwresdc'})-[:HAS_MEDAL {medal: 'Bronze', irrelevant2: 'juju'}]->(mixed2022)
we can execute:
MATCH path=(:Athlete)-[:HAS_MEDAL]->(Discipline)
WITH collect(path) AS paths
CALL apoc.ml.rag(paths,
["name", "country", "medal", "title", "year"],
"Which athletes won the gold medal in mixed doubles's curling at the 2022 Winter Olympics?",
{apiKey: $apiKey}
) YIELD value
RETURN value
value |
---|
The gold medal in curling at the 2022 Winter Olympics was won by Stefania Constantini and Amos Mosaner from Italy. |
or:
MATCH path=(:Athlete)-[:HAS_MEDAL]->(Discipline)
WITH collect(path) AS paths
CALL apoc.ml.rag(paths,
["name", "country", "medal", "title", "year"],
"Which athletes won the silver medal in mixed doubles's curling at the 2022 Winter Olympics?",
{apiKey: $apiKey}
) YIELD value
RETURN value
value |
---|
The gold medal in curling at the 2022 Winter Olympics was won by Kristin Skaslien and Magnus Nedregotten from Norway. |
We can also pass a string query returning paths/relationships/nodes, for example:
CALL apoc.ml.rag("MATCH path=(:Athlete)-[:HAS_MEDAL]->(Discipline) WITH collect(path) AS paths",
["name", "country", "medal", "title", "year"],
"Which athletes won the gold medal in mixed doubles's curling at the 2022 Winter Olympics?",
{apiKey: $apiKey}
) YIELD value
RETURN value
value |
---|
The gold medal in curling at the 2022 Winter Olympics was won by Stefania Constantini and Amos Mosaner from Italy. |
or we can pass a vector index name as the 1st parameter, in case we stored useful info into embedding nodes. For example, given this node vector index:
CREATE VECTOR INDEX `rag-embeddings`
FOR (n:RagEmbedding) ON (n.embedding)
OPTIONS {indexConfig: {
`vector.dimensions`: 1536,
`vector.similarity_function`: 'cosine'
}}
and some (:RagEmbedding) nodes with the text
properties, we can execute:
CALL apoc.ml.rag("rag-embeddings",
["text"],
"Which athletes won the gold medal in mixed doubles's curling at the 2022 Winter Olympics?",
{apiKey: $apiKey, embeddings: "NODE", topK: 20}
) YIELD value
RETURN value
or, with a relationship vector index:
CREATE VECTOR INDEX `rag-rel-embeddings`
FOR ()-[r:RAG_EMBEDDING]-() ON (r.embedding)
OPTIONS {indexConfig: {
`vector.dimensions`: 1536,
`vector.similarity_function`: 'cosine'
}}
and some [:RagEmbedding] relationships with the text
properties, we can execute:
CALL apoc.ml.rag("rag-rel-embeddings",
["text"],
"Which athletes won the gold medal in mixed doubles's curling at the 2022 Winter Olympics?",
{apiKey: $apiKey, embeddings: "REL", topK: 20}
) YIELD value
RETURN value