RAG - Retrieval Augmented Generation Procedures
Query with Retrieval-augmented generation (RAG) technique
This procedure apoc.ml.rag takes a list of paths or a vector index name, relevant attributes and a natural language question
to create a prompt implementing a Retrieval-augmented generation (RAG) technique.
See here for more info about the RAG process.
It uses the chat/completions API which is documented here.
| name | description | mandatory | 
|---|---|---|
paths  | 
the list of paths to retrieve and augment the prompt, it can also be a matching query or a vector index name  | 
yes  | 
attributes  | 
the relevant attributes useful to retrieve and augment the prompt  | 
yes  | 
question  | 
the user question  | 
yes  | 
conf  | 
An optional configuration map, please check the next section  | 
no  | 
| name | description | mandatory | 
|---|---|---|
getLabelTypes  | 
add the label / rel-type names to the info to augment the prompt  | 
no, default   | 
embeddings  | 
to search similar embeddings stored into a node vector index (in case of   | 
no, default   | 
topK  | 
number of neighbors to find for each node (in case of   | 
no, default   | 
apiKey  | 
OpenAI API key  | 
in case   | 
prompt  | 
the base prompt to be augmented with the context  | 
no, default is: "You are a customer service agent that helps a customer with answering questions about a service.
Use the following context to answer the   | 
Using the apoc.ml.rag procedure we can reduce AI hallucinations (i.e. false or misleading responses), providing relevant and up-to-date information to our procedure via the 1st parameter.
For example, by executing the following procedure (with the gpt-3.5-turbo model, last updated in January 2022)
we have a hallucination
CALL apoc.ml.openai.chat([
    {role:"user", content: "Which athletes won the gold medal in mixed doubles's curling  at the 2022 Winter Olympics?"}
], $apiKey)
| value | 
|---|
The gold medal in curling at the 2022 Winter Olympics was won by the Swedish men’s team and the Russian women’s team.  | 
So, we can use the RAG technique to provide real results. For example with the given dataset (with data taken from this wikipedia page):
CREATE (mixed2022:Discipline {title:"Mixed doubles's curling", year: 2022})
WITH mixed2022
CREATE (:Athlete {name: 'Stefania Constantini', country: 'Italy', irrelevant: 'asdasd'})-[:HAS_MEDAL {medal: 'Gold', irrelevant2: 'asdasd'}]->(mixed2022)
CREATE (:Athlete {name: 'Amos Mosaner', country: 'Italy', irrelevant: 'qweqwe'})-[:HAS_MEDAL {medal: 'Gold', irrelevant2: 'rwerew'}]->(mixed2022)
CREATE (:Athlete {name: 'Kristin Skaslien', country: 'Norway', irrelevant: 'dfgdfg'})-[:HAS_MEDAL {medal: 'Silver', irrelevant2: 'gdfg'}]->(mixed2022)
CREATE (:Athlete {name: 'Magnus Nedregotten', country: 'Norway', irrelevant: 'xcvxcv'})-[:HAS_MEDAL {medal: 'Silver', irrelevant2: 'asdasd'}]->(mixed2022)
CREATE (:Athlete {name: 'Almida de Val', country: 'Sweden', irrelevant: 'rtyrty'})-[:HAS_MEDAL {medal: 'Bronze', irrelevant2: 'bfbfb'}]->(mixed2022)
CREATE (:Athlete {name: 'Oskar Eriksson', country: 'Sweden', irrelevant: 'qwresdc'})-[:HAS_MEDAL {medal: 'Bronze', irrelevant2: 'juju'}]->(mixed2022)
we can execute:
MATCH path=(:Athlete)-[:HAS_MEDAL]->(Discipline)
WITH collect(path) AS paths
CALL apoc.ml.rag(paths,
  ["name", "country", "medal", "title", "year"],
  "Which athletes won the gold medal in mixed doubles's curling  at the 2022 Winter Olympics?",
  {apiKey: $apiKey}
) YIELD value
RETURN value
| value | 
|---|
The gold medal in curling at the 2022 Winter Olympics was won by Stefania Constantini and Amos Mosaner from Italy.  | 
or:
MATCH path=(:Athlete)-[:HAS_MEDAL]->(Discipline)
WITH collect(path) AS paths
CALL apoc.ml.rag(paths,
  ["name", "country", "medal", "title", "year"],
  "Which athletes won the silver medal in mixed doubles's curling  at the 2022 Winter Olympics?",
  {apiKey: $apiKey}
) YIELD value
RETURN value
| value | 
|---|
The gold medal in curling at the 2022 Winter Olympics was won by Kristin Skaslien and Magnus Nedregotten from Norway.  | 
or:
MATCH path=(:Athlete)-[:HAS_MEDAL]->(Discipline)
WITH collect(path) AS paths
CALL apoc.ml.rag(paths,
  ["name", "country", "medal", "title", "year"],
  "Which athletes won the gold medal in mixed doubles's curling at the 2022 Winter Olympics?",
  {apiKey: $apiKey, model: "gpt-3.5-turbo"}
) YIELD value
RETURN value
| value | 
|---|
The athletes who won the gold medal in mixed doubles curling at the 2022 Winter Olympics were Stefania Constantini and Amos Mosaner from Italy.  | 
We can also pass a string query returning paths/relationships/nodes, for example:
CALL apoc.ml.rag("MATCH path=(:Athlete)-[:HAS_MEDAL]->(Discipline) WITH collect(path) AS paths",
  ["name", "country", "medal", "title", "year"],
  "Which athletes won the gold medal in mixed doubles's curling  at the 2022 Winter Olympics?",
  {apiKey: $apiKey}
) YIELD value
RETURN value
| value | 
|---|
The gold medal in curling at the 2022 Winter Olympics was won by Stefania Constantini and Amos Mosaner from Italy.  | 
or we can pass a vector index name as the 1st parameter, in case we stored useful info into embedding nodes. For example, given this node vector index:
CREATE VECTOR INDEX `rag-embeddings`
FOR (n:RagEmbedding) ON (n.embedding)
OPTIONS {indexConfig: {
 `vector.dimensions`: 1536,
 `vector.similarity_function`: 'cosine'
}}
and some (:RagEmbedding) nodes with the text properties, we can execute:
CALL apoc.ml.rag("rag-embeddings",
  ["text"],
  "Which athletes won the gold medal in mixed doubles's curling  at the 2022 Winter Olympics?",
  {apiKey: $apiKey, embeddings: "NODE", topK: 20}
) YIELD value
RETURN value
or, with a relationship vector index:
CREATE VECTOR INDEX `rag-rel-embeddings`
FOR ()-[r:RAG_EMBEDDING]-() ON (r.embedding)
OPTIONS {indexConfig: {
 `vector.dimensions`: 1536,
 `vector.similarity_function`: 'cosine'
}}
and some [:RagEmbedding] relationships with the text properties, we can execute:
CALL apoc.ml.rag("rag-rel-embeddings",
  ["text"],
  "Which athletes won the gold medal in mixed doubles's curling  at the 2022 Winter Olympics?",
  {apiKey: $apiKey, embeddings: "REL", topK: 20}
) YIELD value
RETURN value