Vector Databases
APOC provides these set of procedures, which leverages the Rest APIs, to interact with Vector Databases:
-
apoc.vectordb.qdrant.*
(to interact with Qdrant) -
apoc.vectordb.chroma.*
(to interact with Chroma) -
apoc.vectordb.weaviate.*
(to interact with Weaviate) -
apoc.vectordb.custom.*
(to interact with other vector databases). -
apoc.vectordb.configure
(to store host, credentials and mapping into the system database)
All the procedures, except the apoc.vectordb.configure
one, can have, as a final parameter,
a configuration map with these optional parameters:
key |
description |
headers |
additional HTTP headers |
method |
HTTP method |
endpoint |
endpoint key, can be used to override the default endpoint created via the 1st parameter of the procedures, to handle potential endpoint changes. |
body |
body HTTP request |
jsonPath |
To customize JSONPath parsing of the response. The default is |
Besides the above config, the apoc.vectordb.<type>.get
and the apoc.vectordb.<type>.query
procedures can have these additional parameters:
key |
description |
mapping |
to fetch the associated entities and optionally create them. See examples below. |
allResults |
if true, returns the vector, metadata and text (if present), otherwise returns null values for those columns. |
vectorKey, metadataKey, scoreKey, textKey |
used with the |
Ad-hoc procedures
See the following pages for more details on specific vector db procedures
Store Vector db info (i.e. apoc.vectordb.configure
)
We can save some info in the System Database to be reused later, that is the host, login credentials, and mapping,
to be used in *.get
and .*query
procedures, except for the apoc.vectordb.custom.get
one.
Therefore, to store the vector info, we can execute the CALL apoc.vectordb.configure(vectorName, keyConfig, databaseName, $configMap)
,
where vectorName
can be "QDRANT", "CHROMA", "PINECONE", "MILVUS" or "WEAVIATE",
that indicates info to be reused respectively by apoc.vectordb.qdrant.
, apoc.vectordb.chroma.
and apoc.vectordb.weaviate.*
.
Then keyConfig
is the configuration name, databaseName
is the database where the config will be set,
and finally the configMap
, that can have:
-
host
is the host base name -
credentialsValue
is the API key -
mapping
is a map that can be used by theapoc.vectordb.*.getAndUpdate
andapoc.vectordb.*.queryAndUpdate
procedures- NOTE
-
this procedure is only executable by a user with admin permissions and against the system database
For example:
// -- within the system database or using the Cypher clause `USE SYSTEM ..` as a prefix
CALL apoc.vectordb.configure('QDRANT', 'qdrant-config-test', 'neo4j',
{
mapping: { embeddingKey: "vect", nodeLabel: "Test", entityKey: "myId", metadataKey: "foo" },
host: 'custom-host-name',
credentials: '<apiKey>'
}
)
and then we can execute e.g. the following procedure (within the neo4j
database):
CALL apoc.vectordb.qdrant.query('qdrant-config-test', 'test_collection', [0.2, 0.1, 0.9, 0.7], {}, 5)
instead of:
CALL apoc.vectordb.qdrant.query($host, 'test_collection', [0.2, 0.1, 0.9, 0.7], {}, 5,
{ mapping: {
embeddingKey: "vect",
nodeLabel: "Test",
entityKey: "myId",
metadataKey: "foo"
},
headers: {Authorization: 'Bearer <apiKey>'},
endpoint: 'custom-host-name'
})