Weaviate
Here is a list of all available Weaviate procedures, note that the list and the signature procedures are consistent with the others, like the Qdrant ones:
name | description |
---|---|
apoc.vectordb.weaviate.createCollection(hostOrKey, collection, similarity, size, $config) |
Creates a collection, with the name specified in the 2nd parameter, and with the specified |
apoc.vectordb.weaviate.deleteCollection(hostOrKey, collection, $config) |
Deletes a collection with the name specified in the 2nd parameter.
The default endpoint is |
apoc.vectordb.weaviate.upsert(hostOrKey, collection, vectors, $config) |
Upserts, in the collection with the name specified in the 2nd parameter, the vectors [{id: 'id', vector: '<vectorDb>', medatada: '<metadata>'}].
The default endpoint is |
apoc.vectordb.weaviate.delete(hostOrKey, collection, ids, $config) |
Deletes the vectors with the specified |
apoc.vectordb.weaviate.get(hostOrKey, collection, ids, $config) |
Gets the vectors with the specified |
apoc.vectordb.weaviate.query(hostOrKey, collection, vector, filter, limit, $config) |
Retrieve closest vectors from the defined |
apoc.vectordb.weaviate.getAndUpdate(hostOrKey, collection, ids, $config) |
Gets the vectors with the specified |
apoc.vectordb.weaviate.queryAndUpdate(hostOrKey, collection, vector, filter, limit, $config) |
Retrieve closest vectors from the defined |
where the 1st parameter can be a key defined by the apoc config apoc.weaviate.<key>.host=myHost
.
With hostOrKey=null, the default is 'http://localhost:8080/v1'.
Examples
CALL apoc.vectordb.weaviate.createCollection($host, 'test_collection', 'Cosine', 4, {<optional config>})
vectorizer | invertedIndexConfig | vectorIndexConfig | multiTenancyConfig | vectorIndexType | replicationConfig | shardingConfig | class | properties |
---|---|---|---|---|---|---|---|---|
none |
{"bm25": { "b": 0.75, "k1": 1.2 }, "stopwords": { "additions": null, "removals": null, "preset": "en" }, "cleanupIntervalSeconds": 60} |
{ "ef": -1, "dynamicEfMin": 100, "pq": { "centroids": 256, "trainingLimit": 100000, "encoder": { "type": "kmeans", "distribution": "log-normal" }, "enabled": false, "bitCompression": false, "segments": 0 }, "distance": "cosine", "skip": false, "dynamicEfFactor": 8, "bq": { "enabled": false }, "vectorCacheMaxObjects": 1000000000000, "cleanupIntervalSeconds": 300, "dynamicEfMax": 500, "efConstruction": 128, "flatSearchCutoff": 40000, "maxConnections": 64 } |
{ "enabled": false } |
hnsw |
{ "factor": 1 } |
{ "desiredVirtualCount": 128, "desiredCount": 1, "actualCount": 1, "function": "murmur3", "virtualPerPhysical": 128, "strategy": "hash", "actualVirtualCount": 128, "key": "_id" } |
TestCollection |
null |
CALL apoc.vectordb.weaviate.createCollection("https://<weaviateInstanceId>.weaviate.network",
'TestCollection',
'cosine',
4,
{headers: {Authorization: 'Bearer <apiKey>'}})
vectorizer | invertedIndexConfig | vectorIndexConfig | multiTenancyConfig | vectorIndexType | replicationConfig | shardingConfig | class | properties |
---|---|---|---|---|---|---|---|---|
none |
{"bm25": { "b": 0.75, "k1": 1.2 }, "stopwords": { "additions": null, "removals": null, "preset": "en" }, "cleanupIntervalSeconds": 60} |
{ "ef": -1, "dynamicEfMin": 100, "pq": { "centroids": 256, "trainingLimit": 100000, "encoder": { "type": "kmeans", "distribution": "log-normal" }, "enabled": false, "bitCompression": false, "segments": 0 }, "distance": "cosine", "skip": false, "dynamicEfFactor": 8, "bq": { "enabled": false }, "vectorCacheMaxObjects": 1000000000000, "cleanupIntervalSeconds": 300, "dynamicEfMax": 500, "efConstruction": 128, "flatSearchCutoff": 40000, "maxConnections": 64 } |
{ "enabled": false } |
hnsw |
{ "factor": 1 } |
{ "desiredVirtualCount": 128, "desiredCount": 1, "actualCount": 1, "function": "murmur3", "virtualPerPhysical": 128, "strategy": "hash", "actualVirtualCount": 128, "key": "_id" } |
TestCollection |
null |
CALL apoc.vectordb.weaviate.deleteCollection($host, 'test_collection', {<optional config>})
which returns an empty result.
CALL apoc.vectordb.weaviate.upsert($host, 'test_collection',
[
{id: "8ef2b3a7-1e56-4ddd-b8c3-2ca8901ce308", vector: [0.05, 0.61, 0.76, 0.74], metadata: {city: "Berlin", foo: "one"}},
{id: "9ef2b3a7-1e56-4ddd-b8c3-2ca8901ce308", vector: [0.19, 0.81, 0.75, 0.11], metadata: {city: "London", foo: "two"}}
],
{<optional config>})
lastUpdateTimeUnix | vector | id | creationTimeUnix | class | properties |
---|---|---|---|---|---|
1721293838439 |
[0.05, 0.61, 0.76, 0.74] |
8ef2b3a7-1e56-4ddd-b8c3-2ca8901ce308 |
1721293838439 |
TestCollection |
{city: "Berlin", foo: "one"} |
1721293838439 |
[0.19, 0.81, 0.75, 0.11] |
9ef2b3a7-1e56-4ddd-b8c3-2ca8901ce308 |
1721293838439 |
TestCollection |
{city: "London", foo: "two"} |
CALL apoc.vectordb.weaviate.get($host, 'test_collection', [1,2], {<optional config>})
score | metadata | id | vector | text | entity |
---|---|---|---|---|---|
null |
{city: "Berlin", foo: "one"} |
null |
null |
null |
null |
null |
{city: "Berlin", foo: "two"} |
null |
null |
null |
null |
{allResults: true}
CALL apoc.vectordb.weaviate.get($host, 'test_collection', [1,2], {allResults: true, <optional config>})
score | metadata | id | vector | text | entity |
---|---|---|---|---|---|
null |
{city: "Berlin", foo: "one"} |
1 |
[…] |
null |
null |
null |
{city: "Berlin", foo: "two"} |
2 |
[…] |
null |
null |
CALL apoc.vectordb.weaviate.query($host,
'test_collection',
[0.2, 0.1, 0.9, 0.7],
'{operator: Equal, valueString: "London", path: ["city"]}',
5,
{fields: ["city", "foo"], allResults: true, <other optional config>})
score | metadata | id | vector | text |
---|---|---|---|---|
1, |
{city: "Berlin", foo: "one"} |
1 |
[…] |
null |
0.1 |
{city: "Berlin", foo: "two"} |
2 |
[…] |
null |
We can define a mapping, to fetch the associated nodes and relationships and optionally create them, by leveraging the vector metadata.
For example, if we have created 2 vectors with the above upsert procedures,
we can populate some existing nodes (i.e. (:Test {myId: 'one'})
and (:Test {myId: 'two'})
):
CALL apoc.vectordb.weaviate.query($host, 'test_collection',
[0.2, 0.1, 0.9, 0.7],
{},
5,
{ fields: ["city", "foo"],
mapping: {
embeddingKey: "vect",
nodeLabel: "Test",
entityKey: "myId",
metadataKey: "foo"
}
})
which populates the two nodes as: (:Test {myId: 'one', city: 'Berlin', vect: [vector1]})
and (:Test {myId: 'two', city: 'London', vect: [vector2]})
,
which will be returned in the entity
column result.
Or else, we can create a node if not exists, via create: true
:
CALL apoc.vectordb.weaviate.query($host, 'test_collection',
[0.2, 0.1, 0.9, 0.7],
{},
5,
{ fields: ["city", "foo"],
mapping: {
create: true,
embeddingKey: "vect",
nodeLabel: "Test",
entityKey: "myId",
metadataKey: "foo"
}
})
which creates 2 new nodes as above.
Or, we can populate an existing relationship (i.e. (:Start)-[:TEST {myId: 'one'}]→(:End)
and (:Start)-[:TEST {myId: 'two'}]→(:End)
):
CALL apoc.vectordb.weaviate.query($host, 'test_collection',
[0.2, 0.1, 0.9, 0.7],
{},
5,
{ fields: ["city", "foo"],
mapping: {
embeddingKey: "vect",
relType: "TEST",
entityKey: "myId",
metadataKey: "foo"
}
})
which populates the two relationships as: ()-[:TEST {myId: 'one', city: 'Berlin', vect: [vector1]}]-()
and ()-[:TEST {myId: 'two', city: 'London', vect: [vector2]}]-()
,
which will be returned in the entity
column result.
To optimize performances, we can choose what to For example, by executing a |
It is possible to execute vector db procedures together with the apoc.ml.rag as follow:
CALL apoc.vectordb.weaviate.getAndUpdate($host, $collection, [<id1>, <id2>], $conf) YIELD score, node, metadata, id, vector
WITH collect(node) as paths
CALL apoc.ml.rag(paths, $attributes, $question, $confPrompt) YIELD value
RETURN value
which returns a string that answers the $question
by leveraging the embeddings of the db vector.
CALL apoc.vectordb.weaviate.delete($host, 'test_collection', [1,2], {<optional config>})
value |
---|
["1", "2"] |