Weaviate

Here is a list of all available Weaviate procedures, note that the list and the signature procedures are consistent with the others, like the Qdrant ones:

name description

apoc.vectordb.weaviate.createCollection(hostOrKey, collection, similarity, size, $config)

Creates a collection, with the name specified in the 2nd parameter, and with the specified similarity and size. The default endpoint is <hostOrKey param>/schema.

apoc.vectordb.weaviate.deleteCollection(hostOrKey, collection, $config)

Deletes a collection with the name specified in the 2nd parameter. The default endpoint is <hostOrKey param>/schema/<collection param>.

apoc.vectordb.weaviate.upsert(hostOrKey, collection, vectors, $config)

Upserts, in the collection with the name specified in the 2nd parameter, the vectors [{id: 'id', vector: '<vectorDb>', medatada: '<metadata>'}]. The default endpoint is <hostOrKey param>/objects.

apoc.vectordb.weaviate.delete(hostOrKey, collection, ids, $config)

Deletes the vectors with the specified ids. The default endpoint is <hostOrKey param>/schema.

apoc.vectordb.weaviate.get(hostOrKey, collection, ids, $config)

Gets the vectors with the specified ids. The default endpoint is <hostOrKey param>/schema.

apoc.vectordb.weaviate.query(hostOrKey, collection, vector, filter, limit, $config)

Retrieve closest vectors from the defined vector, limit of results, in the collection with the name specified in the 2nd parameter. Note that, besides the common config parameters, this procedure requires a field: [listOfProperty] config, to define which properties are to be retrieved from GraphQL running under-the-hood. The default endpoint is <hostOrKey param>/graphql.

apoc.vectordb.weaviate.getAndUpdate(hostOrKey, collection, ids, $config)

Gets the vectors with the specified ids, and optionally creates/updates neo4j entities. The default endpoint is <hostOrKey param>/schema.

apoc.vectordb.weaviate.queryAndUpdate(hostOrKey, collection, vector, filter, limit, $config)

Retrieve closest vectors from the defined vector, limit of results, in the collection with the name specified in the 2nd parameter, and optionally creates/updates neo4j entities. Note that, besides the common config parameters, this procedure requires a field: [listOfProperty] config, to define which properties are to be retrieved from GraphQL running under-the-hood. The default endpoint is <hostOrKey param>/graphql.

where the 1st parameter can be a key defined by the apoc config apoc.weaviate.<key>.host=myHost. With hostOrKey=null, the default is 'http://localhost:8080/v1'.

Examples

Create a collection (it leverages this API)
CALL apoc.vectordb.weaviate.createCollection($host, 'test_collection', 'Cosine', 4, {<optional config>})
Table 1. Example results
vectorizer invertedIndexConfig vectorIndexConfig multiTenancyConfig vectorIndexType replicationConfig shardingConfig class properties

none

{"bm25": { "b": 0.75, "k1": 1.2 }, "stopwords": { "additions": null, "removals": null, "preset": "en" }, "cleanupIntervalSeconds": 60}

{ "ef": -1, "dynamicEfMin": 100, "pq": { "centroids": 256, "trainingLimit": 100000, "encoder": { "type": "kmeans", "distribution": "log-normal" }, "enabled": false, "bitCompression": false, "segments": 0 }, "distance": "cosine", "skip": false, "dynamicEfFactor": 8, "bq": { "enabled": false }, "vectorCacheMaxObjects": 1000000000000, "cleanupIntervalSeconds": 300, "dynamicEfMax": 500, "efConstruction": 128, "flatSearchCutoff": 40000, "maxConnections": 64 }

{ "enabled": false }

hnsw

{ "factor": 1 }

{ "desiredVirtualCount": 128, "desiredCount": 1, "actualCount": 1, "function": "murmur3", "virtualPerPhysical": 128, "strategy": "hash", "actualVirtualCount": 128, "key": "_id" }

TestCollection

null

Create a collection against a remote connection using an API key (see here)
CALL apoc.vectordb.weaviate.createCollection("https://<weaviateInstanceId>.weaviate.network",
    'TestCollection',
    'cosine',
    4,
    {headers: {Authorization: 'Bearer <apiKey>'}})
Table 2. Example results
vectorizer invertedIndexConfig vectorIndexConfig multiTenancyConfig vectorIndexType replicationConfig shardingConfig class properties

none

{"bm25": { "b": 0.75, "k1": 1.2 }, "stopwords": { "additions": null, "removals": null, "preset": "en" }, "cleanupIntervalSeconds": 60}

{ "ef": -1, "dynamicEfMin": 100, "pq": { "centroids": 256, "trainingLimit": 100000, "encoder": { "type": "kmeans", "distribution": "log-normal" }, "enabled": false, "bitCompression": false, "segments": 0 }, "distance": "cosine", "skip": false, "dynamicEfFactor": 8, "bq": { "enabled": false }, "vectorCacheMaxObjects": 1000000000000, "cleanupIntervalSeconds": 300, "dynamicEfMax": 500, "efConstruction": 128, "flatSearchCutoff": 40000, "maxConnections": 64 }

{ "enabled": false }

hnsw

{ "factor": 1 }

{ "desiredVirtualCount": 128, "desiredCount": 1, "actualCount": 1, "function": "murmur3", "virtualPerPhysical": 128, "strategy": "hash", "actualVirtualCount": 128, "key": "_id" }

TestCollection

null

Delete a collection (it leverages this API)
CALL apoc.vectordb.weaviate.deleteCollection($host, 'test_collection', {<optional config>})

which returns an empty result.

Upsert vectors (it leverages this API)
CALL apoc.vectordb.weaviate.upsert($host, 'test_collection',
    [
        {id: "8ef2b3a7-1e56-4ddd-b8c3-2ca8901ce308", vector: [0.05, 0.61, 0.76, 0.74], metadata: {city: "Berlin", foo: "one"}},
        {id: "9ef2b3a7-1e56-4ddd-b8c3-2ca8901ce308", vector: [0.19, 0.81, 0.75, 0.11], metadata: {city: "London", foo: "two"}}
    ],
    {<optional config>})
Table 3. Example results
lastUpdateTimeUnix vector id creationTimeUnix class properties

1721293838439

[0.05, 0.61, 0.76, 0.74]

8ef2b3a7-1e56-4ddd-b8c3-2ca8901ce308

1721293838439

TestCollection

{city: "Berlin", foo: "one"}

1721293838439

[0.19, 0.81, 0.75, 0.11]

9ef2b3a7-1e56-4ddd-b8c3-2ca8901ce308

1721293838439

TestCollection

{city: "London", foo: "two"}

Get vectors (it leverages this API)
CALL apoc.vectordb.weaviate.get($host, 'test_collection', [1,2], {<optional config>})
Table 4. Example results
score metadata id vector text entity

null

{city: "Berlin", foo: "one"}

null

null

null

null

null

{city: "Berlin", foo: "two"}

null

null

null

null

Get vectors with {allResults: true}
CALL apoc.vectordb.weaviate.get($host, 'test_collection', [1,2], {allResults: true, <optional config>})
Table 5. Example results
score metadata id vector text entity

null

{city: "Berlin", foo: "one"}

1

[…​]

null

null

null

{city: "Berlin", foo: "two"}

2

[…​]

null

null

Query vectors (it leverages here)
CALL apoc.vectordb.weaviate.query($host,
    'test_collection',
    [0.2, 0.1, 0.9, 0.7],
    '{operator: Equal, valueString: "London", path: ["city"]}',
    5,
    {fields: ["city", "foo"], allResults: true, <other optional config>})
Table 6. Example results
score metadata id vector text

1,

{city: "Berlin", foo: "one"}

1

[…​]

null

0.1

{city: "Berlin", foo: "two"}

2

[…​]

null

We can define a mapping, to fetch the associated nodes and relationships and optionally create them, by leveraging the vector metadata.

For example, if we have created 2 vectors with the above upsert procedures, we can populate some existing nodes (i.e. (:Test {myId: 'one'}) and (:Test {myId: 'two'})):

CALL apoc.vectordb.weaviate.query($host, 'test_collection',
    [0.2, 0.1, 0.9, 0.7],
    {},
    5,
    { fields: ["city", "foo"],
      mapping: {
        embeddingKey: "vect",
        nodeLabel: "Test",
        entityKey: "myId",
        metadataKey: "foo"
      }
    })

which populates the two nodes as: (:Test {myId: 'one', city: 'Berlin', vect: [vector1]}) and (:Test {myId: 'two', city: 'London', vect: [vector2]}), which will be returned in the entity column result.

Or else, we can create a node if not exists, via create: true:

CALL apoc.vectordb.weaviate.query($host, 'test_collection',
    [0.2, 0.1, 0.9, 0.7],
    {},
    5,
    { fields: ["city", "foo"],
      mapping: {
        create: true,
        embeddingKey: "vect",
        nodeLabel: "Test",
        entityKey: "myId",
        metadataKey: "foo"
      }
    })

which creates 2 new nodes as above.

Or, we can populate an existing relationship (i.e. (:Start)-[:TEST {myId: 'one'}]→(:End) and (:Start)-[:TEST {myId: 'two'}]→(:End)):

CALL apoc.vectordb.weaviate.query($host, 'test_collection',
    [0.2, 0.1, 0.9, 0.7],
    {},
    5,
    { fields: ["city", "foo"],
      mapping: {
        embeddingKey: "vect",
        relType: "TEST",
        entityKey: "myId",
        metadataKey: "foo"
      }
    })

which populates the two relationships as: ()-[:TEST {myId: 'one', city: 'Berlin', vect: [vector1]}]-() and ()-[:TEST {myId: 'two', city: 'London', vect: [vector2]}]-(), which will be returned in the entity column result.

To optimize performances, we can choose what to YIELD with the apoc.vectordb.weaviate.query and the apoc.vectordb.weaviate.get procedures.

For example, by executing a CALL apoc.vectordb.weaviate.query(…​) YIELD metadata, score, id, the RestAPI request will have an {"with_payload": false, "with_vectors": false}, so that we do not return the other values that we do not need.

It is possible to execute vector db procedures together with the apoc.ml.rag as follow:

CALL apoc.vectordb.weaviate.getAndUpdate($host, $collection, [<id1>, <id2>], $conf) YIELD score, node, metadata, id, vector
WITH collect(node) as paths
CALL apoc.ml.rag(paths, $attributes, $question, $confPrompt) YIELD value
RETURN value

which returns a string that answers the $question by leveraging the embeddings of the db vector.

Delete vectors (it leverages this API)
CALL apoc.vectordb.weaviate.delete($host, 'test_collection', [1,2], {<optional config>})
Table 7. Example results
value

["1", "2"]