Weaviate

Here is a list of all available Weaviate procedures, note that the list and the signature procedures are consistent with the others, like the Qdrant ones:

name description

apoc.vectordb.weaviate.info($host, $collectionName, $config)

Get information about the specified existing collection or throws a FileNotFoundException if it does not exist

apoc.vectordb.weaviate.createCollection(hostOrKey, collection, similarity, size, $config)

Creates a collection, with the name specified in the 2nd parameter, and with the specified similarity and size. The default endpoint is <hostOrKey param>/schema.

apoc.vectordb.weaviate.deleteCollection(hostOrKey, collection, $config)

Deletes a collection with the name specified in the 2nd parameter. The default endpoint is <hostOrKey param>/schema/<collection param>.

apoc.vectordb.weaviate.upsert(hostOrKey, collection, vectors, $config)

Upserts, in the collection with the name specified in the 2nd parameter, the vectors [{id: 'id', vector: '<vectorDb>', medatada: '<metadata>'}]. The default endpoint is <hostOrKey param>/objects.

apoc.vectordb.weaviate.delete(hostOrKey, collection, ids, $config)

Deletes the vectors with the specified ids. The default endpoint is <hostOrKey param>/schema.

apoc.vectordb.weaviate.get(hostOrKey, collection, ids, $config)

Gets the vectors with the specified ids. The default endpoint is <hostOrKey param>/schema.

apoc.vectordb.weaviate.query(hostOrKey, collection, vector, filter, limit, $config)

Retrieve closest vectors from the defined vector, limit of results, in the collection with the name specified in the 2nd parameter. Note that, besides the common config parameters, this procedure requires a field: [listOfProperty] config, to define which properties are to be retrieved from GraphQL running under-the-hood. The default endpoint is <hostOrKey param>/graphql.

apoc.vectordb.weaviate.getAndUpdate(hostOrKey, collection, ids, $config)

Gets the vectors with the specified ids, and optionally creates/updates neo4j entities. The default endpoint is <hostOrKey param>/schema.

apoc.vectordb.weaviate.queryAndUpdate(hostOrKey, collection, vector, filter, limit, $config)

Retrieve closest vectors from the defined vector, limit of results, in the collection with the name specified in the 2nd parameter, and optionally creates/updates neo4j entities. Note that, besides the common config parameters, this procedure requires a field: [listOfProperty] config, to define which properties are to be retrieved from GraphQL running under-the-hood. The default endpoint is <hostOrKey param>/graphql.

where the 1st parameter can be a key defined by the apoc config apoc.weaviate.<key>.host=myHost. With hostOrKey=null, the default is 'http://localhost:8080/v1'.

Examples

Get collection info (it leverages this API)
CALL apoc.vectordb.weaviate.info($host, 'test_collection', {<optional config>})
Table 1. Example results
value

{"vectorizer": "none", "invertedIndexConfig": {"bm25": {"b": 0.75, "k1": 1.2}, "stopwords": {"additions": null, "removals": null, "preset": en}, "cleanupIntervalSeconds": 60}, "vectorIndexConfig": {"ef": -1, "dynamicEfMin": 100, "pq": {"centroids": 256, "trainingLimit": 100000, "encoder": {"type": "kmeans", "distribution": "log-normal"}, "enabled": false, "bitCompression": false, "segments": 0 }, "distance": cosine, "skip": false, "dynamicEfFactor": 8, "bq": {"enabled": false}, "vectorCacheMaxObjects": 1000000000000, "cleanupIntervalSeconds": 300, "dynamicEfMax": 500, "efConstruction": 128, "flatSearchCutoff": 40000, "maxConnections": 64}, "multiTenancyConfig": {"enabled": false}, "vectorIndexType": "hnsw", "replicationConfig": {"factor": 1}, "shardingConfig": {"desiredVirtualCount": 128, "desiredCount": 1, "actualCount": 1, "function": "murmur3", "virtualPerPhysical": 128, "strategy": "hash", "actualVirtualCount": 128, "key": "_id"}, "class": "TestCollection", "properties": [{"name": "city", "description": "This property was generated by Weaviate’s auto-schema feature on Wed Jul 10 12:50:18 2024", "indexFilterable": true, "tokenization": "word", "indexSearchable": true, "dataType": ["text"]}, {"name": "foo", "description": "This property was generated by Weaviate’s auto-schema feature on Wed Jul 10 12:50:18 2024", "indexFilterable": true, "tokenization": word, "indexSearchable": true, "dataType": ["text"]} ] }

Create a collection (it leverages this API)
CALL apoc.vectordb.weaviate.createCollection($host, 'test_collection', 'Cosine', 4, {<optional config>})
Table 2. Example results
vectorizer invertedIndexConfig vectorIndexConfig multiTenancyConfig vectorIndexType replicationConfig shardingConfig class properties

none

{"bm25": { "b": 0.75, "k1": 1.2 }, "stopwords": { "additions": null, "removals": null, "preset": "en" }, "cleanupIntervalSeconds": 60}

{ "ef": -1, "dynamicEfMin": 100, "pq": { "centroids": 256, "trainingLimit": 100000, "encoder": { "type": "kmeans", "distribution": "log-normal" }, "enabled": false, "bitCompression": false, "segments": 0 }, "distance": "cosine", "skip": false, "dynamicEfFactor": 8, "bq": { "enabled": false }, "vectorCacheMaxObjects": 1000000000000, "cleanupIntervalSeconds": 300, "dynamicEfMax": 500, "efConstruction": 128, "flatSearchCutoff": 40000, "maxConnections": 64 }

{ "enabled": false }

hnsw

{ "factor": 1 }

{ "desiredVirtualCount": 128, "desiredCount": 1, "actualCount": 1, "function": "murmur3", "virtualPerPhysical": 128, "strategy": "hash", "actualVirtualCount": 128, "key": "_id" }

TestCollection

null

Create a collection against a remote connection using an API key (see here)
CALL apoc.vectordb.weaviate.createCollection("https://<weaviateInstanceId>.weaviate.network",
    'TestCollection',
    'cosine',
    4,
    {headers: {Authorization: 'Bearer <apiKey>'}})
Table 3. Example results
vectorizer invertedIndexConfig vectorIndexConfig multiTenancyConfig vectorIndexType replicationConfig shardingConfig class properties

none

{"bm25": { "b": 0.75, "k1": 1.2 }, "stopwords": { "additions": null, "removals": null, "preset": "en" }, "cleanupIntervalSeconds": 60}

{ "ef": -1, "dynamicEfMin": 100, "pq": { "centroids": 256, "trainingLimit": 100000, "encoder": { "type": "kmeans", "distribution": "log-normal" }, "enabled": false, "bitCompression": false, "segments": 0 }, "distance": "cosine", "skip": false, "dynamicEfFactor": 8, "bq": { "enabled": false }, "vectorCacheMaxObjects": 1000000000000, "cleanupIntervalSeconds": 300, "dynamicEfMax": 500, "efConstruction": 128, "flatSearchCutoff": 40000, "maxConnections": 64 }

{ "enabled": false }

hnsw

{ "factor": 1 }

{ "desiredVirtualCount": 128, "desiredCount": 1, "actualCount": 1, "function": "murmur3", "virtualPerPhysical": 128, "strategy": "hash", "actualVirtualCount": 128, "key": "_id" }

TestCollection

null

Delete a collection (it leverages this API)
CALL apoc.vectordb.weaviate.deleteCollection($host, 'test_collection', {<optional config>})

which returns an empty result.

Upsert vectors (it leverages this API)
CALL apoc.vectordb.weaviate.upsert($host, 'test_collection',
    [
        {id: "8ef2b3a7-1e56-4ddd-b8c3-2ca8901ce308", vector: [0.05, 0.61, 0.76, 0.74], metadata: {city: "Berlin", foo: "one"}},
        {id: "9ef2b3a7-1e56-4ddd-b8c3-2ca8901ce308", vector: [0.19, 0.81, 0.75, 0.11], metadata: {city: "London", foo: "two"}}
    ],
    {<optional config>})
Table 4. Example results
lastUpdateTimeUnix vector id creationTimeUnix class properties

1721293838439

[0.05, 0.61, 0.76, 0.74]

8ef2b3a7-1e56-4ddd-b8c3-2ca8901ce308

1721293838439

TestCollection

{city: "Berlin", foo: "one"}

1721293838439

[0.19, 0.81, 0.75, 0.11]

9ef2b3a7-1e56-4ddd-b8c3-2ca8901ce308

1721293838439

TestCollection

{city: "London", foo: "two"}

Get vectors (it leverages this API)
CALL apoc.vectordb.weaviate.get($host, 'test_collection', [1,2], {<optional config>})
Table 5. Example results
score metadata id vector text entity

null

{city: "Berlin", foo: "one"}

null

null

null

null

null

{city: "Berlin", foo: "two"}

null

null

null

null

Get vectors with {allResults: true}
CALL apoc.vectordb.weaviate.get($host, 'test_collection', [1,2], {allResults: true, <optional config>})
Table 6. Example results
score metadata id vector text entity

null

{city: "Berlin", foo: "one"}

1

[…​]

null

null

null

{city: "Berlin", foo: "two"}

2

[…​]

null

null

Query vectors (it leverages here)
CALL apoc.vectordb.weaviate.query($host,
    'test_collection',
    [0.2, 0.1, 0.9, 0.7],
    '{operator: Equal, valueString: "London", path: ["city"]}',
    5,
    {fields: ["city", "foo"], allResults: true, <other optional config>})
Table 7. Example results
score metadata id vector text

1,

{city: "Berlin", foo: "one"}

1

[…​]

null

0.1

{city: "Berlin", foo: "two"}

2

[…​]

null

We can define a mapping, to fetch the associated nodes and relationships and optionally create them, by leveraging the vector metadata.

For example, if we have created 2 vectors with the above upsert procedures, we can populate some existing nodes (i.e. (:Test {myId: 'one'}) and (:Test {myId: 'two'})):

CALL apoc.vectordb.weaviate.queryAndUpdate($host, 'test_collection',
    [0.2, 0.1, 0.9, 0.7],
    {},
    5,
    { fields: ["city", "foo"],
      mapping: {
        embeddingKey: "vect",
        nodeLabel: "Test",
        entityKey: "myId",
        metadataKey: "foo"
      }
    })

which populates the two nodes as: (:Test {myId: 'one', city: 'Berlin', vect: [vector1]}) and (:Test {myId: 'two', city: 'London', vect: [vector2]}), which will be returned in the entity column result.

We can also set the mapping configuration mode to CREATE_IF_MISSING (which creates nodes if not exist), READ_ONLY (to search for nodes/rels, without making updates) or UPDATE_EXISTING (default behavior):

CALL apoc.vectordb.weaviate.queryAndUpdate($host, 'test_collection',
    [0.2, 0.1, 0.9, 0.7],
    {},
    5,
    { fields: ["city", "foo"],
      mapping: {
        mode: "CREATE_IF_MISSING",
        embeddingKey: "vect",
        nodeLabel: "Test",
        entityKey: "myId",
        metadataKey: "foo"
      }
    })

which creates 2 new nodes as above.

Or, we can populate an existing relationship (i.e. (:Start)-[:TEST {myId: 'one'}]→(:End) and (:Start)-[:TEST {myId: 'two'}]→(:End)):

CALL apoc.vectordb.weaviate.queryAndUpdate($host, 'test_collection',
    [0.2, 0.1, 0.9, 0.7],
    {},
    5,
    { fields: ["city", "foo"],
      mapping: {
        embeddingKey: "vect",
        relType: "TEST",
        entityKey: "myId",
        metadataKey: "foo"
      }
    })

which populates the two relationships as: ()-[:TEST {myId: 'one', city: 'Berlin', vect: [vector1]}]-() and ()-[:TEST {myId: 'two', city: 'London', vect: [vector2]}]-(), which will be returned in the entity column result.

We can also use mapping for apoc.vectordb.weaviate.query procedure, to search for nodes/rels fitting label/type and metadataKey, without making updates (i.e. equivalent to *.queryOrUpdate procedure with mapping config having mode: "READ_ONLY").

For example, with the previous relationships, we can execute the following procedure, which just return the relationships in the column rel:

CALL apoc.vectordb.weaviate.query($host, 'test_collection',
    [0.2, 0.1, 0.9, 0.7],
    {},
    5,
    { fields: ["city", "foo"],
      mapping: {
        relType: "TEST",
        entityKey: "myId",
        metadataKey: "foo"
      }
    })

We can use mapping with apoc.vectordb.weaviate.get* procedures as well

To optimize performances, we can choose what to YIELD with the apoc.vectordb.weaviate.query and the apoc.vectordb.weaviate.get procedures.

For example, by executing a CALL apoc.vectordb.weaviate.query(…​) YIELD metadata, score, id, the RestAPI request will have an {"with_payload": false, "with_vectors": false}, so that we do not return the other values that we do not need.

It is possible to execute vector db procedures together with the apoc.ml.rag as follow:

CALL apoc.vectordb.weaviate.getAndUpdate($host, $collection, [<id1>, <id2>], $conf) YIELD score, node, metadata, id, vector
WITH collect(node) as paths
CALL apoc.ml.rag(paths, $attributes, $question, $confPrompt) YIELD value
RETURN value

which returns a string that answers the $question by leveraging the embeddings of the db vector.

Delete vectors (it leverages this API)
CALL apoc.vectordb.weaviate.delete($host, 'test_collection', [1,2], {<optional config>})
Table 8. Example results
value

["1", "2"]