Pinecone
In Pinecone a collection is a static and non-queryable copy of an index, therefore, unlike other vector dbs, the Pinecone procedures work on indexes instead of collections. However, the vectordb procedures to handle CRUD operations on collections are usually named |
Here is a list of all available Pinecone procedures:
name | description |
---|---|
apoc.vectordb.pinecone.info(hostOrKey, index, $config) |
Get information about the specified existing index or throws a 404 error if it does not exist |
apoc.vectordb.pinecone.createCollection(hostOrKey, index, similarity, size, $config) |
Creates an index, with the name specified in the 2nd parameter, and with the specified |
apoc.vectordb.pinecone.deleteCollection(hostOrKey, index, $config) |
Deletes an index with the name specified in the 2nd parameter.
The default endpoint is |
apoc.vectordb.pinecone.upsert(hostOrKey, index, vectors, $config) |
Upserts, in the index with the name specified in the 2nd parameter, the vectors [{id: 'id', vector: '<vectorDb>', medatada: '<metadata>'}].
The default endpoint is |
apoc.vectordb.pinecone.delete(hostOrKey, index, ids, $config) |
Delete the vectors with the specified |
apoc.vectordb.pinecone.get(hostOrKey, index, ids, $config) |
Get the vectors with the specified |
apoc.vectordb.pinecone.getAndUpdate(hostOrKey, index, ids, $config) |
Get the vectors with the specified |
apoc.vectordb.pinecone.query(hostOrKey, index, vector, filter, limit, $config) |
Retrieve closest vectors the the defined |
apoc.vectordb.pinecone.queryAndUpdate(hostOrKey, index, vector, filter, limit, $config) |
Retrieve closest vectors the the defined |
where the 1st parameter can be a key defined by the apoc config apoc.pinecone.<key>.host=myHost
.
The default hostOrKey
is "https://api.pinecone.io"
,
therefore in general can be null with the createCollection
and deleteCollection
procedures,
and equal to the host name, with the other ones, that is, the one indicated in the Pinecone dashboard:
Examples
The following example assume we want to create and manage an index called test-index
.
CALL apoc.vectordb.pinecone.info(hostOrKey, 'test-index', {<optional config>})
value |
---|
{ "dimension": 3, "environment": "us-east1-gcp", "name": "tiny-index", "size": 3126700, "status": "Ready", "vector_count": 99 } |
CALL apoc.vectordb.pinecone.createCollection(null, 'test-index', 'cosine', 4, {<optional config>})
CALL apoc.vectordb.pinecone.deleteCollection(null, 'test-index', {<optional config>})
CALL apoc.vectordb.pinecone.upsert('https://test-index-ilx67g5.svc.aped-4627-b74a.pinecone.io',
'test-index',
[
{id: '1', vector: [0.05, 0.61, 0.76, 0.74], metadata: {city: "Berlin", foo: "one"}},
{id: '2', vector: [0.19, 0.81, 0.75, 0.11], metadata: {city: "London", foo: "two"}}
],
{<optional config>})
CALL apoc.vectordb.pinecone.get($host, 'test-index', [1,2], {<optional config>})
score | metadata | id | vector | text | entity |
---|---|---|---|---|---|
null |
{city: "Berlin", foo: "one"} |
null |
null |
null |
null |
null |
{city: "Berlin", foo: "two"} |
null |
null |
null |
null |
{allResults: true}
CALL apoc.vectordb.pinecone.get($host, 'test-index', ['1','2'], {allResults: true, <optional config>})
score | metadata | id | vector | text | entity |
---|---|---|---|---|---|
null |
{city: "Berlin", foo: "one"} |
1 |
[…] |
null |
null |
null |
{city: "Berlin", foo: "two"} |
2 |
[…] |
null |
null |
CALL apoc.vectordb.pinecone.query($host,
'test-index',
[0.2, 0.1, 0.9, 0.7],
{ city: { `$eq`: "London" } },
5,
{allResults: true, <optional config>})
score | metadata | id | vector | text | entity |
---|---|---|---|---|---|
1, |
{city: "Berlin", foo: "one"} |
1 |
[…] |
null |
null |
0.1 |
{city: "Berlin", foo: "two"} |
2 |
[…] |
null |
null |
We can define a mapping, to auto-create one/multiple nodes and relationships, by leveraging the vector metadata.
For example, if we have created 2 vectors with the above upsert procedures,
we can populate some existing nodes (i.e. (:Test {myId: 'one'})
and (:Test {myId: 'two'})
):
CALL apoc.vectordb.pinecone.queryAndUpdate($host, 'test-index',
[0.2, 0.1, 0.9, 0.7],
{},
5,
{ mapping: {
embeddingKey: "vect",
nodeLabel: "Test",
entityKey: "myId",
metadataKey: "foo"
}
})
which populates the two nodes as: (:Test {myId: 'one', city: 'Berlin', vect: [vector1]})
and (:Test {myId: 'two', city: 'London', vect: [vector2]})
,
which will be returned in the entity
column result.
We can also set the mapping configuration mode
to CREATE_IF_MISSING
(which creates nodes if not exist), READ_ONLY
(to search for nodes/rels, without making updates) or UPDATE_EXISTING
(default behavior):
CALL apoc.vectordb.pinecone.queryAndUpdate($host, 'test-index',
[0.2, 0.1, 0.9, 0.7],
{},
5,
{ mapping: {
mode: "CREATE_IF_MISSING",
embeddingKey: "vect",
nodeLabel: "Test",
entityKey: "myId",
metadataKey: "foo"
}
})
which creates and 2 new nodes as above.
Or, we can populate an existing relationship (i.e. (:Start)-[:TEST {myId: 'one'}]→(:End)
and (:Start)-[:TEST {myId: 'two'}]→(:End)
):
CALL apoc.vectordb.pinecone.queryAndUpdate($host, 'test-index',
[0.2, 0.1, 0.9, 0.7],
{},
5,
{ mapping: {
embeddingKey: "vect",
relType: "TEST",
entityKey: "myId",
metadataKey: "foo"
}
})
which populates the two relationships as: ()-[:TEST {myId: 'one', city: 'Berlin', vect: [vector1]}]-()
and ()-[:TEST {myId: 'two', city: 'London', vect: [vector2]}]-()
,
which will be returned in the entity
column result.
We can also use mapping for apoc.vectordb.pinecone.query
procedure, to search for nodes/rels fitting label/type and metadataKey, without making updates
(i.e. equivalent to *.queryOrUpdate
procedure with mapping config having mode: "READ_ONLY"
).
For example, with the previous relationships, we can execute the following procedure, which just return the relationships in the column rel
:
CALL apoc.vectordb.pinecone.query($host, 'test-index',
[0.2, 0.1, 0.9, 0.7],
{},
5,
{ mapping: {
embeddingKey: "vect",
relType: "TEST",
entityKey: "myId",
metadataKey: "foo"
}
})
We can use mapping with |
To optimize performances, we can choose what to For example, by executing a |
It is possible to execute vector db procedures together with the apoc.ml.rag as follow:
CALL apoc.vectordb.pinecone.getAndUpdate($host, $index, [<id1>, <id2>], $conf) YIELD node, metadata, id, vector
WITH collect(node) as paths
CALL apoc.ml.rag(paths, $attributes, $question, $confPrompt) YIELD value
RETURN value
CALL apoc.vectordb.pinecone.delete($host, 'test-index', ['1','2'], {<optional config>})