Create and store embeddings
Neo4j’s vector indexes and vector functions allow you to calculate the similarity between node and relationship properties in a graph. A prerequisite for using these features is that vector embeddings have been set as properties of these entities. This page shows how these embeddings can be created and stored as properties on nodes and relationships in a Neo4j database using the GenAI plugin.
| For a hands-on guide on how to use the GenAI plugin on a Neo4j database, see Embeddings & Vector Indexes Tutorial → Create embeddings with cloud AI providers. |
Generate and store a single embedding
Use the ai.text.embed() function to generate a vector embedding for a single value.
Syntax |
|
||
Description |
Encode a resource as a vector using the named provider. |
||
Inputs |
Name |
Type |
Description |
|
|
The string to transform into an embedding. |
|
|
|
Case-insensitive identifier of the AI provider to use. See Providers for supported options. |
|
|
|
Provider-specific options. See Providers for details of each supported provider. Note that because this argument may contain sensitive data, it is obfuscated in the query.log. However, if the function call is misspelled or the query is otherwise malformed, it will be logged without obfuscation. |
|
Returns |
|
The generated vector embedding for the resource. |
|
| This function sends one API request every time it is called, which may result in a lot of overhead in terms of both network traffic and latency. If you want to generate many embeddings at once, use Generate and store a batch of embeddings. |
The examples below use the recommendations dataset described in Environment setup. Choose the tab that matches the setup you selected there.
ai.text.embed() returns a VECTOR.
Storing VECTOR values on self-managed instances requires Enterprise Edition and block format.
VECTOR embedding property for the GodfatherMATCH (m:Movie {title:'Godfather, The'})
WHERE m.plot IS NOT NULL AND m.title IS NOT NULL
WITH m, m.title + ' ' + m.plot AS titleAndPlot (1)
WITH m, ai.text.embed(titleAndPlot, 'OpenAI', { token: $openaiToken, model: 'text-embedding-3-small' }) AS vector (2)
SET m.embedding = vector (3)
RETURN m.embedding AS embedding
| 1 | Concatenate the title and plot of the Movie into a single STRING. |
| 2 | Create an embedding from titleAndPlot. |
| 3 | Store the propertyVector into an embedding property (type VECTOR) on The Godfather node. |
+----------------------------------------------------------------------------------------------------+
| embedding |
+----------------------------------------------------------------------------------------------------+
| [0.005239539314061403, -0.039358530193567276, -0.0005175105179660022, -0.038706034421920776, ... ] |
+----------------------------------------------------------------------------------------------------+
ai.text.embed() returns a VECTOR, which can be converted into a list with toFloatList().
Use this compatibility workflow on Community Edition or on databases that use aligned format.
LIST<FLOAT> embedding property for the GodfatherMATCH (m:Movie {title:'Godfather, The'})
WHERE m.plot IS NOT NULL AND m.title IS NOT NULL
WITH m, m.title + ' ' + m.plot AS titleAndPlot (1)
WITH m, ai.text.embed(titleAndPlot, 'OpenAI', { token: $openaiToken, model: 'text-embedding-3-small' }) AS vector (2)
CALL db.create.setNodeVectorProperty(m, 'embedding', toFloatList(vector)) (3)
RETURN m.embedding AS embedding
| 1 | Concatenate the title and plot of the Movie into a single STRING. |
| 2 | Create an embedding from titleAndPlot. |
| 3 | Store the propertyVector into an embedding property (type LIST<FLOAT>) on The Godfather node.
The procedures db.create.setNodeVectorProperty and db.create.setRelationshipVectorProperty store the list with a more space-efficient representation. |
+----------------------------------------------------------------------------------------------------+
| embedding |
+----------------------------------------------------------------------------------------------------+
| [0.005239539314061403, -0.039358530193567276, -0.0005175105179660022, -0.038706034421920776, ... ] |
+----------------------------------------------------------------------------------------------------+
Generate and store a batch of embeddings
Use the ai.text.embedBatch procedure to generate many vector embeddings with a single API request.
This procedure takes a list of resources as an input, and returns the same number of result rows.
Syntax |
|
||
Description |
Encode a given batch of resources as vectors using the named provider. |
||
Inputs |
Name |
Type |
Description |
|
|
The strings to transform into an embedding. |
|
|
|
Case-insensitive identifier of the AI provider to use. See Providers for supported options. |
|
|
|
Provider-specific options. See Providers for details of each supported provider. Note that because this argument may contain sensitive data, it is obfuscated in the query.log. However, if the function call is misspelled or the query is otherwise malformed, it will be logged without obfuscation. |
|
Returns |
Name |
Type |
Description |
|
|
The index of the corresponding element in the input list, to correlate results back to inputs. |
|
|
|
The given input resource. |
|
|
|
The generated vector embedding for this resource. |
|
|
This procedure attempts to generate embeddings for all supplied resources in a single API request. Providing too many resources may cause the AI provider to time out or to reject the request. For OpenAI, Azure OpenAI, and Vertex AI, the |
ai.text.embedBatch() returns a VECTOR for each input resource.
Storing VECTOR values on an on-prem instance requires Enterprise Edition and block format.
VECTOR propertiesMATCH (m:Movie WHERE m.plot IS NOT NULL) LIMIT 20
WITH collect(m) AS moviesList (1)
WITH moviesList, [movie IN moviesList | movie.title + ': ' + movie.plot] AS batch (2)
CALL ai.text.embedBatch(batch, 'OpenAI', { token: $openaiToken, model: 'text-embedding-3-small' }) YIELD index, vector
WITH moviesList, index, vector
MATCH (toUpdate:Movie {title: moviesList[index]['title']})
SET toUpdate.embedding = vector (3)
| 1 | Collect all 20 Movie nodes into a LIST<NODE>. |
| 2 | A list comprehension ([]) extracts the title and plot properties of the movies in moviesList into a new LIST<STRING>. |
| 3 | SET is run for each vector returned by ai.text.embedBatch(), and stores that vector as a property named embedding on the corresponding node. |
VECTOR propertiesMATCH (m:Movie WHERE m.plot IS NOT NULL) LIMIT 500
WITH collect(m) AS moviesList, (1)
count(*) AS total,
100 AS batchSize (2)
UNWIND range(0, total-1, batchSize) AS batchStart (3)
CALL (moviesList, batchStart, batchSize) { (4)
WITH [movie IN moviesList[batchStart .. batchStart + batchSize] | movie.title + ': ' + movie.plot] AS batch (5)
CALL ai.text.embedBatch(batch, 'OpenAI', { token: $openaiToken, model: 'text-embedding-3-small' }) YIELD index, vector
MATCH (toUpdate:Movie {title: moviesList[batchStart + index]['title']})
SET toUpdate.embedding = vector (6)
} IN CONCURRENT TRANSACTIONS OF 1 ROW (7)
| 1 | Collect all returned Movie nodes into a LIST<NODE>. |
| 2 | batchSize defines the number of nodes in moviesList to be processed at once.
Because vector embeddings can be very large, a larger batch size may require significantly more memory on the Neo4j server.
Too large a batch size may also exceed the provider’s threshold. |
| 3 | Process Movie nodes in increments of batchSize.
The end range total-1 is due to range being inclusive on both ends. |
| 4 | A CALL subquery executes a separate transaction for each batch.
Note that this CALL subquery uses a variable scope clause. |
| 5 | batch is a list of strings, each being the concatenation of title and plot of one movie. |
| 6 | The procedure sets vector as value for the property named embedding for the node at position batchStart + index in the moviesList. |
| 7 | Set to 1 the amount of batches to be processed at once.
For more information on concurrency in transactions, see CALL subqueries → Concurrent transactions. |
This example may not scale to larger datasets, as collect(m) requires the whole result set to be loaded in memory.
For an alternative method more suitable to processing large amounts of data, see Embeddings & Vector Indexes Tutorial → Create embeddings with cloud AI providers.
|
ai.text.embedBatch() returns a VECTOR for each input resource, which can be converted into a list with toFloatList().
These compatibility workflows store embeddings as LIST<FLOAT> properties.
Use them on Community Edition, or whenever you prefer list-based storage over native VECTOR properties.
LIST<FLOAT> propertiesMATCH (m:Movie WHERE m.plot IS NOT NULL) LIMIT 20
WITH collect(m) AS moviesList (1)
WITH moviesList, [movie IN moviesList | movie.title + ': ' + movie.plot] AS batch (2)
CALL ai.text.embedBatch(batch, 'OpenAI', { token: $openaiToken, model: 'text-embedding-3-small' }) YIELD index, vector
WITH moviesList, index, vector
CALL db.create.setNodeVectorProperty(moviesList[index], 'embedding', toFloatList(vector)) (3)
| 1 | Collect all 20 Movie nodes into a LIST<NODE>. |
| 2 | A list comprehension ([]) extracts the title and plot properties of the movies in moviesList into a new LIST<STRING>. |
| 3 | Each vector is converted into a list of floats and stored as a property named embedding (type LIST<FLOAT>) on the corresponding node.
The procedures db.create.setNodeVectorProperty and db.create.setRelationshipVectorProperty store the list with a more space-efficient representation. |
LIST<FLOAT> valuesMATCH (m:Movie WHERE m.plot IS NOT NULL) LIMIT 500
WITH collect(m) AS moviesList, (1)
count(*) AS total,
100 AS batchSize (2)
UNWIND range(0, total-1, batchSize) AS batchStart (3)
CALL (moviesList, batchStart, batchSize) { (4)
WITH [movie IN moviesList[batchStart .. batchStart + batchSize] | movie.title + ': ' + movie.plot] AS batch (5)
CALL ai.text.embedBatch(batch, 'OpenAI', { token: $openaiToken, model: 'text-embedding-3-small' }) YIELD index, vector
CALL db.create.setNodeVectorProperty(moviesList[batchStart + index], 'embedding', toFloatList(vector)) (6)
} IN CONCURRENT TRANSACTIONS OF 1 ROW (7)
| 1 | Collect all returned Movie nodes into a LIST<NODE>. |
| 2 | batchSize defines the number of nodes in moviesList to be processed at once.
Because vector embeddings can be very large, a larger batch size may require significantly more memory on the Neo4j server.
Too large a batch size may also exceed the provider’s threshold. |
| 3 | Process Movie nodes in increments of batchSize.
The end range total-1 is due to range being inclusive on both ends. |
| 4 | A CALL subquery executes a separate transaction for each batch.
Note that this CALL subquery uses a variable scope clause. |
| 5 | batch is a list of strings, each being the concatenation of title and plot of one movie. |
| 6 | The procedure sets vector as value for the property named embedding for the node at position batchStart + index in the moviesList. |
| 7 | Set to 1 the amount of batches to be processed at once.
For more information on concurrency in transactions, see CALL subqueries → Concurrent transactions. |
This example may not scale to larger datasets, as collect(m) requires the whole result set to be loaded in memory.
For an alternative method more suitable to processing large amounts of data, see GenAI documentation - Embeddings & Vector Indexes Tutorial → Create embeddings with cloud AI providers.
|
Providers
You can create vector embeddings via the following providers:
-
OpenAI (
openai) -
Azure OpenAI (
azure-openai) -
Google Vertex AI (
vertexai) -
Amazon Bedrock Titan Models (
bedrock-titan)
The query CALL ai.text.embed.providers() (see reference) shows the list of supported providers in the installed version of the plugin.
OpenAI
| Name | Type | Default | Description |
|---|---|---|---|
|
|
- |
Model ID (see OpenAI → Models). |
|
|
- |
OpenAI API key (see OpenAI → API Keys). |
|
|
|
The maximum amount of data (measured in tokens) sent at a time. Introduced in 2026.04 |
|
|
|
Optional vendor options that will be passed on as-is in the request to Open AI (see OpenAI → Create embeddings). |
Hello World!WITH
{
token: $openaiToken,
model: 'text-embedding-3-small',
vendorOptions: {
dimensions: 1024
}
} AS conf
RETURN ai.text.embed('Hello World!', 'openai', conf) AS result
You can change OpenAI’s base URL (default: https://api.openai.com) via the genai.openai.baseurl setting.
The change applies to all ai.text.* calls that use OpenAI, including ai.text.embed, ai.text.embedBatch and ai.text.completion.
See Configuration Options → genai.openai.baseurl.
|
Azure OpenAI
| Name | Type | Default | Description |
|---|---|---|---|
|
|
- |
Model id (see Azure → Azure OpenAI in Foundry Models). |
|
|
- |
Azure resource name. |
|
|
- |
Azure OAuth2 bearer token. |
|
|
|
The maximum amount of data (measured in tokens) sent at a time. Introduced in 2026.04 |
|
|
|
Optional vendor options that will be passed on as-is in the request to Azure. |
Hello World!WITH
{
token: $azureToken,
resource: 'my-azure-openai-resource',
model: 'text-embedding-3-small',
vendorOptions: {
dimensions: 1024
}
} AS conf
RETURN ai.text.embed('Hello World!', 'azure-openai', conf) AS result
Since 2026.04, you can change Azure OpenAI’s base URL via the genai.azure.openai.baseurl setting.
The change applies to all ai.text.* calls that use Azure OpenAI.
See Configuration Options → genai.azure.openai.baseurl.
|
Google VertexAI
| Name | Type | Default | Description |
|---|---|---|---|
|
|
- |
Model resource name (see Vertex AI → Model Garden). |
|
|
- |
Google Cloud project ID. |
|
|
- |
Google cloud region (see Vertex AI → Locations). |
|
|
|
Model publisher. |
|
|
- |
Vertex AI API key. |
|
|
- |
Vertex AI API access token. |
|
|
- |
The maximum amount of data (measured in tokens) sent at a time. Introduced in 2026.04 |
|
|
|
Optional vendor options that will be passed on as-is in the request to Vertex (see Vertex AI → Method: models.predict). |
Exactly one of apiKey or token must be provided.
|
Hello World!WITH
{
token: $vertexaiApiAccessKey,
model: 'gemini-embedding-001',
publisher: 'google',
project: 'my-google-cloud-project',
region: 'asia-northeast1',
vendorOptions: {
outputDimensionality: 1024
}
} AS conf
RETURN ai.text.embed('Hello World!', 'vertexai', conf) AS result
Amazon Bedrock Titan Models
This provider supports all models that use the same request parameters and response fields as the Titan text models.
| Name | Type | Default | Description |
|---|---|---|---|
|
|
- |
Model ID or its ARN. |
|
|
- |
Amazon region (see Amazon Bedrock → Model Support). |
|
|
- |
Amazon access key ID. |
|
|
- |
Amazon secret access key. |
|
|
|
Optional vendor options that will be passed on as-is in the request to Bedrock (see Amazon Bedrock → Inference request parameters and response fields). |
Hello World!WITH
{
accessKeyId: $awsAccessKeyId,
secretAccessKey: $secretAccessKey,
model: 'amazon.titan-embed-text-v1',
region: 'eu-west-2',
vendorOptions: {
dimensions: 1024
}
} AS conf
RETURN ai.text.embed('Hello World!', 'bedrock-titan', conf) AS result
(Legacy) ProvidersDeprecated in 2025.11
The following provider configurations are for the genai.vector.encode function and the genai.vector.encodeBatch procedure.
Both callables have been deprecated in 2025.11 and superseded by ai.text.embed and ai.text.embedBatch.
For more information on the old callables, see documentation for the previous version.
OpenAI
-
Identifier (
providerargument):"OpenAI"
| Key | Type | Description | Default |
|---|---|---|---|
|
|
API access token. |
Required |
|
|
The name of the model to invoke. |
|
|
|
The number of dimensions to reduce the vector to. Only supported for certain models. |
Model-dependent. |
Vertex AI
-
Identifier (
providerargument):"VertexAI"
| Key | Type | Description | Default |
|---|---|---|---|
|
|
API access token. |
Required |
|
|
GCP project ID. |
Required |
|
|
The name of the model to invoke. |
|
|
|
GCP region where to send the API requests. Supported values
|
|
|
|
The intended downstream application (see provider documentation). The specified |
|
|
|
The title of the document that is being encoded (see provider documentation). The specified |
Azure OpenAI
-
Identifier (
providerargument):"AzureOpenAI"
| Unlike the other providers, the model is configured when creating the deployment on Azure, and is thus not part of the configuration map. |
| Key | Type | Description | Default |
|---|---|---|---|
|
|
API access token. |
Required |
|
|
The name of the resource to which the model has been deployed. |
Required |
|
|
The name of the model deployment. |
Required |
|
|
The number of dimensions to reduce the vector to. Only supported for certain models. |
Model-dependent. |
Amazon Bedrock
-
Identifier (
providerargument):"Bedrock"
| Key | Type | Description | Default |
|---|---|---|---|
|
|
AWS access key ID. |
Required |
|
|
AWS secret key. |
Required |
|
|
The name of the model to invoke.
|
|
|
|
AWS region where to send the API requests.
|
|