Vector search index

Vector search indexes were released as a public beta in Neo4j 5.11.

This chapter describes how to use vector indexes to perform an approximate nearest neighbor search.

Vector indexes allow users to query vector embeddings from large datasets. An embedding is a numerical representation of a data object, such as text, image, audio, or document.

For example, each word or token in a text is typically represented as high-dimensional vector where each dimension represents a certain aspect of the word’s meaning. Words that are semantically similar or related are often represented by vectors that are closer to each other in this vector space. This allows for mathematical operations like addition and subtraction to carry semantic meaning. For example, the vector representation of "king" minus "man" plus "woman" might be close to the vector representation of "queen." In other words, vector embeddings can be said to be a numerical representation of a particular data object, capturing its semantic meaning.

The embedding for a particular data object can be generated by, for example, the Vertex AI or OpenAI embedding generators, which can produce vector embeddings with dimensions such as 768, 1024, and 1536. These vector embeddings are stored as LIST<FLOAT> properties on a node, where each dimensional component of the vector is an element in the LIST. A Neo4j vector index can be used to index nodes by LIST<FLOAT> properties valid to the index.

In Neo4j, a vector index allows you to write queries that match a neighborhood of nodes based on the similarity between the properties of those nodes and the ones specified in the query.

Neo4j vector indexes are powered by the Apache Lucene indexing and search library. Lucene implements a Hierarchical Navigable Small World[1] (HNSW) Graph to perform a k approximate nearest neighbors (k-ANN) query over the vector fields.

Vector index commands and procedures

Vector indexes are managed through Cypher® commands and built-in procedures, see Operations Manual → Procedures for a complete reference.

The procedures and commands for vector indexes are listed in the following table:

Table 1. Commands and procedures for vector indexes
Usage Procedure/Command Description

Create vector index.

Create a named vector index for the specified label and property with the given vector dimensionality using the given similarity function.

Use vector index.

Query the given vector index. Returns the requested number of approximate nearest neighbor nodes and their similarity score, ordered by score.

Drop vector index.

DROP INDEX index_name

Drop the specified index.

Listing all vector indexes.


Lists all vector indexes, see the SHOW INDEXES command for details. There is no vector index filter built into SHOW INDEXES.

Set vector property.

Update a given node property with the given vector in a more space-efficient way than directly using SET.

Create and configure vector indexes

You can create vector indexes using the procedure db.index.vector.createNodeIndex. The name of the index must be unique so that you can reference it when querying or if you want to drop the index.

The index name must be unique among both indexes and constraints.

A vector index is a single-label, single-property index for nodes. In addition to the label and property key (both given as STRING), a vector index needs to be configured with both the dimensionality of the vector (INTEGER between 1 and 2048 inclusive), and the measure of similarity between two vectors (case-insensitive STRING). For details, see Supported similarity functions.

Signature for db.index.vector.createNodeIndex to create a vector node index
db.index.vector.createNodeIndex(indexName :: STRING?, label :: STRING?, propertyKey :: STRING?, vectorDimension :: INTEGER?, vectorSimilarityFunction :: STRING?) :: VOID

The new index is not immediately available but is created in the background.

All vectors within the index must have the same dimensionality. The measure of similarity is determined by the given vector similarity function. This defines how similar two vectors are to one another by a similarity score, how vectors are interpreted, and what vectors are valid for the index.

A node is indexed if all the following are true:

  • The node contains the configured label.

  • The node contains the configured property key.

  • The respective property value is of type LIST<FLOAT>.

  • The size() of the respective value is the same as the configured dimensionality.

  • The value is a valid vector for the configured similarity function.

Otherwise, a node is not indexed.

Example 1. Create a vector index

For instance, assume you have a graph of research papers, and each paper has an abstract. You want to find papers in the neighborhood of a paper you know.

Data model

Assume for each abstract, you have generated a 1536-dimensional vector embedding of the abstract’s text using Open AI’s default model, text-embedding-ada-002. This model suggests a cosine similarity. For more information, see OpenAI’s official documentation.

You can create a cosine vector index over the embedding property.

CALL db.index.vector.createNodeIndex('abstract-embeddings', 'Abstract', 'embedding', 1536, 'cosine')

You can see that the vector index has been created using SHOW INDEXES:

SHOW INDEXES YIELD name, type, labelsOrTypes, properties, options
Table 2. Result
name type labelsOrTypes properties options





{indexProvider: "vector-1.0", indexConfig: {vector.dimensions: 1536, vector.similarity_function: "cosine"}}

Rows: 1

Query a vector index

You can query a vector index using the procedure db.index.vector.queryNodes.

Signature for db.index.vector.queryNodes to query a vector index
db.index.vector.queryNodes(indexName :: STRING?, numberOfNearestNeighbours :: INTEGER?, query :: LIST? OF FLOAT?) :: (node :: NODE?, score :: FLOAT?)
  • The indexName (a STRING) refers to the unique name of the vector index to query.

  • The numberOfNearestNeighbours (an INTEGER) refers to the number of nearest neighbors to return as the neighborhood.

  • The query vector (a LIST<FLOAT>) in which to search for the neighborhood.

The procedure returns the neighborhood of nodes with their respective similarity scores, ordered by those scores. The scores are bounded between 0 and 1, where the closer to 1 the score is, the more similar the indexed vector is to the query vector.

Example 2. Query a vector index

This example takes the paper that describes the HNSW[1] graph structure that the vector index implements and tries to find similar papers. First you MATCH to find the paper, and then you query the abstract-embeddings index for a neighborhood of 10 similar abstracts to your query. Finally, you MATCH for the neighborhood’s respective titles.

MATCH (title:Title)<--(:Paper)-->(abstract:Abstract)
WHERE toLower(title.text) = 'efficient and robust approximate nearest neighbor search using
  hierarchical navigable small world graphs'

CALL db.index.vector.queryNodes('abstract-embeddings', 10, abstract.embedding)
YIELD node AS similarAbstract, score

MATCH (similarAbstract)<--(:Paper)-->(similarTitle:Title)
RETURN similarTitle.text AS title, score
Table 3. Result
title score

"Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs"


"Accelerating Large-Scale Graph-based Nearest Neighbor Search on a Computational Storage Platform"


"Nearest Neighbor Search Under Uncertainty"


"Neighbor selection and hitting probability in small-world graphs"


"Fast Approximate Nearest Neighbor Search With The Navigating Spreading-out Graph"


"Towards Similarity Graphs Constructed by Deep Reinforcement Learning"


"A novel approach to study realistic navigations on networks"


"Intentional Walks on Scale Free Small Worlds"


"FINGER: Fast Inference for Graph-based Approximate Nearest Neighbor Search"


"Learning to Route in Similarity Graphs"


Rows: 10

The results are expected, with papers discussing graph-based nearest-neighbor searches.

The most similar to this result is the query vector itself, which is to be expected as the index was queried with an indexed property. If the query vector itself is not wanted, you can use WHERE score < 1 to remove equivalent vectors to the query vector.

Drop vector indexes

A vector index is dropped by using the same command as for other indexes, DROP INDEX.

Example 3. DROP INDEX

In the following example, you drop the abstract-embeddings that you created previously:

DROP INDEX `abstract-embeddings`
Removed 1 index.

Set a vector property on a node

Valid vectors for use in the index must have components finitely representable in IEEE 754[2] single precision. They are represented as properties on nodes with type LIST<FLOAT>. The recommended way of setting a vector property is using the db.create.setVectorProperty procedure. This procedure validates the input and sets the property as an array of IEEE 754[2] single precision values.

Signature for db.create.setVectorProperty
db.create.setVectorProperty(node :: NODE?, key :: STRING?, vector :: LIST? OF FLOAT?) :: (node :: NODE?)

Usually you want to define your embeddings as Cypher parameters and call db.create.setVectorProperty as in the following example:

Set a vector via db.create.setVectorProperty
MATCH (n:Node {id: $id})
CALL db.create.setVectorProperty(n, 'propertyKey', $vector)
YIELD node RETURN node;

The example above matches one node and updates its embedding, but it’s also possible to use a list parameter containing several MATCH criterias and embeddings, to update multiple nodes in an UNWIND clause. This is ideal for creating and setting new vector properties in the graph.

You can also set a vector property on a node using the SET command:

Set a vector property via SET
MATCH (node:Node {id: $id})
SET node.propertyKey = $vector
RETURN node;

However, Cypher stores the provided LIST<FLOAT> as a primitive array of IEEE 754[2] double precision values. As a result, it takes up approximately twice as much space and it’s not recommended for properties that are used in a vector index. Existing properties can be re-set using db.create.setVectorProperty to minimize store size, but there is a cost in the transaction log size until they are rotated away.

Supported similarity functions

The choice of similarity function affects which indexed vectors are considered similar, and which are valid. The semantic meaning of the vector may itself dictate which similarity function to choose. Refer to the documentation for the particular vector embedding model you are using, as it may suggest a preference for certain similarity functions. Otherwise, being able to differentiate between the various similarity functions can assist in making a more informed decision.

Table 4. Similarity functions
Name Case insensitive argument Key similarity feature







For l2-normalized vectors (unit vectors), thus having unit length The l2-norm of vector v equals 1, Euclidean and cosine similarity functions produce the same similarity ordering.

Euclidean similarity

Euclidean similarity is useful when the distance between the vectors is what determines how similar two vectors are.

A valid vector for a Euclidean vector index is when all vector components can be represented finitely in IEEE 754[2] single precision.

Euclidean interprets the vectors in Cartesian coordinates. The measure is related to the Euclidean distance, i.e., how far two points are from one another. However, that distance is unbounded and less useful as a similarity score. Euclidean similarity bounds the square of the Euclidean distance.

The Euclidean of vector v and vector u is defined as 1 over the quantity 1 plus the square of the l2-norm of vector v subtract vector u, which exists in the bounded set of real numbers between 0 exclusive and 1 inclusive.

Cosine similarity

Cosine similarity is used when the angle between the vectors is what determines how similar two vectors are.

A valid vector for a cosine vector index is when:

  • All vector components can be represented finitely in IEEE 754[2] single precision.

  • Its l2-norm is non-zero and can be represented finitely in IEEE 754[2] single precision.

Cosine similarity interprets the vectors in Cartesian coordinates. The measure is related to the angle between the two vectors. However, an angle can be described in many units, sign conventions, and periods. The trigonometric cosine of this angle is both agnostic to the aforementioned angle conventions and bounded. Cosine similarity rebounds the trigonometric cosine.

The cosine of vector v and vector u is defined as half of the quanity 1 plus the scalar product of v hat u hat, which equals half of the quantity 1 plus the scalar product of vector v vector u over the product of the l2-norm of vector v and the l2 norm ov vector u, which exists in the bounded set of real numbers between 0 inclusive and 1 inclusive.

In the above equation the trigonometric cosine is given by the scalar product of the two unit vectors.

Limitations and idiosyncrasies

  • The query is an approximate nearest neighbor search. The requested k nearest neighbors may not be the exact k nearest, but close within the same wider neighborhood, such as finding a local extremum vs the true extremum.

  • For large requested nearest neighbors, k, close to the total number of indexed vectors, the search may retrieve fewer than k results.

  • The index must have a unique name. There is no provided method for an autogenerated name.

  • Only one vector index can be over a schema. For example, you cannot have one Euclidean and one cosine vector index on the same label-property key pair.

  • Only node vector indexes are supported.

  • No provided settings or options for tuning the index.

  • Changes made within the same transaction are not visible to the index.

  • There is no Cypher syntax for creating a vector index, nor for the standard index type filtering with SHOW INDEXES command.

Known issues

The vector search index is a beta feature. The below table lists the known issues in its current implementation.

Known issues Fixed in

SHOW PROCEDURES does not show the vector index procedures:

The procedures are still usable, just not visible.

Neo4j 5.12

Passing null as an argument to some of the procedure parameters can generate a confusing exception.

Neo4j 5.12

The creation of the vector index skipped the check to limit the dimensionality to 2048.

Vector indexes configured with dimensionality greater than 2048 in Neo4j 5.11 should continue to work after the limitation is applied.

Neo4j 5.12

The validation for cosine similarity verifies that the vector’s l2-norm can be represented finitely in IEEE 754[2] double precision, rather than in single precision. This can lead to certain large component vectors being incorrectly indexed, and return a similarity score of ±0.0.

Neo4j 5.12

db.index.vector.queryNodes query vector validation is incorrect with a cosine vector index. The l2-norm validation only considers the last component of the vector. If that component is ±0.0, an otherwise valid query vector will be thrown as invalid. This can also result in some invalid vectors being used to query, and return a similarity score of ±0.0.

For l2-normalized vectors (unit vectors), thus having unit length The l2-norm of vector v equals 1, Euclidean and cosine similarity functions produce the same similarity ordering. It is recommended to normalize your vectors (if needed), and use a Euclidean vector index.

Neo4j 5.12

The vector index createStatement field from SHOW INDEXES does not correctly escape single quotes in index names, labels, and property keys.

Neo4j 5.12

Copying a database store with a vector index does not log the recreation command, and instead logs an error:

ERROR: [StoreCopy] Unable to format statement for index 'index-name'

Due to an:

java.lang.IllegalArgumentException: Did not recognize index type VECTOR

If a store copy is required, make a note of the information in the createStatement column returned from the SHOW INDEX command. For example:

SHOW INDEXES YIELD type, createStatement
RETURN createStatement

Neo4j 5.12

Some of the protections preventing the use of new features during a database rolling upgrade are missing. This can result in a transaction to create a vector index on a cluster member running Neo4j 5.11 and distributing it to other cluster members running an older Neo4j version. The older Neo4j versions will fail to understand the transaction.

Ensure that all cluster members have been updated to use Neo4j 5.11 (or a newer version) before calling dbms.upgrade() on the system database. Once committed, vector indexes can be safely created on the cluster.

Neo4j 5.12


Vector indexes can take advantage of the incubated Java 20 Vector API for noticeable speed improvements. If you are using a compatible version of Java, you can add the following setting to your configuration settings:

Configuration settings
server.jvm.additional=--add-modules jdk.incubator.vector