Full-text indexes
A full-text index is used to index nodes and relationships by STRING
properties.
Unlike range and text indexes, which can only perform limited STRING
matching (exact, prefix, substring, or suffix matches), full-text indexes stores individual words in any given STRING
property.
This means that full-text indexes can be used to match within the content of a STRING
property.
Full-text indexes also return a score of proximity between a given query string and the STRING
values stored in the database, thus enabling them to semantically interpret data.
Full-text indexes are powered by the Apache Lucene indexing and search library.
Example graph
The following graph is used for the examples below:
To recreate it, run the following query against an empty Neo4j database:
CREATE (nilsE:Employee {name: "Nils-Erik Karlsson", position: "Engineer", team: "Kernel", peerReviews: ['Nils-Erik is difficult to work with.', 'Nils-Erik is often late for work.']}),
(lisa:Manager {name: "Lisa Danielsson", position: "Engineering manager"}),
(nils:Employee {name: "Nils Johansson", position: "Engineer", team: "Operations"}),
(maya:Employee {name: "Maya Tanaka", position: "Senior Engineer", team:"Operations"}),
(lisa)-[:REVIEWED {message: "Nils-Erik is reportedly difficult to work with."}]->(nilsE),
(maya)-[:EMAILED {message: "I have booked a team meeting tomorrow."}]->(nils)
Create full-text indexes
Full-text indexes are created with the CREATE FULLTEXT INDEX
command.
It is recommended to to give the index a name when it is created.
If no name is given when created, a random name will be assigned to the full-text index.
The CREATE FULLTEXT INDEX
command is optionally idempotent.
This mean that its default behavior is to throw an error if an attempt is made to create the same index twice.
If IF NOT EXISTS
is appended to the command, no error is thrown and nothing happens should an index with the same name or a full-text index on the same schema already exist.
As of Neo4j 5.17, an informational notification is instead returned showing the existing index which blocks the creation.
As of Neo4j 5.16, the index name can also be given as a parameter, CREATE FULLTEXT INDEX $name FOR …
.
Creating a full-text index requires the CREATE INDEX privilege.
|
When creating a full-text index, you need to specify the labels/relationship types and property names it should apply to.
This statement creates a full-text index named namesAndTeams
on each name
and team
property for nodes with the label Employee
or Manager
:
CREATE FULLTEXT INDEX namesAndTeams FOR (n:Employee|Manager) ON EACH [n.name, n.team]
This query highlights two key differences between full-text and search-performance indexes:
-
Full-text indexes can be applied to more than one node label.
-
Full-text indexes can be applied to more than one property, but unlike composite search-performance indexes, a full-text index stores entities that have at least one of the indexed labels or relationship types, and at least one of the indexed properties.
Similarly, though a relationship can have only one type, a full-text index can store multiple relationship types. In that case, all types matching at least one of the relationship types and at least one of the indexed properties will be included.
This statement creates a full-text index named communications
on the message
property for the relationship types REVIEWED
and EMAILED
:
CREATE FULLTEXT INDEX communications FOR ()-[r:REVIEWED|EMAILED]-() ON EACH [r.message]
Tokenization and analyzers
Full-text indexes store individual words in a STRING
property.
This is achieved by tokenizer, which breaks up a stream of characters into individual tokens (usually individual words).
How a STRING
is tokenized is determined by what analyzer the full-text index is configured with.
The default analyzer (standard-no-stop-words
) analyzes both the indexed values and the query string.
Stop words are common words in a language that can be filtered out during information retrieval tasks since they are considered to be of little use when determining the meaning of a string. These words are typically short and frequently used across various contexts. For example, the following stop words are included in Lucene’s Removing stop words can help reduce the size of stored data and thereby improve the efficiency of data retrieval. |
In some cases, using different analyzers for the indexed values and query string is more appropriate.
For example, if handling STRING
values written in Swedish, it may be beneficial to select the swedish
analyzer, which knows how to tokenize Swedish words, and will avoid indexing Swedish stop words.
The db.index.fulltext.listAvailableAnalyzers()
procedure shows all available analyzers.
Neo4j also supports the use of custom analyzers. For more information, see the Java Reference Manual → Full-text index analyzer providers.
Configuration settings
The CREATE FULLTEXT INDEX
command takes an optional OPTIONS
clause, where the indexConfig
can be specified.
The following statement creates a full-text index using a parameter for nodes with the label Employee
or Manager
.
Creating and dropping indexes using parameters was introduced in Neo4j 5.16.
{
"name": "peerReviews"
}
OPTIONS
CREATE FULLTEXT INDEX $name FOR (n:Employee|Manager) ON EACH [n.peerReviews]
OPTIONS {
indexConfig: {
`fulltext.analyzer`: 'english', (1)
`fulltext.eventually_consistent`: true (2)
}
}
1 | The fulltext.analyzer setting can be used to configure an index-specific analyzer.
In this case, it is set to the english analyzer.
The possible values for the fulltext.analyzer setting can be listed with the db.index.fulltext.listAvailableAnalyzers procedure. |
2 | The fulltext.eventually_consistent setting, if set to true , will put the index in an eventually consistent update mode.
This means that updates will be applied in a background thread "as soon as possible", instead of during a transaction commit, which is true for other indexes. |
For more information on how to configure full-text indexes, refer to the Operations Manual → Indexes to support full-text search.
Query full-text indexes
Unlike search-performance indexes, full-text indexes are not automatically used by the Cypher® query planner.
To query a full-text index, use either the db.index.fulltext.queryNodes
or the db.index.fulltext.queryRelationships
procedure.
An index cannot be used while its state is POPULATING , which occurs immediately after it is created.
To check the state of a full-text index — whether it is ONLINE (usable) or POPULATING (still being built; the populationPercent column shows the progress of the index creation) — run the following command: SHOW FULLTEXT INDEXES .
|
This query uses the db.index.fulltext.queryNodes
to look for nils
in the previously created full-text index namesAndTeams
:
CALL db.index.fulltext.queryNodes("namesAndTeams", "nils") YIELD node, score
RETURN node.name, score
node.name | score |
---|---|
|
|
|
|
Rows: 2 |
Many full-text index analyzers (including Neo4j’s default analyzer) normalize tokens to lower case. Full-text indexes are therefore case-insensitive by default when used on Neo4j. |
The score
column represents how well the index thinks that the entry matches the given query string.
Thus, in addition to any exact matches, full-text indexes return approximate matches to a given query string.
This is possible because both the property values that are indexed, and the queries to the index, are processed through the analyzer such that the index can find data entities which do not exactly match the provided STRING
.
The score
results are always returned in descending score order, where the best matching result entry is put first.
This query uses the db.index.fulltext.queryRelationships
to query the previously created communications
full-text index for any message
containing "meeting":
CALL db.index.fulltext.queryRelationships("communications", "meeting") YIELD relationship, score
RETURN type(relationship), relationship.message, score
type(relationship) | relationship.message | score |
---|---|---|
|
|
|
Rows: 1 |
To only obtain exact matches, quote the STRING
you are searching for:
CALL db.index.fulltext.queryNodes("namesAndTeams", '"Nils-Erik"') YIELD node, score
RETURN node.name, score
node.name | score |
---|---|
|
|
Rows: 1 |
Query strings also support the use of the Lucene boolean operators (AND
, OR
, NOT
, +
, -
):
CALL db.index.fulltext.queryNodes("namesAndTeams", 'nils AND kernel') YIELD node, score
RETURN node.name, node.team, score
node.name | node.team | score |
---|---|---|
|
|
|
Rows: 1 |
It is possible to limit the search to specific properties, by prefixing <propertyName>:
to the query string.
CALL db.index.fulltext.queryNodes("namesAndTeams", 'team:"Operations"') YIELD node, score
RETURN node.name, node.team, score
node.name | node.team | score |
---|---|---|
|
|
|
|
|
|
Rows: 2 |
A complete description of the Lucene query syntax can be found in the Lucene documentation.
Lists of STRING
values
If the indexed property contains a list of STRING
values, each entry is analyzed independently and all produced tokens are associated to the same property name.
This means that when querying such an indexed node or relationship, there is a match if any of the list elements matches the query string.
For scoring purposes, the full-text index treats it as a single-property value, and the score will represent how close the query is to matching the entire list.
STRING
propertiesCALL db.index.fulltext.queryNodes('peerReviews', 'late') YIELD node, score
RETURN node.name, node.peerReviews, score
node.name | node.peerReviews | score |
---|---|---|
|
|
|
Show full-text indexes
To list all full-text indexes in a database, use the SHOW FULLTEXT INDEXES
command:
SHOW FULLTEXT INDEXES
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | id | name | state | populationPercent | type | entityType | labelsOrTypes | properties | indexProvider | owningConstraint | lastRead | readCount | +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | 4 | "communications" | "ONLINE" | 100.0 | "FULLTEXT" | "RELATIONSHIP" | ["REVIEWED", "EMAILED"] | ["message"] | "fulltext-1.0" | NULL | 2023-10-31T15:06:10.270Z | 2 | | 3 | "namesAndTeams" | "ONLINE" | 100.0 | "FULLTEXT" | "NODE" | ["Employee", "Manager"] | ["name", "team"] | "fulltext-1.0" | NULL | 2023-10-31T15:07:48.874Z | 5 | | 6 | "peerReviews" | "ONLINE" | 100.0 | "FULLTEXT" | "NODE" | ["Employee", "Manager"] | ["peerReviews"] | "fulltext-1.0" | NULL | 2023-10-31T15:09:05.391Z | 3 | +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
Similar to search-performance indexes, the SHOW
command can be filtered for particular columns:
SHOW FULLTEXT INDEXES WHERE name CONTAINS "Team"
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | id | name | state | populationPercent | type | entityType | labelsOrTypes | properties | indexProvider | owningConstraint | lastRead | readCount | +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | 5 | "namesAndTeams" | "ONLINE" | 100.0 | "FULLTEXT" | "NODE" | ["Employee", "Manager"] | ["name", "team"] | "fulltext-1.0" | NULL | NULL | 0 | +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
To return full index details, use the YIELD
clause.
For example:
SHOW FULLTEXT INDEXES YIELD *
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | id | name | state | populationPercent | type | entityType | labelsOrTypes | properties | indexProvider | owningConstraint | lastRead | readCount | trackedSince | options | failureMessage | createStatement | +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | 4 | "communications" | "ONLINE" | 100.0 | "FULLTEXT" | "RELATIONSHIP" | ["REVIEWED", "EMAILED"] | ["message"] | "fulltext-1.0" | NULL | NULL | 0 | 2023-11-01T09:27:57.024Z | {indexConfig: {`fulltext.analyzer`: "standard-no-stop-words", `fulltext.eventually_consistent`: FALSE}, indexProvider: "fulltext-1.0"} | "" | "CREATE FULLTEXT INDEX `communications` FOR ()-[r:`REVIEWED`|`EMAILED`]-() ON EACH [r.`message`] OPTIONS {indexConfig: {`fulltext.analyzer`: 'standard-no-stop-words',`fulltext.eventually_consistent`: false}, indexProvider: 'fulltext-1.0'}" | | 5 | "namesAndTeams" | "ONLINE" | 100.0 | "FULLTEXT" | "NODE" | ["Employee", "Manager"] | ["name", "team"] | "fulltext-1.0" | NULL | NULL | 0 | 2023-11-01T12:24:48.002Z | {indexConfig: {`fulltext.analyzer`: "standard-no-stop-words", `fulltext.eventually_consistent`: FALSE}, indexProvider: "fulltext-1.0"} | "" | "CREATE FULLTEXT INDEX `namesAndTeams` FOR (n:`Employee`|`Manager`) ON EACH [n.`name`, n.`team`] OPTIONS {indexConfig: {`fulltext.analyzer`: 'standard-no-stop-words',`fulltext.eventually_consistent`: false}, indexProvider: 'fulltext-1.0'}" | | 6 | "peerReviews" | "ONLINE" | 100.0 | "FULLTEXT" | "NODE" | ["Employee", "Manager"] | ["peerReviews"] | "fulltext-1.0" | NULL | NULL | 0 | 2023-11-01T12:25:41.495Z | {indexConfig: {`fulltext.analyzer`: "english", `fulltext.eventually_consistent`: TRUE}, indexProvider: "fulltext-1.0"} | "" | "CREATE FULLTEXT INDEX `peerReviews` FOR (n:`Employee`|`Manager`) ON EACH [n.`peerReviews`] OPTIONS {indexConfig: {`fulltext.analyzer`: 'english',`fulltext.eventually_consistent`: true}, indexProvider: 'fulltext-1.0'}" | +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
For a full description of all return columns, see Search-performance indexes → Result columns for listing indexes.
Drop full-text indexes
A full-text node index is dropped by using the same command as for other indexes, DROP INDEX
.
In the following example, the previously created communications
full-text index is deleted from the database:
DROP INDEX communications
As of Neo4j 5.16, the index name can also be given as a parameter when dropping an index: DROP INDEX $name
.
List of full-text index procedures
The procedures for full-text indexes are listed in the table below:
Usage | Procedure/Command | Description |
---|---|---|
Eventually consistent indexes. |
Wait for the updates from recently committed transactions to be applied to any eventually-consistent full-text indexes. |
|
List available analyzers. |
List the available analyzers that the full-text indexes can be configured with. |
|
Use full-text node index. |
Query the given full-text index. Returns the matching nodes and their Lucene query score, ordered by score. |
|
Use full-text relationship index. |
Query the given full-text index. Returns the matching relationships and their Lucene query score, ordered by score. |
Summary
-
Full-text indexes support the indexing of both nodes and relationships.
-
Full-text indexes include only property values of types
STRING
orLIST<STRING>
. -
Full-text indexes are accessed via Cypher procedures.
-
Full-text indexes return the score for each result from a query.
-
Full-text indexes support configuring custom analyzers, including analyzers that are not included with Lucene itself.
-
Full-text indexes can be queried using the Lucene query language.
-
Full-text indexes are kept up to date automatically, as nodes and relationships are added, removed, and modified.
-
Full-text indexes will automatically populate newly created indexes with the existing data in a store.
-
Full-text indexes can be checked by the consistency checker, and they can be rebuilt if there is a problem with them.
-
Newly created full-text indexes get automatically populated with the existing data in the database.
-
Full-text indexes can support any number of properties in a single index.
-
Full-text indexes are created, dropped, and updated transactionally, and are automatically replicated throughout a cluster.
-
Full-text indexes can be configured to be eventually consistent, in which index updating is moved from the commit path to a background thread. Using this feature, it is possible to work around the slow Lucene writes from the performance-critical commit process, thus removing the main bottlenecks for Neo4j write performance.