Full-text search index

This chapter describes how to use full-text indexes, to enable full-text search.

Full-text indexes are powered by the Apache Lucene indexing and search library, and can be used to index nodes and relationships by string properties. A full-text index allows you to write queries that match within the contents of indexed string properties. For instance, the b-tree indexes described in previous sections can only do exact matching or prefix matches on strings. A full-text index will instead tokenize the indexed string values, so it can match terms anywhere within the strings. How the indexed strings are tokenized and broken into terms, is determined by what analyzer the full-text index is configured with. For instance, the swedish analyzer knows how to tokenize and stem Swedish words, and will avoid indexing Swedish stop words. The complete list of stop words for each analyzer is included in the result of the db.index.fulltext.listAvailableAnalyzers procedure.

Full-text indexes:

  • support the indexing of both nodes and relationships.

  • support configuring custom analyzers, including analyzers that are not included with Lucene itself.

  • can be queried using the Lucene query language.

  • can return the score for each result from a query.

  • are kept up to date automatically, as nodes and relationships are added, removed, and modified.

  • will automatically populate newly created indexes with the existing data in a store.

  • can be checked by the consistency checker, and they can be rebuilt if there is a problem with them.

  • are a projection of the store, and can only index nodes and relationships by the contents of their properties.

  • can support any number of documents in a single index.

  • are created, dropped, and updated transactionally, and is automatically replicated throughout a cluster.

  • can be accessed via Cypher procedures.

  • can be configured to be eventually consistent, in which index updating is moved from the commit path to a background thread. Using this feature, it is possible to work around the slow Lucene writes from the performance critical commit process, thus removing the main bottlenecks for Neo4j write performance.

At first sight, the construction of full-text indexes can seem similar to regular indexes. However there are some things that are interesting to note: In contrast to b-tree indexes, a full-text index can be:

  • applied to more than one label.

  • applied to more than one relationship type.

  • applied to more than one property at a time (similar to a composite index) but with an important difference: While a composite index applies only to entities that match the indexed label and all of the indexed properties, full-text index will index entities that have at least one of the indexed labels or relationship types, and at least one of the indexed properties.

For information on how to configure full-text indexes, refer to Operations Manual → Indexes to support full-text search.

1. Full-text search procedures

Full-text indexes are managed through commands and used through built-in procedures, see Operations Manual → Procedures for a complete reference.

The commands and procedures for full-text indexes are listed in the table below:

Usage Procedure/Command Description

Create full-text node index

CREATE FULLTEXT INDEX …​

Create a node fulltext index for the given labels and properties. The optional 'options' map can be used to supply provider and settings to the index. Supported settings are 'fulltext.analyzer', for specifying what analyzer to use when indexing and querying. Use the db.index.fulltext.listAvailableAnalyzers procedure to see what options are available. And 'fulltext.eventually_consistent' which can be set to 'true' to make this index eventually consistent, such that updates from committing transactions are applied in a background thread.

Create full-text relationship index

CREATE FULLTEXT INDEX …​

Create a relationship fulltext index for the given relationship types and properties. The optional 'options' map can be used to supply provider and settings to the index. Supported settings are 'fulltext.analyzer', for specifying what analyzer to use when indexing and querying. Use the db.index.fulltext.listAvailableAnalyzers procedure to see what options are available. And 'fulltext.eventually_consistent' which can be set to 'true' to make this index eventually consistent, such that updates from committing transactions are applied in a background thread.

List available analyzers

db.index.fulltext.listAvailableAnalyzers

List the available analyzers that the full-text indexes can be configured with.

Use full-text node index

db.index.fulltext.queryNodes

Query the given full-text index. Returns the matching nodes and their Lucene query score, ordered by score.

Use full-text relationship index

db.index.fulltext.queryRelationships

Query the given full-text index. Returns the matching relationships and their Lucene query score, ordered by score.

Drop full-text index

DROP INDEX …​

Drop the specified index.

Eventually consistent indexes

db.index.fulltext.awaitEventuallyConsistentIndexRefresh

Wait for the updates from recently committed transactions to be applied to any eventually-consistent full-text indexes.

Listing all fulltext indexes

SHOW FULLTEXT INDEXES

Lists all fulltext indexes, see the SHOW INDEXES command for details.

2. Create and configure full-text indexes

Full-text indexes are created with the CREATE FULLTEXT INDEX command. An index can be given a unique name when created (or get a generated one), which is used to reference the specific index when querying or dropping it. A full-text index applies to a list of labels or a list of relationship types, for node and relationship indexes respectively, and then a list of property names.

Table 1. Syntax for creating fulltext indexes
Command Description Comment
CREATE FULLTEXT INDEX [index_name] [IF NOT EXISTS]
FOR (n:LabelName[|...])
ON EACH "[" n.propertyName[, ...] "]"
[OPTIONS "{" option: value[, ...] "}"]

Create a fulltext index on nodes.

Best practice is to give the index a name when it is created. This name is needed for both dropping and querying the index. If the index is not explicitly named, it will get an auto-generated name.

The index name must be unique among all indexes and constraints.

Index provider and configuration can be specified using the OPTIONS clause.

The command is optionally idempotent, with the default behavior to throw an error if you attempt to create the same index twice. With IF NOT EXISTS, no error is thrown and nothing happens should an index with the same name, schema or both already exist. It may still throw an error should a constraint with the same name exist.

CREATE FULLTEXT INDEX [index_name] [IF NOT EXISTS]
FOR ()-"["r:TYPE_NAME[|...]"]"-()
ON EACH "[" r.propertyName[, ...] "]"
[OPTIONS "{" option: value[, ...] "}"]

Create a fulltext index on relationships.

For instance, if we have a movie with a title.

Query
CREATE (m:Movie {title: "The Matrix"}) RETURN m.title
Table 2. Result
m.title

"The Matrix"

Rows: 1
Nodes created: 1
Properties set: 1
Labels added: 1

And we have a full-text index on the title and description properties of movies and books.

Query
CREATE FULLTEXT INDEX titlesAndDescriptions FOR (n:Movie|Book) ON EACH [n.title, n.description]

Then our movie node from above will be included in the index, even though it only has one of the indexed labels, and only one of the indexed properties:

Query
CALL db.index.fulltext.queryNodes("titlesAndDescriptions", "matrix") YIELD node, score
RETURN node.title, node.description, score
Table 3. Result
node.title node.description score

"The Matrix"

<null>

0.7799721956253052

Rows: 1

The same is true for full-text indexes on relationships. Though a relationship can only have one type, a relationship full-text index can index multiple types, and all relationships will be included that match one of the relationship types, and at least one of the indexed properties.

The CREATE FULLTEXT INDEX command take an optional clause, called options. This have two parts, the indexProvider and indexConfig. The provider can only have the default value, 'fulltext-1.0'. The indexConfig is a map from string to string and booleans, and can be used to set index-specific configuration settings. The fulltext.analyzer setting can be used to configure an index-specific analyzer. The possible values for the fulltext.analyzer setting can be listed with the db.index.fulltext.listAvailableAnalyzers procedure. The fulltext.eventually_consistent setting, if set to true, will put the index in an eventually consistent update mode. This means that updates will be applied in a background thread "as soon as possible", instead of during transaction commit like other indexes.

Query
CREATE FULLTEXT INDEX taggedByRelationshipIndex FOR ()-[r:TAGGED_AS]-() ON EACH [r.taggedByUser] OPTIONS {indexConfig: {`fulltext.analyzer`: 'url_or_email', `fulltext.eventually_consistent`: true}}

In this example, an eventually consistent relationship full-text index is created for the TAGGED_AS relationship type, and the taggedByUser property, and the index uses the url_or_email analyzer. This could, for instance, be a system where people are assigning tags to documents, and where the index on the taggedByUser property will allow them to quickly find all of the documents they have tagged. Had it not been for the relationship index, one would have had to add artificial connective nodes between the tags and the documents in the data model, just so these nodes could be indexed.

Table 4. Result

(empty result)

Rows: 0
Indexes added: 1

3. Query full-text indexes

Full-text indexes will, in addition to any exact matches, also return approximate matches to a given query. Both the property values that are indexed, and the queries to the index, are processed through the analyzer such that the index can find that don’t exactly matches. The score that is returned alongside each result entry, represents how well the index thinks that entry matches the given query. The results are always returned in descending score order, where the best matching result entry is put first. To illustrate, in the example below, we search our movie database for "Full Metal Jacket", and even though there is an exact match as the first result, we also get three other less interesting results:

Query
CALL db.index.fulltext.queryNodes("titlesAndDescriptions", "Full Metal Jacket") YIELD node, score
RETURN node.title, score
Table 5. Result
node.title score

"Full Metal Jacket"

1.411118507385254

"Full Moon High"

0.44524085521698

"Yellow Jacket"

0.3509605824947357

"The Jacket"

0.3509605824947357

Rows: 4

Full-text indexes are powered by the Apache Lucene indexing and search library. This means that we can use Lucene’s full-text query language to express what we wish to search for. For instance, if we are only interested in exact matches, then we can quote the string we are searching for.

Query
CALL db.index.fulltext.queryNodes("titlesAndDescriptions", '"Full Metal Jacket"') YIELD node, score
RETURN node.title, score

When we put "Full Metal Jacket" in quotes, Lucene only gives us exact matches.

Table 6. Result
node.title score

"Full Metal Jacket"

1.411118507385254

Rows: 1

Lucene also allows us to use logical operators, such as AND and OR, to search for terms:

Query
CALL db.index.fulltext.queryNodes("titlesAndDescriptions", 'full AND metal') YIELD node, score
RETURN node.title, score

Only the Full Metal Jacket movie in our database has both the words full and metal.

Table 7. Result
node.title score

"Full Metal Jacket"

1.1113792657852173

Rows: 1

It is also possible to search for only specific properties, by putting the property name and a colon in front of the text being searched for.

Query
CALL db.index.fulltext.queryNodes("titlesAndDescriptions", 'description:"surreal adventure"') YIELD node, score
RETURN node.title, node.description, score
Table 8. Result
node.title node.description score

"Metallica Through The Never"

"The movie follows the young roadie Trip through his surreal adventure with the band."

0.2615291476249695

Rows: 1

A complete description of the Lucene query syntax can be found in the Lucene documentation.

4. Drop full-text indexes

A full-text node index is dropped by using the same command as for other indexes, DROP INDEX.

In the following example, we will drop the taggedByRelationshipIndex that we created previously:

Query
DROP INDEX taggedByRelationshipIndex
Table 9. Result

(empty result)

Rows: 0
Indexes removed: 1