Indexes for full-text search

1. Introduction

Full-text indexes are powered by the Apache Lucene indexing and search library, and can be used to index nodes and relationships by string properties. A full-text index allows you to write queries that match within the contents of indexed string properties. For instance, the btree indexes described in previous sections can only do exact matching or prefix matches on strings. A full-text index will instead tokenize the indexed string values, so it can match terms anywhere within the strings. How the indexed strings are tokenized and broken into terms, is determined by what analyzer the full-text index is configured with. For instance, the swedish analyzer knows how to tokenize and stem Swedish words, and will avoid indexing Swedish stop words. The complete list of stop words for each analyzer is included in the result of the db.index.fulltext.listAvailableAnalyzers procedure.

Full-text indexes:

  • support the indexing of both nodes and relationships.

  • support configuring custom analyzers, including analyzers that are not included with Lucene itself.

  • can be queried using the Lucene query language.

  • can return the score for each result from a query.

  • are kept up to date automatically, as nodes and relationships are added, removed, and modified.

  • will automatically populate newly created indexes with the existing data in a store.

  • can be checked by the consistency checker, and they can be rebuilt if there is a problem with them.

  • are a projection of the store, and can only index nodes and relationships by the contents of their properties.

  • can support any number of documents in a single index.

  • are created, dropped, and updated transactionally, and is automatically replicated throughout a cluster.

  • can be accessed via Cypher procedures.

  • can be configured to be eventually consistent, in which index updating is moved from the commit path to a background thread. Using this feature, it is possible to work around the slow Lucene writes from the performance critical commit process, thus removing the main bottlenecks for Neo4j write performance.

At first sight, the construction of full-text indexes can seem similar to regular indexes. However there are some things that are interesting to note: In contrast to btree indexes, a full-text index

  • can be applied to more than one label.

  • can be applied to relationship types (one or more).

  • can be applied to more than one property at a time (similar to a composite index) but with an important difference: While a composite index applies only to entities that match the indexed label and all of the indexed properties, full-text index will index entities that have at least one of the indexed labels or relationship types, and at least one of the indexed properties.

For information on how to configure full-text indexes, refer to Operations Manual → Indexes to support full-text search.

2. Procedures to manage full-text indexes

Full-text indexes are managed through built-in procedures. The most common procedures are listed in the table below:

Usage Procedure Description

Create full-text node index

db.index.fulltext.createNodeIndex

Create a node fulltext index for the given labels and properties. The optional 'config' map parameter can be used to supply settings to the index. Supported settings are 'analyzer', for specifying what analyzer to use when indexing and querying. Use the db.index.fulltext.listAvailableAnalyzers procedure to see what options are available. And 'eventually_consistent' which can be set to 'true' to make this index eventually consistent, such that updates from committing transactions are applied in a background thread.

Create full-text relationship index

db.index.fulltext.createRelationshipIndex

Create a relationship fulltext index for the given relationship types and properties. The optional 'config' map parameter can be used to supply settings to the index. Supported settings are 'analyzer', for specifying what analyzer to use when indexing and querying. Use the db.index.fulltext.listAvailableAnalyzers procedure to see what options are available. And 'eventually_consistent' which can be set to 'true' to make this index eventually consistent, such that updates from committing transactions are applied in a background thread.

List available analyzers

db.index.fulltext.listAvailableAnalyzers

List the available analyzers that the full-text indexes can be configured with.

Use full-text node index

db.index.fulltext.queryNodes

Query the given full-text index. Returns the matching nodes and their Lucene query score, ordered by score.

Use full-text relationship index

db.index.fulltext.queryRelationships

Query the given full-text index. Returns the matching relationships and their Lucene query score, ordered by score.

Drop full-text index

db.index.fulltext.drop

Drop the specified index.

Eventually consistent indexes

db.index.fulltext.awaitEventuallyConsistentIndexRefresh

Wait for the updates from recently committed transactions to be applied to any eventually-consistent full-text indexes.

3. Create and configure full-text indexes

Full-text indexes are created with the db.index.fulltext.createNodeIndex and db.index.fulltext.createRelationshipIndex procedures. An index must be given a unique name when created, which is used to reference the specific index when querying or dropping it. A full-text index applies to a list of labels or a list of relationship types, for node and relationship indexes respectively, and then a list of property names.

For instance, if we have a movie with a title.

Query
CREATE (m:Movie { title: "The Matrix" })
RETURN m.title
Table 1. Result
m.title

"The Matrix"

1 row, Nodes created: 1
Properties set: 1
Labels added: 1

And we have a full-text index on the title and description properties of movies and books.

Query
CALL db.index.fulltext.createNodeIndex("titlesAndDescriptions",["Movie", "Book"],["title", "description"])

Then our movie node from above will be included in the index, even though it only has one of the indexed labels, and only one of the indexed properties:

Query
CALL db.index.fulltext.queryNodes("titlesAndDescriptions", "matrix") YIELD node, score
RETURN node.title, node.description, score
Table 2. Result
node.title node.description score

"The Matrix"

<null>

0.7799721956253052

1 row

The same is true for full-text indexes on relationships. Though a relationship can only have one type, a relationship full-text index can index multiple types, and all relationships will be included that match one of the relationship types, and at least one of the indexed properties.

The db.index.fulltext.createNodeIndex and db.index.fulltext.createRelationshipIndex procedures take an optional fourth argument, called config. The config parameter is a map from string to string, and can be used to set index-specific configuration settings. The analyzer setting can be used to configure an index-specific analyzer. The possible values for the analyzer setting can be listed with the db.index.fulltext.listAvailableAnalyzers procedure. The eventually_consistent setting, if set to "true", will put the index in an eventually consistent update mode. this means that updates will be applied in a background thread "as soon as possible", instead of during transaction commit like other indexes.

Query
CALL db.index.fulltext.createRelationshipIndex("taggedByRelationshipIndex",["TAGGED_AS"],["taggedByUser"], { analyzer: "url_or_email", eventually_consistent: "true" })

In this example, an eventually consistent relationship full-text index is created for the TAGGED_AS relationship type, and the taggedByUser property, and the index uses the url_or_email analyzer. This could, for instance, be a system where people are assigning tags to documents, and where the index on the taggedByUser property will allow them to quickly find all of the documents they have tagged. Had it not been for the relationship index, one would have had to add artificial connective nodes between the tags and the documents in the data model, just so these nodes could be indexed.

Table 3. Result

(empty result)

0 rows

4. Query full-text indexes

Full-text indexes will, in addition to any exact matches, also return approximate matches to a given query. Both the property values that are indexed, and the queries to the index, are processed through the analyzer such that the index can find that don’t exactly matches. The score that is returned alongside each result entry, represents how well the index thinks that entry matches the given query. The results are always returned in descending score order, where the best matching result entry is put first. To illustrate, in the example below, we search our movie database for "Full Metal Jacket", and even though there is an exact match as the first result, we also get three other less interesting results:

Query
CALL db.index.fulltext.queryNodes("titlesAndDescriptions", "Full Metal Jacket") YIELD node, score
RETURN node.title, score
Table 4. Result
node.title score

"Full Metal Jacket"

1.411118507385254

"Full Moon High"

0.44524085521698

"Yellow Jacket"

0.3509605824947357

"The Jacket"

0.3509605824947357

4 rows

Full-text indexes are powered by the Apache Lucene indexing and search library. This means that we can use Lucene’s full-text query language to express what we wish to search for. For instance, if we are only interested in exact matches, then we can quote the string we are searching for.

Query
CALL db.index.fulltext.queryNodes("titlesAndDescriptions", "\"Full Metal Jacket\"") YIELD node, score
RETURN node.title, score

When we put "Full Metal Jacket" in quotes, Lucene only gives us exact matches.

Table 5. Result
node.title score

"Full Metal Jacket"

1.411118507385254

1 row

Lucene also allows us to use logical operators, such as AND and OR, to search for terms:

Query
CALL db.index.fulltext.queryNodes("titlesAndDescriptions", 'full AND metal') YIELD node, score
RETURN node.title, score

Only the "Full Metal Jacket" movie in our database has both the words "full" and "metal".

Table 6. Result
node.title score

"Full Metal Jacket"

1.1113792657852173

1 row

It is also possible to search for only specific properties, by putting the property name and a colon in front of the text being searched for.

Query
CALL db.index.fulltext.queryNodes("titlesAndDescriptions", 'description:"surreal adventure"') YIELD node, score
RETURN node.title, node.description, score
Table 7. Result
node.title node.description score

"Metallica Through The Never"

"The movie follows the young roadie Trip through his surreal adventure with the band."

0.2615291476249695

1 row

A complete description of the Lucene query syntax can be found in the Lucene documentation.

5. Drop full-text indexes

A full-text node index is dropped by using the procedure db.index.fulltext.drop.

In the following example, we will drop the taggedByRelationshipIndex that we created previously:

Query
CALL db.index.fulltext.drop("taggedByRelationshipIndex")
Table 8. Result

(empty result)

0 rows