Indexes for full-text search
Introduction
Full-text indexes are powered by the Apache Lucene indexing and search library, and can be used to index nodes and relationships by string properties.
A full-text index allows you to write queries that match within the contents of indexed string properties.
For instance, the btree indexes described in previous sections can only do exact matching or prefix matches on strings.
A full-text index will instead tokenize the indexed string values, so it can match terms anywhere within the strings.
How the indexed strings are tokenized and broken into terms, is determined by what analyzer the full-text index is configured with.
For instance, the swedish analyzer knows how to tokenize and stem Swedish words, and will avoid indexing Swedish stop words.
The complete list of stop words for each analyzer is included in the result of the db.index.fulltext.listAvailableAnalyzers
procedure.
Full-text indexes:
-
support the indexing of both nodes and relationships.
-
support configuring custom analyzers, including analyzers that are not included with Lucene itself.
-
can be queried using the Lucene query language.
-
can return the score for each result from a query.
-
are kept up to date automatically, as nodes and relationships are added, removed, and modified.
-
will automatically populate newly created indexes with the existing data in a store.
-
can be checked by the consistency checker, and they can be rebuilt if there is a problem with them.
-
are a projection of the store, and can only index nodes and relationships by the contents of their properties.
-
can support any number of documents in a single index.
-
are created, dropped, and updated transactionally, and is automatically replicated throughout a cluster.
-
can be accessed via Cypher® procedures.
-
can be configured to be eventually consistent, in which index updating is moved from the commit path to a background thread. Using this feature, it is possible to work around the slow Lucene writes from the performance critical commit process, thus removing the main bottlenecks for Neo4j write performance.
At first sight, the construction of full-text indexes can seem similar to regular indexes. However there are some things that are interesting to note: In contrast to btree indexes, a full-text index
-
can be applied to more than one label.
-
can be applied to relationship types (one or more).
-
can be applied to more than one property at a time (similar to a composite index) but with an important difference: While a composite index applies only to entities that match the indexed label and all of the indexed properties, full-text index will index entities that have at least one of the indexed labels or relationship types, and at least one of the indexed properties.
For information on how to configure full-text indexes, refer to Operations Manual → Indexes to support full-text search.
Procedures to manage full-text indexes
Full-text indexes are managed through built-in procedures, see Operations Manual → Procedures for a complete reference.
The procedures for managing full-text indexes are listed in the table below:
Usage | Procedure | Description |
---|---|---|
Create full-text node index |
|
Create a node fulltext index for the given labels and properties. The optional 'config' map parameter can be used to supply settings to the index. Supported settings are 'analyzer', for specifying what analyzer to use when indexing and querying. Use the |
Create full-text relationship index |
|
Create a relationship fulltext index for the given relationship types and properties. The optional 'config' map parameter can be used to supply settings to the index. Supported settings are 'analyzer', for specifying what analyzer to use when indexing and querying. Use the |
List available analyzers |
|
List the available analyzers that the full-text indexes can be configured with. |
Use full-text node index |
|
Query the given full-text index. Returns the matching nodes and their Lucene query score, ordered by score. |
Use full-text relationship index |
|
Query the given full-text index. Returns the matching relationships and their Lucene query score, ordered by score. |
Drop full-text index |
|
Drop the specified index. |
Eventually consistent indexes |
|
Wait for the updates from recently committed transactions to be applied to any eventually-consistent full-text indexes. |
Create and configure full-text indexes
Full-text indexes are created with the db.index.fulltext.createNodeIndex
and db.index.fulltext.createRelationshipIndex
procedures.
An index must be given a unique name when created, which is used to reference the specific index when querying or dropping it.
A full-text index applies to a list of labels or a list of relationship types, for node and relationship indexes respectively, and then a list of property names.
For instance, if we have a movie with a title.
CREATE (m:Movie {title: "The Matrix"}) RETURN m.title
m.title |
---|
|
Rows: 1 |
And we have a full-text index on the title
and description
properties of movies and books.
CALL db.index.fulltext.createNodeIndex("titlesAndDescriptions", ["Movie", "Book"], ["title", "description"])
Then our movie node from above will be included in the index, even though it only has one of the indexed labels, and only one of the indexed properties:
CALL db.index.fulltext.queryNodes("titlesAndDescriptions", "matrix") YIELD node, score
RETURN node.title, node.description, score
node.title | node.description | score |
---|---|---|
|
|
|
Rows: 1 |
The same is true for full-text indexes on relationships. Though a relationship can only have one type, a relationship full-text index can index multiple types, and all relationships will be included that match one of the relationship types, and at least one of the indexed properties.
The db.index.fulltext.createNodeIndex
and db.index.fulltext.createRelationshipIndex
procedures take an optional fourth argument, called config
.
The config
parameter is a map from string to string, and can be used to set index-specific configuration settings.
The analyzer
setting can be used to configure an index-specific analyzer.
The possible values for the analyzer
setting can be listed with the db.index.fulltext.listAvailableAnalyzers
procedure.
The eventually_consistent
setting, if set to "true"
, will put the index in an eventually consistent update mode.
This means that updates will be applied in a background thread "as soon as possible", instead of during transaction commit like other indexes.
CALL db.index.fulltext.createRelationshipIndex("taggedByRelationshipIndex", ["TAGGED_AS"], ["taggedByUser"], {analyzer: "url_or_email", eventually_consistent: "true"})
In this example, an eventually consistent relationship full-text index is created for the TAGGED_AS
relationship type, and the taggedByUser
property, and the index uses the url_or_email
analyzer.
This could, for instance, be a system where people are assigning tags to documents, and where the index on the taggedByUser
property will allow them to quickly find all of the documents they have tagged.
Had it not been for the relationship index, one would have had to add artificial connective nodes between the tags and the documents in the data model, just so these nodes could be indexed.
|
Rows: 0 |
Query full-text indexes
Full-text indexes will, in addition to any exact matches, also return approximate matches to a given query.
Both the property values that are indexed, and the queries to the index, are processed through the analyzer such that the index can find that don’t exactly matches.
The score
that is returned alongside each result entry, represents how well the index thinks that entry matches the given query.
The results are always returned in descending score order, where the best matching result entry is put first.
To illustrate, in the example below, we search our movie database for "Full Metal Jacket"
, and even though there is an exact match as the first result, we also get three other less interesting results:
CALL db.index.fulltext.queryNodes("titlesAndDescriptions", "Full Metal Jacket") YIELD node, score
RETURN node.title, score
node.title | score |
---|---|
|
|
|
|
|
|
|
|
Rows: 4 |
Full-text indexes are powered by the Apache Lucene indexing and search library. This means that we can use Lucene’s full-text query language to express what we wish to search for. For instance, if we are only interested in exact matches, then we can quote the string we are searching for.
CALL db.index.fulltext.queryNodes("titlesAndDescriptions", '"Full Metal Jacket"') YIELD node, score
RETURN node.title, score
When we put "Full Metal Jacket" in quotes, Lucene only gives us exact matches.
node.title | score |
---|---|
|
|
Rows: 1 |
Lucene also allows us to use logical operators, such as AND
and OR
, to search for terms:
CALL db.index.fulltext.queryNodes("titlesAndDescriptions", 'full AND metal') YIELD node, score
RETURN node.title, score
Only the Full Metal Jacket
movie in our database has both the words full
and metal
.
node.title | score |
---|---|
|
|
Rows: 1 |
It is also possible to search for only specific properties, by putting the property name and a colon in front of the text being searched for.
CALL db.index.fulltext.queryNodes("titlesAndDescriptions", 'description:"surreal adventure"') YIELD node, score
RETURN node.title, node.description, score
node.title | node.description | score |
---|---|---|
|
|
|
Rows: 1 |
A complete description of the Lucene query syntax can be found in the Lucene documentation.
Drop full-text indexes
A full-text node index is dropped by using the procedure db.index.fulltext.drop
.
In the following example, we will drop the taggedByRelationshipIndex
that we created previously:
CALL db.index.fulltext.drop("taggedByRelationshipIndex")
|
Rows: 0 |