11.2.3. Fulltext schema indexes

This section describes how to enable fulltext search, by using fulltext schema indexes.

This section describes the following:

11.2.3.1. Introduction

Fulltext schema indexes are powered by the Apache Lucene indexing and search library. A fulltext schema index enables you to write queries that matches within the contents of indexed string properties. A full description on how to create and use fulltext schema indexes is provided in the Cypher Manual → Fulltext schema index.

11.2.3.2. Configuration

The following options are available for configuring fulltext schema indexes:

dbms.index.fulltext.default_analyzer
The name of the analyzer that the fulltext schema indexes should use by default. This setting only has effect when a fulltext schema index is created, and will be remembered as an index-specific setting from then on. The list of possible analyzers is available through the db.index.fulltext.listAvailableAnalyzers() Cypher procedure. Unless otherwise specific, the default analyzer is standard, which is the same as the StandardAnalyzer from Lucene.
dbms.index.fulltext.eventually_consistent
Whether or not fulltext schema indexes should be eventually consistent, or not. This setting only has effect when a fulltext schema index is created, and will be remembered as an index-specific setting from then on. Schema indexes are normally fully consistent, and the committing of a transaction does not return until both the store and the indexes have been updated. Eventually consistent fulltext schema indexes, on the other hand, are not updated as part of commit, but instead have their updates queued up and applied in a background thread. This means that there can be a short delay between committing a change, and that change becoming visible via any eventually consistent fulltext schema indexes. This delay is just an artifact of the queueing, and will usually be quite small since eventually consistent indexes are updated "as soon as possible". By default, this is turned off, and fulltext schema indexes are fully consistent.
dbms.index.fulltext.eventually_consistent_index_update_queue_max_length
Eventually consistent fulltext schema indexes have their updates queued up and applied in a background thread, and this setting determines the maximum size of that update queue. If the maximum queue size is reached, then committing transactions will block and wait until there is more room in the queue, before adding more updates to it. This setting applies to all eventually consistent fulltext schema indexes, and they all use the same queue. The maximum queue length must be at least 1 index update, and must be no more than 50 million due to heap space usage considerations. The default maximum queue length is 10.000 index updates.

When Neo4j is deployed in Causal Cluster configurations, it is recommended that all cluster members have identical dbms.index.fulltext.* settings in their neo4j.conf files. This ensures that the indexes always behave predictably, when the cluster switches leader, or when members perform store copies.

11.2.3.3. Deprecation of explicit indexes

Fulltext indexes have previously been supported in Neo4j via the deprected explicit indexes, but with some limitations that the fulltext schema indexes solve. This section outlines some of the similarities and differences in the two fulltext indexing implementations:

  • Both schema and explicit fulltext indexes support indexing of both nodes and relationships.
  • Both schema and explicit fulltext indexes support configuring custom analyzers, including analyzers that are not included with Lucene itself.
  • Both schema and explicit fulltext indexes can be queried using the Lucene query language.
  • Both schema and explicit fulltext indexes can return the score for each result from a query.
  • The fulltext schema indexes are kept up to date automatically, as nodes and relationships are added, removed, and modified. The explicit auto indexes can do this as well, except it can get confused by id and space re-use, and produce wrong results from queries as a consequence. This is not a problem for the new fulltext schema indexes.
  • The fulltext schema indexes will automatically populate newly created indexes with the existing data in a store. The explicit auto indexes do no such thing when they are enabled, and they will miss updates that occur while they are temporarily disabled or misconfigured.
  • The fulltext schema indexes can be checked by the consistency checker, and they can be rebuilt if there is a problem with them. The explicit indexes are ignored by the consistency checker, and they cannot be automatically rebuilt if they develop any issues.
  • The explicit indexes can be used to index by keys and values that are not actually in the store, so for instance if you want to index a node by the contents of a book without assigning it to the node as a property value, you can do that. The fulltext schema indexes are a projection of the store, and can only index nodes and relationships by the contents of their properties.
  • The explicit indexes suffer from the Lucene limitation of only supporting up to at most 2 billion documents in a single index. The fulltext schema indexes have no such limitation.
  • The explicit indexes interact poorly with a Causal Cluster. For instance, the fact that a new explicit index has been created can only be communicated from the leader to the rest of the cluster via a store copy. The fulltext schema indexes are created, dropped, and updated transactionally, and is replicated throughout a cluster automatically.
  • The explicit indexes can be accessed via dedicated REST end-points and Java APIs, as well as Cypher procedures. The fulltext schema indexes can only be accessed via Cypher procedures.
  • The fulltext schema indexes can be configured to be eventually consistent, in which index updating is moved from the commit path to a background thread. This removes the slow Lucene writes from the performance critical commit process, which has historically been among the main bottlenecks for Neo4j write performance. This is not possible to do with explicit indexes.