Chapter 6. Manual indexing

This chapter focuses on how to use the Explicit Indexes. As of Neo4j 2.0, this is not the favored method of indexing data in Neo4j, instead we recommend defining indexes in the database schema.

However, support for explicit indexes remains, because certain features, such as full-text search, are not yet handled by the new indexes.

6.1. Introduction

Explicit indexing operations are part of the Neo4j index API.

Each index is tied to a unique, user-specified name (for example "first_name" or "books") and can index either nodes or relationships.

The default index implementation is provided by the neo4j-lucene-index component, which is included in the standard Neo4j download. It can also be downloaded separately from . For Maven users, the neo4j-lucene-index component has the coordinates org.neo4j:neo4j-lucene-index and should be used with the same version of org.neo4j:neo4j-kernel. Different versions of the index and kernel components are not compatible in the general case. Both components are included transitively by the org.neo4j:neo4j:pom artifact which makes it simple to keep the versions in sync.


All modifying index operations must be performed inside a transaction, as with any modifying operation in Neo4j.

6.2. Create

An index is created if it doesn’t exist when you ask for it. Unless you give it a custom configuration, it will be created with default configuration and backend.

To set the stage for our examples, let’s create some indexes to begin with:

IndexManager index = graphDb.index();
Index<Node> actors = index.forNodes( "actors" );
Index<Node> movies = index.forNodes( "movies" );
RelationshipIndex roles = index.forRelationships( "roles" );

This will create two node indexes and one relationship index with default configuration. See Section 6.8, “Relationship indexes” for more information specific to relationship indexes.

See Section 6.10, “Configuration and fulltext indexes” for how to create fulltext indexes.

You can also check if an index exists like this:

IndexManager index = graphDb.index();
boolean indexExists = index.existsForNodes( "actors" );

6.3. Delete

Indexes can be deleted. When deleting, the entire contents of the index will be removed as well as its associated configuration. An index can be created with the same name at a later point in time.

IndexManager index = graphDb.index();
Index<Node> actors = index.forNodes( "actors" );

Note that the actual deletion of the index is made during the commit of the surrounding transaction. Calls made to such an index instance after delete() has been called are invalid inside that transaction as well as outside (if the transaction is successful), but will become valid again if the transaction is rolled back.

6.4. Add

Each index supports associating any number of key-value pairs with any number of entities (nodes or relationships), where each association between entity and key-value pair is performed individually. To begin with, let’s add a few nodes to the indexes:

// Actors
Node reeves = graphDb.createNode();
reeves.setProperty( "name", "Keanu Reeves" );
actors.add( reeves, "name", reeves.getProperty( "name" ) );
Node bellucci = graphDb.createNode();
bellucci.setProperty( "name", "Monica Bellucci" );
actors.add( bellucci, "name", bellucci.getProperty( "name" ) );
// multiple values for a field, in this case for search only
// and not stored as a property.
actors.add( bellucci, "name", "La Bellucci" );
// Movies
Node theMatrix = graphDb.createNode();
theMatrix.setProperty( "title", "The Matrix" );
theMatrix.setProperty( "year", 1999 );
movies.add( theMatrix, "title", theMatrix.getProperty( "title" ) );
movies.add( theMatrix, "year", theMatrix.getProperty( "year" ) );
Node theMatrixReloaded = graphDb.createNode();
theMatrixReloaded.setProperty( "title", "The Matrix Reloaded" );
theMatrixReloaded.setProperty( "year", 2003 );
movies.add( theMatrixReloaded, "title", theMatrixReloaded.getProperty( "title" ) );
movies.add( theMatrixReloaded, "year", 2003 );
Node malena = graphDb.createNode();
malena.setProperty( "title", "Malèna" );
malena.setProperty( "year", 2000 );
movies.add( malena, "title", malena.getProperty( "title" ) );
movies.add( malena, "year", malena.getProperty( "year" ) );

Note that there can be multiple values associated with the same entity and key.

Next up, we’ll create relationships and index them as well:

// we need a relationship type
RelationshipType ACTS_IN = RelationshipType.withName( "ACTS_IN" );
// create relationships
Relationship role1 = reeves.createRelationshipTo( theMatrix, ACTS_IN );
role1.setProperty( "name", "Neo" );
roles.add( role1, "name", role1.getProperty( "name" ) );
Relationship role2 = reeves.createRelationshipTo( theMatrixReloaded, ACTS_IN );
role2.setProperty( "name", "Neo" );
roles.add( role2, "name", role2.getProperty( "name" ) );
Relationship role3 = bellucci.createRelationshipTo( theMatrixReloaded, ACTS_IN );
role3.setProperty( "name", "Persephone" );
roles.add( role3, "name", role3.getProperty( "name" ) );
Relationship role4 = bellucci.createRelationshipTo( malena, ACTS_IN );
role4.setProperty( "name", "Malèna Scordia" );
roles.add( role4, "name", role4.getProperty( "name" ) );

After these operations, our example graph looks like this:

Figure 6.1. Movie and Actor Graph

6.5. Remove

Removing from an index is similar to adding, but can be done by supplying one of the following combinations of arguments:

  • entity
  • entity, key
  • entity, key, value
// completely remove bellucci from the actors index
actors.remove( bellucci );
// remove any "name" entry of bellucci from the actors index
actors.remove( bellucci, "name" );
// remove the "name" -> "La Bellucci" entry of bellucci
actors.remove( bellucci, "name", "La Bellucci" );

6.6. Update

To update an index entry, the old one must be removed and a new one added. For details on removing index entries, see Section 6.5, “Remove”.

Remember that a node or relationship can be associated with any number of key-value pairs in an index. This means that you can index a node or relationship with many key-value pairs that have the same key. In the case where a property value changes and you’d like to update the index, it’s not enough to just index the new value — you’ll have to remove the old value as well.

Here’s a code example that demonstrates how it’s done:

// create a node with a property
// so we have something to update later on
Node fishburn = graphDb.createNode();
fishburn.setProperty( "name", "Fishburn" );
// index it
actors.add( fishburn, "name", fishburn.getProperty( "name" ) );
// update the index entry
// when the property value changes
actors.remove( fishburn, "name", fishburn.getProperty( "name" ) );
fishburn.setProperty( "name", "Laurence Fishburn" );
actors.add( fishburn, "name", fishburn.getProperty( "name" ) );

6.8. Relationship indexes

An index for relationships is just like an index for nodes, extended by providing support to constrain a search to relationships with a specific start and/or end node. These extra methods reside in the RelationshipIndex interface which extends Index<Relationship>.

Example of querying a relationship index:

// find relationships filtering on start node
// using exact matches
IndexHits<Relationship> reevesAsNeoHits;
reevesAsNeoHits = roles.get( "name", "Neo", reeves, null );
Relationship reevesAsNeo = reevesAsNeoHits.iterator().next();
// find relationships filtering on end node
// using a query
IndexHits<Relationship> matrixNeoHits;
matrixNeoHits = roles.query( "name", "*eo", null, theMatrix );
Relationship matrixNeo = matrixNeoHits.iterator().next();

And here’s an example for the special case of searching for a specific relationship type:

// find relationships filtering on end node
// using a relationship type.
// this is how to add it to the index:
roles.add( reevesAsNeo, "type", reevesAsNeo.getType().name() );
// Note that to use a compound query, we can't combine committed
// and uncommitted index entries, so we'll commit before querying:

// and now we can search for it:
try ( Transaction tx = graphDb.beginTx() )
    IndexHits<Relationship> typeHits = roles.query( "type:ACTS_IN AND name:Neo", null, theMatrix );
    Relationship typeNeo = typeHits.iterator().next();

Such an index can be useful if your domain has nodes with a very large number of relationships between them, since it reduces the search time for a relationship between two nodes. A good example where this approach pays dividends is in time series data, where we have readings represented as a relationship per occurrence.

6.9. Scores

The IndexHits interface exposes scoring so that the index can communicate scores for the hits. Note that the result is not sorted by the score unless you explicitly specify that. See Section 6.11.2, “Sorting” for how to sort by score.

IndexHits<Node> hits = movies.query( "title", "The*" );
for ( Node movie : hits )
    System.out.println( movie.getProperty( "title" ) + " " + hits.currentScore() );

6.10. Configuration and fulltext indexes

At the time of creation extra configuration can be specified to control the behavior of the index and which backend to use. For example to create a Lucene fulltext index:

IndexManager index = graphDb.index();
Index<Node> fulltextMovies = index.forNodes( "movies-fulltext",
        MapUtil.stringMap( IndexManager.PROVIDER, "lucene", "type", "fulltext" ) );
fulltextMovies.add( theMatrix, "title", "The Matrix" );
fulltextMovies.add( theMatrixReloaded, "title", "The Matrix Reloaded" );
// search in the fulltext index
Node found = fulltextMovies.query( "title", "reloAdEd" ).getSingle();

Here’s an example of how to create an exact index which is case-insensitive:

Index<Node> index = graphDb.index().forNodes( "exact-case-insensitive",
        MapUtil.stringMap( "type", "exact", "to_lower_case", "true" ) );
Node node = graphDb.createNode();
index.add( node, "name", "Thomas Anderson" );
assertContains( index.query( "name", "\"Thomas Anderson\"" ), node );
assertContains( index.query( "name", "\"thoMas ANDerson\"" ), node );

In order to search for tokenized words, the query method has to be used. The get method will only match the full string value, not the tokens.

The configuration of the index is persisted once the index has been created. The provider configuration key is interpreted by Neo4j, but any other configuration is passed onto the backend index (e.g. Lucene) to interpret.

Table 6.1. Lucene indexing configuration parameters
Parameter Possible values Effect


exact, fulltext

exact is the default and uses a Lucene keyword analyzer. fulltext uses a white-space tokenizer in its analyzer.


true, false

This parameter goes together with type: fulltext and converts values to lower case during both additions and querying, making the index case insensitive. Defaults to true.


the full class name of an Analyzer

Overrides the type so that a custom analyzer can be used. Note: to_lower_case still affects lowercasing of string queries. If the custom analyzer uppercases the indexed tokens, string queries will not match as expected.

6.11. Extra features for Lucene indexes

6.11.1. Numeric ranges

Lucene supports smart indexing of numbers, querying for ranges and sorting such results, and so does its backend for Neo4j. To mark a value so that it is indexed as a numeric value, we can make use of the ValueContext class, like this:

movies.add( theMatrix, "year-numeric", new ValueContext( 1999 ).indexNumeric() );
movies.add( theMatrixReloaded, "year-numeric", new ValueContext( 2003 ).indexNumeric() );
movies.add( malena, "year-numeric", new ValueContext( 2000 ).indexNumeric() );

int from = 1997;
int to = 1999;
hits = movies.query( QueryContext.numericRange( "year-numeric", from, to ) );

The same type must be used for indexing and querying. That is, you can’t index a value as a Long and then query the index using an Integer.

By giving null as from/to argument, an open ended query is created. In the following example we are doing that, and have added sorting to the query as well:

hits = movies.query(
        QueryContext.numericRange( "year-numeric", from, null )
                .sortNumeric( "year-numeric", false ) );

From/to in the ranges defaults to be inclusive, but you can change this behavior by using two extra parameters:

movies.add( theMatrix, "score", new ValueContext( 8.7 ).indexNumeric() );
movies.add( theMatrixReloaded, "score", new ValueContext( 7.1 ).indexNumeric() );
movies.add( malena, "score", new ValueContext( 7.4 ).indexNumeric() );

// include 8.0, exclude 9.0
hits = movies.query( QueryContext.numericRange( "score", 8.0, 9.0, true, false ) );

6.11.2. Sorting

Lucene performs sorting very well, and that is also exposed in the index backend, through the QueryContext class:

hits = movies.query( "title", new QueryContext( "*" ).sort( "title" ) );
for ( Node hit : hits )
    // all movies with a title in the index, ordered by title
// or
hits = movies.query( new QueryContext( "title:*" ).sort( "year", "title" ) );
for ( Node hit : hits )
    // all movies with a title in the index, ordered by year, then title

We sort the results by relevance (score) like this:

hits = movies.query( "title", new QueryContext( "The*" ).sortByScore() );
for ( Node movie : hits )
    // hits sorted by relevance (score)

6.11.3. Querying with Lucene query objects

Instead of passing in Lucene query syntax queries, you can instantiate such queries programmatically and pass in as argument, for example:

// a TermQuery will give exact matches
Node actor = actors.query( new TermQuery( new Term( "name", "Keanu Reeves" ) ) ).getSingle();

Note that the TermQuery is basically the same thing as using the get method on the index.

This is how to perform wildcard searches using Lucene query objects:

hits = movies.query( new WildcardQuery( new Term( "title", "The Matrix*" ) ) );
for ( Node movie : hits )
    System.out.println( movie.getProperty( "title" ) );

Note that this allows for whitespace in the search string.

6.11.4. Compound queries

Lucene supports querying for multiple terms in the same query, like so:

hits = movies.query( "title:*Matrix* AND year:1999" );

Compound queries can’t search across committed index entries and those who haven’t got committed yet at the same time.

6.11.5. Default operator

The default operator (that is whether AND or OR is used in between different terms) in a query is OR. Changing that behavior is also done via the QueryContext class:

QueryContext query = new QueryContext( "title:*Matrix* year:1999" )
        .defaultOperator( Operator.AND );
hits = movies.query( query );