36.11. Extra features for Lucene indexes

Numeric ranges

Lucene supports smart indexing of numbers, querying for ranges and sorting such results, and so does its backend for Neo4j. To mark a value so that it is indexed as a numeric value, we can make use of the ValueContext class, like this:

movies.add( theMatrix, "year-numeric", new ValueContext( 1999 ).indexNumeric() );
movies.add( theMatrixReloaded, "year-numeric", new ValueContext( 2003 ).indexNumeric() );
movies.add( malena, "year-numeric", new ValueContext( 2000 ).indexNumeric() );

int from = 1997;
int to = 1999;
hits = movies.query( QueryContext.numericRange( "year-numeric", from, to ) );
[Note]Note

The same type must be used for indexing and querying. That is, you can’t index a value as a Long and then query the index using an Integer.

By giving null as from/to argument, an open ended query is created. In the following example we are doing that, and have added sorting to the query as well:

hits = movies.query(
        QueryContext.numericRange( "year-numeric", from, null )
                .sortNumeric( "year-numeric", false ) );

From/to in the ranges defaults to be inclusive, but you can change this behavior by using two extra parameters:

movies.add( theMatrix, "score", new ValueContext( 8.7 ).indexNumeric() );
movies.add( theMatrixReloaded, "score", new ValueContext( 7.1 ).indexNumeric() );
movies.add( malena, "score", new ValueContext( 7.4 ).indexNumeric() );

// include 8.0, exclude 9.0
hits = movies.query( QueryContext.numericRange( "score", 8.0, 9.0, true, false ) );

Sorting

Lucene performs sorting very well, and that is also exposed in the index backend, through the QueryContext class:

hits = movies.query( "title", new QueryContext( "*" ).sort( "title" ) );
for ( Node hit : hits )
{
    // all movies with a title in the index, ordered by title
}
// or
hits = movies.query( new QueryContext( "title:*" ).sort( "year", "title" ) );
for ( Node hit : hits )
{
    // all movies with a title in the index, ordered by year, then title
}

We sort the results by relevance (score) like this:

hits = movies.query( "title", new QueryContext( "The*" ).sortByScore() );
for ( Node movie : hits )
{
    // hits sorted by relevance (score)
}

Querying with Lucene Query objects

Instead of passing in Lucene query syntax queries, you can instantiate such queries programmatically and pass in as argument, for example:

// a TermQuery will give exact matches
Node actor = actors.query( new TermQuery( new Term( "name", "Keanu Reeves" ) ) ).getSingle();

Note that the TermQuery is basically the same thing as using the get method on the index.

This is how to perform wildcard searches using Lucene Query Objects:

hits = movies.query( new WildcardQuery( new Term( "title", "The Matrix*" ) ) );
for ( Node movie : hits )
{
    System.out.println( movie.getProperty( "title" ) );
}

Note that this allows for whitespace in the search string.

Compound queries

Lucene supports querying for multiple terms in the same query, like so:

hits = movies.query( "title:*Matrix* AND year:1999" );
[Caution]Caution

Compound queries can’t search across committed index entries and those who haven’t got committed yet at the same time.

Default operator

The default operator (that is whether AND or OR is used in between different terms) in a query is OR. Changing that behavior is also done via the QueryContext class:

QueryContext query = new QueryContext( "title:*Matrix* year:1999" )
        .defaultOperator( Operator.AND );
hits = movies.query( query );

Caching

If your index lookups becomes a performance bottle neck, caching can be enabled for certain keys in certain indexes (key locations) to speed up get requests. The caching is implemented with an LRU cache so that only the most recently accessed results are cached (with "results" meaning a query result of a get request, not a single entity). You can control the size of the cache (the maximum number of results) per index key.

//        Index<Node> index = graphDb.index().forNodes( "actors" );
//        ((LuceneIndex<Node>) index).setCacheCapacity( "name", 300000 );
[Caution]Caution

This setting is not persisted after shutting down the database. This means: set this value after each startup of the database if you want to keep it.