39.2. Index Batch Insertion
For general notes on batch insertion, see Chapter 39, Batch Insertion.
BatchInserter inserter = BatchInserters.inserter( "target/neo4jdb-batchinsert" ); BatchInserterIndexProvider indexProvider = new LuceneBatchInserterIndexProvider( inserter ); BatchInserterIndex actors = indexProvider.nodeIndex( "actors", MapUtil.stringMap( "type", "exact" ) ); actors.setCacheCapacity( "name", 100000 ); Map<String, Object> properties = MapUtil.map( "name", "Keanu Reeves" ); long node = inserter.createNode( properties ); actors.add( node, properties ); //make the changes visible for reading, use this sparsely, requires IO! actors.flush(); // Make sure to shut down the index provider as well indexProvider.shutdown(); inserter.shutdown();
The configuration parameters are the same as mentioned in Section 38.10, “Configuration and fulltext indexes”.
Here are some pointers to get the most performance out of
- Try to avoid flushing too often because each flush will result in all additions (since last flush) to be visible to the querying methods, and publishing those changes can be a performance penalty.
- Have (as big as possible) phases where one phase is either only writes or only reads, and don’t forget to flush after a write phase so that those changes becomes visible to the querying methods.
- Enable caching for keys you know you’re going to do lookups for later on to increase performance significantly (though insertion performance may degrade slightly).
Changes to the index are available for reading first after they are flushed to disk. Thus, for optimal performance, read and lookup operations should be kept to a minimum during batchinsertion since they involve IO and impact speed negatively.