Lifespan-based partitioning of index structures for time-travel text search
Abstract
Time-travel text search over a temporally evolving document collection is useful in various applications. Supporting a wide range of query classes demanded by these applications require different index layouts optimized for their respective query access patterns. The problem we tackle is how to efficiently handle different query classes using the same index layout. Our approach is to use list intersections on single-attribute indexes of keywords and temporal attributes. Although joint predicate evaluation on single-attribute indexes is inefficient in general, we show that partitioning the index based on version lifespans coupled with exploiting the transaction-time ordering of record-identifiers, can significantly reduce the cost of list intersections. We empirically evaluate different index partitioning alternatives on top of open-source Lucene, and show that our approach is the only technique that can simultaneously support a wide range of query classes efficiently, have high indexing throughput in a real-time ingestion setting, and also have negligible extra storage costs.