Zoie is a realtime indexing and search system, and as such needs to have relatively close coupling between the logically distinct Indexing and Searching subsystems: as soon as a document made available to be indexed, it must be immediately searchable.
The Zoie System is the primary component of Zoie, that incorporates both Indexing (via implementing
DataConsumer<V>) and Search (via implementing
IndexReaderFactory<ZoieIndexReader<R extends IndexReader>>).
Zoie can be configured via Spring:
Documents get into the Zoie System for addition to lucene indices by way of a decoupled DataProvider abstraction, which indexes via push: Zoie implements the DataConsumer interface, the natural partner to DataProvider . What follows is a brief call-stack walk-through of indexing:
- DataProvider is running on its own thread/pool/remote machine/etc, and controls the flow of
DataConsumer .consume(Collection<DataEvent<V>>), which in this case is the ZoieSystem which delegates to several internal
- ZoieIndexableInterpreter , whose job it is to iterate over
DataEvent<V>and spit out Indexable objects via ZoieIndexable
.convertAndInterpret(V data), and these resultant objects provide
Document objects via buildIndexingReqs().
RAM-to-Disk Index Segment Copy:
Prior to 1.4.0, indexing for RAM Index and Disk Index both tokenized document data and built inverted indexes separately. In 1.4.0, we eliminated this duplicate work. Disk index now copies index segments from RAM index instead of going through tokenization and inversion again. This greatly reduces CPU load and disk I/O when documents are flushed to Disk index.
Overview: The part of Zoie that enables real-time searchability is the fact that ZoieSystem contains three IndexDataLoader objects:
- a RAMLuceneIndexDataLoader, which is a simple wrapper around a RAMDirectory,
- a DiskLuceneIndexDataLoader, which can index directly to the FSDirectory in batches via an intermediary
- BatchedIndexDataLoader, whose primary job is to queue up and batch DataEvents that need to be flushed to disk
All write requests that come in through the DataProvider are tee'ed off into a "currentWritable" RAMDirectory and into an in-memory index of DataEvent's which the BatchedIndexDataLoader collects until it hits a threshold batch size (and a minimum delay time has passed), after which it gets added to disk.
ZoieSystem acting as an IndexReaderFactory, provides an "expert" search api (note that these IndexReader instances will always be read-only, and thus not usable for modifying the index - only searching it), for clients of ZoieSystem who need access to the IndexReader internals (for faceting, caching, etc...). For clients who do not need/want such an expert api, there will be (in an upcoming Zoie release) a more simplified "Searcher Factory" interface which compartmentalizes the IndexReader internals a bit more by wrapping the IndexReader's in a MultiSearcher.
ZoieSystem delegates the getIndexReaders() and returns a list of ZoieIndexReader instances.
SeeCode Samples wiki for examples.