Hit counting: Add support for scenarios where you want to count something other than '1' per doc


There are scenarios where you want to count each hit with a value other than 1. For example, it could be through specific fields within the document that would indicate its weight and/or doing some more advanced calculations using frequency analysis/etc. See below original email thread.


Hi Pablo:

Coincidentally we just had a discussion on that topic today.
Specifically, we wanted to instead of counting the hits to the facet,
we want to normalize it by the frequency, e.g. punishing popular
terms, like the tfidf model.

The thing to be careful of is that as of bobo 2.5.0, we are on
lucene 2.9+, which means the calculations are done at the segment-
level instead of the index level, which means the FacetDataCache where
the counting etc. are done only has a per-segment view of the data.

We are thinking of different ways of doing this to make sure we
don't do anything half-baked. Do you mind creating a jira ticket at:
snaprojects.jira.com to track and discuss this issue?



  • Hide quoted text -

On Aug 6, 2:45 pm, Pablo <pablo.osin...@gmail.com> wrote:
> Hello,
> Quick question: We are trying to modify the behavior of
> FacetCountCollector (below) to instead of counting "1" for each match,
> use the value of a specific field within the document as the number to
> the count. So each document will have a specific weight into the
> overall hit count and not just '1'.
> "FacetCountCollector:
> A count array, int[] of size t, is created to store the hit count for
> each term, given a match docid, count[order[docid]] is incremented.
> Facets are created by grouping all elements in the term array with
> count >= minHitCount specified by the FacetSpec, into desired range
> facets of the format [x TO y]."
> What would be the most effective way to accomplish this?
> Many thanks-




John Wang


Pablo Osinaga