Top-K: A Probabilistic Addition to RedisBloom

Background

You may find yourself wondering whether you should use more probabilistic data structures in your code. The answer is, as always, it depends. If your data set is relatively small and you have the required memory available, keeping another copy of the data in an index might make sense. You’ll maintain 100% accuracy and get a fast execution time.

However, when the data set becomes sizeable, your current solution will soon turn into a slow memory hog as it stores an additional copy of the items or items’ identifiers. Therefore, when you deal with a large number of items, you might want to relax the requirement of 100% accuracy in order to gain speed and save memory space. In these instances, using the appropriate probabilistic data structure could work magic — supporting high accuracy, a predefined memory allowance and a time complexity that’s independent of your data set size.