Bloom Filter

Learn about the Bloom filter, a type of probabilistic data structure.

Probabilistic data structure

Probabilistic data structures are data structures that provide a reasonably approximate answer without giving a definitive, accurate result. They also offer a mechanism to approximate this estimation. The probabilistic data structure is used in big data and streaming applications, where approximate answers are sufficient to proceed with the workflow.

Some well-known probabilistic data structures include Bloom filter, Count Min Sketch, and HyperLogLog. Probabilistic data structures include these characteristics:

  • They are space-efficient. The data structures fit into memory, providing a lower memory footprint.

  • Probabilistic data structures are parallelizable.

  • The access patterns on the data structure are predictable and in constant time.

Probabilistic data structures can be used to:

  • Determine whether an element belongs to a set.

  • Determine the total distinct elements in a given array. ...