Bloom Filter
Learn about the Bloom filter, a type of probabilistic data structure.
Probabilistic data structure
Probabilistic data structures are data structures that provide a reasonably approximate answer without giving a definitive, accurate result. They also offer a mechanism to approximate this estimation. The probabilistic data structure is used in big data and streaming applications, where approximate answers are sufficient to proceed with the workflow.
Some well-known probabilistic data structures include Bloom filter, Count Min Sketch, and HyperLogLog. Probabilistic data structures include these characteristics:
They are space-efficient. The data structures fit into memory, providing a lower memory footprint.
Probabilistic data structures are parallelizable.
The access patterns on the data structure are predictable and in constant time.
Probabilistic data structures can be used to:
Determine whether an element belongs to a set.
Determine the total distinct elements in a given array.
Count the approximate frequency of a particular element from a large data set.
Introduction
A Bloom filter is a space-efficient probabilistic data structure devised by Burton Howard Bloom in 1970. A Bloom filter is used to approximate the set membership problem statement and determine whether an element belongs to a set or not.
The data structure can produce a false positive, but it doesn't produce a ...