Bloom Filter

Learn about the Bloom filter, a type of probabilistic data structure.

We'll cover the following...

Probabilistic data structure

Probabilistic data structures are data structures that provide a reasonably approximate answer without giving a definitive, accurate result. They also offer a mechanism to approximate this estimation. The probabilistic data structure is used in big data and streaming applications, where approximate answers are sufficient to proceed with the workflow.

Some well-known probabilistic data structures include Bloom filter, Count Min Sketch, and HyperLogLog. Probabilistic data structures include these characteristics:

  • They are space-efficient. The data structures fit into memory, providing a lower memory footprint.

  • Probabilistic data structures are parallelizable.

  • The access patterns on the data structure are predictable and in constant time.

Probabilistic data structures can be used to:

  • Determine whether an element belongs to a set.

  • Determine the total distinct elements in a given array.

  • Count the approximate frequency of a particular element from a large data set.

Introduction

A Bloom filter is a space-efficient probabilistic data structure devised by Burton Howard Bloom in 1970. A Bloom filter is used to approximate the set membership problem statement and determine whether an element belongs to a set or not.

The data structure can produce a false positive, but it doesn't produce a ...