An empty Bloom filter is a Bit Vector with all bits set to zero. In the image below, each cell represents a bit. The number below the bit is its index for a 10-bit vector.
In order to add an element, it must be hashed using multiple hash functions. Bits are set at the index of the hashes in the bit vector.
For example, let’s assume we need to add the james@gmail.com
element using three efficient hash functions:
H1(james@gmail.com
) = 12021
H2(james@gmail.com
) = 23324
H3(james@gmail.com
) = 23237
We can take the mod of 10 for all these values to get an index within the bounds of the bit vector: 12021 % 10 = 1 23324 % 10= 4 23237 % 10 = 7
For an item whose membership needs to be tested, it is also hashed via the same hash functions. If all the bits are already set for this, the element may exist in the set.
If any bit is not set, the element is definitely not in the set.
Let’s assume we have added the two members below to the bloom filter.
Now, if we want to check whether or not Tiger exists in the set, we can hash it via the same hash functions.
H(“Tiger”) = {2,7,3}
We have not added “Tiger” to the bloom filter, but all the bits at index {2,7,3} have already been set by the previous two elements; thus, Bloom Filter claims that “Tiger” is present in the set. This is a false positive result.
A Bloom filters is a space-efficient data structure, but it does not store the actual items since it is just a bit vector. It has many applications such as: