Bloom Filter

Imagine a nightclub bouncer who checks guests against a list — but the list is written in invisible ink, and occasionally the ink bleeds and marks a name that was never there. The bouncer might turn away someone innocent (false positive), but he will never let in someone on the banned list (no false negatives).

That's a Bloom Filter: a lightning-fast, memory-tiny way to ask "is this thing in my set?" It might say "probably yes" when the answer is actually no — but if it says "definitely not," you can take that to the bank.

How it works

A Bloom filter is just a bit array of m bits (all zeros at the start) paired with k independent hash functions. Each hash function maps any input to a position in that array.

To insert an element, run it through all k hash functions and flip those bit positions to 1. That's it.

To query, run the same k hashes. If every bit at those positions is 1 → "probably in the set." If any bit is 0 → "definitely NOT in the set." A 0 can only exist if that position was never flipped — meaning this element was never inserted.

Why false positives happen — but false negatives can't

Over time, other elements flip bits all over the array. By coincidence, the exact positions your query needs may all be 1 already — set by different elements. That's a false positive. But once a bit is set to 1, it stays 1 forever. So if you inserted something, its bits are guaranteed to stay on — false negatives are structurally impossible.

Tuning the false positive rate

Given n expected insertions and a target false positive rate p, the optimal parameters are:
• Bit array size: m = -n·ln(p) / (ln 2)²
• Number of hash functions: k = (m/n)·ln 2 ≈ 0.693·m/n

For a 1% false positive rate, you need only about 9.6 bits per element. A regular hash set storing 64-bit keys uses 64 bits per element minimum — about 7× more space.

Why no deletions?

If you clear a bit to "delete" an element, you might wipe out evidence of other elements that shared that bit position. You'd create false negatives — exactly what makes Bloom filters useful breaks. Extensions like Counting Bloom Filters replace bits with small counters to allow deletions, at the cost of more memory.

Real-world uses

Google Chrome's Safe Browsing keeps a local Bloom filter of malicious URLs. A false positive just triggers a follow-up server check — a small cost. A false negative (missing a real threat) would be catastrophic, and the filter makes that impossible.

Databases (Cassandra, RocksDB, HBase) use Bloom filters before touching disk: "is this key in this SSTable?" A false positive wastes one disk read. A false negative would return wrong data. The filter eliminates false negatives entirely while cutting most unnecessary disk I/O.

Bloom Filter

How it works

Why false positives happen — but false negatives can't

Tuning the false positive rate

Why no deletions?

Real-world uses

Bloom Filter — core insert & query

Complexity

Quick check