Searching & Selection8 min read

Median of Medians

A guaranteed-good pivot — every single time
time:\(O(n)\) guaranteedspace:\(O(n)\)pivot quality:≥ 30% elements on each side

Imagine your class needs to pick a team captain fairly. Instead of everyone voting at once (which might be gamed), you try something clever: split into small groups of 5, have each group pick their own captain, then have all the group captains vote among themselves. The winner is a good representative of everyone.

That's exactly what Median of Medians does — and for the same reason. It picks a pivot element that is guaranteed to split the array decently, so no adversary can trick it into \(O(n^2)\) behavior.

The problem with QuickSelect

Randomized QuickSelect is excellent in practice — expected \(O(n)\) with high probability. But if someone hands you carefully crafted input, they can force your random pivot to always be terrible, pushing you to \(O(n^2)\). In security-critical code, or when you need a provable worst-case guarantee, you need something deterministic.

The algorithm step by step

Step 1 — Group into 5s: Split the n elements into groups of exactly 5 (the last group may be smaller). Groups of 5 are the sweet spot — small enough to sort instantly, large enough to give a quality guarantee.

Step 2 — Sort each group and grab the median: Sort each group of 5 with insertion sort (just 7 comparisons max). Take the middle element. You now have ⌈n/5⌉ group medians.

Step 3 — Recursively find the median of those medians: Apply the whole algorithm recursively on just the ⌈n/5⌉ medians. The result is the median of medians — call it M.

Step 4 — Use M as your pivot: Partition the full array around M. Because M is a good pivot (guaranteed 30-70 split), recurse only on the side that contains your target element.

Why is M a good pivot?

M is the median of ⌈n/5⌉ values. At least half of those values are ≥ M. Each such value was the median of a group of 5, so at least 2 elements in that group are also ≥ M. That means at least (n/5)/2 × 2 = n/5 elements in the full array are ≥ M from those groups alone — plus M itself. After careful counting, at least 3n/10 elements are guaranteed to be on each side of M. The partition is always at most 70%-30%, never degenerate.

The recurrence

T(n) = T(n/5) + T(7n/10) + \(O(n)\). The T(n/5) covers finding the median of medians recursively. The T(7n/10) covers the partition recursion on the larger side. The \(O(n)\) covers grouping, sorting groups, and partitioning. This solves to T(n) = \(O(n)\).

Note: Why groups of 5 specifically? Groups of 3 give too weak a guarantee — the recurrence becomes T(n)=T(n/3)+T(2n/3)+\(O(n)\) which is \(O(n \log n)\). Groups of 7 work but add overhead. Five is the smallest group size that provably gives \(O(n)\).

Median of Medians

def median_of_five(arr):
arr.sort()
return arr[len(arr) // 2]
def partition_around(arr, lo, hi, pivot):
# Move pivot to end
for i in range(lo, hi + 1):
if arr[i] == pivot:
arr[i], arr[hi] = arr[hi], arr[i]
break
p = arr[hi]
i = lo
for j in range(lo, hi):
if arr[j] <= p:
arr[i], arr[j] = arr[j], arr[i]
i += 1
arr[i], arr[hi] = arr[hi], arr[i]
return i
def mom_select(arr, lo, hi, k):
if hi - lo < 5:
arr[lo:hi+1] = sorted(arr[lo:hi+1])
return arr[lo + k - lo]
# Step 1-2: find medians of groups of 5
medians = []
for i in range(lo, hi + 1, 5):
group = arr[i:min(i + 5, hi + 1)]
medians.append(median_of_five(group))
# Step 3: recursively find median of medians
pivot = mom_select(medians, 0, len(medians) - 1, len(medians) // 2)
# Step 4: partition and recurse
p = partition_around(arr, lo, hi, pivot)
if p == k:
return arr[p]
elif p > k:
return mom_select(arr, lo, p - 1, k)
else:
return mom_select(arr, p + 1, hi, k)
arr = [3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5]
print(mom_select(arr[:], 0, len(arr) - 1, 4)) # 5th smallest = 3
Output
3

Complexity

🏆 Overall selectionDeterministic; worst case = average case; adversarial inputs can't break it
Linear — guaranteed\(O(n)\)
🎯 Pivot quality guaranteeAt least 3n/10 elements on each side of the pivot — no degenerate splits
30%-70% split always\(\Theta(n)\)
💾 SpaceMedians array at each level; recursion stack \(O(\log n)\); total \(O(n)\)
Linear auxiliary\(O(n)\)
🐢 Practical speedLarge constants from grouping + sorting overhead; rarely used in practice
5-10× slower than QuickSelect\(O(n)\)

Quick check

Why do we use groups of exactly 5, rather than groups of 3?

Continue reading

QuickSelect
Find the kth smallest without fully sorting
D&C Pattern
Split the problem, solve the pieces, combine the results