Sorting8 min read

Linear-Time Sorting

Sort by mailbox number, not by comparing — break the \(O(n \log n)\) wall
counting sort:\(O(n + k)\)radix sort:\(O(d · (n + k)\))bucket sort:\(O(n)\) averagerequirement:Integer keys in known range

Imagine sorting a stack of letters by mailbox number. You don't need to compare two letters against each other — you just drop each one directly into its mailbox slot. That's the key idea behind linear-time sorting: instead of comparing elements, we exploit extra information about what the values can be.

Normal comparison sorts are bottlenecked at \(O(n \log n)\) because every comparison only gives one bit of information. Linear sorts bypass this by knowing the range of possible values upfront — letting them place elements directly rather than discovering order through comparisons.

Counting Sort — The Simplest Idea

If your values are integers in the range [0, k), counting sort works like this: (1) Count how many times each value appears. (2) Convert those counts to starting positions (prefix sums). (3) Place each input element into the output at its precomputed position.

Time: \(O(n + k)\). Space: \(O(n + k)\). Stable if you scan the input right-to-left in step 3 (important for radix sort to work correctly). Perfect when k is small — if k = \(O(n)\), total time is \(O(n)\). Becomes impractical when k is huge (e.g., 64-bit integers).

Radix Sort — Sorting Digit by Digit

Radix sort handles large integers by sorting one digit at a time, from least significant to most significant (LSD), using a stable sort (counting sort) at each digit. After processing all digits, the array is fully sorted.

Why LSD order? Because stability preserves the work of previous passes. After sorting by the ones digit, the tens-digit pass keeps ones-digit order for numbers with the same tens digit — correctness builds up digit by digit.

With base 256 and 32-bit integers: 4 passes, k=256 buckets per pass. Total: \(O(4 × (n + 256)\)) = \(O(n)\). Faster than \(O(n \log n)\) for large n.

Bucket Sort — Distribute and Conquer

Bucket sort divides the value range into n equal-width buckets, tosses each element into its bucket, sorts each bucket with insertion sort, then concatenates. If input is uniformly distributed over the range, each bucket gets ~1 element on average, and each insertion sort runs in \(O(1)\). Total: \(O(n)\).

The catch: non-uniform distributions can pile many elements into one bucket (worst case: all into one → \(O(n^2)\)). Bucket sort shines for uniformly distributed floating-point numbers or when you control the distribution.

Note: Linear sorts don't break the laws of computer science — they just use information comparison sorts throw away. Every comparison sort only asks 'is A < B?' Linear sorts ask 'what exact value is A?' — and that extra information is what buys speed.

Counting Sort and Radix Sort

def counting_sort(arr, k=None):
"""Sort non-negative integers. k = max value + 1."""
if not arr: return arr
k = k or (max(arr) + 1)
count = [0] * k
for x in arr:
count[x] += 1
# Prefix sums -> starting positions
for i in range(1, k):
count[i] += count[i - 1]
out = [0] * len(arr)
for x in reversed(arr): # right-to-left for stability
count[x] -= 1
out[count[x]] = x
return out
def radix_sort(arr):
"""LSD radix sort for non-negative integers."""
if not arr: return arr
max_val = max(arr)
exp = 1
while max_val // exp > 0:
arr = counting_sort_by_digit(arr, exp)
exp *= 10
return arr
def counting_sort_by_digit(arr, exp):
count = [0] * 10
for x in arr: count[(x // exp) % 10] += 1
for i in range(1, 10): count[i] += count[i-1]
out = [0] * len(arr)
for x in reversed(arr):
d = (x // exp) % 10
count[d] -= 1
out[count[d]] = x
return out
print(counting_sort([4, 2, 2, 8, 3, 3, 1]))
print(radix_sort([170, 45, 75, 90, 802, 24, 2, 66]))
Output
[1, 2, 2, 3, 3, 4, 8]
[2, 24, 45, 66, 75, 90, 170, 802]

Complexity

🗂️ Counting sortk = key range; ideal when k = \(O(n)\). Bad if k >> n (wasted memory)
Linear in n plus key range\(O(n + k)\)
🔢 Radix sortd = digit count, k = base. \(O(n)\) for 32-bit ints with base 256 (d=4, k=256)
Linear for fixed-width integers\(O(d·(n + k)\))
🪣 Bucket sort — averageUniform distribution → \(O(1)\) elements per bucket on average
Linear with uniform input\(O(n)\)
🪣 Bucket sort — worst caseSkewed distribution can funnel all elements into one bucket
Quadratic if all in one bucket\(O(n^2)\)

Quick check

Why can counting sort bypass the \(\Omega(n \log n)\) comparison sort lower bound?

Continue reading

Comparison Lower Bound
Why no comparison sort can ever beat \(O(n \log n)\) — a mathematical proof
Basic Sorts
Sorting a hand of cards — three ways, one speed