Linear-Time Sorting
Imagine sorting a stack of letters by mailbox number. You don't need to compare two letters against each other — you just drop each one directly into its mailbox slot. That's the key idea behind linear-time sorting: instead of comparing elements, we exploit extra information about what the values can be.
Normal comparison sorts are bottlenecked at \(O(n \log n)\) because every comparison only gives one bit of information. Linear sorts bypass this by knowing the range of possible values upfront — letting them place elements directly rather than discovering order through comparisons.
Counting Sort — The Simplest Idea
If your values are integers in the range [0, k), counting sort works like this: (1) Count how many times each value appears. (2) Convert those counts to starting positions (prefix sums). (3) Place each input element into the output at its precomputed position.
Time: \(O(n + k)\). Space: \(O(n + k)\). Stable if you scan the input right-to-left in step 3 (important for radix sort to work correctly). Perfect when k is small — if k = \(O(n)\), total time is \(O(n)\). Becomes impractical when k is huge (e.g., 64-bit integers).
Radix Sort — Sorting Digit by Digit
Radix sort handles large integers by sorting one digit at a time, from least significant to most significant (LSD), using a stable sort (counting sort) at each digit. After processing all digits, the array is fully sorted.
Why LSD order? Because stability preserves the work of previous passes. After sorting by the ones digit, the tens-digit pass keeps ones-digit order for numbers with the same tens digit — correctness builds up digit by digit.
With base 256 and 32-bit integers: 4 passes, k=256 buckets per pass. Total: \(O(4 × (n + 256)\)) = \(O(n)\). Faster than \(O(n \log n)\) for large n.
Bucket Sort — Distribute and Conquer
Bucket sort divides the value range into n equal-width buckets, tosses each element into its bucket, sorts each bucket with insertion sort, then concatenates. If input is uniformly distributed over the range, each bucket gets ~1 element on average, and each insertion sort runs in \(O(1)\). Total: \(O(n)\).
The catch: non-uniform distributions can pile many elements into one bucket (worst case: all into one → \(O(n^2)\)). Bucket sort shines for uniformly distributed floating-point numbers or when you control the distribution.