Heap Sort
Imagine a tournament bracket where every match is played simultaneously at every round. The rule: the winner of each match must be stronger than both their children. At the end of any round, the very top of the bracket holds the strongest player. Pull them out, collapse the bracket, and run it again — the next strongest rises. That's heap sort.
More precisely: it uses a max-heap — a binary tree where every parent is larger than its children. The array representation stores it compactly: children of index i live at 2i+1 and 2i+2; parent of i is at ⌊(i−1)/2⌋. No pointers needed.
Phase 1: Build the Heap in \(O(n)\)
You might think inserting n elements one by one into a heap costs \(O(n \log n)\). But there's a smarter way: Floyd's heapify. Start from the last non-leaf node (index ⌊n/2⌋−1) and work backwards, sifting each node down to its correct position. Leaves do zero work; nodes near the bottom do very little. The work sums to a geometric series that converges to \(O(n)\) — a surprising fact that's proven by summing h·n/2^h over all heights h.
Phase 2: Sort by Repeated Extract-Max
Now the largest element sits at the root (index 0). Swap it with the last element in the heap, shrink the heap by one, and sift the new root down to restore the heap property. The extracted element is now in its correct sorted position at the end of the array. Repeat n−1 times. Each sift-down costs \(O(\log n)\), and we do n of them: \(O(n \log n)\) total.
The sort happens entirely in-place. The heap lives in the front of the array; the sorted portion grows from the back. No extra buffer needed.
Why Heap Sort Loses in Practice
Despite its beautiful properties — \(O(n \log n)\) guaranteed, \(O(1)\) space, no bad inputs — heap sort is slower than quicksort in practice. The culprit: cache unfriendliness. Sift-down jumps between parent (index i) and child (index 2i+1), which are far apart in large arrays. This pattern causes many CPU cache misses, making heap sort 2–5× slower than quicksort empirically.
Merge sort accesses memory sequentially (cache-friendly). Quicksort partitions contiguous ranges (cache-friendly). Heap sort bounces around — not cache-friendly. Its niche is as the safety net in Introsort: when quicksort detects it's degenerating (recursion depth > 2 log n), it falls back to heap sort for the \(O(n \log n)\) guarantee without extra memory.