Merge Sort
Picture a huge messy pile of papers on your desk. Splitting it into two smaller piles is easy. If each pile is already sorted, merging them back into one sorted pile is also easy — just compare the top sheet of each pile and take the smaller one. That's exactly how merge sort works.
Of course, the smaller piles aren't sorted yet either — so you split them again, and again, until each pile has just one sheet. One sheet is always sorted. Then merge pairs of piles back together, all the way up. The whole thing runs in \(O(n \log n)\) time — guaranteed, no matter what the input looks like.
The Three Phases
1. Divide: Split the array in half. Recurse on each half. Keep splitting until you reach arrays of size 1 (base case).
2. Conquer: Arrays of size 1 are trivially sorted. The real work happens next.
3. Merge: Combine two sorted halves into one sorted array. Use two pointers — one for each half. At each step, pick the smaller front element and advance that pointer. When one half is exhausted, append the rest of the other half. This costs \(O(n)\) per merge, and each element participates in exactly log n merges (once per level of the recursion tree).
Why It's Always \(O(n \log n)\)
The recursion tree has log n levels (we halve the problem each time). At every level, the total merge work across all nodes is \(O(n)\) — many smaller merges that together touch every element once. Multiply: \(O(n)\) × \(O(\log n)\) = \(O(n \log n)\), always — no bad inputs, no unlucky pivot choices.
Stability and the Space Tradeoff
Merge sort is stable: when two elements are equal during a merge, we always take from the left (earlier) half first. Equal elements never swap past each other.
The downside is \(O(n)\) extra space for the merge buffer. This is the main reason merge sort loses to quicksort in memory-constrained settings. One exception: linked lists. Merging two sorted linked lists requires no extra array — just relink pointers. So merge sort on linked lists uses only \(O(\log n)\) stack space.
Merge sort is the backbone of Timsort (used by Python and Java for object sorting) and is the standard algorithm for external sorting (data too large for RAM, stored on disk).