Comparison Lower Bound
Think about playing 20 questions to figure out which of many possible arrangements is correct. Each yes/no answer rules out about half the remaining possibilities. The more possible arrangements there are, the more questions you need to ask before you're sure.
Sorting n items has n! possible orderings. Each comparison ("is A before B?") is one yes/no question that rules out some orderings. The lower bound proof shows that no matter how clever your comparison strategy is, you must ask at least \(\Omega(n \log n)\) questions to handle all n! possibilities. This means no comparison sort can ever be faster than \(O(n \log n)\) in the worst case — not with any trick, not ever.
The Decision Tree Model
Every comparison-based sort can be modeled as a binary decision tree. Each internal node is a comparison between two elements: "is arr[i] ≤ arr[j]?". Depending on the answer (yes or no), the algorithm follows a branch. At some point it reaches a leaf — a node with no more comparisons — which specifies the final sorted output.
Every possible input traces a unique root-to-leaf path through this tree. The number of comparisons for a given input equals the depth of its leaf. The worst-case comparison count is the height of the tree (depth of the deepest leaf).
The Tree Needs at Least n! Leaves
For an algorithm to be correct, it must produce the right output for every possible input. There are exactly n! possible orderings of n distinct elements. If two different orderings led to the same leaf, the algorithm would return the same output for both — but the correct sorted order differs between them, so at least one answer would be wrong.
Therefore, a correct comparison sort's decision tree must have at least n! leaves.
Height ≥ \(\log_2(n!)\) ≈ n log n
A binary tree with L leaves has height at least ⌈\(\log_2(L)\)⌉. With L ≥ n!, the height is at least \(\log_2(n!)\). By Stirling's approximation: \(\log_2(n!)\) = Σᵢ \(\log_2(i)\) ≥ (n/2)·\(\log_2(n/2)\) = \(\Omega(n \log n)\). More precisely, \(\log_2(n!)\) = \(\Theta(n \log n)\).
So every comparison sort must make at least \(\Omega(n \log n)\) comparisons in the worst case. Since merge sort and heap sort achieve \(O(n \log n)\), the bound is tight — they are asymptotically optimal among all comparison sorts.
Why Linear Sorts Bypass This
Linear sorts (counting sort, radix sort, bucket sort) don't fit the decision tree model at all — they never ask "is A ≤ B?" Instead, they read exact integer values and place elements directly. They're using a completely different source of information, so the \(\Omega(n \log n)\) lower bound simply doesn't apply to them.