Sqrt Decomposition
Suppose you're a librarian managing 10,000 books. Checking every book for a query would take ages. But if you group them into shelves of 100 books each, you can quickly scan full shelves and only check individual books at the two ends of a range. You trade the pain of "check everything" for a manageable \(O(√n)\) sweet spot.
That's Sqrt Decomposition: divide your array into blocks of size B ≈ √n. Precompute an aggregate (sum, min, XOR, …) for each block. Range queries scan at most 2 partial blocks element-by-element, and all the full blocks in between in \(O(1)\) each.
Building the structure
Divide an array of n elements into ⌈n/B⌉ blocks, where B = ⌊√n⌋. For each block b, precompute block[b] = aggregate of all elements in that block. This takes one pass: \(O(n)\). Element at index i belongs to block i/B.
Answering a range query [l, r]
The range spans three parts:
Left partial block: l is somewhere in the middle of its block. Scan element by element from l to the end of that block (at most B−1 elements).
Full middle blocks: for each block completely inside [l, r], use its precomputed aggregate — \(O(1)\) per block.
Right partial block: scan element by element from the start of r's block to r (at most B−1 elements).
Total work: \(O(B)\) for the two partial ends + \(O(n/B)\) for full blocks = \(O(B + n/B)\). Setting B = √n minimizes this to \(O(√n)\).
Updating a single element
Update the element at index i, then recompute the block aggregate:
• Sum: block[i/B] += (new_val − old_val). \(O(1)\).
• Min/Max: the new value might be smaller/larger than the block's current aggregate, but if you raised the old minimum, you can't tell the new minimum without re-scanning. \(O(B)\) = \(O(√n)\).
Range update with lazy blocks
To add δ to all elements in [l, r]: for full blocks, store lazy[b] += δ in \(O(1)\) — don't touch individual elements. For the two partial boundary blocks, update elements one by one and recompute those blocks' aggregates. Total: \(O(√n)\). Queries incorporate the lazy value when they hit a block or scan its individual elements.
When to reach for sqrt decomposition
A segment tree answers queries and updates in \(O(\log n)\) — faster than \(O(√n)\). But sqrt decomposition wins on implementation simplicity: no tree structure, no recursive lazy propagation, just arrays and arithmetic. For problems with complex aggregates that are hard to compose (like certain "last occurrence" queries), sqrt decomposition adapts naturally. In competitive programming, when a segment tree would take 30 minutes to code correctly, sqrt decomposition takes 10.