Advanced Topics7 min read

Sqrt Decomposition

Split the array into √n chunks — too big to check every element, too small to need a fancy tree
build:\(O(n)\)range query:\(O(√n)\)point update:\(O(1)\) for sum, \(O(√n)\) for min/maxblock size:√n

Suppose you're a librarian managing 10,000 books. Checking every book for a query would take ages. But if you group them into shelves of 100 books each, you can quickly scan full shelves and only check individual books at the two ends of a range. You trade the pain of "check everything" for a manageable \(O(√n)\) sweet spot.

That's Sqrt Decomposition: divide your array into blocks of size B ≈ √n. Precompute an aggregate (sum, min, XOR, …) for each block. Range queries scan at most 2 partial blocks element-by-element, and all the full blocks in between in \(O(1)\) each.

Building the structure

Divide an array of n elements into ⌈n/B⌉ blocks, where B = ⌊√n⌋. For each block b, precompute block[b] = aggregate of all elements in that block. This takes one pass: \(O(n)\). Element at index i belongs to block i/B.

Answering a range query [l, r]

The range spans three parts:

Left partial block: l is somewhere in the middle of its block. Scan element by element from l to the end of that block (at most B−1 elements).

Full middle blocks: for each block completely inside [l, r], use its precomputed aggregate — \(O(1)\) per block.

Right partial block: scan element by element from the start of r's block to r (at most B−1 elements).

Total work: \(O(B)\) for the two partial ends + \(O(n/B)\) for full blocks = \(O(B + n/B)\). Setting B = √n minimizes this to \(O(√n)\).

Updating a single element

Update the element at index i, then recompute the block aggregate:

Sum: block[i/B] += (new_val − old_val). \(O(1)\).
Min/Max: the new value might be smaller/larger than the block's current aggregate, but if you raised the old minimum, you can't tell the new minimum without re-scanning. \(O(B)\) = \(O(√n)\).

Range update with lazy blocks

To add δ to all elements in [l, r]: for full blocks, store lazy[b] += δ in \(O(1)\) — don't touch individual elements. For the two partial boundary blocks, update elements one by one and recompute those blocks' aggregates. Total: \(O(√n)\). Queries incorporate the lazy value when they hit a block or scan its individual elements.

When to reach for sqrt decomposition

A segment tree answers queries and updates in \(O(\log n)\) — faster than \(O(√n)\). But sqrt decomposition wins on implementation simplicity: no tree structure, no recursive lazy propagation, just arrays and arithmetic. For problems with complex aggregates that are hard to compose (like certain "last occurrence" queries), sqrt decomposition adapts naturally. In competitive programming, when a segment tree would take 30 minutes to code correctly, sqrt decomposition takes 10.

Note: The √n trick appears in many algorithms: Mo's Algorithm sorts queries in blocks of √n, Heavy-Light Decomposition creates \(O(\log n)\) chains of length \(O(√n)\), and sqrt decomposition uses blocks of √n directly. Whenever you see \(O(√n)\), someone is balancing two \(O(n)\)-sized things against each other.

Sqrt Decomposition — range sum with point updates

import math
class SqrtDecomp:
def __init__(self, arr):
self.arr = arr[:]
self.n = len(arr)
self.B = max(1, int(math.isqrt(self.n)))
num_blocks = (self.n + self.B - 1) // self.B
self.blocks = [0] * num_blocks
for i, v in enumerate(arr):
self.blocks[i // self.B] += v
def update(self, i, val):
self.blocks[i // self.B] += val - self.arr[i]
self.arr[i] = val
def query(self, l, r): # inclusive [l, r]
total = 0
bl, br = l // self.B, r // self.B
if bl == br:
return sum(self.arr[l:r+1])
# Left partial block
total += sum(self.arr[l : (bl+1)*self.B])
# Full middle blocks
for b in range(bl+1, br):
total += self.blocks[b]
# Right partial block
total += sum(self.arr[br*self.B : r+1])
return total
arr = [1, 3, 5, 7, 9, 11, 13, 15, 17, 19]
sq = SqrtDecomp(arr)
print(sq.query(2, 7)) # 5+7+9+11+13+15 = 60
sq.update(4, 0) # arr[4] = 0 (was 9)
print(sq.query(2, 7)) # 5+7+0+11+13+15 = 51
Output
60
51

Complexity

🏗️ BuildCompute block aggregates in a single pass — trivially fast
One linear scan\(O(n)\)
🔍 Range queryAt most 2B−2 individual elements + \(O(n/B)\) full blocks; minimized at B = √n
Two partial ends + full blocks\(O(√n)\)
✏️ Point update (sum)block[i/B] += new_val − old_val — just two arithmetic ops
Adjust element + adjust block sum\(O(1)\)
✏️ Point update (min/max)If old minimum was raised, must rescan all B elements to find new minimum
Update element + rescan block\(O(√n)\)
📝 Range update + lazyStore lazy[b] += δ for full blocks; update elements directly at boundaries
\(O(1)\) per full block, \(O(√n)\) per boundary\(O(√n)\)

Quick check

Why is B = √n the optimal block size?

Continue reading

Mo's Algorithm
Answer a thousand range queries by sorting them cleverly — not answering them in order
DP Pattern
Never solve the same puzzle piece twice — write it down and look it up next time