Images as Numbers
Zoom In Far Enough
Open any photo on your phone and zoom in as far as you can. Keep going. Eventually, you'll see tiny colored squares β pixels. Every digital image, from selfies to satellite photos, is just a grid of pixels.
And each pixel? Just three numbers: one for Red, one for Green, one for Blue. Mix them together like paint, and you get any color.
[255, 0, 0]= pure red[0, 255, 0]= pure green[0, 0, 255]= pure blue[255, 255, 0]= yellow (red + green)[255, 255, 255]= white (all colors)[0, 0, 0]= black (no light)
A typical smartphone photo is about 4000 x 3000 pixels. That's 12 million pixels, each with 3 color values β 36 million numbers to describe a single picture.
Why This Matters for AI
Once an image is just a grid of numbers, computers can do math on it. And math is what neural networks do best. Brightness? Average the numbers. Edges? Subtract neighboring pixels. Blur? Average a neighborhood. Every image operation is just arithmetic on a grid.
Grayscale: The Simple Version
Color images have 3 channels (R, G, B). Grayscale images have just 1 β a brightness value from 0 (black) to 255 (white). They're easier to work with and many classic computer vision algorithms start with grayscale.
Image as a Matrix (or Tensor)
To a computer, an image is a 3D array (also called a tensor):
- Height x Width x Channels
- A 28x28 grayscale image β shape (28, 28, 1)
- A 1920x1080 color image β shape (1080, 1920, 3)
This is the fundamental data structure for all computer vision. When you feed an image to a neural network, you're really feeding it a tensor of numbers.
Resolution and Information
More pixels = more detail = more numbers. A 4K image (3840x2160) has 8.3 million pixels β compared to a tiny 28x28 thumbnail (784 pixels) often used in AI experiments. The MNIST handwritten digit dataset uses 28x28 grayscale images. Simple, but enough for a neural network to recognize digits with 99%+ accuracy.
Working with Images as Numbers
From Pixels to Understanding
A single pixel tells you almost nothing. It's just a color. But patterns of pixels form edges. Groups of edges form shapes. Collections of shapes form objects. This hierarchy β from numbers to pixels to edges to shapes to objects to scenes β is exactly what neural networks learn to build, layer by layer.
The journey from "grid of numbers" to "that's a photo of a golden retriever at the beach" is the story of computer vision. And it all starts with the humble pixel.