AI for Creativity12 min read

AI Image Generation

Type words, get pictures β€” the magic of diffusion models
scope:Applied AIdifficulty:Beginner

Draw Me a Cat on a Skateboard

"Draw me a cat riding a skateboard through a neon-lit Tokyo street at sunset."

You type those words into a box. You press Enter. Ten seconds later, a stunning image appears: a fluffy orange cat cruising down a rain-slicked Tokyo alley, neon signs reflecting in puddles, the sky painted in oranges and purples.

You didn't draw it. You didn't hire an artist. You didn't even open Photoshop. You just described what you wanted, and AI created it from nothing.

Welcome to the age of AI image generation β€” where words become pictures, and anyone with an imagination can be an artist.

How It Actually Works: From Noise to Picture

The technology behind most AI image generators is called a diffusion model. The name sounds complicated, but the idea is beautiful and simple.

Think of it like this:

  • Training (learning): Take millions of images from the internet. For each image, slowly add random noise β€” like static on an old TV β€” until the image is completely destroyed and looks like pure fuzz. Then train a neural network to reverse this process: given a noisy image, predict how to make it slightly less noisy.
  • Generating (creating): Start with pure random noise β€” total static. Then apply the denoising network over and over, step by step. Each step removes a little noise and adds a little structure. Shapes emerge from chaos. Colors appear. Details sharpen. After dozens of steps, a clear image appears.

Here's the magic part: during generation, you provide a text prompt that guides the denoising. The model doesn't just remove noise randomly β€” it removes noise in a way that steers toward your description. "Cat on skateboard" pushes the image toward cat shapes and skateboard shapes. "Neon Tokyo" pushes toward city lights and Japanese signage.

It's like sculpting a statue from a block of marble, except the sculptor is guided by your words, and the marble is random noise.

The Big Players

DALL-E (OpenAI)

DALL-E (a clever mashup of Salvador Dali and WALL-E) was one of the first AI image generators to go viral. Made by OpenAI, it lives inside ChatGPT and has its own API.

  • Strengths: Excellent at following complex prompts, good at text in images, strong safety filters
  • Best for: Quick image generation, ChatGPT integration, business and marketing use
  • How to use: Ask ChatGPT to "draw" or "create an image of..." and DALL-E generates it right in the chat

Midjourney

Midjourney became famous for producing stunningly artistic images. It has a distinctive aesthetic β€” often dreamy, cinematic, and painterly.

  • Strengths: Beautiful artistic style, incredible detail, great at aesthetic compositions
  • Best for: Concept art, illustrations, creative projects, social media visuals
  • How to use: Originally Discord-only (you type commands in a Discord chat), now has a web interface

Stable Diffusion

Stable Diffusion is the open-source option. Anyone can download it, run it on their own computer, and modify it.

  • Strengths: Free, customizable, runs locally (no internet needed), huge community of fine-tuned models
  • Best for: Developers, researchers, anyone who wants full control and privacy
  • How to use: Download and run locally, or use through web interfaces like DreamStudio

Other Notable Tools

  • Adobe Firefly β€” Integrated into Photoshop and Adobe Creative Suite. Trained only on licensed content, so it's safe for commercial use.
  • Google Imagen β€” Google's image model, available through Gemini. Strong at photorealistic images.
  • Flux β€” A newer open model known for high quality and fast generation. Gaining popularity rapidly.

Understanding Image Generation Concepts

import random
# ===== Simplified Diffusion: Noise to Signal =====
# Real diffusion works on millions of pixels.
# This demo shows the core IDEA on a tiny "image."
def create_image():
"""Our 'image' is a simple 4x4 grid of values."""
return [
[9, 8, 2, 1],
[8, 7, 3, 2],
[2, 3, 7, 8],
[1, 2, 8, 9]
]
def add_noise(img, noise_level):
"""Add random noise to an image."""
noisy = []
for row in img:
noisy_row = []
for val in row:
noise = random.uniform(-noise_level, noise_level)
noisy_row.append(round(max(0, min(9, val + noise)), 1))
noisy.append(noisy_row)
return noisy
def denoise_step(noisy, target, strength):
"""One denoising step: move slightly toward the target."""
result = []
for i in range(len(noisy)):
row = []
for j in range(len(noisy[i])):
moved = noisy[i][j] + (target[i][j] - noisy[i][j]) * strength
row.append(round(moved, 1))
result.append(row)
return result
def display(img, label):
print(f" {label}:")
for row in img:
print(" [" + " ".join(f"{v:4.1f}" for v in row) + " ]")
random.seed(42)
original = create_image()
print("=== Diffusion Model: From Noise to Image ===\n")
display(original, "Original image (what we want to generate)")
# Forward process: destroy the image with noise
noisy = add_noise(original, 8)
print()
display(noisy, "Step 0: Pure noise (random starting point)")
# Reverse process: gradually denoise (guided by prompt)
print("\n--- Denoising steps (guided by text prompt) ---")
current = noisy
for step in range(1, 6):
strength = 0.3 # Each step removes some noise
current = denoise_step(current, original, strength)
display(current, f"Step {step}: Denoising...")
print("\nThe image emerges from noise, guided by the prompt!")
print("Real models do this with millions of pixels over 20-50 steps.")
Output
=== Diffusion Model: From Noise to Image ===

  Original image (what we want to generate):
    [ 9.0  8.0  2.0  1.0 ]
    [ 8.0  7.0  3.0  2.0 ]
    [ 2.0  3.0  7.0  8.0 ]
    [ 1.0  2.0  8.0  9.0 ]

  Step 0: Pure noise (random starting point):
    [ 4.0  0.0  8.1  4.5 ]
    [ 1.3  9.0  0.0  5.0 ]
    [ 6.8  0.0  3.2  2.6 ]
    [ 7.5  8.4  3.5  9.0 ]

--- Denoising steps (guided by text prompt) ---
  Step 1: Denoising...
    [ 5.5  2.4  6.3  3.4 ]
    [ 3.3  8.4  0.9  4.1 ]
    [ 5.4  0.9  4.3  4.2 ]
    [ 5.6  6.5  4.8  9.0 ]
  Step 2: Denoising...
    [ 6.5  4.1  5.0  2.7 ]
    [ 4.7  8.0  1.5  3.5 ]
    [ 4.4  1.5  5.1  5.3 ]
    [ 4.2  5.1  5.8  9.0 ]
  Step 3: Denoising...
    [ 7.3  5.3  4.1  2.2 ]
    [ 5.7  7.7  2.0  3.1 ]
    [ 3.7  1.9  5.7  6.1 ]
    [ 3.3  4.2  6.5  9.0 ]
  Step 4: Denoising...
    [ 7.8  6.2  3.4  1.8 ]
    [ 6.3  7.5  2.3  2.8 ]
    [ 3.2  2.2  6.1  6.7 ]
    [ 2.6  3.5  6.9  9.0 ]
  Step 5: Denoising...
    [ 8.2  6.8  3.0  1.6 ]
    [ 6.8  7.3  2.5  2.6 ]
    [ 2.8  2.5  6.4  7.1 ]
    [ 2.1  3.0  7.3  9.0 ]

The image emerges from noise, guided by the prompt!
Real models do this with millions of pixels over 20-50 steps.
Note: The Art Debate: AI image generation has sparked one of the biggest debates in the creative world. Artists argue that AI models were trained on their work without permission or compensation. Others say AI is just a new tool, like the camera was when it was invented (painters protested that too). There's no easy answer. What's clear is that AI-generated images are here to stay, and society is still figuring out the rules β€” copyright, attribution, consent, and what counts as "art."

Prompt Tips for Better Images

Getting great images from AI is a skill. Here are battle-tested tips:

  • Be specific about style: "Oil painting," "anime style," "35mm film photography," "pixel art" β€” the style keyword changes everything
  • Describe lighting: "Golden hour," "dramatic side lighting," "soft diffused light," "neon glow" β€” lighting sets the mood
  • Use artist references: "In the style of Studio Ghibli" or "Wes Anderson color palette" gives AI a concrete aesthetic target
  • Specify what you DON'T want: Most tools support negative prompts: "no text, no watermark, no blurry" helps avoid common issues
  • Iterate: Your first image is rarely your best. Adjust the prompt, regenerate, adjust again. It's a conversation.

Ethical Concerns

With great power comes great responsibility:

  • Copyright: Who owns an AI-generated image? Laws are still catching up. Some countries say AI output can't be copyrighted.
  • Deepfakes: AI can generate realistic images of real people in fake situations. This raises serious concerns about misinformation.
  • Artist consent: Models trained on artists' work without permission. Services like Stability AI now offer opt-out programs.
  • Job impact: Stock photography, illustration, and concept art industries are being disrupted. Some artists are adapting by using AI as a tool.
Challenge

Quick check

How do diffusion models generate images?

Continue reading