Meet the AI Platforms11 min read

Open Weight Models

Download a brain and run AI on your own computer
scope:Core Conceptdifficulty:Intermediate

What If You Could Download a Brain?

When you use ChatGPT, Claude, or Gemini, your words travel to a massive data center, get processed by a model you can't see, and an answer comes back. You're renting someone else's brain. You don't know exactly how it works. You can't look inside. And if the company changes its rules, raises prices, or shuts down β€” you lose access.

But what if you could download the brain itself? Run it on your own computer, in your own home, with no internet connection? What if you could look inside it, modify it, fine-tune it for your specific needs?

That's the promise of open weight models. These are AI models where the trained weights β€” the billions of numbers that define the model's knowledge β€” are publicly available for anyone to download and use.

Open Weight vs Open Source vs Closed

These terms get confused a lot. Let's be precise:

  • Closed models β€” you only interact through an API. You never see the weights, architecture details, or training data. Examples: GPT-4, Claude, Gemini Ultra. The company controls everything.
  • Open weight models β€” the trained model weights are publicly downloadable. You can run them, fine-tune them, and deploy them. But the training code and data might not be shared. Examples: Llama, Mistral.
  • Fully open source β€” weights, training code, data, AND the process are all open. This is rare because training data is often proprietary or legally gray. Some models like OLMo from AI2 aim for this.

Most people say "open source" when they mean "open weight." The weights are the critical piece β€” they're what you need to actually run the model.

The Llama Revolution

The open weight movement has a hero story. In February 2023, Meta (Facebook's parent company) released Llama β€” a powerful language model with weights freely available. It changed everything.

Before Llama, running a capable AI required a massive budget. After Llama, a researcher with a good GPU could run a model rivaling commercial offerings. The community went wild:

  • Llama 2 (July 2023) β€” openly licensed for commercial use, available in 7B, 13B, and 70B parameter sizes
  • Llama 3 (April 2024) β€” a massive leap in quality, with 8B and 70B sizes that competed with GPT-3.5
  • Llama 3.1 405B β€” a 405 billion parameter model that rivaled GPT-4 on many benchmarks β€” and was open weight

Meta's strategy was clever: by making powerful AI free, they prevented any single competitor from monopolizing the market. If everyone has access to strong AI, nobody can charge monopoly prices.

Note: Why does Meta give it away for free? Meta spends billions training these models but releases them openly. Why? Because Meta makes money from ads, not AI APIs. If open models prevent Google and Microsoft from dominating AI, that helps Meta. Plus, thousands of developers improve Llama for free through community fine-tuning. It's strategy, not charity.

The Open Weight Ecosystem

Llama inspired a whole ecosystem of open weight models:

  • Mistral β€” a French startup that punches way above its weight. Their Mistral 7B outperformed much larger models. Mixtral (a mixture-of-experts model) was a breakthrough in efficiency.
  • Microsoft Phi β€” small but mighty. Phi-2 (2.7B parameters) and Phi-3 showed that smaller models trained on high-quality data could beat much larger ones on many tasks.
  • Qwen β€” from Alibaba. Among the best open weight models, especially for multilingual tasks and coding.
  • Gemma β€” Google's open weight models, built from Gemini research. Available in small sizes for on-device and research use.
  • DeepSeek β€” from China. DeepSeek-V2 introduced innovative architecture choices that improved efficiency dramatically.

How to Run AI on Your Own Computer

The magic of open weight models is that you can run them. Here's how:

  • Ollama β€” the simplest way. Install it, run ollama run llama3, and you're chatting with AI locally. No internet needed. Works on Mac, Linux, and Windows.
  • LM Studio β€” a beautiful desktop app for running local models. Point and click to download and chat. Has a built-in server mode so your apps can use local AI.
  • llama.cpp β€” the engine under the hood. A C++ library that runs models efficiently on CPUs (not just GPUs). This is what made local AI practical for regular computers.
  • Hugging Face β€” the GitHub of AI models. Thousands of open weight models hosted and downloadable. The community hub where people share fine-tuned versions.

Running Open Weight Models Locally

# === Running a local model with Ollama ===
# First install Ollama: https://ollama.ai
# Then in terminal: ollama pull llama3
import requests
def ask_local_ai(prompt, model="llama3"):
"""Talk to a local AI model via Ollama."""
response = requests.post(
"http://localhost:11434/api/generate",
json={"model": model, "prompt": prompt, "stream": False}
)
return response.json()["response"]
# Chat with AI running on YOUR computer
answer = ask_local_ai("What is photosynthesis? Explain simply.")
print("Local AI says:", answer[:200])
# === Compare model sizes ===
models_info = {
"Phi-3 Mini (3.8B)": {"ram": "~3 GB", "speed": "Fast", "quality": "Good"},
"Llama 3 8B": {"ram": "~5 GB", "speed": "Medium", "quality": "Great"},
"Llama 3 70B": {"ram": "~40 GB", "speed": "Slow", "quality": "Excellent"},
"Mixtral 8x7B": {"ram": "~26 GB", "speed": "Medium", "quality": "Excellent"},
}
print("\n--- Open Weight Model Guide ---")
print(f"{'Model':<22} {'RAM Needed':<12} {'Speed':<10} {'Quality'}")
print("-" * 56)
for name, info in models_info.items():
print(f"{name:<22} {info['ram']:<12} {info['speed']:<10} {info['quality']}")
print("\nTip: Start with smaller models and work up!")
print("8GB RAM laptop? Try Phi-3 Mini or Llama 3 8B.")
print("32GB+ workstation? Try Llama 3 70B or Mixtral.")
Output
Local AI says: Photosynthesis is how plants make their
own food using sunlight! Think of it like cooking β€” plants
take water from the ground, carbon dioxide from the air,
and use sunlight as energy to cook them together into
sugar (their food) and release oxygen as a bonus...

--- Open Weight Model Guide ---
Model                  RAM Needed   Speed      Quality
--------------------------------------------------------
Phi-3 Mini (3.8B)      ~3 GB        Fast       Good
Llama 3 8B             ~5 GB        Medium     Great
Llama 3 70B            ~40 GB       Slow       Excellent
Mixtral 8x7B           ~26 GB       Medium     Excellent

Tip: Start with smaller models and work up!
8GB RAM laptop? Try Phi-3 Mini or Llama 3 8B.
32GB+ workstation? Try Llama 3 70B or Mixtral.

Pros and Cons: Open vs Closed

Advantages of open weight models:

  • Privacy β€” your data never leaves your computer. No company reads your prompts.
  • Cost β€” no per-token API fees. Once you have the model, usage is free (just electricity).
  • Customization β€” fine-tune the model on your specific data for your specific task.
  • No censorship β€” you control the safety filters. Research applications may need uncensored outputs.
  • Offline use β€” works without internet. Useful on planes, in secure facilities, or in areas with poor connectivity.
  • Transparency β€” you can inspect the weights, study the model, and verify behavior.

Disadvantages of open weight models:

  • Hardware required β€” good models need significant RAM and ideally a GPU. Not everyone has this.
  • Lower quality ceiling β€” the very best closed models (GPT-4, Claude) still outperform the best open models on many complex tasks.
  • No guardrails by default β€” without careful setup, open models can generate harmful content.
  • Setup complexity β€” even with tools like Ollama, it's harder than just opening a website.
  • No built-in tools β€” closed models come with web browsing, code execution, and image generation built in. You'd need to build these yourself.
Challenge

Quick check

What is the key difference between 'open weight' and 'fully open source' AI models?

Continue reading