Meta · 2025 · 04 · 05 · Model · ~1 min read
Meta released Llama 4 (Scout + Maverick)
What's actually new
- Mixture-of-experts in the open. First time the architecture shipped at this scale with downloadable weights.
- 10-million-token context on Scout — paste a whole codebase or a stack of long PDFs.
- Strong multilingual performance — meaningfully better than Llama 3 on non-English languages.
If you want more
Worth knowing
- Launch-day scores were cherry-picked. Independent testing showed weaker real-world performance than the headlines suggested.
- The 10M-token context degrades quickly. Past ~1M tokens accuracy drops; past 5M it's barely usable.
- Behemoth (the giant 2-trillion-parameter sibling) was promised but not shipped.
Who should care
Companies running their own AI on their own hardware who care about cost-per-token. Teams needing long-context analysis on a budget. Researchers studying mixture-of-experts at open-weights scale.
What to do about it
If you're running Llama 3.x in production, plan to test Llama 4 — costs typically drop. Validate on YOUR workload, not the launch-day numbers. New deployments should compare Llama 4 Scout for long-context work specifically.
Honest take
Llama 4 was the moment open-weights AI moved into mixture-of-experts territory at the leading edge. The launch-day buzz was over-hyped and the real-world performance was slightly behind expectations, but the architectural shift mattered: it's much cheaper to run mixture-of-experts at scale than dense models, and that economics was the bigger story than any single test result. Open-source AI got a structurally cheaper future.
Sources
- Meta — Llama 4 announcementvendor
- Hugging Face — Llama 4 model collectionthird party
- Artificial Analysis — Llama 4 testingbenchmark
Last verified · 2026 · 05 · 05 · Found a fact wrong? corrections@aguidetocloud.com