Ai2's Tulu3-405B Outperforms DeepSeek and OpenAI's GPT-4o

Move over, DeepSeek and OpenAI—there’s a new AI contender shaking things up. Seattle-based nonprofit AI lab Ai2 just dropped Tulu3-405B, a model they say beats both DeepSeek V3 and OpenAI’s GPT-4o on several benchmarks. Oh, and it’s completely open source.

What does that mean? Well, unlike some of the big players in AI, Tulu3-405B’s inner workings are open for anyone to inspect, tweak, and build upon. Ai2 believes this launch highlights the U.S.’s ability to lead in developing top-tier AI models, even without relying on tech giants.

Tulu3-405B is a heavyweight in the AI world, packing a whopping 405 billion parameters—basically, the brainpower that helps it solve problems. Training it wasn’t a small feat either; it took 256 GPUs running in sync. Ai2 tested the model on benchmarks like math problem-solving and general knowledge, and it didn’t disappoint. It outperformed DeepSeek V3, GPT-4o, and even Meta’s Llama 3.1 on specialized tests like PopQA and GSM8K.

One of the secrets behind Tulu3-405B’s success is a training technique called reinforcement learning with verifiable rewards (RLVR). This method focuses on tasks with clear outcomes, like math or following instructions, to fine-tune the model’s performance.

If you’re curious to see Tulu3-405B in action, you can test it out on Ai2’s chatbot web app. The code is also up on GitHub, so developers can dive in and start experimenting.

This launch is a big deal for the open-source AI community, proving that cutting-edge AI doesn’t have to come from closed systems or corporate giants. Stay tuned—the AI race just got a lot more interesting.