OpenAI simply wrapped up its 12-day occasion referred to as “Shipmas” the place it made some superb bulletins. As a correct send-off, OpenAI launched us to o3, its upcoming reasoning mannequin, and it seems to be like it is going to be extraordinarily good.
Throughout Shipmas, OpenAI introduced another nice AI goodies. For starters, it launched its $200/month ChatGPT Professional plan. This can give customers entry to probably the most highly effective model of o1 and different nice options. Additionally, the corporate launched Sora, its AI video generator that just about broke the web when the corporate first confirmed it off. You should use it if you happen to’re a ChatGPT Plus member.
OpenAI offers us a sneak peek at o3, its newest reasoning mannequin
What occurred to o2? Effectively, it’s within the farm up-state together with Home windows 9, the OnePlus 4, and the iPhone 9. OpenAI determined to skip to o3 as a result of there’s a British telecommunication firm named O2. So, this was a method to keep away from any authorized points down the street.
o3 shall be a reasoning mannequin, which is analogous to an everyday mannequin. Nonetheless, the important thing distinction is that, as a substitute of supplying you with the reply suddenly, a reasoning mannequin will truly break down the method and present you the entire steps it took to come back to the conclusion. Google’s Gemini 2.0 Flash Pondering is an effective instance of a reasoning mannequin. So, if you wish to take a better look into how a mannequin arrived at its reply, you then’ll need to use reasoning fashions.
Since this shall be OpenAI’s magnum opus, you understand that it’ll include some insane AI smarts. The corporate launched some statistics on the way it performs, and it exhibits that it’s properly previous the purpose of constructing AI that’s smarter than a human (properly, largely).
For instance, the corporate put the mannequin via the SWE-Bench Verified coding exams, and it beat o1 by 22.8%. Subsequent, OpenAI put o3 via the GPQA (Google-Proof Q&A Benchmark) Diamond science benchmark, and it scored 87.7%. OpenAI additionally put o3 via the AIME (American Invitiational Arithmetic Examination), and it solely missed one of many 15 questions. The AIME is a particularly onerous math competiton.
It seems to be like OpenAI actually outdid itself this time round. We don’t know when the corporate will launch this mannequin to the general public. Simply don’t depend on it anytime quickly, as o1 remains to be relatively new.