On Friday, on the 12th day of the ads of the 12 days of Openai, the CEO of Sam Altman presented the new AI models O3 and O3-Mini, which improve the O1 line, launched earlier in this year. The models are still not available to the general public, but they have already been opened for researchers and security testing.
The main feature of these models is the technology of a “private chain of reasoning”. It allows models to suspend, analyze the internal dialogue and plan your answers, demonstrating “simulated thinking” (Simulated Reasoning, SR). This is the next stage in the development of artificial intelligence, which goes beyond the possibilities of traditional large language models (LLMS).
The models were called “O3” to avoid conflicts with the brand of the British telecom provider O2. Altman, commenting on an unusual choice of the name, noted: “In the best traditions Openai, we are again terrible in the name of our products.”
The O3 model has already broken records on several key tests. At the ARC-Agi visual benchmark, she gained 75.7% in low-cost mode and 87.5% in high computing power mode, reaching the level of human performance. O3 also showed 96.7% at the American Invitational Mathematics exam, 2024, mistaken in only one question, and demonstrated 87.7% on the GPQA Diamond test with the level of graduate school on biology, physics and chemistry. On the Frontier Math from Epochai, the model solved 25.2% of the tasks, while other AI did not overcome the threshold of 2%.
The simplified version, O3 -Mini, offered a new approach to calculations with setting data speed setting – low, medium and high. At the same time, high settings show the best results. O3-Mini has already surpassed the previous model O1 on Codeforces tests.
The release of models coincided with similar development of competitors. The day before, Gemini 2.0 Flash Thinking Experimental, Deepseek launched Deepseek-R1, and Alibaba presented QWQ, calling it the first “open” alternative to the O1.
Openai claims that first new models will be available to security researchers, and the public launch of the O3-Mini version is expected at the end of January, while the full-fledged O3 release is scheduled for a later term.