OpenAI Launches o1 Model to Advance AI Reasoning Capabilities

Gábor Bíró 2024. September 13.
3 min de lecture

OpenAI's latest artificial intelligence model, o1 (internally codenamed "Strawberry"), is now available. The o1 model is specifically designed to enhance the reasoning capabilities of artificial intelligence. Multiple sources report that this new model family aims to solve complex problems in science, programming, and mathematics by spending more time "thinking" before providing an answer.

OpenAI Launches o1 Model to Advance AI Reasoning Capabilities
Source: Création originale

Advanced Reasoning and Performance

The o1 model has demonstrated remarkable capabilities in complex problem-solving, particularly in STEM (Science, Technology, Engineering, and Mathematics) fields. In tests, o1 placed in the 89th percentile in competitive programming contests (Codeforces) and ranked among the top 500 students in the USA Mathematical Olympiad qualifier (AIME). In scientific domains like physics, biology, and chemistry, it surpassed PhD-level human accuracy on a benchmark dataset (GPQA). Its advanced reasoning allows o1 to tackle intricate questions, generate sophisticated algorithms, and excel in comparative analysis tasks, such as examining contracts or legal documents.

Performance Benchmarks

The o1 model showcased outstanding performance across various benchmarks, proving its advanced reasoning skills. The table below summarizes key results for the o1 model:

Benchmark Performance
Codeforces (Competitive Programming) 89th percentile
AIME (Math Olympiad Qualifier) Top 500 students in the USA
GPQA (Physics, Biology, Chemistry) Surpasses PhD-level accuracy
International Olympiad in Informatics (IOI) 49th percentile globally
Codeforces Elo rating 1807 (93rd percentile)
MMLU Subcategories Outperforms previous models in 54 out of 57

The performance of the o1 model is particularly noteworthy in STEM fields, demonstrating its ability to solve complex problems and logically work through difficult tasks. Its results elevate AI reasoning capabilities to a new level, representing a significant advancement for applications in science, mathematics, and programming.

o1 Model Variants

The o1 model has been released in two variants: o1-preview and o1-mini. The o1-mini is smaller, faster, and more cost-effective, specifically designed for coding tasks. o1-mini is reported to be 80% cheaper than o1-preview while delivering competitively strong performance on coding benchmarks. Both models are accessible within ChatGPT and via the OpenAI API.

Limitations and Challenges

Despite its advanced capabilities, the o1 model faces several challenges. It is significantly more expensive to use, with input costs being 3x and output costs 4x higher than GPT-4o via the API. The o1 model can sometimes be slower in processing queries, especially for complex problems that might require over ten seconds of computation time. Another limitation is that o1 currently does not support features like web browsing and file analysis, which are available in other AI models.

Availability and Future Plans

The o1 model is currently available to ChatGPT Plus and Team users, with limited weekly message caps: 30 messages for o1-preview and 50 messages for o1-mini. The o1-mini model is expected to become available to all free ChatGPT users soon, although a specific release date has not yet been announced. OpenAI plans to further enhance the model's capabilities, address its limitations, and integrate additional features like browsing and file uploads to increase its utility across various applications.

Gábor Bíró 2024. September 13.