Skip to content

AI-powered Langauge Model Acquires Mastery of Challenging Mathematics Olympiad

International Mathematical Olympiad (IMO) held in Australia's Sunshine Coast this month, showcases the intellect of 635 gifted teenagers from 114 countries. Amid the rapid scribbling and undercurrent of global discussions, an unexpected participant graced the competition - not a high school...

AI-driven Language Model excels in one of the world's challenging mathematics competitions
AI-driven Language Model excels in one of the world's challenging mathematics competitions

AI-powered Langauge Model Acquires Mastery of Challenging Mathematics Olympiad

In a groundbreaking achievement, OpenAI's experimental reasoning large language model (LLM) has earned a gold medal-level performance at the 2025 International Mathematical Olympiad (IMO). The LLM successfully solved 5 out of 6 challenging problems, scoring an impressive 35 out of 42 points—exactly the threshold for a gold medal.

This remarkable feat was validated by three former IMO medalists who independently graded the AI's carefully written natural language proofs under the same strict conditions as human contestants: two 4.5-hour exam sessions, no external tools or internet, and only the official problem statements.

A Leap in AI's Capacity for Sustained Logical Thinking

The IMO problems require extended, creative, and multi-step mathematical reasoning far beyond previous AI benchmarks. The reasoning time horizon here (around 100 minutes per problem) is much longer and more complex than previous milestones such as GSM8K, MATH benchmark, and AIME contests, marking a significant leap in AI’s capacity for sustained and intricate logical thinking.

Constructing Watertight, Human-Level Mathematical Arguments

Unlike typical verified tasks in AI, IMO solutions demand multi-page, detailed, and rigorous proofs that are hard to formally verify. OpenAI’s model succeeded by moving beyond traditional reinforcement learning paradigms reliant on clear-cut rewards, demonstrating an ability to construct watertight, human-level mathematical arguments in natural language.

General-Purpose Modeling and the Prospects of AGI

Unlike narrow or task-specific models, OpenAI’s system did not use IMO-specific training but relied on general-purpose reasoning methods coupled with reinforcement learning and test-time compute scaling techniques. This shows promising progress toward more flexible and broadly capable AI systems, and could have significant implications for artificial general intelligence (AGI).

A New Frontier in AI Reasoning

This solidifies a new frontier in AI reasoning, impacting theory, education, and future research. It demonstrates that AI can handle tasks once considered a top challenge for human intellect, advancing the prospects of AGI and collaborative human-AI mathematical discovery.

Alexander Wei, a research scientist at OpenAI, announced the achievement, stating, "Our experimental reasoning large language model (LLM) has achieved a gold medal-level performance, solving five of the six problems in the 2025 IMO."

The LLM's solutions for the 2025 IMO were released online and revealed a distinct, sometimes sassy, often meticulous style. For Problem 1, the LLM navigated a labyrinth of conditions about "sunny" lines with a proof spanning several pages, culminating in a playful "No citation necessary" after computing a mystery number.

The 66th International Mathematical Olympiad (IMO) took place on Australia’s Sunshine Coast in July 2025. The event brought together 635 of the world’s brightest young minds from 114 countries. The LLM's presence at the IMO is both an inspiration and a call to elevate the skills of human contestants.

Overall, OpenAI’s experimental LLM success at the 2025 IMO sets a historic AI milestone by demonstrating gold medal-level problem-solving in one of the most demanding math competitions with a general-purpose reasoning system under real exam conditions. Experts attribute the advancements in reinforcement learning as a key driver in the LLM's ability to adapt and reason without task-specific training.

OpenAI CEO Sam Altman called the achievement a "significant marker of how far AI has come." The LLM's performance in the 2025 IMO challenges the notion that AI lacks true understanding, suggesting a leap toward general intelligence. Recent studies in Nature Machine Intelligence (2024) suggest that this method can boost multi-step reasoning by 40%.

The LLM's experimental nature is distinct from the forthcoming GPT-5, which OpenAI plans to release soon but without this level of mathematical prowess for months. The 2025 IMO will be remembered not just for its equations, but for the code that cracked them, hinting at a new era of intelligence or a redefinition of competition.

  1. Artificial intelligence, demonstrated by OpenAI's experimental reasoning large language model (LLM), has shown an impressive capacity for artificial general intelligence (AGI) by solving five out of six problems at the 2025 International Mathematical Olympiad (IMO), traditionally considered a top challenge for human intellect.
  2. Moving beyond traditional reinforcement learning paradigms, the LLM demonstrated an ability to construct watertight, human-level mathematical arguments in natural language, marking a milestone in AI's capacity for sustained and intricate logical thinking, and potentially redefining the realms of competition.

Read also:

    Latest