Skip to content

AI developed by Sam Altman's OpenAI outperforms Elon Musk's AI, Grok, in AI Chess Tournament

Top-tier gamers observed allegedly superior AI opponents perform in a bafflingly novice manner.

AI championed by Sam Altman, OpenAI, outperforms Elon Musk's Grok in AI chess tournament triumph
AI championed by Sam Altman, OpenAI, outperforms Elon Musk's Grok in AI chess tournament triumph

AI developed by Sam Altman's OpenAI outperforms Elon Musk's AI, Grok, in AI Chess Tournament

General-Purpose AI Chatbots Fall Short in High-Profile Chess Tournament

The Google's Kaggle Game Arena AI Chess Exhibition recently took place, featuring a lineup of general-purpose AI chatbots. However, the results were far from impressive, with the chatbots demonstrating significantly sub-human chess abilities.

The tournament, which ran from August 5-7, saw OpenAI’s chatbot model o3 emerge as the winner, beating Elon Musk's AI model Grok 4 with a clean 4-0 in the finals. Despite o3's victory, both AI models struggled with fundamental weaknesses compared to even amateur human players.

Magnus Carlsen, the world chess champion and event commentator, likened these chatbots to “a gifted child who doesn’t know how the pieces move,” estimating their play strength at about 800 Elo rating—far below the level needed to compete with strong humans or standard chess engines.

Common failures included poor piece management, catastrophic blunders, incoherent positional play in the midgame, and an inability to capitalize on an advantage. Grok 4's errors were particularly severe, including blunders in opening tactics and poor handling of solid positions.

In the first match, Grok essentially gave away one of its important pieces for free. In game two, Grok tried to execute the "Poisoned Pawn" strategy but grabbed the wrong pawn. In game three, Grok had a solid position but then fumbled and lost piece after piece in rapid succession.

The tournament highlighted the gap between general language model reasoning and the procedural, concrete demands of a domain like chess, where specialized algorithms outperform these generalist AIs decisively. Despite the low chess skill level, OpenAI’s o3 demonstrated comparatively better strategic reasoning among the general-purpose models, winning the final without a single loss.

Interestingly, in an earlier tournament hosted by International Master Levy Rozman, less advanced models ended in a complete mess due to illegal moves, piece summonings, and incorrect calculations. This further underscores the challenges faced by general-purpose AI chatbots in a domain like chess.

Despite the disappointing performance of the general-purpose AI chatbots, it's important to note that specialized AIs, such as Stockfish, designed specifically for chess, have proven to be formidable opponents. For instance, Stockfish recently won a tournament against ChatGPT.

In conclusion, while general-purpose AI chatbots have made significant strides in various areas, the Google's Kaggle Game Arena AI Chess Exhibition underscored that they still have a long way to go in domains requiring precise rule-based reasoning and tactical calculation like chess.

The general-purpose AI chatbots, notably Elon Musk's AI model Grok 4, showed weaknesses in chess competence, mimicking a child unaware of the pieces' moves with an 800 Elo rating, far below human or standard engine levels. In the exhibition, Grok's performances included errors like mismanaging pieces, blunders in openings, and failing to capitalize on advantages, demonstrating the chasm between general language model reasoning and precise rule-based reasoning needed for chess. Even though more advanced models struggled in previous tournaments, specialized AIs like Stockfish, designed specifically for chess, have proven victorious against other AIs, such as ChatGPT, highlighting the ongoing need for improvement in general-purpose AI technology for chess.

Read also:

    Latest