Categories: Techno

OpenAI: new GPT model outperforms most programmers on the planet in tests

OpenAI has introduced o3 – a new family of language models, but for now they will only be available to researchers for testing.

OpenAI has developed updated versions of its large, reasoning language models. OpenAI's new model, called o3, replaces o1, which the company introduced in September. Like o1, the new model spends time thinking about a problem to provide better answers to questions that require step-by-step logical reasoning.

OpenAI says the o3 model scored higher than its predecessor on several metrics and benchmarks used to evaluate AI. These tests include those that measure complex programming skills, as well as complex math and scientific problem-solving.

The o3 model is trained using reinforcement learning to “think” before responding. When asked a question, o3 pauses before responding, “analyzes” the information, and “explains” its reasoning as it works. After a while, the model summarizes what it believes to be the most accurate answer. The same reasoning principles were used by o1, but now users can “regulate” the reasoning time: the longer the time, the more accurate the answer.

In ARC-AGI – a test designed to assess whether an AI system can effectively learn new skills beyond the data on which it was trained – o3 achieved 87.5% of the score at high computational settings. In the worst case (at low computational settings), the model tripled the performance of o1.

The developers also report that the model outperforms o1 by 22.8 pp in SWE-Bench Verified – a benchmark focused on programming tasks. In addition, when solving programming tasks for the Codeforces rating, it scored 2727 points. Developers who receive more than 2600 points receive the title of International Grandmaster of Programming on the platform – there are just over 300 such people on Earth.

All reports about the high-profile achievements of the new model are so far only reported by OpenAI. The model will not be released publicly or by subscription yet. The company is accepting applications until January 10th to register groups of security testers and researchers who will be the first to assess the capabilities and risks of the new model.

Natasha Kumar

Natasha Kumar has been a reporter on the news desk since 2018. Before that she wrote about young adolescence and family dynamics for Styles and was the legal affairs correspondent for the Metro desk. Before joining The Times Hub, Natasha Kumar worked as a staff writer at the Village Voice and a freelancer for Newsday, The Wall Street Journal, GQ and Mirabella. To get in touch, contact me through my natasha@thetimeshub.in 1-800-268-7116

Share
Published by
Natasha Kumar

Recent Posts

Cryptocurrency analyst predicts rapid growth of Bitcoin cryptocurrency to $250,000

Tom Lee, a well-known analyst and cryptocurrency advocate, has predicted that the price of Bitcoin…

19 minutes ago

Stalker developers congratulated gamers on Christmas with “Shchedryk” and a postcard

The GSC Game World team, developers of the cult game Stalker, congratulated fans on the…

19 minutes ago

Donation fraud and obstruction of the Armed Forces of Ukraine: blogger Shavlyuk's arrest extended for another two months

In the Vinnytsia City Court on December 25, the court approved the investigation's motion and…

1 hour ago

CPR reveals Kremlin narratives in media of Global South countries

The Center for Countering Disinformation monitors the information space of Global South countries and identifies…

1 hour ago

From real life. “My husband disappeared without a word”: After years, he returned, asking for help, because he was left alone

I didn't invite him in. The door opened suddenly, and behind it stood – Marek.…

2 hours ago

From real life. “A scandal broke out at home on Christmas Day”: My son-in-law decided to leave the family

We all sat at the table, sharing the Christmas wafer and wishing each other health,…

2 hours ago