Categories: Techno

OpenAI: new GPT model outperforms most programmers on the planet in tests

OpenAI has introduced o3 – a new family of language models, but for now they will only be available to researchers for testing.

OpenAI has developed updated versions of its large, reasoning language models. OpenAI's new model, called o3, replaces o1, which the company introduced in September. Like o1, the new model spends time thinking about a problem to provide better answers to questions that require step-by-step logical reasoning.

OpenAI says the o3 model scored higher than its predecessor on several metrics and benchmarks used to evaluate AI. These tests include those that measure complex programming skills, as well as complex math and scientific problem-solving.

The o3 model is trained using reinforcement learning to “think” before responding. When asked a question, o3 pauses before responding, “analyzes” the information, and “explains” its reasoning as it works. After a while, the model summarizes what it believes to be the most accurate answer. The same reasoning principles were used by o1, but now users can “regulate” the reasoning time: the longer the time, the more accurate the answer.

In ARC-AGI – a test designed to assess whether an AI system can effectively learn new skills beyond the data on which it was trained – o3 achieved 87.5% of the score at high computational settings. In the worst case (at low computational settings), the model tripled the performance of o1.

The developers also report that the model outperforms o1 by 22.8 pp in SWE-Bench Verified – a benchmark focused on programming tasks. In addition, when solving programming tasks for the Codeforces rating, it scored 2727 points. Developers who receive more than 2600 points receive the title of International Grandmaster of Programming on the platform – there are just over 300 such people on Earth.

All reports about the high-profile achievements of the new model are so far only reported by OpenAI. The model will not be released publicly or by subscription yet. The company is accepting applications until January 10th to register groups of security testers and researchers who will be the first to assess the capabilities and risks of the new model.

Natasha Kumar

Natasha Kumar has been a reporter on the news desk since 2018. Before that she wrote about young adolescence and family dynamics for Styles and was the legal affairs correspondent for the Metro desk. Before joining The Times Hub, Natasha Kumar worked as a staff writer at the Village Voice and a freelancer for Newsday, The Wall Street Journal, GQ and Mirabella. To get in touch, contact me through my natasha@thetimeshub.in 1-800-268-7116

Share
Published by
Natasha Kumar

Recent Posts

In Lithuania, they offer to punish for damage to infrastructure imprisonment up to 15 years

< IMG SRC = "/Uploads/Blogs/A2/11/IB-FQMF601TF_F0593115.jpg" Alt = "In Lithuania, they are offered to punish infrastructure…

1 hour ago

Shares of European Defense Companies were sharply rise after promises to support Ukraine – Bloomberg

< img src = "/uploads/blogs/2e/3C/ib-f-f-FQMITA7OV_FD28E3CC.jpg" Alt = "shares of European defense companies were sharply ridden…

1 hour ago

China Ambassador criticized Australia's decision to limit Deepeseek

< img src = "/uploads/blogs/9f/6a/ib-fqm6oc5lg_1cdd7077. < P > Ambassador of China to Australia Xiao Tseana…

2 hours ago

Cryptocurrency hackers abducted more than $ 1.5 billion in February

< IMG SRC = "/Uploads/Blogs/57/B0/IB-FQMD01T2B_A16650ec.jpg" Alt = "cryptocurrency hackers stole more than $ 1.5 billion…

2 hours ago

Britain has started investigating Tiktok through teenage data

< img src = "/uploads/blogs/f1/5E/ib-f1m6mqub8_499ea2a6.jpg" Alt = "Britain has started investigating TikTok through teenage data"/>…

2 hours ago

Unknown tried to break the account of one of the courts of Dnipropetrovsk region

< img src = "/uploads/blogs/80/de/ib-1ileluijn_65656F9313.jpg" Alt = "Unknown tried to break the account of one…

2 hours ago