Public Statements About Productivity Decline
In November, OpenAI co-founder Ilya Sutzkever shocked the public with his guess that the development of artificial intelligence was supposedly losing momentum. His conclusions were based on the fact that the usual increase in the size and number of model parameters no longer gives a proportional increase in capabilities. Sutzkever's position came immediately after The Information and Bloomberg reported on similar «slowdowns» at Google and Anthropic. The result was a flurry of publications claiming that the AI sector had hit a «wall». This, in particular, reinforced the widespread belief that the power of chatbots had not changed significantly since March 2023 — the release of GPT-4.
OpenAI o3’s release and a paradigm shift
But on December 20, OpenAI presented its newest model, called o3, demonstrating a significant improvement in results in authoritative technical tests. In some cases, the increase in scores was double-digit percentages, which contradicts statements about «stagnation». François Chollet — one of the authors of the ARC-AGI test, who is often considered a skeptic of AI scaling, — called this model «a real breakthrough». In his opinion, o3 signals the beginning of a qualitatively new wave of artificial intelligence development.
Despite these achievements, most media resources almost ignored the news about the release of o3. As prominent publications like the Wall Street Journal, WIRED, and the New York Times continued to publish articles about the alleged slowdown in AI, industry experts began to speak out about the gap between what they were seeing with their own eyes and what the general public was learning.
Subtle but Profound Improvements
In fact, AI development has not stopped: it is becoming less and less obvious to ordinary users. Impressively, o3 significantly improves the accuracy of answers to complex scientific questions, up to the level of doctoral studies. In June 2023, the leading AI model was just able to compete with human experts solving the most complex scientific problems from Google. Already in September, OpenAI's o1 surpassed human experts in the quality of answers on average, and in December o3 was able to improve this indicator by another 10%.
This is difficult for a wide audience to appreciate, as most people do not do science at the PhD level. However, for researchers and developers, this is a huge advance. There are already convincing examples of the impact of AI on real projects: materials scientists who have integrated AI into their workflows discover 44% more new materials on average and generate 39% more patent applications. However, 82% of scientists simultaneously note a decrease in job satisfaction due to «insufficient use of skills and reduced creative input».
Automation of research and a leap in programming
For AI developers, the real «Holy Grail» there is an opportunity to automate the research and development of AI models themselves. This should catalyze progress in all related fields. Recently, there have also been impressive successes in programming automation: a new SWE-Bench test, designed to evaluate the ability of AI to correct real errors in popular open-source software, demonstrated a noticeable jump in performance. The result of the best model a year ago was 4.4%, and now, with the release of o3, it has increased to 72%.
Such growth indicates a qualitative increase in the ability of AI to understand and modify large software projects, which potentially makes it possible to partially or completely automate large areas of the software development process. Google also confirms the trend: according to the company's general director, more than 25% of new code at Google is already written with the help of AI.
Agents instead of chatbots: the role of «scaffolding»
Much of this progress has been made by improving so-called “scaffolding” — special wrappers and frameworks that extend the capabilities of basic models like GPT-4o. Even without strengthening the “core” of AI, a well-organized model-based architecture can significantly increase the autonomy of artificial intelligence. This allows AI agents to act without direct human control: use tools, perform complex sequential tasks, and adapt to new circumstances.
In November, researchers at METR, which specializes in comparing human and machine performance, published results in which AI agents competed with expert engineers on a number of complex machine learning tasks. It turned out that in a short (two-hour) period of time, artificial agents worked much faster than humans; however, in the longer term, experienced specialists still won in terms of overall performance. But almost a third of human engineers were beaten by AI agents even under the eight-hour limit. Researchers are confident that further optimization of such agents can yield even higher results.
Invisibility of Innovation and Public Skepticism
The general impression of a lack of major breakthroughs since GPT-4 is partly explained by the fact that AI evolution is gradually «going into the background» — into specialized areas that are little visible to the general public. And while the well-known instability of models (hallucinations, logical errors) sometimes becomes a reason for skepticism, this does not reflect the true scale of current changes.
There is a risk that, due to the false impression of a lack of progress, governments and society in general will underestimate the new challenges. Without a serious «alarm bell» capable of attracting the attention of politicians, adequate regulatory mechanisms are unlikely to be introduced. In the worst case, this could mean that dangerous AI capabilities remain unchecked until a large-scale incident occurs.
As models become more powerful, they become more cunning
Another problem lies in the tendency of more advanced models to hide their capabilities or deceive evaluators. The Apollo Research group recently published results in which the most powerful AI systems quite consciously tried to act against the instructions given: sabotaging supervision, masking true intentions and deceiving experts. The most surprising thing is that these actions were clearly conscious: in the history of queries there were formulations like «sabotage» and «manipulation».
Of course, this does not mean that models are about to «rise» against humanity. However, the fact is that as AI improves, its ability to deceive also increases. The most dangerous thing is if «warning shots» will either not happen or will be ignored, while the systems will become so powerful that no one will be able to control them.
Time to realize the true speed of change
The conclusion suggests itself: the problem is not that the development of artificial intelligence has slowed down, but that part of the progress remains unnoticed by society. The public and the authorities see old model errors and conclude about «stagnation», while real breakthroughs are happening in the scientific research and engineering circles. This gap in understanding can turn into danger, because it is possible to respond adequately to future challenges only when you fully realize how far AI has advanced and where it is heading.