
Despite significant progress in artificial intelligence, a worrying trend is emerging: the most recent and most sophisticated AI models, in particular those that use complex “reasoning” capacities, show a Significant increase in inaccurate and manufactured information. It is a phenomenon commonly called “hallucinations”. This development is confusing for industry leaders and pose considerable challenges for the widespread and reliable application of AI technologies.
Recent tests of the latest models of major players such as Openai and Deepseek reveal a surprising reality: these supposedly more intelligent systems generate incorrect information at higher rates than their predecessors. OPENAI's own OPENAI evaluations, detailed in a Recent research documenthas shown that their latest O3 and O4-Mini models, published in April, suffered from significantly high hallucination rate compared to their previous O1 model at the end of 2025. For example, during the summary of questions on public figures, the O3 hallucinated 33% of the time, while O4-Mini made a failure of 48% of the time. In striking contrast, the old O1 model had a hallucination rate of only 16%.
The problem is not isolated for Openai. Independent test by VectaraWho classifies AI models, indicates that several models of “reasoning”, including DEEPSEEK R1, have experienced significant increases in hallucination rates compared to previous iterations of the same developers. These reasoning models are designed to imitate human -type thinking processes by breaking down problems in several stages before arriving at an answer.
The implications of this increase in inaccuracies are important. As IA chatbots are increasingly integrated into various applications – customer service and research assistance in legal and medical areas – the reliability of their production becomes essential. A customer service bot providing incorrect political information, as experienced by users of the programming tool cursor, or a legal AI citing nonexistent jurisprudence, can cause significant frustration of users and even serious real consequences.
While AI companies have initially expressed optimism that hallucination rates would naturally decrease with model updates, recent data depict a different image. Even Openai recognizes the problem, a company spokesperson declaring: “Hallucinations are not intrinsically more widespread in reasoning models, although we actively work to reduce higher hallucination rates that we saw in O3 and O4-Mini.” They argue that research on the causes and attenuation of hallucinations in all models remain a priority.
The underlying reasons for this increase in errors in more advanced models remain somewhat elusive. Due to the volume of data on the data on which these systems are formed and the complex mathematical processes they use, it is an important challenge for technologists. Some theories suggest that the process of “thought” step by step in reasoning models could create more opportunities for errors to worsen. Others offer that training methodologies, such as learning to strengthen, although beneficial for tasks such as mathematics and coding, could inadvertently compromise factual precision in other areas.
Researchers actively explore potential solutions to alleviate this growing problem. Study strategies include training models to recognize and express uncertainty, as well as employment Find increased generation techniques which allow the AI to reference external and verified sources of information before generating answers.
However, some experts warn against the allocation of AI errors with the term “hallucination” itself. They argue that this implies in an inaccurate way a level of consciousness or perception that the models of AI do not have. Instead, they consider these inaccuracies as a fundamental aspect of the current probabilistic nature of language models.
Despite the current efforts to improve precision, the recent trend suggests that the path to a really reliable AI can be more complex than initially expected. For the moment, users are invited to be prudence and critical thinking when interacting with the most advanced AI chatbots, especially when looking for factual information. It seems that the “growing pain” of AI development are far from over.
