
In a new series of experiences, researchers from Google Deepmind and the University College of London discovered that large-language models (LLM) like GPT-4O, Gemma 3 and O1-PREVIEW fight with an unexplained double challenge: they are often too confident in their initial responses but become disproportionately uncertain when they are confronted with opposite points of view.
The LLMs are at the heart of today's artificial intelligence systems, allowing everything, virtual assistants to decision -making tools in health care, finance and education. Their growing influence requires not only precision but also consistency and transparency in the way they reach the conclusions. However, the new discoveries suggest that these models, although advanced, do not always operate with the rational precision that we assume.
At the heart of the study is a paradox: LLM tends to stay obstinately to their first answer when they recalled it, showing what researchers call a “bias supporting the choice”. However, paradoxically, when their answers are disputed – in particular with the opposite advice – they frequently lose confidence and change their mind, even when these advice is defective.
To explore this, the researchers have designed a single two -step test framework. First, an LLM would answer a question of binary choice, as to determine which city is further north. Then he would receive “advice” from another LLM, with different levels of agreement and confidence. Finally, the original model had to make a final decision.
A key innovation in the experience was to control whether the LLM could “see” its initial response. When the initial response was visible, the model became more confident and less likely to change your mind. When hidden, he was more flexible, suggesting that the memory of his own answer distorted his judgment.
Research depicts an image of LLMS as digital decision -makers with very human quirks. Like people, they display a tendency to strengthen their initial choices even when new contradictory information emerges – a behavior probably caused by an internal need for consistency rather than optimal reasoning.
Interestingly, the study also revealed that LLMs are particularly sensitive to contradictory advice. Rather than weighing all the new information uniformly, the models have systematically gave more weight to the opposite views than that of support. This hypersensitivity has led to a sharp drop in confidence, even in correct initial responses.
This behavior challenges what is called normative Bayesian update, the ideal method of integrating new evidence proportional to its reliability. Instead, the LLMS is overlooking negative feedback and sub-noise agreement, pointing to a form of decision-making which is not purely rational, but shaped by internal biases.
While previous research has attributed behaviors similar to “sycophance” – the tendency of a model to align with users' suggestions – this new work reveals a more complex image. Sycophance generally leads to equal deference in terms of acceptance and input disagreement. Here, however, the models have shown an asymmetrical response, promoting dissident advice on support input.
This suggests two distinct forces at work: hypersensitivity to contradiction which causes net trips of trust and a choice of choice which encourages to stick to previous decisions. Remarkably, the second effect disappears when the initial response comes from another agent rather than from the model itself, pointing to a self-coherence reader, not only repetition.
These results have important implications for the design and deployment of AI systems in real world parameters. In dynamic environments such as medicine or autonomous vehicles – where decisions are high and subject to change – models must balance flexibility with confidence. The fact that LLM can cling to early responses or to react excessively to criticism could lead to fragile or erratic behavior in complex scenarios.
In addition, parallels with human cognitive biases raise philosophical and ethical questions. If IA systems reflect our own bankruptcy, can we never fully trust them? Or should we design future models with mechanisms to monitor and correct such biases?
The researchers hope that their work will inspire new approaches to the formation of AI, perhaps beyond learning to strengthen human feedback (RLHF), which can inadvertently encourage sycophanical trends. By developing models that can assess and accurately update their confidence, without sacrificing rationality or becoming too deferential, we can get closer to the construction of a really reliable AI.
Read the full study in the article “How excessive confidence in initial choices and confidence under criticism of criticism modulate the change of mind in models of great language».
