Human reasoning naturally operates through abstract and non -verbal concepts rather than relying strictly on discreet linguistic tokens. However, current LLMs are limited to reasoning within the limits of natural language, producing a token both by the predefined vocabulary. This token approach by Token restricts not only the expressive capacity of the model, but also limits the extent of the reasoning paths that it can explore, in particular in ambiguous or complex scenarios. The standard methods of the chain of thoughts (COT) illustrate this limitation, forcing the model to engage on a single path at each stage. On the other hand, human cognition is more flexible and parallel, allowing simultaneous consideration of several ideas and delaying verbalization until the concepts are fully formed. This makes human reasoning more adaptable and robust in the treatment of uncertainty.
To respond to these limits, the researchers proposed the transition of reasoning based on tokens to reasoning in a continuous conceptual space, representing stages of reasoning such as combinations of integration of token. This approach allows models to explore several trajectories of reasoning in parallel and integrate richer conceptual representations. Previous studies have demonstrated the potential for manipulating hidden states to influence the results of the reasoning or to introduce latent planning. However, the application of continuous space reasoning to larger models has challenges. In models under 7B parameters, the weights shared between the input and output layers allow hidden states to align with the interests of the tokens, facilitating continuous reasoning. However, in larger models, where entry and exit spaces are decoupled, the use directly of hidden states as inputs causes discrepancies that are difficult to solve. The attempts to recycle these models to fill this gap often lead to over-adjustment or degradation performance, highlighting the difficulty of allowing effective continuous reasoning.
Researchers from the University of California, Purdue University, LMSYS ORG and Microsoft have a soft reflection. This approach without training improves reasoning in large -language models by operating in a continuous conceptual space. Instead of choosing a discreet token at each stage, the model generates concept tokens – weighted mixtures depending on all the incorporations of token – allowing parallel reasoning on several paths. This results in richer and more abstract representations. The method includes a cold stop mechanism to improve efficiency. Evaluations on mathematical and coding tasks show precision up to 2.48% higher and 22.4% less used tokens than the reasoning of the standard reflection chain.
The gentle thinking method improves standard COT reasoning by replacing the discreet sample of chips with concept tokens – probability distributions on the entire vocabulary. These distributions calculate weighted interests, allowing the model to reason in a continuous conceptual space. This preserves uncertainty and allows the parallel exploration of several paths of reasoning. A cold stop mechanism monitors entropy to stop reasoning when the model becomes confident, improving efficiency and preventing collapse. Theoretical analysis shows that gentle thought is close to complete marginalization on all paths of reasoning through linearization, offering a more expressive and treatable alternative to a discreet cot.
The study assesses the method of soft reflection on eight marks in mathematics and programming using three LLMS open source of variable sizes and architectures. Compared to standard and gourmet COT methods, gentle thinking constantly improves precision (pass @ 1) while considerably reducing the number of tokens generated, indicating more effective reasoning. The approach uses conceptual tokens and a cold start -up controller without modifying the weights of the model or requiring additional training. Experiences show that gentle thought balances higher precision with a lower calculation cost, surpassing the basic lines by allowing a richer and more abstract reasoning in fewer steps through various tasks and models.

In conclusion, gentle reflection is an approach without training that allows important language models to reason by using continuous concept tokens instead of traditional discreet tokens. By combining interests of weighted tokens, gentle thinking allows models to explore several paths of reasoning simultaneously, improving precision and efficiency. Tested on mathematics and coding references, it constantly increases the precision @ 1 while reducing the number of tokens generated, all without additional training or architectural changes. The method maintains concise interpretability and reasoning. Future research could focus on training adaptations to improve robustness, in particular for inputs outside distribution. The code is accessible to the public.
Discover the Paper And GitHub page. All the merit of this research goes to researchers in this project. Also, don't hesitate to follow us Twitter And don't forget to join our 95K + ML Subdreddit and subscribe to Our newsletter.
Sana Hassan, consulting trainee at Marktechpost and double -degree student at Iit Madras, is passionate about the application of technology and AI to meet the challenges of the real world. With a great interest in solving practical problems, it brings a new perspective to the intersection of AI and real life solutions.
