Researchers from the National University of Singapore present `thinkless ”, an adaptive framework that reduces unnecessary reasoning up to 90% using degrpo

by Brenden Burgess

When you buy through links on our site, we may earn a commission at no extra cost to you. However, this does not influence our evaluations.

The effectiveness of language models is based on their ability to simulate the deduction step by step of human type. However, these reasoning sequences are at high intensity of resources and can be useless for simple questions that do not require elaborate calculation. This lack of consciousness concerning the complexity of the task is one of the main challenges of these models. They often have detailed reasoning by default, even for requests that could be responded directly. Such an approach increases the use of tokens, prolongs the response time and increases the latency of the system and the use of memory. Consequently, there is an urgent need to equip language models with a mechanism that allows them to make autonomous decisions about the opportunity to think deeply or to respond succinctly.

The current tools that try to solve this problem are based on a manual heuristics or fast engineering to switch between short and long responses. Certain methods use separate models and travel questions according to complexity estimates. However, these external routing systems often lack understanding of the forces of the target model and do not make optimal decisions. Other techniques refine models with rapid indices such as “activated / deactivated reasoning”, but these are based on static rules rather than a dynamic understanding. Despite certain improvements, these approaches do not allow control entirely autonomous and sensitive to the context in a single model.

Researchers from the National University of Singapore have introduced a new framework called Thinkless, which offers a linguistic model with the ability to dynamically decide between the use of short or long reasoning. The framework is built on learning to strengthen and introduces two special control tokens – For concise responses and For detailed answers. By incorporating a new algorithm called optimization of relative policy of the decoupled group (degrpo), Thinkless separates the development of the formation between the selection of the reasoning mode and the improvement of the accuracy of the generated response. This design prevents the model from falling into one -dimensional behavior and allows adaptive reasoning adapted to each request.

The methodology implies two stages: distillation of heating and learning to strengthen. In the distillation phase, Thinkless is formed using outings of two expert models – one specializing in short responses and the other in detailed reasoning. This step helps the model to establish a firm link between the control token and the desired reasoning format. The learning stage of strengthening then refines the capacity of the model to decide which mode of reasoning to use. DEGRPO decomposes learning into two distinct objectives: one for the formation of the control token and another to refine the response tokens. This approach avoids gradient imbalances in previous models, where longer responses would dominate the learning signal, leading to a collapse of the diversity of reasoning. Thinkless guarantees that both And Endons receive balanced updates, promoting stable learning between types of response.

When evaluated, considerable reflection has considerably reduced long reasoning while preserving great precision. On the reference of Minerva algebra, the model used the Token in only 25.88% of cases while reaching 94.59% precision. On the other hand, the conventional reasoning models had to use extensive thinking chains much more frequently. On the set of data likes 2025, Thinkless reached an accuracy rate of 27.33% with 100% use of reasoning mode, showing that it could maintain performance when the full reasoning was necessary. On the GSM8K data set, he used Only 13.31% of the time, but still reached an accuracy of 84.18%. These results reflect the capacity of the model to manage simple and complex queries with an appropriate reasoning depth, reducing the generation of useless tokens up to 90% in certain tasks.

Overall, this study by researchers from the National University of Singapore presents a convincing solution to the ineffectiveness of uniform reasoning in models of great language. By introducing a mechanism that allows models to judge the complexity of tasks and adjust their inference strategy accordingly, Thinkless optimizes both accuracy and efficiency. The balance method the depth of reasoning and the accuracy of the response without relying on fixed rules, offering a data -oriented approach to more intelligent language model.


Discover the Paper And GitHub page. All the merit of this research goes to researchers in this project. Also, don't hesitate to follow us Twitter And don't forget to join our 95K + ML Subdreddit and subscribe to Our newsletter.


Nikhil is an intern consultant at Marktechpost. It pursues a double degree integrated into materials at the Indian Kharagpur Institute of Technology. Nikhil is an IA / ML enthusiast who is still looking for applications in fields like biomaterials and biomedical sciences. With a strong experience in material science, he explores new progress and creates opportunities to contribute.

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.