Othink-R1: a double-mode reasoning frame to cut redundant calculation in LLMS

by Brenden Burgess

When you buy through links on our site, we may earn a commission at no extra cost to you. However, this does not influence our evaluations.

The ineffectiveness of static reasoning of the chain of thoughts in the LRM

Recent LRMs get higher performance using detailed COT reasoning to solve complex tasks. However, many simple tasks they manage could be resolved by smaller models with fewer tokens, which makes reasoning elaborate useless. This echoes human thought, where we use rapid and intuitive answers for easy problems and slower analytical thinking for complex responses. While the LRMs imitate slow logical reasoning, they generate much longer outings, thus increasing the cost of calculation. Current methods to reduce reasoning stages lack flexibility, limiting models to a single fixed reasoning style. There is an increasing need for adaptive reasoning which adjusts the effort according to the difficulty of the task.

Limits of existing approaches based on training and without training

Recent research on improving the effectiveness of reasoning in LRMs can be classified into two main areas: methods based on training and without training. Training strategies often use the learning of strengthening or fine adjustment to limit the use of tokens or adjust the depth of reasoning, but they tend to follow fixed models without flexibility. Approaches without training use rapid engineering or detection of patterns to shorten outings during inference; However, they also lack adaptability. More recent work focuses on the reasoning of variable length, where the models adjust the depth of the reasoning as a function of the complexity of the tasks. Others study “too thoughtful”, where the models are excessively unnecessarily. However, few methods allow a dynamic switching between rapid and in -depth reasoning, which this document is addressed directly.

Presentation of Othink-R1: Fast / Slow Dynamic reasoning frame

Researchers from the University of Zhejiang and the OPPO have developed Othink-R1, a new approach that allows LRMs to intelligently switch between rapid and slow thought, just like humans. By analyzing the reasoning models, they identified the essential steps and which are redundant. With the help of another model acting as a judge, they formed LRM to adapt their style of reasoning according to the complexity of the tasks. Their method reduces unnecessary reasoning by more than 23% without losing precision. Using a loss function and refined data sets, Othink-R1 surpasses the previous models both in efficiency and performance on various mathematical tasks and in terms of responses.

System architecture: pruning of reasoning and double reference optimization

The Othink-R1 frame helps LRM to dynamically switch between rapid and slow thought. First, it identifies when the LRM includes unnecessary reasoning, such as overexploration or double verification, compared to the detailed steps are really essential. Using this, it builds a set of training data organized by prune the redundant reasoning and maintaining a precious logic. Then, during the fine adjustment, a special loss function balances the two styles of reasoning. This double reference loss compares the results of the model with variants of reflection that is both fast and slow, encouraging flexibility. Consequently, Othink-R1 can choose the most effective reasoning path for each problem while preserving precision and logical depth.

Empirical evaluation and comparative performance

The Othink-R1 model has been tested on simpler QA and mathematics to assess its ability to switch between fast and slow reasoning. Using data sets like OpenBookqa, Commonsseqa, Asdiv and GSM8K, the model has shown solid performance, generating fewer tokens while maintaining or improving precision. Compared to basic lines such as Nothinking and Dualformer, Othink-R1 has demonstrated a better balance between efficiency and efficiency. Ablation studies have confirmed the importance of pruning, KL constraints and LLM judge to obtain optimal results. A case study has shown that unnecessary reasoning can cause reduced excitement and precision, stressing the Othink-R1 strength in adaptive reasoning.

Conclusion: towards evolving and effective hybrid reasoning systems

In conclusion, Othink-R1 is a large model of reasoning that changes in an adaptive way between rapid and slow thinking modes to improve both efficiency and performance. It addresses the question of unnecessarily complex reasoning in large models by analyzing and classifying the stages of reasoning as essential or redundant. By pruning the redundants while maintaining logical precision, Othink-R1 reduces unnecessary calculation. It also introduces a loss of divergence with double reference to strengthen hybrid reasoning. Tested on mathematical tasks and QA, it reduces the redundancy of reasoning by 23% without sacrificing precision, which is promising for the construction of more adaptive, scalable and effective AI reasoning systems in the future.


Discover the Paper And GitHub page. All the merit of this research goes to researchers in this project. Also, don't hesitate to follow us Twitter And don't forget to join our Subseubdredit 100k + ml and subscribe to Our newsletter.


Sana Hassan, consulting trainee at Marktechpost and double -degree student at Iit Madras, is passionate about the application of technology and AI to meet the challenges of the real world. With a great interest in solving practical problems, it brings a new perspective to the intersection of AI and real life solutions.

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.