Existing long-term reasoning models have achieved advanced performance in mathematical reasoning by generating reasoning trajectories with iterative self-truth and refinement. However, long -speaking COT models depend only on traces of natural language reasoning, which makes them expensive and subject to errors without verification mechanisms. Although the reasoning assisted by the tool provides greater efficiency and greater reliability for large -scale digital calculations through frames such as open hands that incorporate code interpreters, these agency approaches fight with abstract or conceptually complex reasoning problems.
Dualdistill Framework and Agent-R1 model
Researchers from Carnegie Mellon University have proposed DualdistillA distillation framework that combines the trajectories of two additional teachers to create a unified student model. The framework uses a teacher focused on reasoning and a tool teacher to develop Agentic-R1A model that learns to select the most appropriate strategy for each type of problem dynamically. Agentic-R1 performs code for arithmetic and algorithmic tasks while using reasoning in natural language for abstract problems. Dualdistill uses the composition of the trajectory to distill the knowledge of the two complementary teachers, followed by self-disclosure. In addition, the researchers used open hands as a teacher of agency reasoning and Deepseek-R1 as a teacher of textual reasoning.


Evaluation and references
The proposed method is evaluated on several benchmarks such as Deepmath-L And Combinator 300 To test various aspects of mathematical reasoning. He is compared to the basic lines Deepseek-R1-Distill And Qwen-2.5-instructor. The student model, Agentic-R1, shows major performance improvements that benefit from both agent and reasoning strategies. It surpasses two models of similar size, each specializing in tools assisted by tool (QWEN2.5-7B-ISTRUCT) or pure (Deepseek-R1-Distill7b). The Agentic-R1 surpasses models based on tools by intelligently using reasoning strategies if necessary, while maintaining greater efficiency compared to pure reasoning models on standard mathematical tasks.
Qualitative analysis and tools for using tools
Qualitative examples show that agentics-R1 has models of intelligent tools, activating code execution tools in 79.2% 300 -calculating combinatorial problems, while reducing activation to 52.0% For simpler AMC data problems. Agentic-R1 learns to invoke the tools appropriately thanks to a supervised fine adjustment alone, without explicit instruction, effectively balancing the effectiveness of calculation and the precision of reasoning.
Robustness to imperfect teachers
The framework remains effective even when guided by imperfect teachers. For example, the agentic teacher only reaches 48.4% Precision on combinorics300, but the student model has improved from 44.7% has 50.9%Finally outperforming the teacher.
Conclusion
In summary, the Dualdistill Framework effectively combines the forces of natural language reasoning and problem solving assisted by tools by distilling the complementary knowledge of two models of teachers specializing in a single versatile student model, Agentic-R1. Thanks to the composition of the trajectory and self-discharge, Agentic-R1 learns to dynamically select the most appropriate strategy for each problem, balancing the accuracy and calculation efficiency. The evaluations through various references of mathematical reasoning show that the agentics-R1 surpasses both pure reasoning and models based on tools, even when they learn imperfect teachers. This work highlights a promising approach to build adaptable AI agents capable of integrating heterogeneous problem solving strategies for more robust and effective reasoning.
Discover the Paper And GitHub page. All the merit of this research goes to researchers in this project.
Meet the newsletter of AI dev read by 40K + developers and researchers from Nvidia, Openai, Deepmind, Meta, Microsoft, JP Morgan Chase, Amgen, Aflac, Wells Fargo and 100 others (Subscribe now)
Sajjad Ansari is a last year's first year of the Kharagpur Iit. As a technology enthusiast, he plunges into AI's practical applications by emphasizing the understanding of the impact of AI technologies and their real implications. It aims to articulate complex AI concepts in a clear and accessible way.