OpenHoughts: A SFT data -adjustment data cooking pipeline for reasoning models

by Brenden Burgess

When you buy through links on our site, we may earn a commission at no extra cost to you. However, this does not influence our evaluations.

The growing complexity of the preservation of reasoning data

Models of recent reasoning, such as Deepseek-R1 and O3, have shown exceptional performance in mathematical, coding and scientific fields, using post-training techniques such as supervised fine adjustment (SFT) and strengthening learning (RL). However, complete methodologies behind these models of border reasoning are not public, which makes research difficult for the construction of reasoning models. Although the conservation of SFT data has become a powerful approach to developing solid reasoning capacities, most existing efforts only expose limited design choices, such as relying only on human -written questions or singles models. In addition, the exploration of the extended design space of various techniques to generate pairs of answers to questions requires high costs for teacher inference and model training.

The traces of reasoning provided by models such as Gemini, QWQ and Deepseek-R1 have enabled knowledge distillation techniques to form smaller models of reasoning. Projects like OpenR1, Openmathreasoning and OpendeReSoning collect questions from public forums and competition sites, while natural reasoning uses pre-training corpus as seed data. Certain efforts, such as S1 and Limousine, focus on the manual organization of small high quality data sets of difficult invites. Other methods, such as Deepmath-103k and Nvidia Nemotron, introduce innovations between the stages of supply, filtering and scale. RL methods, including Acereason and Skywork-Or1, have an improvement in reasoning capacities beyond traditional SFT methods.

OpenHoughts: an evolutionary framework for the development of the SFT data set

Researchers from the University of Stanford, the University of Washington, Bespokelabs.ai, Toyota Research Institute, UC Berkeley and 12 additional organizations have proposed OpenHoughts, a new recipe for Open Sota reasoning data. OpenHoughts uses a progressive approach through three iterations: OpenHoughts-114K scale The Sky-T1 pipeline with automated verification, OpenHoughts2-1m improves the data scale thanks to the diversity of increased questions and synthetic generation strategies, and OpenHoughts3-1.2m incorporate results of more than 1000 experiences Simple, high and high data data pipeline pipeline. In addition, the OpenHinker3-7B model obtains cutting-edge performance from the 7B-scale data models.

The OpenHoughts3-1.2m is built by ablacing each pipeline component independently while retaining constant conditions through other stages, generating 31,600 data points per strategy and in fine QWEN2.5-7B-Intugues adjustment on each resulting data set. The objective during the training is to create the best data-answer data set for SFT reasoning. The evaluation occurs in eight benchmarks of reasoning through mathematics (AIM24, AMC23, MATH500), coding (CODELO, CODEROCS, LIVECODEBENCH) and science (GPQA Diamond, Jeebench). The experimental design includes a rigorous decontamination process to eliminate high -similar samples and maintains a reference set held for generalization tests. The evaluation serves as a main assessment tool, ensuring coherent evaluation protocols.

Evaluation and reference performance insistence

The evaluation of the Pipeline OpenHoughts reveals key information in the supply of questions, the mixture, the filtering, the filtering of the answers and the model of the teacher. Source experience experiences show that code coding and competition issues carry out the highest code tasks for code tasks (25.3-27.5 average scores), while the questions generated by LLM and written excel in mathematics (58.8-58.5 SCORES), and the physics of stackexchange questions with extra Sciences (43.2-45.3 scores). The mixing question shows that the combination of multiple questions sources degrades performance, with optimal results for improving the 5% accuracy compared to the various mixing strategies. In the teachers' model, QWQ-32B surpasses Deepseek-R1 in knowledge distillation, improving precision of 1.9-2.6%.

In conclusion, the researchers present the OpenHoughts project, showing that systematic experimentation can significantly advance the conservation of SFT data for reasoning models. The researchers have developed OpenHoughts3-1.2m, a set of viewing data with the cutting edge of technology through scientific, mathematics and coding fields. The resulting OpenHinker3-7b model obtains higher performance among the reasoning models in terms of open-level data. However, several limitations remain unexplored, including RL approaches, fine regimes and curriculum learning strategies. Future research guidelines include the study of transfer effects between the field during the optimization of individual fields in relation to overall performance and understanding the dynamics of scaling as the student models approach teachers' capacities.


Discover the Paper,, Project page And GitHub page. All the merit of this research goes to researchers in this project. Also, don't hesitate to follow us Twitter And don't forget to join our 99K + ML Subreddit and subscribe to Our newsletter.


Sajjad Ansari is a last year's first year of the Kharagpur Iit. As a technology enthusiast, he plunges into AI's practical applications by emphasizing the understanding of the impact of AI technologies and their real implications. It aims to articulate complex AI concepts in a clear and accessible way.

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.