Despite notable progress in large -language models (LLM), effective performance on high intensity tasks, such as mathematical problem solving, algorithmic planning or coding, are limited by model size, training methodology and inference time capacities. The models that work well on the general references of NLPs often do not have the capacity to build reasoning chains in several stages or to reflect on intermediate problem solving states. In addition, although the size of the model size can improve reasoning capacity, it introduces prohibitive calculation and deployment costs, in particular for applied use in education, engineering and decision -making systems.
Microsoft releases the rest of the PHI-4 reasoning model
Microsoft recently introduced the Phi -4 reasoning family, made up of three models –During the season of Phi-4,, Phi-4-Seasoning-PlusAnd In the process of realization of phi. These models are derived from the PHI-4 base (14B parameters) and are specifically formed to manage complex reasoning tasks in mathematics, scientific fields and the solving software problem. Each variant addresses different compromises between calculation efficiency and output accuracy. The revival of Phi-4 is optimized via a supervised fine adjustment, while the season of the season of Phi-4 extends this with an learning by reinforcement based on the results, targeting in particular improved performance in large variance tasks such as mathematics in terms of competition.
Open weight models have been published with transparent training details and evaluation newspapers, including reference design, and are hosted on the face embraced for reproducibility and access to the public.
Technical composition and methodological advances
The models of the seizure of Phi-4 are based on the Phi-4 architecture with targeted improvements in the behavioral and training regime. Key methodological decisions include:
- Structured supervised fine refinement (SFT): More than 1.4 m prompts were organized by emphasizing the “limits” cases – problems on the edge of Phi -4 reference capacities. Invites have been obtained and filtered to emphasize the reasoning in several stages rather than on a factual reminder, and the responses were generated synthetically using O3-Mini in high elimination mode.
- Chain of thought format: To facilitate structured reasoning, the models have been formed to generate an output using
Tags, encouraging separation between traces of reasoning and final responses. - Extended context management: The basic frequency of the rope has been modified to support a 32K token context window, allowing more profound, particularly relevant traces of solution in multi-violation or long questions.
- Reinforcement learning (Phi-4-Seasoning-Plus): Using the optimization of the relative policies of the group (GRPO), the revival of Phi-4 was refined on a small organized set of ∼6,400 problems focused on mathematics. A reward function has been manufactured to promote correct, concise and well structured outings, while penalizing verbity, repetition and violations of the format.
This data -focused training regime and consciousness of the format supports better use of the inference time and generalization of the model between the fields, including problems of invisible symbolic reasoning.

Comparative assessment and performance
Through a wide range of reasoning references, the seizure of Phi-4 and the Phi-4-Seasoning-Plus provide competitive results compared to much greater open weight models:
Phi-4-Seasoning-Plus shows solid performance not only on the evaluations specific to the field, but also is well widespread in planning and combinatorial problems like TSP and 3SAT, despite no explicit training in these areas. Performance gains have also been observed in monitoring of instructions (IFEVAL) and long -context QA (FLUNQA), which suggests that the formulation of the thought chain improves the usefulness of wider models.
Above all, Microsoft reports full variance distributions on 50+ generation of executions for sets of sensitive data as likes 2025, revealing that the matches of the season and the seasoning PHI-4 exceed the consistency of model performance like O3-Mini, while remaining the disjoint from smaller basic distributions like Deepseek-R1-Distill.

Conclusion and implications
Phi-4 reasoning models represent a methodologically rigorous effort to advance the small capacities of the model in structured reasoning. By combining data centered training, architectural adjustment and minimum but well -targeted learning, Microsoft demonstrates that 14B models can correspond or surpass much greater systems in tasks requiring inference and generalization in several stages.
The open weight availability of the models and the transparent reference have established a precedent for future development in small LLMs, in particular for applied areas where interpretability, cost and reliability are essential. Future work should extend reasoning capacities in additional STEM fields, improve decoding strategies and explore learning to strengthen progress on longer horizons.
Discover the Paper,, Cuddling page And Microsoft blog. Also, don't forget to follow us Twitter And join our Telegram And Linkedin Group. Don't forget to join our 90K + ML Subdreddit.
Asif Razzaq is the CEO of Marktechpost Media Inc .. as a visionary entrepreneur and engineer, AIF undertakes to exploit the potential of artificial intelligence for social good. His most recent company is the launch of an artificial intelligence media platform, Marktechpost, which stands out from its in-depth coverage of automatic learning and in-depth learning news which are both technically solid and easily understandable by a large audience. The platform has more than 2 million monthly views, illustrating its popularity with the public.
