Formal mathematical reasoning has become a specialized sub-champ of artificial intelligence which requires strict logical consistency. Unlike informal problem solving, which allows intuition and heuristic to loose, the proven formal theorem is based on each entirely described, precise and verifiable step by calculation systems. The evidence assistants, such as Lean, Coq and Isabelle, serve as structural frames in which these formal evidence is built. Their operation requires logical solidity without space for omissions, approximations or unorganized hypotheses. This makes the challenge particularly demanding for AI systems, in particular important languages of languages, which excels in the production of coherent responses in natural language, but generally lack rigor to produce verifiable formal evidence. However, the desire to mix these forces, the mastery of AI in informal reasoning and the structure of formal verification, has led to new innovations at the interface of language modeling and formal automation of logic.
A major problem results from the inability of current language models to fill the conceptual gap between informal and formal reasoning. Excellent language models generally generate human explanations and solve mathematical problems written in natural language. However, this reasoning is intrinsically informal and often does not have the structural precision required by formal logical systems. Although humans can intuitively jump from one deductive step to another, proof assistants require a fully specified, unambiguous stages of stages. Thus, the challenge is to guide AI models to produce logically coherent formal outputs from their otherly informal and intuitive internal reasoning processes. This problem becomes more and more complex when managing advanced theorems from fields such as numbers of numbers or geometry, where precision is crucial.
Recent efforts have attempted to solve this problem by first guiding the models to generate sketches of proof of natural language, which are then translated manually or semi-automatically translated into formal evidence. A known strategy includes the decomposition of a complex theorem into smaller sub-objectives. Each sub-can represents a lemma which can be approached independently and combined later to form complete evidence. Managers such as “Braft, Sketch and take” applied this idea, using language models to generate contours of evidence which are then translated into formal language. Another method uses learning hierarchical strengthening, breaking down complex mathematical problems into simpler layers. However, these models often find it difficult to produce fully verifiable outings in lean or rooster environments. In addition, the training data for these models is generally limited and attempts at proof often fail to generate successful results that provide useful learning signals.
A team of Deepseek -i researchers has introduced a new model, Deepseek-Prover-V2Designed to generate official mathematical evidence by taking advantage of the under-existence decomposition and the learning of strengthening. The nucleus of their approach uses Deepseek-V3 to decompose a complex theorem in manageable sub-objectives, each translated into a “having” declaration in Lean 4 with a reserved space indicating that the evidence is incomplete. These sub-objectives are then transmitted to a prover model of size 7B which finishes each stage of evidence. Once all the stages have been resolved, they are synthesized in a complete skinny proof and associated with the original reasoning of the natural language generated by Deepseek-V3. This forms a rich cold data for learning strengthening. It is important to note that the formation of the model is fully amortized by synthetic data, without stages of evidence annotated by the man used.
The cold start-up pipeline begins by encouraging Deepseek-V3 to create sketches of evidence in natural language. These sketches are transformed into formal theorem instructions with unresolved pieces. A key innovation lies in the recursive resolution of each under-explanation using the Prover 7B, reducing calculation costs while maintaining formal rigor. Researchers have built a curriculum learning framework that has increased the complexity of training tasks over time. They have also implemented two types of sub-engine theorems, one incorporating previous and preceding sub-objectives, and one treating them independently. This double structure has been integrated into the model's expert iteration phase to train it in gradually more difficult problems. The capacity of the model was then reinforced by a reward system based on consistency during the training, ensuring that all the decomposed lems were correctly incorporated into the final formal evidence.
On the Minif2F-Test reference, the model has reached a success rate of 88.9% with high sampling (Pass @ 8192), against 82.0% by Kimina-Prover and 64.7% by Geodel-Prover. He also resolved 49 of the 658 Putnambench problems, a platform with difficult mathematical tasks. On the newly introduced proverbench data set, comprising 325 formalized problems, the model addressed 6 numbers out of 15 of the AIM competitions (American Invitational Mathematics Examination) for the 1920s and 2025. These benchmarks highlight the generalization capacity of the model through multiple formal tasks. Even compared to Deepseek-V3, which uses natural language reasoning, the new model demonstrates competitive performance, solving a comparable number of problems likes while guaranteeing formal verifiability.
Several key dishes of research on Deepseek-Prover-V2:
- Deepseek-Prover-V2 has reached a success rate of 88.9% on the minif2F test (Pass @ 8192), the highest reported among the formal reasoning models so far.
- The model successfully resolved 49 of the 658 problems in the Putnambench data, which contains advanced mathematical challenges.
- He resolved 6 problems out of 15 from recent contests AM 2025-2025, presenting the applicability of the real world.
- A new reference, Proverbench, comprising 325 formal problems, was introduced to assess the formal reasoning models.
- The pipeline unifies the sketch of proof of natural language and the construction of formal evidence by combining Deepseek-V3 and a prover 7B model.
- Two types of sub -og decompositions – one with and one without dependent premises – were used to form the model in a structured manner guided by the study program.
- Learning to strengthen with a reward based on coherence has considerably improved the precision of evidence by applying the structural alignment between the sketch and the solution.
- The entire training strategy is based on cold start -up data, eliminating dependence on manually labeled evidence.
Discover the model on Paper And GitHub page. Also, don't forget to follow us Twitter And join our Telegram And Linkedin Group. Don't forget to join our 90K + ML Subdreddit.
Asif Razzaq is the CEO of Marktechpost Media Inc .. as a visionary entrepreneur and engineer, AIF undertakes to exploit the potential of artificial intelligence for social good. His most recent company is the launch of an artificial intelligence media platform, Marktechpost, which stands out from its in-depth coverage of automatic learning and in-depth learning news which are both technically solid and easily understandable by a large audience. The platform has more than 2 million monthly views, illustrating its popularity with the public.
