ServiceNow Ai Published APRIME-Nemotron-15b-Thinker: a compact but powerful reasoning model optimized for the deployment and efficiency at the scale of the company

by Brenden Burgess

When you buy through links on our site, we may earn a commission at no extra cost to you. However, this does not influence our evaluations.

AI models should today manage complex tasks such as mathematical problem solving, interpretation of logical declarations and business decision -making assistance. The construction of these models requires the integration of mathematical reasoning, scientific understanding and recognition of advanced models. As the demand for intelligent agents in real -time applications, such as coding assistants and business automation tools, continues to grow, there is an urgent need for models that combine solid performance with effective memory and use of tokens, which makes them viable for deployment in practical material environments.

A central challenge in the development of AI is the intensity of the resources of large -scale reasoning models. Despite their solid capacities, these models often require significant memory and calculation resources, limiting their real applicability. This creates a difference between what advanced models can achieve and what users can deploy realistically. Even the most in -depth companies can find models currently requiring dozens of gigabytes of memory or unusual high inference costs. The problem is not only to build smarter models, but to ensure that they are effective and deployable in the platforms of the real world. Very efficient models such as QWQ – 32B, O1 -Mini and ExaONE -DEEP – 32B excels in tasks involving mathematical reasoning and academic references. However, their dependence on high -end GPUs and the consumption of high chips limits their use in production contexts. These models highlight the current compromise in the deployment of AI: achieving great precision at the cost of scalability and efficiency.

Addressing this gap, the researchers from ServiceNow have introduced APRIME-Nemotron-15b-Thinker. This model consists of 15 billion parameters, a relatively modest size compared to its highly efficient counterparts, but it demonstrates performance equally with the models almost double its size. The main advantage lies in its memory footprint and its token efficiency. When providing competitive results, it requires almost half of QWQ memory-32B and ExaONE-DEEP-32B. This directly contributes to improving operational efficiency in corporate environments, making it possible to integrate high performance reasoning models into real applications without large -scale infrastructure upgrades.

The development of the April-Nemotron-15b-Thinker has followed a structured training approach in three stages, each designed to improve a specific aspect of the model's reasoning capacities. In the initial phase, called continuous pre-training (CPT), the model was exposed to more than 100 billion tokens. These tokens were not generic text but examples carefully selected in fields requiring deep reasoning, mathematical logic, programming challenges, scientific literature and logical deduction tasks. This exhibition provided the fundamental reasoning capacities that distinguish the model from others. The second step involved a supervised fine adjustment (SFT) using 200,000 high quality demonstrations. These examples have also calibrated the model's responses to the challenges of reasoning, improving the performance of tasks that require precision and attention to detail. The final adjustment stage, GRPO (optimization of guided strengthening preferences), refined the model outputs by optimizing alignment with the expected results on key tasks. This pipeline ensures that the model is intelligent, precise, structured and evolving.

In tasks specific to the company such as MBPP, BFCL, Enterprise CLOTHMt Banc, Mixeval, Ifeval and Multi-Challenge, the model offered competitive or superior performance compared to larger models. Regarding the effectiveness of production, it has consumed 40% less tokens than QWQ-32B, which considerably reduces inference costs. From the point of view of memory, it realizes all this with around 50% of the memory necessary for QWQ-32B and ExaONE-DEEP-32B, indicating a substantial improvement in the feasibility of the deployment. Even in academic references, as Aimé-24, AIME-25, AMC-23, MATH-500 and GPQA, the model held its own, equaling or often exceeding the performance of other larger models, while being considerably lighter in computer demand.

Several key dishes of research on the Apriel-Nemotron-15b-Thinker:

  • APRIME-Nemotron-15b-Thinker has 15 billion parameters, much lower than QWQ-32B or ExaONE-DEEP-32B, but works competitively.
  • Use a three -phase training, 100B + tokens in CPT, DEMOS OF FUN 200K adjustment in SFT and a final GRPO refinement.
  • Consumes about 50% less memory than QWQ-32B, which allows easier deployment to corporate equipment.
  • Use 40% less tokens in production tasks than QWQ-32B, reducing the cost of inference and increased speed.
  • Overforms or is equivalent to larger models on MBPP, BFCL, RAG Enterprise and academic tasks like GPQA and MATH-500.
  • Optimized for agency and corporate tasks, suggesting usefulness in business automation, coding agents and logical assistants.
  • Designed specifically for the use of the real world, by avoiding excessive dependence on laboratory calculation environments.

Discover the Model on the embraced face. Also, don't forget to follow us Twitter.

Here is a brief overview of what we build on Marktechpost:


Asjad is an internal trainee at Marktechpost. He persuades B.Tech in mechanical engineering at the Indian Kharagpur Institute of Technology. ASJAD is an automatic learning and in -depth learning enthusiast who is still looking for applications for automatic learning in health care.

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.