Together, AI published Deampswe, a fully open software engineering agent, entirely open, which is entirely formed by strengthening strengthening learning (RL). Built above the Qwen3-32B language model, Deepswe reaches a 59% precision on the Verified Swebench reference and 42.2% pass @ 1, at the top of the classification among the open models. This launch represents a significant change for AI together, from traditional pre-training pipelines to the creation of autonomous language agents who learn and improve permanently via real feedback.
Learning to strengthen the generation of code
Deepswe is the result of the post-training training of the Qwen3-32B foundation model using RLLM, the Agetica modular strengthening learning framework adapted to linguistic agents. Unlike conventional supervised fine adjustment approaches, RLLM allows agents to adapt to real world work flows through experience. Deepswe has been specifically trained to solve complex software engineering tasks using a feedback loop rather than static data sets.
The training pipeline incorporates an Agetica R2EGYM data set – a software engineering reference designed for the development of RL style agents. The framework focuses on the formation of language models with objectives oriented towards action, such as the correction of bugs, realization of functions and code editing, rather than simply predicting the distributions to the following Tarker. This alignments more closely with the way in which human engineers have and learn results.

Benchmarks and performance capacities
On Swebench-Verified, the most rigorous reference for software engineering agents, Needswe marks 59% with a time scaling. This considerably surpasses the previous open models. In Pass @ 1 – assessments that measure the probability that the agent properly solves a problem during the first attempt – Deepswe reaches an impressive 42.2%.
These results underline the power of the training based on RL to improve agent behavior, in particular in the fields requiring iterative reasoning and precise outputs, such as the synthesis of the code. The architecture of the model, hereditary of Qwen3-32b, allows it to evolve effectively while remaining adapted to the applications of the real world.

Open source and reproducibility to its heart
One of the remarkable characteristics of this version is its complete transparency. Together, AI and Agetica have open source not only the Deepswe model, but also the whole training recipe, including the RLLM frame, the R2EGYM data set and training configuration scripts. This promotes reproducibility and invites research communities and broader developers to extend or rely on deeply without restrictions.
Developers can access Deepswe and RLLM via the following:
From linguistic reasoners to linguistic agents
Deep mark a philosophical and practical change: from the construction of models that reason on language to construction agents who learn from interaction. Traditional LLMs have shown solid reasoning capacities, but often do not have the capacity to adapt to comments or improve with use. Reinforcement learning allows these models not only to perform well at launch, but also to improve over time, adapting to new distributions of problems and domains.
This approach also opens the door to local deployment. Since Deepswe is fully open-source and modular, it can be extended and recycled for specific use cases. Developers and researchers can build their own agents in addition to deep using RLLM to serve various fields such as web navigation, robotics or autonomous research aid.
Conclusion
Deepswe is an important step in the evolution of the generative AI for software engineering. By applying learning to strengthen large language models such as QWEN3-32B and by releasing the entire training infrastructure, AI together allows a future where agents are not only pre-trained and deployed, but continuously trained and improved. This leap from understanding language to the agency focused on action has important implications in programming, automation and intelligent design of the system.
All the merit of this research goes to researchers in this project. Also, don't hesitate to follow us Twitter And don't forget to join our Subseubdredit 100k + ml and subscribe to Our newsletter.
Asif Razzaq is the CEO of Marktechpost Media Inc .. as a visionary entrepreneur and engineer, AIF undertakes to exploit the potential of artificial intelligence for social good. His most recent company is the launch of an artificial intelligence media platform, Marktechpost, which stands out from its in-depth coverage of automatic learning and in-depth learning news which are both technically solid and easily understandable by a large audience. The platform has more than 2 million monthly views, illustrating its popularity with the public.
