As large language models (LLMS) evolve simple text generators to Agent systems – Capable of planning, reasoning and acting independently – there is a significant increase in their associated capacities and risks. Companies quickly adopt agency AI for automation, but this trend exposes organizations to new challenges: Disalizing objectives, rapid injection, involuntary behavior, data leakage and reduced human surveillance. Responding to these concerns, Nvidia published a Continuation of open source software and a post-training safety recipe designed to protect agenic AI systems throughout their life cycle.
The need for security in the AI agentics
LLMS Agenic Letter Advanced Retry and Use of tools, allowing them to operate with a high degree of autonomy. However, this autonomy can lead to:
- Content moderation failures (for example, generation of harmful, toxic or biased outputs)
- Security vulnerabilities (Quick injection, attempted jailbreak)
- Risks of conformity and confidence (Failure to line up on corporate policies or regulatory standards)
Traditional railings and content filters often fail as the models and attacker techniques quickly evolve. Companies require systematic strategies across the life cycle to align open models on internal policies and external regulations.
NVIDIA safety recipe: Overview and architecture
Nvidia agentic IA security recipe provides a Full end -to -end frame To assess, align and save the LLM before, during and after the deployment:
- Assessment: Before deployment, the recipe allows tests against corporate policies, safety requirements and trust thresholds using open data and benchmarks.
- Post-training alignment: Use of strengthening learning (RL), supervised fine setting (SFT) and policy of policy data on policy, models are also aligned with safety standards.
- Continuous protection: After the deployment, Nvidia Nemo Guarten-Rume and the real-time monitoring microservices provide ongoing and programmable railings, actively blocking dangerous outings and defending against rapid injections and Jailbreak attempts.
Basic components
Scene | Technology / Tools | Aim |
---|---|---|
Pre-deployment assessment | Data set on Nemotron, WildGuardMix, Garak Scanner content | Safety / Safety test |
Post-training alignment | RL, SFT, Open license data | Refined safety / alignment |
Deployment and inference | Nemo Guar-Redacts, Nim Microservices (content security, subject control, Jailbreak detection) | Block dangerous behavior |
Monitoring and comments | Garak, real -time analysis | Detect / resist new attacks |
Open data sets and references
- Nemotron Content Safety Dataset V2: Used for pre and post-training evaluation, this set of screens data for a wide range of harmful behaviors.
- WildGuardmix data set: Target the moderation of the content between ambiguous and adversaries invites.
- AEGIS content security data set: More than 35,000 annotated samples, allowing the development of a fine -grained filter and classifier for LLM safety tasks.
Post-training process
NVIDIA's post-training recipe for safety is distributed as Open Source jupyter notebook Or as a Cloud Lancable module, ensuring transparency and wide accessibility. The workflow generally includes:
- Initial assessment of the model: Basic security / safety test with open references.
- Safety training in politics: Generation of response by the target / aligned model, supervised fine adjustment and learning to strengthen with open data sets.
- Revaluation: Remember the security / security benchmarks after training to confirm improvements.
- Deployment: Confidence models are deployed with live surveillance and railings microservices (content moderation, section / domain control, jailbreak detection).
Quantitative impact
- Content security: Improved from 88% to 94% after applying the post-training of NVIDIA security, a gain of 6%, without measurable loss of precision.
- Product safety: Improvement of resilience against contradictory prompts (jailbreaks, etc.) from 56%to 63%, a gain of 7%.
Collaborative and ecosystem integration
Nvidia's approach goes beyond internal tools –partnerships With the main cybersecurity suppliers (Cisco Ai Defense, Crowdstrike, Trend Micro, Active Closing) allow the integration of continuous safety signals and incidents focused on the Life cycle of the AI.
How to start
- Open Source Access: The complete safety and post-training assessment recipe (tools, data sets, guides) is available publicly for download and as a solution deployable in the cloud.
- Alignment of personalized policies: Companies can define personalized commercial policies, risk thresholds and regulatory requirements, using the recipe to align the models accordingly.
- Iterative hardening: Evaluate, post-Essai, reassess and deploy as new risks emerge, guaranteeing continuous reliability of the model.
Conclusion
NVIDIA's security recipe for agent LLMS represents a systematic approach to industry, first available, openly available To harden LLM against the risks of modern AI. By operationalizing robust, transparent and extensible security protocols, companies can adopt agency AI with confidence, balancing innovation with safety and compliance.
Discover the Nvidia IA security recipe And Technical details. All the merit of this research goes to researchers in this project. Also, don't hesitate to follow us Twitter And don't forget to join our Subseubdredit 100k + ml and subscribe to Our newsletter.
FAQ: Can MarkTechpost help me promote my AI product and position it before IA developers and data engineers?
Rep: Yes, Marktechpost can help promote your AI product by publishing articles, case studies or characteristics of sponsored product, targeting a global audience of AI developers and data engineers. The MTP platform is widely read by technical professionals, increasing the visibility and positioning of your product within the AI community. (Configure a call)
Asif Razzaq is the CEO of Marktechpost Media Inc .. as a visionary entrepreneur and engineer, AIF undertakes to exploit the potential of artificial intelligence for social good. His most recent company is the launch of an artificial intelligence media platform, Marktechpost, which stands out from its in-depth coverage of automatic learning and in-depth learning news which are both technically solid and easily understandable by a large audience. The platform has more than 2 million monthly views, illustrating its popularity with the public.
