---
title: "nvidia telco reasoning models nemo"
source_url: https://developer.nvidia.com/blog/building-telco-reasoning-models-for-autonomous-networks-with-nvidia-nemo/
tags: [nvidia, inference]
source: rss
source_feed: NVIDIA Developer Blog
source_published: 
ingested: 2026-05-08
review_value: 8
review_confidence: 7
review_recommendation: strong
review_stars: 5
sha256: d2be86386c161161
type: raw
created: 2026-05-10
updated: 2026-05-10
---
# Building Telco Reasoning Models for Autonomous Networks with NVIDIA NeMo | NVIDIA Technical Blog
Building Telco Reasoning Models for Autonomous Networks with NVIDIA NeMo | NVIDIA Technical Blog DEVELOPER Home Blog Forums Docs Downloads Training Join Technical Blog Subscribe Related Resources Agentic AI / Generative AI English Building Telco Reasoning Models for Autonomous Networks with NVIDIA NeMo Feb 28, 2026 By Aiden Chang , Amparo Canaveras , Ari Uskudar and Amol Phadke Like Discuss (0) L T F R E AI-Generated Summary Like Dislike Tech Mahindra and NVIDIA developed a reproducible pipeline using synthetic incident data, expert procedures, and structured reasoning traces to fine-tune Qwen3-32B models for telco NOC workflows, leveraging the NVIDIA NeMo toolkit for safe, closed-loop, multiturn, tool-calling automation. The solution operationalizes curriculum learning with multiturn tokenization, prioritizing high-impact incident classes, automating expert guideline translation into structured traces, and orchestrating data preparation, fine-tuning, and evaluation with NeMo Skills and tensor model parallelism. Evaluation demonstrates significant accuracy gains (from ~20% to ~60%) for incident summary prediction and root-cause resolution, with ongoing robustness improvements via tool-calling benchmarks, LLM-as-a-judge safety checks, controlled error injection, and RAG for long-tail incident scenarios. AI-generated content may summarize information incompletely. Verify important information. Learn more Autonomous networks are quickly becoming one of the top priorities in telecommunications. According to the latest NVIDIA State of AI in Telecommunications report , 65% of operators said AI is driving network automation, and 50% named autonomous networks as the top AI use case for ROI.&nbsp; Yet many telcos still report gaps in AI and data science expertise. This makes it difficult to scale safe, closed-loop automation across complex, multidomain networks.&nbsp;&nbsp; Most telecom network operations centers (NOCs) today operate using reactive, alarm-driven workflows. Engineers manually triage thousands of incidents across multiple tools, sift through a high volume of alarm and performance data, and stitch together fragmented dashboards and logs before applying a fix or dispatching a field team. NOCs are a natural starting point for autonomous networks, because they concentrate high-volume, repeatable tasks where AI can directly cut MTTR and OPEX. Tech Mahindra, a leading global provider of technology consulting and digital solutions to enterprises across industries, and NVIDIA are collaborating to close this AI skills gap. They re doing so by making autonomous network building blocks open models, tools, and implementation guides into assets telecom developers can readily adopt and adapt in their own environments.&nbsp; This post outlines how to fine tune reasoning models with NVIDIA NeMo so they behave like NOC engineers, safely driving closed loop, self healing workflows. It shows how to:&nbsp; Generate synthetic, telecom realistic incident data Translate expert procedures into structured reasoning traces using the production-grade reference workflows. This teaches the model to coordinate tools, reason over network state, and execute fault management tasks end to end The result is a repeatable method that telco teams can use to build their own specialized AI agents for network operations. These agents can perform triage, root cause analysis, and resolution for high volume incident classes, helping operators progress toward TM Forum Level 4 highly autonomous networks and beyond. Why do network operations centers need reasoning models? Traditional NOC automation is mostly rule based and open loop: scripts trigger on fixed conditions but struggle with noisy signals, cross domain dependencies, and constantly changing network behavior. As a result, many Level 1 and Level 2 tasks triage, root cause analysis, validation after a change still depend on manual effort, keeping MTTR high and limiting how far operators can move toward truly autonomous operations. Figure 1. Shifting from manual NOC alarm handling to a reasoning agent embedded in the NOC workflow A telco reasoning model becomes the engine for an AI agent that can take on this work pattern in a controlled, auditable way. Instead of hard coded runbooks and point scripts, the agent uses the model to interpret incidents, decide which tools to call, and adapt its actions based on live responses. Key features include: AI reasoning plus tool-calling : Replaces manual alarm triage by invoking NOC tools for validation, root cause analysis, and remediation across existing systems End-to-end automation : Handles alarm validation, RCA, and healing for various incident types such as outages, flaps, congestion, and configuration issues Noise reduction : Filters self clearing or low value alarms using historical patterns so engineers can focus on higher priorities Resolution in seconds, not hours : Shrinks resolution time for high volume, well understood incidents from hours to seconds, significantly reducing MTTR The outcome is a closed loop, self healing network. Specialized NOC agents handle routine triage and resolution, and engineers shift from reactive alarm handling to proactive optimization and complex problem-solving. Designing a telco reasoning pipeline The technical approach to this solution combines the following components into one reproducible pipeline:&nbsp; Synthetic incident data Expert NOC procedures Structured reasoning traces Supervised fine tuning&nbsp; Evaluation&nbsp; Instead of trying to learn from raw logs and alarms directly, the model is trained on curated examples that show how an experienced engineer would analyze an incident, call tools, and decide when a fix is complete. Figure 2. Agent training pipeline, from synthetic incident generation to reasoning model, fine-tuning, and evaluation across tool-calling, reasoning, and conclusions In this case, Qwen3-32B is the base reasoning modeling that is fine-tuned for telco NOC workflows using the following design principles: Focusing on a small number of high impact faults, which account for the majority of incidents and require deliberate action. This enables the model to learn deeply on the fault classes that matter most. Defining step-by-step operational guidelines for each problem type including RCA and remediation steps and NOC tools that agents must use. Generate synthetic reasoning traces that capture multistep tool calls and the rationale behind each decision, using the NeMo Skills reference workflow to automate trace and incident generation. NeMo Skills orchestrates this pipeline end to end, using its CLI, vLLM or TensorRT LLM servers, and training utilities to move from raw incidents to a fine-tuned telco reasoning model. Synthetic incidents and NOC tool-calling The input to the pipeline is a fully synthetic incident dataset that is modeled on real NOC behavior. Each record includes fields such as region, domain, priority, problem type, possible cause, and time stamps. Engineer notes are also included, describing intermediate steps and close notes summarizing the final resolution and close code.&nbsp; An incident summary captures why the network was degraded or down and is the backbone of what the model is trained to solve. The pipeline concentrates on the most frequent, high-impact faults that account for the bulk of incident volume and require explicit action. The reasoning model learns deeply on the cases that drive MTTR and OPEX. To model realistic NOC workflows, a set of custom tools are defined for agents to call in multistep procedures, such as: Acknowledging and tracking the initial alert Checking site and equipment status Performing remote actions (reset, unlock, enable) Monitoring for automatic recovery or alarm clearance Checking topology, power, and fiber, plus public outage information Applying configuration fixes Rechecking alarm status when it remains active Investigating persistent or recurring alarms Documenting actions and status updates Coordinating onsite dispatch or hardware replacement Confirming final site health and closing the incident For each problem type, domain experts translate existing workflows into step by step guidelines that map onto these tools. Examples include which triage toolkit to consult first; which alarms to query; when to reboot a device; and how to verify a fiber cut, power outage, or network element faults.&nbsp; These guidelines become blueprints for the synthetic reasoning traces the model will learn from. They later define the action space that NOC agents use when executing closed loop workflows in production.&nbsp; Turn expert procedures into reasoning traces To turn expert NOC procedures into training data for a telco specialized reasoning model, follow the three-step NeMo Skills workflow outlined below. It converts runbooks into structured, multiturn reasoning traces ready for autonomous NOC agents. Step 1: Generate structured action sequences Using a reference workflow from NeMo Skills, a teacher model generates standardized action sequences for each incident based on prompts that include incident fields and guideline templates. The steps map directly to NOC tools. Traces are formatted so each step records the action, its parameters, the tool call, and the immediate result, forming a structured view of the NOC workflow. Step 2: Attach per step reasoning A second pass enriches every action with reasoning text that explains why the step is taken, what signals it uses, and how it influences the next decision. This creates a chain of reasoning that reflects how an experienced NOC engineer reasons over topologies, alarms, and historical behavior.&nbsp; Because raw traces can be verbose or repetitive, a squashing phase merges related steps while preserving key decision points, making sequences more efficient for training. Step 3: Formatting for multiturn, tool calling models Using another workflow from NeMo Skills, the formatted traces are converted into a Qwen-compatible format that encodes both the dialogue-style interaction and tool-calling actions over multiple turns. Multiturn tokenization simulates realistic interactions where the agent alternates between reasoning, calling tools, and interpreting tool responses, which is essential for deploying a ReAct-style NOC agent. The result is a curriculum-structured dataset where easier cases and shorter traces appear earlier, while more complex multi-step incidents appear later, supporting curriculum learning during model training. Fine-tuning the telco reasoning model&nbsp; The fine-tuning phase uses a standard train/test split on the compiled reasoning dataset, with NeMo Skills orchestrating data preparation and Qwen3 32B serving as the base reasoning model. NeMo Skills prepare_data utilities apply a telco specific prompt template ( noc_reasoning_sft ) and the Qwen tokenizer. This makes each trace in the training split into a supervised fine tuning (SFT) example that includes: Incident context and NOC signals Multistep tool calls and intermediate results Reasoning traces explaining each decision Final resolution and incident summary This produces a single JSONL file of SFT-ready examples for the telco reasoning model. To improve learning efficiency, curriculum learning is applied by ordering samples from simple, single problem incidents to more complex multistep, multitool cases. This allows the model to master core NOC behaviors before tackling long, multiturn troubleshooting patterns.&nbsp; Multiturn tokenization ensures that each example preserves realistic sequences of queries, tool calls, responses, and follow up actions, rather than isolated single turn prompts. These capabilities are critical for downstream ReAct style agents that must coordinate multiple tools over long contexts. Ultimately, Qwen3 32B is fine tuned on this telco reasoning curriculum with long sequence lengths and tensor model parallelism across GPUs. Checkpointing and experiment tracking allow teams to iterate on data quality, curriculum design, and hyperparameters.&nbsp; The result is a telco specialized reasoning model that understands incident fields, close codes, and NOC procedures, and can reliably drive multitool, multiturn tool calling workflows in production. Evaluating incident summary accuracy and safety Initial evaluation focuses on incident summary accuracy: how well the model, embedded in a ReAct style agent with tools, predicts and executes the correct resolution path for a given incident.&nbsp; Experiments compare the fine tuned telco reasoning model against a baseline Qwen3 32B on held out incidents, measuring accuracy, precision, and recall across problem and close code categories. Incident summary accuracy can also be analyzed within a single problem type to highlight where reasoning traces and curriculum learning deliver the largest gains, informing future iterations of synthetic data generation and guideline design. Evaluations across multiple iterations show that the fine-tuned model improves accuracy from roughly 20% to 60%. Beyond incident summary metrics, additional evaluation methods can be introduced over time to further harden the system, including: LLM as a judge setups to evaluate reasoning traces for correctness, completeness, and safety LLM as a judge to assess final conclusions and remediation plans Tool calling benchmarks such as BFCLv3 to measure how reliably the agent sequences and interprets tool calls Rollout and rejection sampling to stress test behavior across many simulated incidents Controlled errors injected into traces to teach the model to detect and recover from its own mistakes Incorporation of retrieval augmented generation (RAG) with historical few shot examples to improve robustness on long tail scenarios Get started building telco reasoning models for autonomous networks Telco specific reasoning models powered by synthetic data, structured traces, and safe tool calling can move NOCs toward zero touch, self healing operations. By focusing on high impact close codes, encoding expert guidelines as multiturn reasoning traces, and fine tuning large models with the NVIDIA NeMo software toolkit, operators can build agents that reliably take on real NOC engineer tasks.&nbsp; The pipeline is reusable and adaptable, so this approach can be tailored to each operator s tools, data, and policies. This accelerates the industry s transition from manual alarm handling to intelligent, autonomous network operations. To get started fine-tuning a reasoning model to build AI agents for network operations, see Teaching a Model to Reason over Telecom Network Incidents . Discuss (0) Like Tags Agentic AI / Generative AI | Networking / Communications | Telecommunications | NeMo | TensorRT-LLM | Intermediate Technical | Tutorial | AI Agent | featured | Retrieval Augmented Generation (RAG) | Training AI Models About the Authors About Aiden Chang Aiden Chang is a solution architect at NVIDIA, focusing on enterprise applications of generative AI, robotics, and reasoning systems. He earned his master s in computer science from the University of Southern California. Outside of work, he enjoys skiing, aviation, and building robots. View all posts by Aiden Chang About Amparo Canaveras Amparo Canaveras is a senior solutions architect at NVIDIA, specializing in generative AI applications within the telecommunications sector. She brings over 20 years of experience from her time in network operations and analytics at Nokia and Verizon. Amparo holds a B.Sc. in electrical engineering from the Polytechnic University of Valencia and an M.Sc. in systems design and management from MIT. View all posts by Amparo Canaveras About Ari Uskudar Ari Uskudar has 20-plus years of experience in AI-driven network automation, RAN intelligence, and large-scale telecom architecture across NVIDIA, VMware, Ericsson, Verizon, Turkcell, Vodafone, and Motorola. Her expertise spans agentic AI systems, autonomous network design, LLM-based telco reasoning, ML-powered observability, and end-to-end optimization. Ari has authored multiple patents in autonomous networks, 6G core architecture, and telco blueprints, etc. Known for bridging deep engineering with strategic product thinking, she designs advanced architectures, leads complex technical collaborations, and develops industry-adopted innovations that shape the future of AI-native telecom systems. View all posts by Ari Uskudar About Amol Phadke Amol Phadke is the chief transformation officer at Tech Mahindra, working closely with the CEO on enterprise-wide strategic initiatives, including the global elevation of the Communications industry vertical. He brings deep technology and business leadership across AI, cloud, software networks, big tech, and telecommunications, specializing in strategy definition, driving execution of large-scale engineering, and leading global multidiscipline teams. With over 25 years of global industry experience, he has previously held senior leadership posts as Group CTIO Telenor Group and GM at Google Cloud, among others. Amol holds a double degree executive MBA from UCLA, California - NUS, Singapore, a master s degree in Telecommunications Engineering from USC, California, and a bachelor s degree in Electronics Engineering from the University of Mumbai. View all posts by Amol Phadke Comments Related posts Build an AI Agent to Analyze IT Tickets with NVIDIA Nemotron Build an AI Agent to Analyze IT Tickets with NVIDIA Nemotron Build Enterprise AI Agents with Advanced Open NVIDIA Llama Nemotron Reasoning Models Build Enterprise AI Agents with Advanced Open NVIDIA Llama Nemotron Reasoning Models Transforming Telco Network Operations Centers with NVIDIA NeMo Retriever and NVIDIA NIM Transforming Telco Network Operations Centers with NVIDIA NeMo Retriever and NVIDIA NIM Navigating Generative AI for Network Admins Navigating Generative AI for Network Admins Diagnosing Network Issues Faster with NVIDIA WJH Diagnosing Network Issues Faster with NVIDIA WJH Related posts Federated Learning Without the Refactoring Overhead Using NVIDIA FLARE Federated Learning Without the Refactoring Overhead Using NVIDIA FLARE Mitigating Indirect AGENTS.md Injection Attacks in Agentic Environments Mitigating Indirect AGENTS.md Injection Attacks in Agentic Environments Build a More Secure, Always-On Local AI Agent with OpenClaw and NVIDIA NemoClaw Build a More Secure, Always-On Local AI Agent with OpenClaw and NVIDIA NemoClaw Bringing AI Closer to the Edge and On-Device with Gemma 4 Bringing AI Closer to the Edge and On-Device with Gemma 4 Building the AI Grid with NVIDIA: Orchestrating Intelligence Everywhere Building the AI Grid with NVIDIA: Orchestrating Intelligence Everywhere L T F R E