This Catalyst is developing a plug-and-play multi-agent system based on generative AI to automate incident diagnosis, resolution, optimization, and continuous learning processes. This will reduce mean-time-to-repair, enhance customer satisfaction, and reduce operational costs.
Enabling multi-vendor, multi-agent autonomous networks
Commercial context
Traditional incident management systems struggle to manage today's increasingly complex network environments. Yet despite the major benefits CSPs could achieve by fully automating their network, telecom operations remain largely manual - with only fragmented automation across radio access, transport and core networks. Most telcos lack the shared agent layer that is required to enable scalable, multi-vendor automation. Operations teams typically uses domain-specific tools or custom scripts, resulting in inconsistent workflows, delays in incident resolution and inefficiency.
Such service disruption reduces customer trust and increases churn. CSPs’ reliance on manual processes also limits their ability to quickly adapt to changes in customer needs and market dynamics. CSPs that offer superior service reliability and rapid issue resolution through wholesale network automation will achieve significant competitive advantage - particularly by avoiding the cost and risk of full-stack replacement.
The solution
This advantage is what the Agent Fabric - Phase II Catalyst seeks to provide. Phase I introduced the Incident Co-Pilot—a generative AI-powered assistant that helped NOC engineers triage, understand, and respond to incidents more quickly. The Co-Pilot didn’t replace human operators. Instead, it focused on trust and explainability. It offered transparent insights and recommended actions that engineers could validate and execute.
Phase II builds on that foundation. It develops a plug-and-play multi-agent system with dynamic, vendor-neutral agent discovery and coordination. This telecom-grade runtime allows autonomous agents to register, discover, and collaborate across network domains.
Called Agent Fabric, the runtime is designed as a plug-and-play model. Agents declare roles, expose callable functions and operate using standard APIs, including TM Forum’s proposed Agent (TMF939*) and Assistant APIs (TMF785).
The new multi-agent system is designed to automate diagnosis, resolution, optimization and learning processes. A specialized incident agent will diagnose issues using real-time and historical data, while a healing agent will manage tickets and corrective actions. An optimization agent will maintain network performance during incidents and a learning agent will use past incidents to enhance future responses. Finally, a risk agent will be able to identify and mitigate potential issues proactively. The overarching goal is to enable CSPs to run Level 4 autonomous networks, in which most operations are automated.
The Agent Fabric connects multi-agent systems using a modular, standards-based architecture aligned with TM Forum’s Level 4 blueprint. Each agent registers a declarative profile card using the proposed TMF939 Agent Management API. This enables dynamic discovery, trust enforcement, and role-based coordination at runtime.
Agents operate autonomously and collaborate securely through metadata-driven agent and tool registries. The Trusted Advisor Component enforces privacy, policy, and trust at runtime. This allows CSPs to scale intelligent automation safely across OSS, BSS, RAN, transport, and customer domains.
Agents and tools coordinate using emerging protocols like A2A for agent interaction and MCP for tool invocation. TM Forum’s proposed Agent and Copilot APIs also support this coordination.
Applications and wider value
The new runtime will deliver a reduction of up to 30% in manual incident handling through multi-agent workflows. CSPs will save on operating costs, as the need for truck rolls and human intervention falls. Cross-domain collaboration could also lead to 25% faster root-cause detection.
The multi-agent system will “allow us to reduce response times and improve our network availability, thereby reducing operating costs and, more importantly, increasing customer satisfaction, which is one of our goals as a company: putting the customer at the center,” explains Pedro Garcia Parra, Director of Autonomous Network & Operations Transformation at Telefónica, one of the champions of the Catalyst.
Telefónica is keen to see firsthand the benefits of an agent architecture from a practical, as well as theoretical, perspective. “Foreseeing the use of autonomous agents, we will be able to implement new workflows, thus making our networks more flexible and future-ready,” adds Pedro Garcia Parra. "At Telefónica, we believe that participating in this catalyst will generate a series of very interesting results for us to fine-tune the most relevant aspects to achieve Level 4 in the near future.”
The Catalyst is focusing on a scalable adoption model, enabling CSPs to avoid having to make costly full-stack replacements. The goal is to reuse legacy infrastructure and enable sustainable usage of LLMs. The project is also promoting vendor-neutral interoperability and standardization, while ensuring the AI-based system is fully explainable. In this way, the team intends to enhance the trust and accountability, as well as the efficiency, of network operations centers.
The CSPs participating in the Catalyst span much of the world. They include AIS, China Unicom, Deutsche Telekom, Du, Orange, IOH, MTN, Telenor, TIM, Telefónica, Vodafone and ZAIN. The Catalyst’s work on advancing multi-agent systems may also be applicable well beyond the telecoms industry. Pedro Garcia Parra believes the project learnings will drive innovation. “This Catalyst allows us to look ahead from an innovative perspective, not just at Level 4,” he says. “The results will generate very interesting learnings that industry in general, not just telecommunications, can use and apply to other areas as well.”