Background: Why AI Networking Needed a New Protocol
On May 5, 2026, OpenAI published a landmark engineering announcement: the release of Multipath Reliable Connection (MRC), a new open networking protocol co-developed with AMD, Broadcom, Intel, Microsoft, and NVIDIA. The release marks a pivotal moment in AI infrastructure engineering.
Training frontier AI models requires clusters containing hundreds of thousands of GPUs working in tight synchronization. A single step in model training can involve many millions of data transfers—and one late transfer can stall an entire job, leaving thousands of expensive GPUs idle. Traditional Ethernet protocols, specifically RoCEv2 (RDMA over Converged Ethernet), route all data between two points over a single fixed path. As clusters scale up, a single congested link or failed switch can bring an entire training run to a halt or force a costly restart from a saved checkpoint.
What Is MRC?
MRC stands for Multipath Reliable Connection. It is a new network transport protocol built into the latest 800 Gb/s network interfaces. MRC extends RoCEv2 and draws on techniques developed by the Ultra Ethernet Consortium (UEC), combining them with SRv6-based source routing to support large-scale AI networking fabrics. The result is a protocol that can spread a single transfer across hundreds of paths, route around failures in microseconds, and run simpler network control planes.
MRC directly addresses two critical failure modes in large AI clusters: traffic congestion and link/switch failures. It is already deployed in production and has been used to train multiple OpenAI frontier models.
How MRC Works
MRC replaces single-path data transfer with intelligent multipath packet distribution. Key mechanisms include:
- Adaptive Packet Spraying: Instead of sending all packets along one path, MRC distributes them across multiple paths simultaneously. This virtually eliminates core congestion and reduces GPU idle time during synchronized training sessions.
- Multiplanar Network Design: Rather than treating one 800 Gb/s interface as a single link, MRC splits it into multiple smaller links—for example, eight parallel 100 Gb/s networks (planes). Each plane provides a complete east-west path between all GPUs, delivering redundancy and boosting switch radix efficiency.
- Microsecond Path Failover: When MRC detects packet loss on a path, it immediately stops using that path and reroutes traffic. Training jobs can survive link flaps and even live switch reboots without measurable disruption—previously, a single failure would crash an entire job.
- Packet Trimming: When a switch would drop a packet due to buffer pressure, MRC trims the payload and forwards only the header to the destination. This triggers an explicit retransmission request and avoids false-positive path failure assumptions.
- Static Source Routing (SRv6): OpenAI eliminated dynamic routing protocols such as BGP in favor of IPv6 Segment Routing. The sender encodes the full route—including switch identifiers—directly into the destination address, eliminating entire classes of routing failures.
- High-Frequency Telemetry: MRC includes continuous reporting of network conditions such as congestion signals, packet loss, and path utilization, enabling real-time microsecond-level routing decisions.
A key architectural advantage: MRC's multipath design allows a two-tier Ethernet switch topology to connect more than 100,000 GPUs—a configuration that conventional 800 Gb/s networks require three or four switch tiers to achieve. This reduces power consumption, component count, and network costs at scale.
Production Deployments
MRC is not theoretical. It is deployed across all of OpenAI's largest NVIDIA GB200 supercomputers used to train frontier models
Sources
- OpenAI – Supercomputer Networking to Accelerate Large Scale AI Training (May 5, 2026)
- AMD – AMD and OpenAI Advance AI Networking at Scale with MRC
- AMD – Next Gen Networking Transport for Large Scale AI Training
- NVIDIA Blog – Spectrum-X Ethernet Sets the Standard for Gigascale AI, Now With MRC
- Broadcom – Enabling AI Networking @ Scale with Multi-path Reliable Connections (MRC)
- Dell'Oro Group – OpenAI's MRC Initiative Reinforces Ethernet's Expanding Role in AI Back-end Networks
- NAND Research – NVIDIA MRC Enables Ethernet for AI-At-Scale, Now at OCP
- 4sysops – Multipath Reliable Connection (MRC): A New Open Networking Protocol for AI Supercomputers
- Technetbook – OpenAI Multipath Reliable Connection Protocol Released to Open Compute Project
- KAD – Top 6 AI Networking Trends Reshaping Infrastructure in 2026 (May 11, 2026)
- devFlokers – AI News Roundup: Biggest Developments (May 6, 2026)
- Cisco and NVIDIA Expand AI-Native Networking Partnership to Accelerate Data Center Automation in 2026
- Cisco Unveils AI-Native Networking Platform to Autonomously Manage Enterprise Infrastructure in 2026
- Cisco Unveils AI-Native Networking Platform to Automate Enterprise Infrastructure in 2026
- Cisco Unveils AI-Native Networking Platform at Cisco Live 2026: What IT Pros Need to Know
- OpenAI DeployCo & Claude for Legal: The AI Industry's Pivot from Models to Enterprise Deployment Services
