Latest Cisco, PMP, AWS, CompTIA, Microsoft Materials on SALE Get Now Get Now
Home/
News/
OpenAI & Partners Launch MRC: The Open AI Networking Protocol Redefining GPU-Scale Training
OpenAI & Partners Launch MRC: The Open AI Networking Protocol Redefining GPU-Scale Training
SPOTO AI 2026-05-13 10:56:23
OpenAI & Partners Launch MRC: The Open AI Networking Protocol Redefining GPU-Scale Training

Background: Why AI Networking Needed a New Protocol

On May 5, 2026, OpenAI published a landmark engineering announcement: the release of Multipath Reliable Connection (MRC), a new open networking protocol co-developed with AMD, Broadcom, Intel, Microsoft, and NVIDIA. The release marks a pivotal moment in AI infrastructure engineering.

Training frontier AI models requires clusters containing hundreds of thousands of GPUs working in tight synchronization. A single step in model training can involve many millions of data transfers—and one late transfer can stall an entire job, leaving thousands of expensive GPUs idle. Traditional Ethernet protocols, specifically RoCEv2 (RDMA over Converged Ethernet), route all data between two points over a single fixed path. As clusters scale up, a single congested link or failed switch can bring an entire training run to a halt or force a costly restart from a saved checkpoint.

What Is MRC?

MRC stands for Multipath Reliable Connection. It is a new network transport protocol built into the latest 800 Gb/s network interfaces. MRC extends RoCEv2 and draws on techniques developed by the Ultra Ethernet Consortium (UEC), combining them with SRv6-based source routing to support large-scale AI networking fabrics. The result is a protocol that can spread a single transfer across hundreds of paths, route around failures in microseconds, and run simpler network control planes.

MRC directly addresses two critical failure modes in large AI clusters: traffic congestion and link/switch failures. It is already deployed in production and has been used to train multiple OpenAI frontier models.

How MRC Works

MRC replaces single-path data transfer with intelligent multipath packet distribution. Key mechanisms include:

  • Adaptive Packet Spraying: Instead of sending all packets along one path, MRC distributes them across multiple paths simultaneously. This virtually eliminates core congestion and reduces GPU idle time during synchronized training sessions.
  • Multiplanar Network Design: Rather than treating one 800 Gb/s interface as a single link, MRC splits it into multiple smaller links—for example, eight parallel 100 Gb/s networks (planes). Each plane provides a complete east-west path between all GPUs, delivering redundancy and boosting switch radix efficiency.
  • Microsecond Path Failover: When MRC detects packet loss on a path, it immediately stops using that path and reroutes traffic. Training jobs can survive link flaps and even live switch reboots without measurable disruption—previously, a single failure would crash an entire job.
  • Packet Trimming: When a switch would drop a packet due to buffer pressure, MRC trims the payload and forwards only the header to the destination. This triggers an explicit retransmission request and avoids false-positive path failure assumptions.
  • Static Source Routing (SRv6): OpenAI eliminated dynamic routing protocols such as BGP in favor of IPv6 Segment Routing. The sender encodes the full route—including switch identifiers—directly into the destination address, eliminating entire classes of routing failures.
  • High-Frequency Telemetry: MRC includes continuous reporting of network conditions such as congestion signals, packet loss, and path utilization, enabling real-time microsecond-level routing decisions.

A key architectural advantage: MRC's multipath design allows a two-tier Ethernet switch topology to connect more than 100,000 GPUs—a configuration that conventional 800 Gb/s networks require three or four switch tiers to achieve. This reduces power consumption, component count, and network costs at scale.

Production Deployments

MRC is not theoretical. It is deployed across all of OpenAI's largest NVIDIA GB200 supercomputers used to train frontier models

Sources

Latest Passing Reports from SPOTO Candidates
ANS-C01

ANS-C01

CLF-C02-P

CLF-C02-P

SAP-C02-P

SAP-C02-P

SAP-C02

SAP-C02

SAP-C02

SAP-C02

CLF-C02-P

CLF-C02-P

CLF-C02-P

CLF-C02-P

CLF-C02-P

CLF-C02-P

DVA-C02-P

DVA-C02-P

SAA-C03-P

SAA-C03-P

Write a Reply or Comment
Don't Risk Your Certification Exam Success – Take Real Exam Questions
Eligible to sit for Exam? 100% Exam Pass Guarantee
SPOTO Ebooks
Recent Posts
Excellent
5.0
Based on 5236 reviews
Request more information
I would like to receive email communications about product & offerings from SPOTO & its Affiliates.
I understand I can unsubscribe at any time.
Home/Blog/OpenAI & Partners Launch MRC: The Open AI Networking Protocol Redefining GPU-Scale Training
OpenAI & Partners Launch MRC: The Open AI Networking Protocol Redefining GPU-Scale Training
SPOTO AI 2026-05-13 10:56:23
OpenAI & Partners Launch MRC: The Open AI Networking Protocol Redefining GPU-Scale Training

Background: Why AI Networking Needed a New Protocol

On May 5, 2026, OpenAI published a landmark engineering announcement: the release of Multipath Reliable Connection (MRC), a new open networking protocol co-developed with AMD, Broadcom, Intel, Microsoft, and NVIDIA. The release marks a pivotal moment in AI infrastructure engineering.

Training frontier AI models requires clusters containing hundreds of thousands of GPUs working in tight synchronization. A single step in model training can involve many millions of data transfers—and one late transfer can stall an entire job, leaving thousands of expensive GPUs idle. Traditional Ethernet protocols, specifically RoCEv2 (RDMA over Converged Ethernet), route all data between two points over a single fixed path. As clusters scale up, a single congested link or failed switch can bring an entire training run to a halt or force a costly restart from a saved checkpoint.

What Is MRC?

MRC stands for Multipath Reliable Connection. It is a new network transport protocol built into the latest 800 Gb/s network interfaces. MRC extends RoCEv2 and draws on techniques developed by the Ultra Ethernet Consortium (UEC), combining them with SRv6-based source routing to support large-scale AI networking fabrics. The result is a protocol that can spread a single transfer across hundreds of paths, route around failures in microseconds, and run simpler network control planes.

MRC directly addresses two critical failure modes in large AI clusters: traffic congestion and link/switch failures. It is already deployed in production and has been used to train multiple OpenAI frontier models.

How MRC Works

MRC replaces single-path data transfer with intelligent multipath packet distribution. Key mechanisms include:

  • Adaptive Packet Spraying: Instead of sending all packets along one path, MRC distributes them across multiple paths simultaneously. This virtually eliminates core congestion and reduces GPU idle time during synchronized training sessions.
  • Multiplanar Network Design: Rather than treating one 800 Gb/s interface as a single link, MRC splits it into multiple smaller links—for example, eight parallel 100 Gb/s networks (planes). Each plane provides a complete east-west path between all GPUs, delivering redundancy and boosting switch radix efficiency.
  • Microsecond Path Failover: When MRC detects packet loss on a path, it immediately stops using that path and reroutes traffic. Training jobs can survive link flaps and even live switch reboots without measurable disruption—previously, a single failure would crash an entire job.
  • Packet Trimming: When a switch would drop a packet due to buffer pressure, MRC trims the payload and forwards only the header to the destination. This triggers an explicit retransmission request and avoids false-positive path failure assumptions.
  • Static Source Routing (SRv6): OpenAI eliminated dynamic routing protocols such as BGP in favor of IPv6 Segment Routing. The sender encodes the full route—including switch identifiers—directly into the destination address, eliminating entire classes of routing failures.
  • High-Frequency Telemetry: MRC includes continuous reporting of network conditions such as congestion signals, packet loss, and path utilization, enabling real-time microsecond-level routing decisions.

A key architectural advantage: MRC's multipath design allows a two-tier Ethernet switch topology to connect more than 100,000 GPUs—a configuration that conventional 800 Gb/s networks require three or four switch tiers to achieve. This reduces power consumption, component count, and network costs at scale.

Production Deployments

MRC is not theoretical. It is deployed across all of OpenAI's largest NVIDIA GB200 supercomputers used to train frontier models

Sources

Latest Passing Reports from SPOTO Candidates
ANS-C01
CLF-C02-P
SAP-C02-P
SAP-C02
SAP-C02
CLF-C02-P
CLF-C02-P
CLF-C02-P
DVA-C02-P
SAA-C03-P
Write a Reply or Comment
Don't Risk Your Certification Exam Success – Take Real Exam Questions
Eligible to sit for Exam? 100% Exam Pass GuaranteeEligible to sit for Exam? 100% Exam Pass Guarantee
SPOTO Ebooks
Recent Posts
Cisco Announces CCNA v2.0 and AI-Integrated CCIE Updates at Cisco Live 2026 Las Vegas
AWS Overhauls 2026 Certification Program: AI-Focused Exams, Lab Maker, and New Microcredentials Go Live
CompTIA Launches AutoOps+ Certification to Meet Surging Demand for Automation and DevOps Skills
Itential Launches FlowAI at Cisco Live 2026: AI Agents Come to Enterprise Networking
Cisco Launches Cloud Control & AgenticOps at Cisco Live 2026: A New Era for Global Network Infrastructure
PMI Launches Overhauled PMP Exam on July 9, 2026: AI, Sustainability, PMBOK 8, New Question Types & Fee Hikes — What U.S. Candidates Must Know Now
Global Network Communications Industry 2026: AI-Driven M&A Wave Reshapes Connectivity Markets
Fortinet NSE Certification Program Major Overhaul Set for July 15, 2026: What US Candidates Must Know
Cisco Live 2026 Las Vegas: Major Certification Announcements Including Refreshed CCNA, AI-Integrated CCIE Exam, and First-Ever Splunk Certifications On-Site
AWS Overhaults 2026 Certification Exams: AI Focus, New Proctoring, and Surging Enrollment
Excellent
5.0
Based on 5236 reviews
Request more information
I would like to receive email communications about product & offerings from SPOTO & its Affiliates.
I understand I can unsubscribe at any time.