Latest Cisco, PMP, AWS, CompTIA, Microsoft Materials on SALE Get Now Get Now
Home/
News/
OpenAI & Partners Launch MRC: The Open AI Networking Protocol Redefining GPU-Scale Training
OpenAI & Partners Launch MRC: The Open AI Networking Protocol Redefining GPU-Scale Training
SPOTO AI 2026-05-13 10:56:23
OpenAI & Partners Launch MRC: The Open AI Networking Protocol Redefining GPU-Scale Training

Background: Why AI Networking Needed a New Protocol

On May 5, 2026, OpenAI published a landmark engineering announcement: the release of Multipath Reliable Connection (MRC), a new open networking protocol co-developed with AMD, Broadcom, Intel, Microsoft, and NVIDIA. The release marks a pivotal moment in AI infrastructure engineering.

Training frontier AI models requires clusters containing hundreds of thousands of GPUs working in tight synchronization. A single step in model training can involve many millions of data transfers—and one late transfer can stall an entire job, leaving thousands of expensive GPUs idle. Traditional Ethernet protocols, specifically RoCEv2 (RDMA over Converged Ethernet), route all data between two points over a single fixed path. As clusters scale up, a single congested link or failed switch can bring an entire training run to a halt or force a costly restart from a saved checkpoint.

What Is MRC?

MRC stands for Multipath Reliable Connection. It is a new network transport protocol built into the latest 800 Gb/s network interfaces. MRC extends RoCEv2 and draws on techniques developed by the Ultra Ethernet Consortium (UEC), combining them with SRv6-based source routing to support large-scale AI networking fabrics. The result is a protocol that can spread a single transfer across hundreds of paths, route around failures in microseconds, and run simpler network control planes.

MRC directly addresses two critical failure modes in large AI clusters: traffic congestion and link/switch failures. It is already deployed in production and has been used to train multiple OpenAI frontier models.

How MRC Works

MRC replaces single-path data transfer with intelligent multipath packet distribution. Key mechanisms include:

  • Adaptive Packet Spraying: Instead of sending all packets along one path, MRC distributes them across multiple paths simultaneously. This virtually eliminates core congestion and reduces GPU idle time during synchronized training sessions.
  • Multiplanar Network Design: Rather than treating one 800 Gb/s interface as a single link, MRC splits it into multiple smaller links—for example, eight parallel 100 Gb/s networks (planes). Each plane provides a complete east-west path between all GPUs, delivering redundancy and boosting switch radix efficiency.
  • Microsecond Path Failover: When MRC detects packet loss on a path, it immediately stops using that path and reroutes traffic. Training jobs can survive link flaps and even live switch reboots without measurable disruption—previously, a single failure would crash an entire job.
  • Packet Trimming: When a switch would drop a packet due to buffer pressure, MRC trims the payload and forwards only the header to the destination. This triggers an explicit retransmission request and avoids false-positive path failure assumptions.
  • Static Source Routing (SRv6): OpenAI eliminated dynamic routing protocols such as BGP in favor of IPv6 Segment Routing. The sender encodes the full route—including switch identifiers—directly into the destination address, eliminating entire classes of routing failures.
  • High-Frequency Telemetry: MRC includes continuous reporting of network conditions such as congestion signals, packet loss, and path utilization, enabling real-time microsecond-level routing decisions.

A key architectural advantage: MRC's multipath design allows a two-tier Ethernet switch topology to connect more than 100,000 GPUs—a configuration that conventional 800 Gb/s networks require three or four switch tiers to achieve. This reduces power consumption, component count, and network costs at scale.

Production Deployments

MRC is not theoretical. It is deployed across all of OpenAI's largest NVIDIA GB200 supercomputers used to train frontier models

Sources

Latest Passing Reports from SPOTO Candidates
SAP-C02

SAP-C02

SAA-C03

SAA-C03

SAA-C03

SAA-C03

CLF-C02-P

CLF-C02-P

CLF-C02-P

CLF-C02-P

CLF-C02-P

CLF-C02-P

DVA-C02-P

DVA-C02-P

SAA-C03-P

SAA-C03-P

SAA-C03-P

SAA-C03-P

SAP-C02-P

SAP-C02-P

Write a Reply or Comment
Don't Risk Your Certification Exam Success – Take Real Exam Questions
Eligible to sit for Exam? 100% Exam Pass Guarantee
SPOTO Ebooks
Recent Posts
Excellent
5.0
Based on 5236 reviews
Request more information
I would like to receive email communications about product & offerings from SPOTO & its Affiliates.
I understand I can unsubscribe at any time.
Home/Blog/OpenAI & Partners Launch MRC: The Open AI Networking Protocol Redefining GPU-Scale Training
OpenAI & Partners Launch MRC: The Open AI Networking Protocol Redefining GPU-Scale Training
SPOTO AI 2026-05-13 10:56:23
OpenAI & Partners Launch MRC: The Open AI Networking Protocol Redefining GPU-Scale Training

Background: Why AI Networking Needed a New Protocol

On May 5, 2026, OpenAI published a landmark engineering announcement: the release of Multipath Reliable Connection (MRC), a new open networking protocol co-developed with AMD, Broadcom, Intel, Microsoft, and NVIDIA. The release marks a pivotal moment in AI infrastructure engineering.

Training frontier AI models requires clusters containing hundreds of thousands of GPUs working in tight synchronization. A single step in model training can involve many millions of data transfers—and one late transfer can stall an entire job, leaving thousands of expensive GPUs idle. Traditional Ethernet protocols, specifically RoCEv2 (RDMA over Converged Ethernet), route all data between two points over a single fixed path. As clusters scale up, a single congested link or failed switch can bring an entire training run to a halt or force a costly restart from a saved checkpoint.

What Is MRC?

MRC stands for Multipath Reliable Connection. It is a new network transport protocol built into the latest 800 Gb/s network interfaces. MRC extends RoCEv2 and draws on techniques developed by the Ultra Ethernet Consortium (UEC), combining them with SRv6-based source routing to support large-scale AI networking fabrics. The result is a protocol that can spread a single transfer across hundreds of paths, route around failures in microseconds, and run simpler network control planes.

MRC directly addresses two critical failure modes in large AI clusters: traffic congestion and link/switch failures. It is already deployed in production and has been used to train multiple OpenAI frontier models.

How MRC Works

MRC replaces single-path data transfer with intelligent multipath packet distribution. Key mechanisms include:

  • Adaptive Packet Spraying: Instead of sending all packets along one path, MRC distributes them across multiple paths simultaneously. This virtually eliminates core congestion and reduces GPU idle time during synchronized training sessions.
  • Multiplanar Network Design: Rather than treating one 800 Gb/s interface as a single link, MRC splits it into multiple smaller links—for example, eight parallel 100 Gb/s networks (planes). Each plane provides a complete east-west path between all GPUs, delivering redundancy and boosting switch radix efficiency.
  • Microsecond Path Failover: When MRC detects packet loss on a path, it immediately stops using that path and reroutes traffic. Training jobs can survive link flaps and even live switch reboots without measurable disruption—previously, a single failure would crash an entire job.
  • Packet Trimming: When a switch would drop a packet due to buffer pressure, MRC trims the payload and forwards only the header to the destination. This triggers an explicit retransmission request and avoids false-positive path failure assumptions.
  • Static Source Routing (SRv6): OpenAI eliminated dynamic routing protocols such as BGP in favor of IPv6 Segment Routing. The sender encodes the full route—including switch identifiers—directly into the destination address, eliminating entire classes of routing failures.
  • High-Frequency Telemetry: MRC includes continuous reporting of network conditions such as congestion signals, packet loss, and path utilization, enabling real-time microsecond-level routing decisions.

A key architectural advantage: MRC's multipath design allows a two-tier Ethernet switch topology to connect more than 100,000 GPUs—a configuration that conventional 800 Gb/s networks require three or four switch tiers to achieve. This reduces power consumption, component count, and network costs at scale.

Production Deployments

MRC is not theoretical. It is deployed across all of OpenAI's largest NVIDIA GB200 supercomputers used to train frontier models

Sources

Latest Passing Reports from SPOTO Candidates
SAP-C02
SAA-C03
SAA-C03
CLF-C02-P
CLF-C02-P
CLF-C02-P
DVA-C02-P
SAA-C03-P
SAA-C03-P
SAP-C02-P
Write a Reply or Comment
Don't Risk Your Certification Exam Success – Take Real Exam Questions
Eligible to sit for Exam? 100% Exam Pass GuaranteeEligible to sit for Exam? 100% Exam Pass Guarantee
SPOTO Ebooks
Recent Posts
Cisco CCNP & CCIE Exam Updates 2026: New AI-Driven Blueprints and Lab Formats Now Active in the US
AWS Revamps 2026 Certification Program with AI-Focused Exams and New GenAI Developer Professional Credential
OpenAI & Partners Launch MRC: The Open AI Networking Protocol Redefining GPU-Scale Training
OpenAI DeployCo & Claude for Legal: The AI Industry's Pivot from Models to Enterprise Deployment Services
Cisco Unveils AI-Native Networking Platform at Cisco Live 2026: What IT Pros Need to Know
Cisco Releases Critical Security Patches for IOS XE Vulnerabilities Amid Global Infrastructure Threats 2026
PMI Updates PMP Certification Exam Content Outline for 2026: What US Candidates Need to Know
Wi-Fi 7 Adoption Surges Globally as Carriers and Enterprises Race to Upgrade Network Infrastructure in 2026
Fortinet Expands NSE Certification Program with New AI-Driven Security Training Tracks in 2026
CompTIA Expands AI Certification Pathway in 2026: What IT Professionals Need to Know
Excellent
5.0
Based on 5236 reviews
Request more information
I would like to receive email communications about product & offerings from SPOTO & its Affiliates.
I understand I can unsubscribe at any time.