NVIDIA Announces Mellanox InfiniBand for Exascale AI Supercomputing

Published 2020-11-17 08:29 by Hilbert Hagedoorn

As computing requirements continue to grow exponentially in areas such as drug discovery, climate research and genomics, NVIDIA Mellanox 400G InfiniBand is accelerating this work through a dramatic leap in performance offered on the world's only fully offloadable, in-network computing platform.

The seventh generation of Mellanox InfiniBand provides ultra-low latency and doubles data throughput with NDR 400 Gb/s and adds new NVIDIA In-Network Computing engines to provide additional acceleration.

The world's leading infrastructure manufacturers—including Atos, Dell Technologies, Fujitsu, GIGABYTE, Inspur, Lenovo and Supermicro—plan to integrate NVIDIA Mellanox 400G InfiniBand into their enterprise solutions and HPC offerings. These commitments are complemented by extensive support from leading storage infrastructure partners including DDN and IBM Storage, among others.

"The most important work of our customers is based on AI and increasingly complex applications that demand faster, smarter, more scalable networks," said Gilad Shainer, senior vice president of networking at NVIDIA. "The NVIDIA Mellanox 400G InfiniBand's massive throughput and smart acceleration engines let HPC, AI and hyperscale cloud infrastructures achieve unmatched performance with less cost and complexity."

Today's announcement builds on Mellanox InfiniBand's lead as the industry's most robust solution for AI supercomputing. The NVIDIA Mellanox NDR 400G InfiniBand offers 3x the switch port density and boosts AI acceleration power by 32x. In addition, it surges switch system aggregated bi-directional throughput 5x, to 1.64 petabits per second, enabling users to run larger workloads with fewer constraints.

Expanding Ecosystem for Expanding Workloads
Early interest in the next generation of Mellanox InfiniBand is coming from some of the world's largest scientific research organizations.

"Microsoft Azure's partnership with NVIDIA Networking stems from our shared passion for helping scientists and researchers drive innovation and creativity through scalable HPC and AI. In HPC, Azure HBv2 VMs are the first to bring HDR InfiniBand to the cloud and achieve supercomputing scale and performance for MPI customer applications with demonstrated scaling to eclipse 80,000 cores for MPI HPC," said Nidhi Chappell, head of product, Azure HPC and AI at Microsoft Corp. "In AI, to meet the high-ambition needs of AI innovation, the Azure NDv4 VMs also leverage HDR InfiniBand with 200 Gb/s per GPU, a massive total of 1.6Tb/s of interconnect bandwidth per VM, and scale to thousands of GPUs under the same low-latency InfiniBand fabric to bring AI supercomputing to the masses. Microsoft applauds the continued innovation in NVIDIA's Mellanox InfiniBand product line, and we look forward to continuing our strong partnership together."

"High-performance interconnects are cornerstone technologies required for exascale and beyond. Los Alamos National Laboratory continues to be at the forefront of HPC networking technologies," said Steve Poole, chief architect for next-generation platforms at Los Alamos National Laboratory. "The Lab will continue their relationship working with NVIDIA in evaluating and analyzing their latest 400 Gb/s technology aimed at solving the diverse workload requirements at Los Alamos."

"Amid the new age of exascale computing, researchers and scientists are pushing the limits of applying mathematical modeling to quantum chemistry, molecular dynamics and civil safety," said Professor Thomas Lippert, head of the Jülich Supercomputing Centre. "We are committed to leveraging the next generation of Mellanox InfiniBand to further our track record of building Europe's leading, next-generation supercomputers."

"InfiniBand continues to maintain its pace of innovation and performance, underlining the differentiation that has made it the most commonly used high-performance server and storage interconnect for HPC and AI systems," said Addison Snell, CEO of Intersect360 Research. "As applications continue to demand increased network throughput, the need for high-performance solutions, such as NVIDIA Mellanox 400G InfiniBand, has the potential to keep expanding into new use cases and markets."

Product Specifications and Availability
Offloading operations is crucial for AI workloads. The third-generation NVIDIA Mellanox SHARP technology allows deep learning training operations to be offloaded and accelerated by the InfiniBand network, resulting in 32x higher AI acceleration power. When combined with NVIDIA Magnum IO software stack, it provides out-of-the-box accelerated scientific computing.

Edge switches, based on the Mellanox InfiniBand architecture, carry an aggregated bi-directional throughput of 51.2Tb/s, with a landmark capacity of more than 66.5 billion packets per second. The modular switches, based on Mellanox InfiniBand, will carry up to an aggregated bi-directional throughput of 1.64 petabits per second, 5x higher than the last generation.

The Mellanox InfiniBand architecture is based on industry standards to ensure backwards and future compatibility and protect data center investments. Solutions based on the architecture are expected to sample in the second quarter of 2021.

Share this content

Twitter Facebook Reddit WhatsApp Email Print

EK launches new EK-AIO Elite 360 D-RGB Liquid Coolers (LCS)

iiyama announces two new G-Master Red Eagle monitors