NVIDIA Networking

End-to-End High Performance InfiniBand Networking for HPC, AI, and the Data Center

  1. Home
  2. >
  3. Technologies
  4. >
  5. NVIDIA Networking

Quantum InfinBand Platform

Delivering Extreme Performance for the Exascale AI Era

Whether you are exploring mountains of data, researching scientific problems, training neural networks, or modeling financial markets, you need a computing platform with the highest data through put available.  GPUs consume data much faster than CPUs and as the GPU computing horsepower increases, so does the demand for IO bandwidth.  

InfiniBand is the preferred choice for world-leading supercomputers, displacing lower performance and proprietary interconnect options. The end-to-end NVIDIA InfiniBand-based network enables extremely low latencies and high data throughput and message rates. Its high-value features, such as smart In-Network Computing acceleration engines, combined with advanced self-healing network capabilities, congestion control, quality of service, and adaptive routing, enable leading performance and scalability for high-performance computing, artificial intelligence, and other compute and data-intensive applications. The performance advantages of InfiniBand are second to none, while its open industry-standards backed guarantee of backward and forward compatibility across generations ensure users protect their data center investments.

Complex workloads demand ultra-fast processing of high resolution simulations, extreme-size data sets, and highly parallelized algorithms. As these computing requirements continue to grow, the NVIDIA Quantum InfiniBand platform – the world’s only fully offloadable, In-Network Computing interconnect technology – provides the dramatic leap in performance needed to achieve unmatched performance in HPC and AI with less cost and complexity. NVIDIA’s InfiniBand architecture gives AI developers, scientific researchers, and product designers even faster networking performance and feature sets to take on the world’s most challenging problems.

As high-performance computing and artificial intelligence applications become more complex, the demand for the most advanced highspeed networking is critical for extreme-scale systems. NVIDIA Quantum-2 is the industry-leading switch platform in power and density, with NDR 400 gigabit per second (Gb/s) InfiniBand throughput.  NVIDIA Quantum InfiniBand also dramatically boosts application performance of complex computations while data moves through the data center network. It can participate in the application’s runtime, improving performance while reducing the amount of data that traverses the network.

NVIDIA Quantum InfiniBand switches systems deliver the highest performance and port density available. Innovative In-Network Computing capabilities, such as NVIDIA Scalable Hierarchical Aggregation and Reduction Protocol (SHARP)™ and advanced management features, such as self-healing network capabilities, quality of service, enhanced congestion control provide a performance boost for industrial, AI, and scientific applications. The extensive switch portfolio enables compute clusters to operate at any scale, while reducing capital expenses, operational costs, and infrastructure complexity.

NVIDIA Spectrum Ethernet Platform

Best in Class Ethernet Connectivity

Formerly known as Mellanox Ethernet switches, NVIDIA Mellanox Spectrum provides efficient Ethernet interconnect solutions for server and storage systems, in enterprise data centers, cloud computing, Web 2.0, data analytics, deep learning, high performance, and embedded environments. The Spectrum switch ASIC delivers leading Ethernet performance, efficiency, throughput, low latency, and scalability by integrating advanced networking functionality for Ethernet fabrics.

The NVIDIA Ethernet Switch platform delivers cloud-scale efficiency to data centers of all sizes. The NVIDIA Cumulus Operational model was built for automating public/private cloud networks using a NetDevOps approach starting with a Digital Twin network – hosted on the NVIDIA AIR platform. Security compliance, provisioning, monitoring, and interoperability is validated in the Digital Twin – prior to implementing in the physical network.  The NVIDIA Spectrum switch platforms deliver the highest levels of network performance and visibility functionality.  Providing cloud-scale fabric validation, NVIDIA NetQ validates Digital Twin networks as well as physical networks by leveraging hardware-accelerated telemetry.

The NVIDIA Spectrum Ethernet Switch family includes leaf and spine switches in flexible form factors with 16 to 128 physical ports and speeds ranging from 10G to 400Gbps.  Spectrum switches offer groundbreaking performance—up to 400GbE—and industry-leading scale for AI, cloud, and enterprise with network OS choices, including NVIDIA Cumulus Linux™, SONiC, Onyx, and DENT/SwitchDev.

These switches can be deployed in layer-2 and layer-3 cloud designs, in overlay-based virtualized networks, or as part of high-performance, mission-critical ethernet storage fabrics.

BlueField DPUs

Best in Class Ethernet ConnSoftware-Defined, Hardware Accelerated Data Center Infrastructure-on-a-Chip

The NVIDIA BlueField Data Processing Unit (DPU) combines powerful computing, high-speed networking, and extensive programmability to deliver software-defined, hardware-accelerated solutions for the most demanding workloads. By offloading, accelerating, and isolating a broad range of advanced networking, storage, and security services, BlueField DPUs provide a secure and accelerated infrastructure. From accelerated AI computing to cloud-native supercomputing, BlueField redefines what’s possible.

The NVIDIA BlueField-3 DPU is the world’s most advanced data center infrastructure on a chip, providing software-defined, hardware-accelerated networking, storage, security, and management services at 400Gb/s. BlueField-3 offloads, accelerates, and isolates data center infrastructure from business applications, transforming traditional computing environments into efficient, high-performance, zero-trust data centers, from cloud to core to edge. Providing industry-standard APIs and an optimized developer experience, the NVIDIA DOCA SDK preserves the investments and commitments in DPU-accelerated applications and services by enabling a future-proof runtime environment as DPU generations continue to evolve. 

Paired with NVIDIA GPUs, BlueField-3 enables modern, cloud-native data center platforms for the age of AI.

ConnectX SmartNICs

Enabling the Accelerated Networking and Security for the Most Advanced HPC and AI Workloads Accelerated Data Center

NVIDIA Connect® InfiniBand and ethernet adapters provide ultra-low latency, extreme throughput, and innovative In-Network Computing engines such as MPI Tag Matching and All-to-All hardware engines, to deliver the acceleration, scalability, and feature-rich technology needed for today’s and tomorrow’s modern workloads.

The NVIDIA ConnectX-7 SmartNIC provides hardware-accelerated networking, storage, security and manageability services at data-center scale for cloud, artificial intelligence and enterprise workloads. ConnectX-7 empowers agile and high-performance networking solutions with features such as Accelerated Switching and Packet Processing (ASAP2), advanced RoCE, GPUDirect Storage, and in-line hardware acceleration for TLS/IPsec/MACsec encryption/decryption.  Providing up to 4-ports of connectivity and 400Gbps of throughput, ConnectX-7 enables organizations to meet their current and future networking needs in both high-bandwidth and high-density environments.

Accelerated Data Science and ML/AI 

ConnectX-7 bolsters a range of in-hardware acceleration engines for data science, machine learning and artificial intelligence.  The GPUDirect RDMA (GDR) technology is a prominent BlueField-3 feature that unlocks high-throughput, low latency network connectivity in data-center scale computing clusters. GPUDirect RDMA allows efficient, zero-copy data transfer between GPUs while facilitating the hardware engines in the BlueField-3 ASIC. GPUDirect Storage (GDS) is a new technology that provides a direct data path to local or remote storage, like NVMe or NVMe-oF and GPU Memory. The role of ConnectX-7 is to allow this is a distributed environment, when the GPU and storage media are not hosted in the same enclosure. ConnectX-7 GDS provides faster and increased bandwidth, lower latency, and increased capacity between storage and GPUs. This becomes especially important as data set sizes no longer fit into system memory, and data I/O to the GPUs grows to be the defining bottleneck in processing time. Enabling a direct path can reduce, if not totally alleviate, this bottleneck for scale-out AI and data science.