NVIDIA GB300 NVL72: 50x Faster AI with Blackwell Ultra GPUs

Here’s a summary in bullet points of the article NVIDIA GB300 NVL72: 50x Faster AI with Blackwell Ultra GPUs:


Overview

  • NVIDIA GB300 NVL72 is a fully liquid-cooled, rack-scale AI server platform.

  • Integrates 36 Grace CPUs (Arm-based) and 72 Blackwell Ultra GPUs.

  • Built for AI reasoning and test-time inference scaling.


Performance Highlights

  • Delivers 50x higher AI reasoning output vs. Hopper-based platforms.

  • Achieves:

    • 5x throughput per megawatt

    • 10x throughput per user

    • Resulting in 50x total AI factory output.


Key Features

  • AI Inference Scaling:

    • Blackwell Ultra Tensor Cores deliver:

      • 1.5x more AI compute FLOPS

      • 2x attention-layer acceleration vs. standard Blackwell.

  • Memory:

    • 288 GB of HBM3e per GPU.

    • Supports longer context lengths and larger batch sizes.

  • Architecture:

    • Based on the new NVIDIA Blackwell architecture.

  • SuperNIC Networking:

    • 2× ConnectX-8 SuperNICs per IO module.

    • Each GPU gets 800 Gb/s network bandwidth.

    • Supports RDMA with Quantum-X800 InfiniBand or Spectrum-X Ethernet.

  • Grace CPUs:

    • Double energy efficiency of top server CPUs.

    • High bandwidth and modern performance for data center workloads.

  • NVLink (Gen 5):

    • 130 TB/s interconnect bandwidth.

    • Enables high-speed GPU-to-GPU communication.

  • Modular Superchip:

    • Each GB300 Superchip has:

      • 4 GPUs

      • 2 Grace CPUs

      • 4 ConnectX-8 SuperNICs

    • 18 of these superchips make up the full NVL72 system.


Availability

  • Expected release in second half of 2025 through NVIDIA partners.


Pricing

  • Estimated between $3.7 million and $4 million per unit.



Conclusion

  • Tailored for hyperscalers and research institutions.

  • Designed to process demanding AI workloads like DeepSeek’s R1 model (1,000 tokens/sec).

  • Pushes the frontier of AI reasoning performance and scalability.