Here’s a summary in bullet points of the article “NVIDIA GB300 NVL72: 50x Faster AI with Blackwell Ultra GPUs“:
Overview
-
NVIDIA GB300 NVL72 is a fully liquid-cooled, rack-scale AI server platform.
-
Integrates 36 Grace CPUs (Arm-based) and 72 Blackwell Ultra GPUs.
-
Built for AI reasoning and test-time inference scaling.
Performance Highlights
-
Delivers 50x higher AI reasoning output vs. Hopper-based platforms.
-
Achieves:
-
5x throughput per megawatt
-
10x throughput per user
-
Resulting in 50x total AI factory output.
-
Key Features
-
AI Inference Scaling:
-
Blackwell Ultra Tensor Cores deliver:
-
1.5x more AI compute FLOPS
-
2x attention-layer acceleration vs. standard Blackwell.
-
-
-
Memory:
-
288 GB of HBM3e per GPU.
-
Supports longer context lengths and larger batch sizes.
-
-
Architecture:
-
Based on the new NVIDIA Blackwell architecture.
-
-
SuperNIC Networking:
-
2× ConnectX-8 SuperNICs per IO module.
-
Each GPU gets 800 Gb/s network bandwidth.
-
Supports RDMA with Quantum-X800 InfiniBand or Spectrum-X Ethernet.
-
-
Grace CPUs:
-
Double energy efficiency of top server CPUs.
-
High bandwidth and modern performance for data center workloads.
-
-
NVLink (Gen 5):
-
130 TB/s interconnect bandwidth.
-
Enables high-speed GPU-to-GPU communication.
-
-
Modular Superchip:
-
Each GB300 Superchip has:
-
4 GPUs
-
2 Grace CPUs
-
4 ConnectX-8 SuperNICs
-
-
18 of these superchips make up the full NVL72 system.
-
️ Availability
-
Expected release in second half of 2025 through NVIDIA partners.
Pricing
-
Estimated between $3.7 million and $4 million per unit.
Conclusion
-
Tailored for hyperscalers and research institutions.
-
Designed to process demanding AI workloads like DeepSeek’s R1 model (1,000 tokens/sec).
-
Pushes the frontier of AI reasoning performance and scalability.