On November 13, NVIDIA unveiled the AI computing platform HGX H200, featuring the Hopper architecture, equipped with H200 Tensor Core GPU and high-end memory to handle the vast amounts of data generated by AI and high-performance computing.
This marks an upgrade from the previous generation H100, with a 1.4x increase in memory bandwidth and a 1.8x increase in capacity, enhancing its capabilities for processing intensive generative AI tasks.
The internal memory changes in H200 represent a significant upgrade, as it adopts the HBM3e for the first time. This results in a notable increase in GPU memory bandwidth, soaring from 3.35TB per second in H100 to 4.8TB per second.
The total memory capacity also sees a substantial boost, rising from 80GB in H100 to 141GB. When compared to H100, these enhancements nearly double the inference speed for the Llama 2 model.
H200 is designed to be compatible with systems that already support H100, according to NVIDIA. The company states that cloud service providers can seamlessly integrate H200 into their product portfolios without the need for any modifications.
This implies that NVIDIA’s server manufacturing partners, including ASRock, ASUS, Dell, Eviden, GIGABYTE, HPE, Ingrasys, Lenovo, Quanta Cloud, Supermicro, Wistron, and Wiwynn, have the flexibility to replace existing processors with H200.
The initial shipments of H200 are expected in the second quarter of 2024, with cloud service giants such as Amazon, Google, Microsoft, and Oracle anticipated to be among the first to adopt H200.
What is HBM?
“The integration of faster and more extensive HBM memory serves to accelerate performance across computationally demanding tasks including generative AI models and [high-performance computing] applications while optimizing GPU utilization and efficiency,” said Ian Buck, the Vice President of High-Performance Computing Products at NVIDIA.
What is HBM? HBM refers to stacking DRAM layers like building blocks and encapsulating them through advanced packaging. This approach increases density while maintaining or even reducing the overall volume, leading to improved storage efficiency.
TrendForce reported that the HBM market’s dominant product for 2023 is HBM2e, employed by the NVIDIA A100/A800, AMD MI200, and most CSPs’ (Cloud Service Providers) self-developed accelerator chips.
As the demand for AI accelerator chips evolves, in 2023, the mainstream demand is projected to shift from HBM2e to HBM3, with estimated proportions of approximately 50% and 39%, respectively.
As the production of acceleration chips utilizing HBM3 increases gradually, the market demand in 2024 is expected to significantly transition to HBM3, surpassing HBM2e directly. The estimated proportion for 2024 is around 60%.
Since Manufacturers plan to introduce new HBM3e products in 2024, HBM3 and HBM3e are expected to become mainstream in the market next year.
TrendForce clarifies that the so-called HBM3 in the current market should be subdivided into two categories based on speed. One category includes HBM3 running at speeds between 5.6 to 6.4 Gbps, while the other features the 8 Gbps HBM3e, which also goes by several names including HBM3P, HBM3A, HBM3+, and HBM3 Gen2.
HBM3e will be stacked with 24Gb mono dies, and under the 8-layer (8Hi) foundation, the capacity of a single HBM3e will jump to 24GB.
According to the TrendForce’s previous news release, the three major manufacturers currently leading the HBM competition – SK hynix, Samsung, and Micron – have the following progress updates.
SK hynix and Samsung began their efforts with HBM3, which is used in NVIDIA’s H100/H800 and AMD’s MI300 series products. These two manufacturers are expected to sample HBM3e in Q1 2024 previously. Meanwhile, Micron chose to skip HBM3 and directly develop HBM3e.
However, according to the latest TrendForce survey, as of the end of July this year, Micron has already provided NVIDIA with HBM3e verification, while SK hynix did so in mid-August, and Samsung in early October.
(Image: Nvidia)