Press Releases
High Bandwidth Memory (HBM) is emerging as the preferred solution for overcoming memory transfer speed restrictions due to the bandwidth limitations of DDR SDRAM in high-speed computation. HBM is recognized for its revolutionary transmission efficiency and plays a pivotal role in allowing core computational components to operate at their maximum capacity. Top-tier AI server GPUs have set a new industry standard by primarily using HBM. TrendForce forecasts that global demand for HBM will experience almost 60% growth annually in 2023, reaching 290 million GB, with a further 30% growth in 2024.
TrendForce’s forecast for 2025, taking into account five large-scale AIGC products equivalent to ChatGPT, 25 mid-size AIGC products from Midjourney, and 80 small AIGC products, the minimum computing resources required globally could range from 145,600 to 233,700 Nvidia A100 GPUs. Emerging technologies such as supercomputers, 8K video streaming, and AR/VR, among others, are expected to simultaneously increase the workload on cloud computing systems due to escalating demands for high-speed computing.
HBM is unequivocally a superior solution for building high-speed computing platforms, thanks to its higher bandwidth and lower energy consumption compared to DDR SDRAM. This distinction is clear when comparing DDR4 SDRAM and DDR5 SDRAM, released in 2014 and 2020 respectively, whose bandwidths only differed by a factor of two. Regardless of whether DDR5 or the future DDR6 is used, the quest for higher transmission performance will inevitably lead to an increase in power consumption, which could potentially affect system performance adversely. Taking HBM3 and DDR5 as examples, the former’s bandwidth is 15 times that of the latter and can be further enhanced by adding more stacked chips. Furthermore, HBM can replace a portion of GDDR SDRAM or DDR SDRAM, thus managing power consumption more effectively.
TrendForce concludes that the current driving force behind the increasing demand is AI servers equipped with Nvidia A100, H100, AMD MI300, and large CSPs such as Google and AWS, which are developing their own ASICs. It is estimated that the shipment volume of AI servers, including those equipped with GPUs, FPGAs, and ASICs, will reach nearly 1.2 million units in 2023, marking an annual growth rate of almost 38%. TrendForce also anticipates a concurrent surge in the shipment volume of AI chips, with growth potentially exceeding 50%.
In-Depth Analyses
With the advancements in AIGC models such as ChatGPT and Midjourney, we are witnessing the rise of more super-sized language models, opening up new possibilities for High-Performance Computing (HPC) platforms.
According to TrendForce, by 2025, the global demand for computational resources in the AIGC industry – assuming 5 super-sized AIGC products equivalent to ChatGPT, 25 medium-sized AIGC products equivalent to Midjourney, and 80 small-sized AIGC products – would be approximately equivalent to 145,600 – 233,700 units of NVIDIA A100 GPUs. This highlights the significant impact of AIGC on computational requirements.
Additionally, the rapid development of supercomputing, 8K video streaming, and AR/VR will also lead to an increased workload on cloud computing systems. This calls for highly efficient computing platforms that can handle parallel processing of vast amounts of data.
However, a critical concern is whether hardware advancements can keep pace with the demands of these emerging applications.
HBM: The Fast Lane to High-Performance Computing
While the performance of core computing components like CPUs, GPUs, and ASICs has improved due to semiconductor advancements, their overall efficiency can be hindered by the limited bandwidth of DDR SDRAM.
For example, from 2014 to 2020, CPU performance increased over threefold, while DDR SDRAM bandwidth only doubled. Additionally, the pursuit of higher transmission performance through technologies like DDR5 or future DDR6 increases power consumption, posing long-term impacts on computing systems’ efficiency.
Recognizing this challenge, major chip manufacturers quickly turned their attention to new solutions. In 2013, AMD and SK Hynix made separate debuts with their pioneering products featuring High Bandwidth Memory (HBM), a revolutionary technology that allows for stacking on GPUs and effectively replacing GDDR SDRAM. It was recognized as an industry standard by JEDEC the same year.
In 2015, AMD introduced Fiji, the first high-end consumer GPU with integrated HBM, followed by NVIDIA’s release of P100, the first AI server GPU with HBM in 2016, marking the beginning of a new era for server GPU’s integration with HBM.
HBM’s rise as the mainstream technology sought after by key players can be attributed to its exceptional bandwidth and lower power consumption when compared to DDR SDRAM. For example, HBM3 delivers 15 times the bandwidth of DDR5 and can further increase the total bandwidth by adding more stacked dies. Additionally, at system level, HBM can effectively manage power consumption by replacing a portion of GDDR SDRAM or DDR SDRAM.
As computing power demands increase, HBM’s exceptional transmission efficiency unlocks the full potential of core computing components. Integrating HBM into server GPUs has become a prominent trend, propelling the global HBM market to grow at a compound annual rate of 40-45% from 2023 to 2025, according to TrendForce.
The Crucial Role of 2.5D Packaging
In the midst of this trend, the crucial role of 2.5D packaging technology in enabling such integration cannot be overlooked.
TSMC has been laying the groundwork for 2.5D packaging technology with CoWoS (Chip on Wafer on Substrate) since 2011. This technology enables the integration of logic chips on the same silicon interposer. The third-generation CoWoS technology, introduced in 2016, allowed the integration of logic chips with HBM and was adopted by NVIDIA for its P100 GPU.
With development in CoWoS technology, the interposer area has expanded, accommodating more stacked HBM dies. The 5th-generation CoWoS, launched in 2021, can integrate 8 HBM stacks and 2 core computing components. The upcoming 6th-generation CoWoS, expected in 2023, will support up to 12 HBM stacks, meeting the requirements of HBM3.
TSMC’s CoWoS platform has become the foundation for high-performance computing platforms. While other semiconductor leaders like Samsung, Intel, and ASE are also venturing into 2.5D packaging technology with HBM integration, we think TSMC is poised to be the biggest winner in this emerging field, considering its technological expertise, production capacity, and order capabilities.
In conclusion, the remarkable transmission efficiency of HBM, facilitated by the advancements in 2.5D packaging technologies, creates an exciting prospect for the seamless convergence of these innovations. The future holds immense potential for enhanced computing experiences.
Press Releases
According to TrendForce’s latest report on the server industry, not only have emerging applications in recent years accelerated the pace of AI and HPC development, but the complexity of models built from machine learning applications and inferences that involve increasingly sophisticated calculations has also undergone a corresponding growth as well, resulting in more data to be processed. While users are confronted with an ever-growing volume of data along with constraints placed by existing hardware, they must make tradeoffs among performance, memory capacity, latency, and cost. HBM (High Bandwidth Memory) and CXL (Compute Express Link) have thus emerged in response to the aforementioned conundrum. In terms of functionality, HBM is a new type of DRAM that addresses more diverse and complex computational needs via its high I/O speeds, whereas CXL is an interconnect standard that allows different processors, or xPUs, to more easily share the same memory resources.
HBM breaks through bandwidth limitations of traditional DRAM solutions through vertical stacking of DRAM dies
Memory suppliers developed HBM in order to be free from the previous bandwidth constraints posed by traditional memory solutions. Regarding memory architecture, HBM consists of a base logic die with DRAM dies vertically stacked on top of the logic die. The 3D-stacked DRAM dies are interconnected with TSV and microbumps, thereby enabling HBM’s high-bandwidth design. The mainstream HBM memory stacks involve four or eight DRAM die layers, which are referred to as “4-hi” or “8-hi”, respectively. Notably, the latest HBM product currently in mass production is HBM2e. This generation of HBM contains four or eight layers of 16Gb DRAM dies, resulting in a memory capacity of 8GB or 16GB per single HBM stack, respectively, with a bandwidth of 410-460GB/s. Samples of the next generation of HBM products, named HBM3, have already been submitted to relevant organizations for validation, and these products will likely enter mass production in 2022.
TrendForce’s investigations indicate that HBM comprises less than 1% of total DRAM bit demand for 2021 primarily because of two reasons. First, the vast majority of consumer applications have yet to adopt HBM due to cost considerations. Second, the server industry allocates less than 1% of its hardware to AI applications; more specifically, servers that are equipped with AI accelerators account for less than 1% of all servers currently in use, not to mention the fact that most AI accelerators still use GDDR5(x) and GDDR6 memories, as opposed to HBM, to support their data processing needs.
Although HBM currently remains in the developmental phase, as applications become increasingly reliant on AI usage (more precise AI needs to be supported by more complex models), computing hardware will then require the integration of HBM to operate these applications effectively. In particular, FPGA and ASIC represent the two hardware categories that are most closely related to AI development, with Intel’s Stratix and Agilex-M as well as Xilinx’s Versal HBM being examples of FPGA with onboard HBM. Regarding ASIC, on the other hand, most CSPs are gradually adopting their own self-designed ASICs, such Google’s TPU, Tencent’s Enflame DTU, and Baidu’s Kunlun – all of which are equipped with HBM – for AI deployments. In addition, Intel will also release a high-end version of its Sapphire Rapids server CPU equipped with HBM by the end of 2022. Taking these developments into account, TrendForce believes that an increasing number of HBM applications will emerge going forward due to HBM’s critical role in overcoming hardware-related bottlenecks in AI development.
A new memory standard born out of demand from high-speed computing, CXL will be more effective in integrating resources of whole system
Evolved from PCIe Gen5, CXL is a memory standard that provides high-speed and low-latency interconnections between the CPU and other accelerators such as the GPU and FPGA. It enables memory virtualization so that different devices can share the same memory pool, thereby raising the performance of a whole computer system while reducing its cost. Hence, CXL can effectively deal with the heavy workloads related to AI and HPC applications.
CXL is just one of several interconnection technologies that feature memory sharing. Other examples that are also in the market include NVLink from NVIDIA and Gen-Z from AMD and Xilinx. Their existence is an indication that the major ICT vendors are increasingly attentive to the integration of various resources within a computer system. TrendForce currently believes that CXL will come out on top in the competition mainly because it is introduced and promoted by Intel, which has an enormous advantage with respect to the market share for CPUs. With Intel’s support in the area of processors, CXL advocates and hardware providers that back the standard will be effective in organizing themselves into a supply chain for the related solutions. The major ICT companies that have in turn joined the CXL Consortium include AMD, ARM, NVIDIA, Google, Microsoft, Facebook (Meta), Alibaba, and Dell. All in all, CXL appears to be the most favored among memory protocols.
The consolidation of memory resources among the CPU and other devices can reduce communication latency and boost the computing performance needed for AI and HPC applications. For this reason, Intel will provide CXL support for its next-generation server CPU Sapphire Rapids. Likewise, memory suppliers have also incorporated CXL support into their respective product roadmaps. Samsung has announced that it will be launching CXL-supported DDR5 DRAM modules that will further expand server memory capacity so as to meet the enormous resource demand of AI computing. There is also a chance that CXL support will be extended to NAND Flash solutions in the future, thus benefiting the development of both types of memory products.
Synergy between HBM and CXL will contribute significantly to AI development; their visibility will increase across different applications starting in 2023
TrendForce believes that the market penetration rate of CXL will rise going forward as this interface standard is built into more and more CPUs. Also, the combination of HBM and CXL will be increasingly visible in the future hardware designs of AI servers. In the case of HBM, it will contribute to a further ramp-up of data processing speed by increasing the memory bandwidth of the CPU or the accelerator. As for CXL, it will enable high-speed interconnections among CPU and other devices. By working together, HBM and CXL will raise computing power and thereby expedite the development of AI applications.
The latest advances in memory pooling and sharing will help overcome the current hardware bottlenecks in the designs of different AI models and continue the trend of more sophisticated architectures. TrendForce anticipates that the adoption rate of CXL-supported Sapphire Rapids processors will reach a certain level, and memory suppliers will also have put their HBM3 products and their CXL-supported DRAM and SSD products into mass production. Hence, examples of HBM-CXL synergy in different applications will become increasingly visible from 2023 onward.
For more information on reports and market data from TrendForce’s Department of Semiconductor Research, please click here, or email Ms. Latte Chung from the Sales Department at lattechung@trendforce.com