Nvidia


2023-08-09

AI GPU Bottleneck Explained: Causes and Prospects for Resolution

Charlie Boyle, Vice President of NVIDIA’s DGX Systems, recently addressed the issue of limited GPU production at the company.

Boyle clarified that the current GPU shortage is not a result of NVIDIA misjudging demand or constraints in Taiwan Semiconductor Manufacturing Company’s (TSMC) wafer production. The primary bottleneck for GPUs lies in the packaging process.

It’s worth noting that the NVIDIA A100 and H100 GPUs are currently manufactured by TSMC using their advanced CoWoS (Chip-on-Wafer-on-Substrate) packaging technology. TSMC has indicated that it may take up to a year and a half, including the completion of additional wafer fabs and expansion of existing facilities, to normalize the backlog of packaging orders.

Furthermore, due to the significant strain on TSMC’s CoWoS capacity, there have been reports of overflow of NVIDIA GPU packaging orders to other manufacturers.

Sources familiar with the matter have revealed that NVIDIA is in discussions with potential alternative suppliers, including Samsung, as secondary suppliers for the 2.5D packaging of NVIDIA’s A100 and H100 GPUs. Other potential suppliers include Amkor and the Siliconware Precision Industries Co., Ltd. (SPIL), a subsidiary of ASE Technology Holding.

In December 2022, Samsung established its Advanced Packaging (AVP) division to seize opportunities in high-end packaging and testing. Sources suggest that if NVIDIA approves of Samsung’s 2.5D packaging process yield, a portion of AI GPU packaging orders may be placed with Samsung.

TrendForce’s research in June this year indicated that driven by strong demand for high-end AI chips and High-Bandwidth Memory (HBM), TSMC’s CoWoS monthly capacity could reach 12,000 units by the end of 2023. Particularly, demand from NVIDIA for A100 and H100 GPUs in AI servers has led to nearly a 50% increase in CoWoS capacity compared to the beginning of the year. Coupled with the growth in demand for high-end AI chips from companies like AMD and Google, the second half of the year is expected to witness tighter CoWoS capacity. This robust demand is projected to continue into 2024, with advanced packaging capacity potentially growing by 30-40% if the necessary equipment is in place.

(Photo credit: NVIDIA)

2023-08-08

[News] US Tech Giants Unite for AI Server Domination, Boosting Taiwan Supply Chain

According to the news from Commercial Times, in a recent press conference, the four major American cloud service providers (CSPs) collectively expressed their intention to expand their investment in AI application services. Simultaneously, they are continuing to enhance their cloud infrastructure. Apple has also initiated its foray into AI development, and both Intel and AMD have emphasized the robust demand for AI servers. These developments are expected to provide a significant boost to the post-market prospects of Taiwan’s AI server supply chain.

Industry insiders have highlighted the ongoing growth of the AI spillover effect, benefiting various sectors ranging from GPU modules, substrates, cooling systems, power supplies, chassis, and rails, to PCB manufacturers.

The American CSP players, including Microsoft, Google, Meta, and Amazon, which recently released their financial reports, have demonstrated growth in their cloud computing and AI-related service segments in their latest quarterly performance reports. Microsoft, Google, and Amazon are particularly competitive in the cloud services arena, and all have expressed optimistic outlooks for future operations.

The direct beneficiaries among Taiwan’s cloud data center suppliers are those in Tier 1, who are poised to reap positive effects on their average selling prices (ASP) and gross margins, driven by the strong demand for AI servers from these CSP giants in the latter half of the year.

Among them, the ODM manufacturers with over six years of collaboration with NVIDIA in multi-GPU architecture AI high-performance computing/cloud computing, including Quanta, Wistron, Wistron, Inventec, Foxconn, and Gigabyte, are expected to see operational benefits further reflected in the latter half of the year. Foxconn and Inventec are the main suppliers of GPU modules and GPU substrates, respectively, and are likely to witness noticeable shipment growth starting in the third quarter.

Furthermore, AI servers not only incorporate multiple GPU modules but also exhibit improvements in aspects such as chassis height, weight, and thermal design power (TDP) compared to standard servers. As a result, cooling solution providers like Asia Vital Components, Auras Technology, and SUNON; power supply companies such as Delta Electronics and Lite-On Technology; chassis manufacturers Chenbro; rail industry players like King Slide, and PCB/CCL manufacturers such as EMC, GCE are also poised to benefit from the increasing demand for AI servers.

(Source: https://ctee.com.tw/news/tech/915830.html)

2023-07-31

High-Tech PCB Manufacturers Poised to Gain from Remarkable Increase in AI Server PCB Revenue

Looking at the impact of AI server development on the PCB industry, mainstream AI servers, compared to general servers, incorporate 4 to 8 GPUs. Due to the need for high-frequency and high-speed data transmission, the number of PCB layers increases, and there’s an upgrade in the adoption of CCL grade as well. This surge in GPU integration drives the AI server PCB output value to surpass that of general servers by several times. However, this advancement also brings about higher technological barriers, presenting an opportunity for high-tech PCB manufacturers to benefit.

TrendForce’s perspective: 

  • The increased value of AI server PCBs primarily comes from GPU boards.

Taking the NVIDIA DGX A100 as an example, its PCB can be divided into CPU boards, GPU boards, and accessory boards. The overall value of the PCB is about 5 to 6 times higher than that of a general server, with approximately 94% of the incremental value attributed to the GPU boards. This is mainly due to the fact that general servers typically do not include GPUs, while the NVIDIA DGX A100 is equipped with 8 GPUs.

Further analysis reveals that CPU boards, which consist of CPU boards, CPU mainboards, and functional accessory boards, make up about 20% of the overall AI server PCB value. On the other hand, GPU boards, including GPU boards, NV Switch, OAM (OCP Accelerator Module), and UBB (Unit Baseboard), account for around 79% of the total AI server PCB value. Accessory boards, composed of components such as power supplies, HDD, and cooling systems, contribute to only about 1% of the overall AI server PCB value.

  • The technological barriers of AI servers are rising, leading to a decrease in the number of suppliers.

Since AI servers require multiple card interconnections with more extensive and denser wiring compared to general servers, and AI GPUs have more pins and an increased number of memory chips, GPU board assemblies may reach 20 layers or more. With the increase in the number of layers, the yield rate decreases.

Additionally, due to the demand for high-frequency and high-speed transmission, CCL materials have evolved from Low Loss grade to Ultra Low Loss grade. As the technological barriers rise, the number of manufacturers capable of entering the AI server supply chain also decreases.

Currently, the suppliers for CPU boards in AI servers include Ibiden, AT&S, Shinko, and Unimicron, while the mainboard PCB suppliers consist of GCE and Tripod. For GPU boards, Ibiden serves as the supplier, and for OAM PCBs, Unimicron and Zhending are the suppliers, with GCE, ACCL, and Tripod currently undergoing certification. The CCL suppliers include EMC. For UBB PCBs, the suppliers are GCE, WUS, and ACCL, with TUC and Panasonic being the CCL suppliers.

Regarding ABF boards, Taiwanese manufacturers have not yet obtained orders for NVIDIA AI GPUs. The main reason for this is the limited production volume of NVIDIA AI GPUs, with an estimated output of only about 1.5 million units in 2023. Additionally, Ibiden’s yield rate for ABF boards with 16 layers or more is approximately 10% to 20% higher than that of Taiwanese manufacturers. However, with TSMC’s continuous expansion of CoWoS capacity, it is expected that the production volume of NVIDIA AI GPUs will reach over 2.7 million units in 2024, and Taiwanese ABF board manufacturers are likely to gain a low single-digit percentage market share.

(Photo credit: Google)

2023-07-06

ASE, Amkor, UMC and Samsung Getting a Slice of the CoWoS Market from AI Chips, Challenging TSMC

AI Chips and High-Performance Computing (HPC) have been continuously shaking up the entire supply chain, with CoWoS packaging technology being the latest area to experience the tremors.

In the previous piece, “HBM and 2.5D Packaging: the Essential Backbone Behind AI Server,” we discovered that the leading AI chip players, Nvidia and AMD, have been dedicated users of TSMC’s CoWoS technology. Much of the groundbreaking tech used in their flagship product series – such as Nvidia’s A100 and H100, and AMD’s Instinct MI250X and MI300 – have their roots in TSMC’s CoWoS tech.

However, with AI’s exponential growth, chip demand from not just Nvidia and AMD has skyrocketed, but other giants like Google and Amazon are also catching up in the AI field, bringing an onslaught of chip demand. The surge of orders is already testing the limits of TSMC’s CoWoS capacity. While TSMC is planning to increase its production in the latter half of 2023, there’s a snag – the lead time of the packaging equipment is proving to be a bottleneck, severely curtailing the pace of this necessary capacity expansion.

Nvidia Shakes the foundation of the CoWoS Supply Chain

In these times of booming demand, maintaining a stable supply is viewed as the primary goal for chipmakers, including Nvidia. While TSMC is struggling to keep up with customer needs, other chipmakers are starting to tweak their outsourcing strategies, moving towards a more diversified supply chain model. This shift is now opening opportunities for other foundries and OSATs.

Interestingly, in this reshuffling of the supply chain, UMC (United Microelectronics Corporation) is reportedly becoming one of Nvidia’s key partners in the interposer sector for the first time, with plans for capacity expansion on the horizon.

From a technical viewpoint, interposer has always been the cornerstone of TSMC’s CoWoS process and technology progression. As the interposer area enlarges, it allows for more memory stack particles and core components to be integrated. This is crucial for increasingly complex multi-chip designs, underscoring Nvidia’s intention to support UMC as a backup resource to safeguard supply continuity.

Meanwhile, as Nvidia secures production capacity, it is observed that the two leading OSAT companies, Amkor and SPIL (as part of ASE), are establishing themselves in the Chip-on-Wafer (CoW) and Wafer-on-Substrate (WoS) processes.

The ASE Group is no stranger to the 2.5D packaging arena. It unveiled its proprietary 2.5D packaging tech as early as 2017, a technology capable of integrating core computational elements and High Bandwidth Memory (HBM) onto the silicon interposer. This approach was once utilized in AMD’s MI200 series server GPU. Also under the ASE Group umbrella, SPIL boasts unique Fan-Out Embedded Bridge (FO-EB) technology. Bypassing silicon interposers, the platform leverages silicon bridges and redistribution layers (RDL) for integration, which provides ASE another competitive edge.

Could Samsung’s Turnkey Service Break New Ground?

In the shifting landscape of the supply chain, the Samsung Device Solutions division’s turnkey service, spanning from foundry operations to Advanced Package (AVP), stands out as an emerging player that can’t be ignored.

After its 2018 split, Samsung Foundry started taking orders beyond System LSI for business stability. In 2023, the AVP department, initially serving Samsung’s memory and foundry businesses, has also expanded its reach to external clients.

Our research indicates that Samsung’s AVP division is making aggressive strides into the AI field. Currently in active talks with key customers in the U.S. and China, Samsung is positioning its foundry-to-packaging turnkey solutions and standalone advanced packaging processes as viable, mature options.

In terms of technology roadmap, Samsung has invested significantly in 2.5D packaging R&D. Mirroring TSMC, the company launched two 2.5D packaging technologies in 2021: the I-Cube4, capable of integrating four HBM stacks and one core component onto a silicon interposer, and the H-Cube, designed to extend packaging area by integrating HDI PCB beneath the ABF substrate, primarily for designs incorporating six or more HBM stack particles.

Besides, recognizing Japan’s dominance in packaging materials and technologies, Samsung recently launched a R&D center there to swiftly upscale its AVP business.

Given all these circumstances, it seems to be only a matter of time before Samsung carves out its own significant share in the AI chip market. Despite TSMC’s industry dominance and pivotal role in AI chip advancements, the rising demand for advanced packaging is set to undeniably reshape supply chain dynamics and the future of the semiconductor industry.

(Source: Nvidia)

2023-06-14

AI Servers: The Savior of the Supply Chain, Examining Key Industries

NVIDIA’s robust financial report reveals the true impact of AI on the technology industry, particularly in the AI server supply chain.

  • Page 43
  • 46 page(s)
  • 230 result(s)

Get in touch with us