AI GPU


2024-10-31

[News] Datacenter GPUs May Have an Astonishingly Short Lifespan of Only 1 to 3 Years

According to a report from Tom’s Hardware, while major tech companies are investing heavily in datacenter GPUs, the lifespan of these GPUs may only be 1 to 3 years, depending on their utilization rates.

The report, citing a general architect at Alphabet, noted that because GPUs are under heavy workload of AI training and inference, they tend to wear out more quickly than other components.

According to the report, in datacenters operated by cloud service providers (CSPs), the utilization rate of GPUs for AI workloads ranges from approximately 60% to 70%.

The report indicated that, citing the words from the general architect at Alphabet, at this utilization rate, a GPU can typically survive for 1 to 2 years, or up to 3 years. While the report stated that this claim cannot be considered 100% accurate and requires further confirmation, it highlighted that modern datacenter GPUs for AI and HPC applications consume and dissipate 700W of power or more, which is significant stress for chips.

One way to extend the life of the GPUs is to reduce the utilization rate, according to the report. However, to reduce the utilization rate implies that the GPUs will lose value more gradually and it will take longer to return their capital, which isn’t ideal for business. Therefore, the report pointed out that most cloud service providers will use their GPUs at a high utilization rate.

The report also references a study conducted by Meta, which describes training its Llama 3 405B model on a cluster powered by 16,384 NVIDIA H100 80GB GPUs. According to the report, in that study, the model flop utilization (MFU) rate of the cluster was about 38% (using BF16), while during a 54-day pre-training snapshot, out of 419 unforeseen disruptions, 148 (30.1%) were caused by GPU failures (including NVLink fails) and 72 (17.2%) were due to HBM3 memory failures.

This result carried out by Meta, according to the report, is quite favorable for NVIDIA’s H100 GPUs. If GPUs and their memory fail at Meta’s rate, the annualized failure rate will be about 9%, and in 3 years, it will be about 27%. However, GPUs will likely fail more frequently after a year of heavy use, as the report pointed out.

(Photo credit: NVIDIA)

Please note that this article cites information from Tom’s Hardware and Meta.

2024-08-05

[News] NVIDIA’s Backup Plan? Intel Reportedly Secures Packaging Orders from the AI Giant

As the demand for AI GPUs increases, TSMC’s advanced packaging capacity for CoWoS is struggling to keep up. Recently, according to a report from Commercial Times, NVIDIA has reportedly turned to Intel for advanced packaging solutions.

According to industry sources cited by the same report, TSMC’s CoWoS-S and Intel’s Foveros packaging technologies are similar, allowing clients to turn to Intel and secures the capacity needed quickly.

Despite its current struggling on transformation, Intel has been gradually developing its ‘s foundry services. In addition to clients like Qualcomm and Microsoft, Intel’s advanced packaging has also attracted interest from companies like Cisco and AWS.

Under the IDM 2.0 strategy, Intel has opened up its wafer outsourcing and foundry services to customers, establishing an the independent IFS foundry service. Earlier this year, Intel secured a major USD 15 billon foundry order from Microsoft for the first system-level AI foundry service, which is expected to use the Intel 18A process.

The report from Commercial Times further suggested that Microsoft’s move is anticipated to reduce its heavy reliance on TSMC. The report also indicates that chip customers, including NVIDIA, have engaged with Intel. Intel’s flexible foundry strategy, which can provide advanced packaging, software, and chiplet services tailored to customer needs, has been well-received by chipmakers.

Sources cited by the same report reveal that the U.S. has begun allocating specialized funds to increase investments in the advanced packaging sector as well. This move could highlight the importance of advanced packaging as the next key area for global competition in production capacity.

In November last year, the U.S. Department of Commerce’s National Institute of Standards and Technology (NIST) released a report titled “National Advanced Packaging Manufacturing Program,” highlighting that advanced packaging technology is one of the key technologies in semiconductor manufacturing.

Additionally, the U.S. Department of Commerce plans to invest approximately USD 3 billion to advance the National Advanced Packaging Manufacturing Program. Intel, alongside Amkor, is another giant in local advanced packaging in the U.S.

The main focus of advanced packaging is on interconnect density, power efficiency, and scaling. From Foveros to hybrid bonding technology, Intel is gradually scaling down bumping pitch sizes, which allows for higher current loads and better thermal performance.

Furthermore, in May last year, Intel’s advanced packaging technology roadmap outlined plans to transition from traditional substrates to more advanced glass substrates.

Read more

(Photo credit: Intel)

Please note that this article cites information from Commercial Times and NIST.

2023-10-13

[News] TSMC’s Investor Meeting on the 19th with Market’s Attention on Five Key Topics

TSMC is set to conduct an investor meeting on the 19th, with Morgan Stanley, UBS, and Bank of America Securities releasing their latest reports ahead of the event. These reports highlight five main areas of interest:

1. Q4 Operational Outlook
2. Future Gross Margin Trends
3. Potential Adjustments to Full-Year Revenue Estimates and Capital Expenditure
4. Economic and Operational Outlook for the Coming Year
5. 2nm Production Plans

Despite market uncertainties surrounding factors such as end-market demand, the Chinese mainland’s economic trajectory, and semiconductor industry cycles, Morgan Stanley Securities anticipates a 10% QoQ increase in TSMC’s Q4 revenue. They attribute this to strong demand for AI GPUs and ASICs, urgent orders from products like smartphone system-on-chips (SoCs) and PC GPUs, as well as sustained demand for Apple’s iPhones. Additionally, the gross margin is expected to benefit from the depreciation of the New Taiwan Dollar, potentially reaching 53%, surpassing the market consensus of 52.2%.

Bank of America Securities similarly projects a 10% QoQ revenue growth for TSMC in Q4, with a gross margin estimate of 52.7%. UBS Securities, on the other hand, has adjusted its Q4 revenue growth forecast from 10% to 7% while maintaining their expectation of a 10% YoY decline in full-year revenue.

In terms of capital expenditures, Morgan Stanley Securities, taking into account factors such as Intel’s 3nm outsourcing and delays in the U.S. factory expansion, estimates that TSMC’s capital expenditures will remain around $28 billion for both this year and the next. UBS Securities, however, believes that due to a slower short-term business recovery, capital expenditures for this year and the next will be adjusted to $31 billion and $30 billion, respectively.

Explore More

(Photo credit: TSMC)

  • Page 1
  • 1 page(s)
  • 3 result(s)

Get in touch with us