News
Meta Platform, parent company of Facebook, has announced its latest generation AI chip of its Training and Inference Accelerator (MTIA) on April 10th, fabricated using TSMC’s 5nm process. According to a report from Commercial Times, this move is expected to reduce Meta’s reliance on NVIDIA’s chips and enhance computational power for AI services.
In its shift towards AI services, Meta requires greater computational capabilities. Thus, last year, Meta introduced its AI models to compete with OpenAI’s ChatGPT. The latest AI chip, Artemis, is an upgraded version of MTIA introduced last year, assisting platforms like Facebook and Instagram with content ranking and recommendations.
Meta’s new generation AI chip will be produced by TSMC using the 5nm process. Meta reveals that Artemis offers triple the performance of the first-generation MTIA.
In October last year, Meta announced plans to invest USD 35 billion to establish infrastructure supporting AI, including data centers and hardware. CEO Mark Zuckerberg told investors, “In terms of investment priorities, AI will be our biggest investment area in 2024 for both engineering and compute resources.”
Meta’s proprietary AI chips are deployed in data centers to power AI applications. Meta has several ongoing projects aimed at expanding MTIA’s application scope, including supporting generative AI workloads.
The trend of tech giants developing their own AI chips is evident, with Meta joining competitors like Amazon, Microsoft, and Google in internal AI chip development to reduce reliance on NVIDIA. Google recently unveiled its latest data center AI chip, TPU v5p, on the 9th. Meanwhile, Intel is targeting NVIDIA’s H100 with its new AI chip, Gaudi 3.
Read more
(Photo credit: Meta)
News
Intel, Qualcomm, Google, and other tech giants are reportedly joining forces with over a hundred startups to challenge NVIDIA’s dominance in the market, as per a report from Reuters. Reportedly, their goal is to collectively penetrate the artificial intelligence (AI) software domain, guiding developers to migrate away from NVIDIA’s CUDA software platform.
NVIDIA’s CUDA is a parallel computing platform and programming model designed specifically to accelerate GPU computing. It allows GPU users to fully leverage their chip’s computational power in AI and other applications. As per a previous report from TrendForce, since 2006, NVIDIA has introduced the CUDA architecture, nearly ubiquitous in educational institutions. Thus, almost all AI engineers encounter CUDA during their academic tenure.
However, tech giants are now reportedly aiming to disrupt the current status quo. According to a report from Reuters on March 25th, Intel, Qualcomm, and Google are teaming up to challenge NVIDIA’s dominant position. They plan to provide alternative solutions for developers to reduce dependence on NVIDIA, encourage application migration to other platforms, and thereby break NVIDIA’s software monopoly and weaken its market influence.
The same report from Reuters further indicated that several tech companies have formed the “UXL Foundation,” named after the concept of “Unified Acceleration” (UXL), which aims to harness the power of acceleration computing using any hardware.
The project plans to leverage Intel’s oneAPI technology to develop software and tools supporting multiple AI accelerator chips. The goal is to reduce the technical barriers developers face when dealing with different hardware platforms, streamline the development process, enhance efficiency, and accelerate innovation and application of AI technology.
Vinesh Sukumar, Head of AI and Machine Learning Platform at Qualcomm, stated, “We’re actually showing developers how you migrate out from an NVIDIA platform.”
Bill Magro, Head of High-Performance Computing at Google, expressed, “It’s about specifically – in the context of machine learning frameworks – how do we create an open ecosystem, and promote productivity and choice in hardware.” The foundation is said to aim to finalize technical specifications in the first half of this year and strives to refine technical details by the end of the year.
However, CUDA software has established a solid foundation in the AI field, making it unlikely to be shaken overnight. Jay Goldberg, CEO of financial and strategic advisory firm D2D Advisory, believes that CUDA’s importance lies not only in its software capabilities but also in its 15-year history of usage. A vast amount of code has been built around it, deeply ingraining CUDA in numerous AI and high-performance computing projects. Changing this status quo would require overcoming significant inertia and dependency.
Read more
(Photo credit: NVIDIA)
News
In 2023, “generative AI” was undeniably the hottest term in the tech industry.
The launch of the generative application ChatGPT by OpenAI has sparked a frenzy in the market, prompting various tech giants to join the race.
As per a report from TechNews, currently, NVIDIA dominates the market by providing AI accelerators, but this has led to a shortage of their AI accelerators in the market. Even OpenAI intends to develop its own chips to avoid being constrained by tight supply chains.
On the other hand, due to restrictions arising from the US-China tech war, while NVIDIA has offered reduced versions of its products to Chinese clients, recent reports suggest that these reduced versions are not favored by Chinese customers.
Instead, Chinese firms are turning to Huawei for assistance or simultaneously developing their own chips, expected to keep pace with the continued advancement of large-scale language models.
In the current wave of AI development, NVIDIA undoubtedly stands as the frontrunner in AI computing power. Its A100/H100 series chips have secured orders from top clients worldwide in the AI market.
As per analyst Stacy Rasgon from the Wall Street investment bank Bernstein Research, the cost of each query using ChatGPT is approximately USD 0.04. If ChatGPT queries were to scale to one-tenth of Google’s search volume, the initial deployment would require approximately USD 48.1 billion worth of GPUs for computation, with an annual requirement of about USD 16 billion worth of chips to sustain operations, along with a similar amount for related chips to execute tasks.
Therefore, whether to reduce costs, decrease overreliance on NVIDIA, or even enhance bargaining power further, global tech giants have initiated plans to develop their own AI accelerators.
Per reports by technology media The Information, citing industry sources, six global tech giants, including Microsoft, OpenAI, Tesla, Google, Amazon, and Meta, are all investing in developing their own AI accelerator chips. These companies are expected to compete with NVIDIA’s flagship H100 AI accelerator chips.
Progress of Global Companies’ In-house Chip Development
Rumors surrounding Microsoft’s in-house AI chip development have never ceased.
At the annual Microsoft Ignite 2023 conference, the company finally unveiled the Azure Maia 100 AI chip for data centers and the Azure Cobalt 100 cloud computing processor. In fact, rumors of Microsoft developing an AI-specific chip have been circulating since 2019, aimed at powering large language models.
The Azure Maia 100, introduced at the conference, is an AI accelerator chip designed for tasks such as running OpenAI models, ChatGPT, Bing, GitHub Copilot, and other AI workloads.
According to Microsoft, the Azure Maia 100 is the first-generation product in the series, manufactured using a 5-nanometer process. The Azure Cobalt is an Arm-based cloud computing processor equipped with 128 computing cores, offering a 40% performance improvement compared to several generations of Azure Arm chips. It provides support for services such as Microsoft Teams and Azure SQL. Both chips are produced by TSMC, and Microsoft is already designing the second generation.
OpenAI is also exploring the production of in-house AI accelerator chips and has begun evaluating potential acquisition targets. According to earlier reports from Reuters citing industry sources, OpenAI has been discussing various solutions to address the shortage of AI chips since at least 2022.
Although OpenAI has not made a final decision, options to address the shortage of AI chips include developing their own AI chips or further collaborating with chip manufacturers like NVIDIA.
OpenAI has not provided an official comment on this matter at the moment.
Electric car manufacturer Tesla is also actively involved in the development of AI accelerator chips. Tesla primarily focuses on the demand for autonomous driving and has introduced two AI chips to date: the Full Self-Driving (FSD) chip and the Dojo D1 chip.
The FSD chip is used in Tesla vehicles’ autonomous driving systems, while the Dojo D1 chip is employed in Tesla’s supercomputers. It serves as a general-purpose CPU, constructing AI training chips to power the Dojo system.
Google began secretly developing a chip focused on AI machine learning algorithms as early as 2013 and deployed it in its internal cloud computing data centers to replace NVIDIA’s GPUs.
The custom chip, called the Tensor Processing Unit (TPU), was unveiled in 2016. It is designed to execute large-scale matrix operations for deep learning models used in natural language processing, computer vision, and recommendation systems.
In fact, Google had already constructed the TPU v4 AI chip in its data centers by 2020. However, it wasn’t until April 2023 that technical details of the chip were publicly disclosed.
As for Amazon Web Services (AWS), the cloud computing service provider under Amazon, it has been a pioneer in developing its own chips since the introduction of the Nitro1 chip in 2013. AWS has since developed three product lines of in-house chips, including network chips, server chips, and AI machine learning chips.
Among them, AWS’s lineup of self-developed AI chips includes the inference chip Inferentia and the training chip Trainium.
On the other hand, AWS unveiled the Inferentia 2 (Inf2) in early 2023, specifically designed for artificial intelligence. It triples computational performance while increasing accelerator total memory by a quarter.
It supports distributed inference through direct ultra-high-speed connections between chips and can handle up to 175 billion parameters, making it the most powerful in-house manufacturer in today’s AI chip market.
Meanwhile, Meta, until 2022, continued using CPUs and custom-designed chipsets tailored for accelerating AI algorithms to execute its AI tasks.
However, due to the inefficiency of CPUs compared to GPUs in executing AI tasks, Meta scrapped its plans for a large-scale rollout of custom-designed chips in 2022. Instead, it opted to purchase NVIDIA GPUs worth billions of dollars.
Still, amidst the surge of other major players developing in-house AI accelerator chips, Meta has also ventured into internal chip development.
On May 19, 2023, Meta further unveiled its AI training and inference chip project. The chip boasts a power consumption of only 25 watts, which is 1/20th of the power consumption of comparable products from NVIDIA. It utilizes the RISC-V open-source architecture. According to market reports, the chip will also be produced using TSMC’s 7-nanometer manufacturing process.
China’s Progress on In-House Chip Development
China’s journey in developing in-house chips presents a different picture. In October last year, the United States expanded its ban on selling AI chips to China.
Although NVIDIA promptly tailored new chips for the Chinese market to comply with US export regulations, recent reports suggest that major Chinese cloud computing clients such as Alibaba and Tencent are less inclined to purchase the downgraded H20 chips. Instead, they have begun shifting their orders to domestic suppliers, including Huawei.
This shift in strategy indicates a growing reliance on domestically developed chips from Chinese companies by transferring some orders for advanced semiconductors to China.
TrendForce indicates that currently about 80% of high-end AI chips purchased by Chinese cloud operators are from NVIDIA, but this figure may decrease to 50% to 60% over the next five years.
If the United States continues to strengthen chip controls in the future, it could potentially exert additional pressure on NVIDIA’s sales in China.
Read more
(Photo credit: NVIDIA)
Insights
Microsoft announced the in-house AI chip, Azure Maia 100, at the Ignite developer conference in Seattle on November 15, 2023. This chip is designed to handle OpenAI models, Bing, GitHub Copilot, ChatGPT, and other AI services. Support for Copilot, Azure OpenAI is expected to commence in early 2024.
TrendForce’s Insights:
Microsoft has not disclosed detailed specifications for Azure Maia 100. Currently, it is known that the chip will be manufactured using TSMC’s 5nm process, featuring 105 billion transistors and supporting at least INT8 and INT4 precision formats. While Microsoft has indicated that the chip will be used for both training and inference, the computational formats it supports suggest a focus on inference applications.
This emphasis is driven by its incorporation of the less common INT4 low-precision computational format in comparison to other CSP manufacturers’ AI ASICs. Additionally, the lower precision contributes to reduced power consumption, shortening inference times, enhancing efficiency. However, the drawback lies in the sacrifice of accuracy.
Microsoft initiated its in-house AI chip project, “Athena,” in 2019. Developed in collaboration with OpenAI. Azure Maia 100, like other CSP manufacturers, aims to reduce costs and decrease dependency on NVIDIA. Despite Microsoft entering the field of proprietary AI chips later than its primary competitors, its formidable ecosystem is expected to gradually demonstrate a competitive advantage in this regard.
Google led the way with its first in-house AI chip, TPU v1, introduced as early as 2016, and has since iterated to the fifth generation with TPU v5e. Amazon followed suit in 2018 with Inferentia for inference, introduced Trainium for training in 2020, and launched the second generation, Inferentia2, in 2023, with Trainium2 expected in 2024.
Meta plans to debut its inaugural in-house AI chip, MTIA v1, in 2025. Given the releases from major competitors, Meta has expedited its timeline and is set to unveil the second-generation in-house AI chip, MTIA v2, in 2026.
Unlike other CSP manufacturers, both MTIA v1 and MTIA v2 adopt the RISC-V architecture, while other CSP manufacturers opt for the ARM architecture. RISC-V is a fully open-source architecture, requiring no instruction set licensing fees. The number of instructions (approximately 200) in RISC-V is lower than ARM (approximately 1,000).
This choice allows chips utilizing the RISC-V architecture to achieve lower power consumption. However, the RISC-V ecosystem is currently less mature, resulting in fewer manufacturers adopting it. Nevertheless, with the growing trend in data centers towards energy efficiency, it is anticipated that more companies will start incorporating RISC-V architecture into their in-house AI chips in the future.
The competition among AI chips will ultimately hinge on the competition of ecosystems. Since 2006, NVIDIA has introduced the CUDA architecture, nearly ubiquitous in educational institutions. Thus, almost all AI engineers encounter CUDA during their academic tenure.
In 2017, NVIDIA further solidified its ecosystem by launching the RAPIDS AI acceleration integration solution and the GPU Cloud service platform. Notably, over 70% of NVIDIA’s workforce comprises software engineers, emphasizing its status as a software company. The performance of NVIDIA’s AI chips can be further enhanced through software innovations.
On the contrary, Microsoft possess a robust ecosystem like Windows. The recent Intel Arc GPU A770 showcased a 1.7x performance improvement in AI-driven Stable Diffusion on Microsoft Olive, this demonstrates that, similar to NVIDIA, Microsoft has the capability to enhance GPU performance through software.
Consequently, Microsoft’s in-house AI chips are poised to achieve superior performance in software collaboration compared to other CSP manufacturers, providing Microsoft with a competitive advantage in the AI competition.
Read more
News
Rumors swirl around AMD’s upcoming chip architecture, codenamed “Prometheus,” featuring the Zen 5C core. As reported by TechNews, the chip is poised to leverage both TSMC’s 3nm and Samsung’s 4nm processes simultaneously, marking a shift in the competitive landscape from process nodes, yield, and cost to factors like capacity, ecosystem, and geopolitics, are all depends on customer considerations.
Examining yields, TSMC claims an estimated 80% yield for its 4nm process, while Samsung has surged from 50% to an impressive 75%, aligning with TSMC’s standards and raising the likelihood of chip customers returning. Speculation abounds that major players such as Qualcomm and Nvidia may reconsider their suppliers, with industry sources suggesting Samsung’s 4nm capacity is roughly half of TSMC’s.
Revegnus, a reputable X(formerly Twitter) source, unveiled information from high-level Apple meetings, indicating a 63% yield for TSMC’s 3nm process but at double the price of the 4nm process. In the 4nm realm, Samsung’s yield mirrors TSMC’s, with Samsung showing a faster-than-expected yield recovery.
Consequently, with Samsung’s significant improvements in yield and capacity, coupled with TSMC’s decision to raise prices, major clients may explore secondary suppliers to diversify outsourcing orders, factoring in considerations such as cost and geopolitics. Recent reports suggest Samsung is in final negotiations for a 4nm collaboration with AMD, planning to shift some 4nm processor orders from TSMC to Samsung.
Beyond AMD, the Tensor G3 processor in Google’s Pixel 8 series this year adopts Samsung’s 4nm process. Samsung’s new fabs in Taylor, Texas, sees its inaugural customer in its Galaxy smartphones, producing Exynos processors.
Furthermore, Samsung announced that U.S.-based AI solution provider Groq will entrust the company to manufacture next-generation AI chips using the 4nm process, slated to commence production in 2025, marking the first order for the new Texas plant.
Regarding TSMC’s 4nm clients, alongside longstanding partners like Apple, Nvidia, Qualcomm, MediaTek, AMD, and Intel, indications propose a potential transition to TSMC’s 4nm process for Tensor G4, while Tensor G5 will be produced using TSMC’s 3nm process. Ending the current collaboration with Samsung, TSMC’s chip manufacturing debut is anticipated to be delayed until 2025.
Last year, rumors circulated about Tesla, the electric vehicle giant, shifting orders for the 5th generation self-driving chip, Hardware 5 (HW 5.0), to TSMC. This decision was prompted by Samsung’s lagging 4nm process yield at that time. However, with Samsung’s improved yield, industry inclination leans towards splitting orders between the two companies.