News
The fusion of AIGC with end-user devices is highlighting the importance of personalized user experiences, cost efficiency, and faster response times in generative AI applications. Major companies like Lenovo and Xiaomi are ramping up their efforts in the development of edge AI, extending the generative AI wave from the cloud to the edge and end-user devices.
On October 24th, Lenovo hosted its 9th Lenovo Tech World 2023, announcing deepening collaborations with companies like Microsoft, NVIDIA, Intel, AMD, and Qualcomm in the areas of smart devices, infrastructure, and solutions. At the event, Lenovo also unveiled its first AI-powered PC. This compact AI model, designed for end-user applications, offers features such as photo editing, intelligent video editing, document editing, and auto task-solving based on user thought patterns.
Smartphone manufacturers are also significantly extending their efforts into edge AI. Xiaomi recently announced their first use of Qualcomm Snapdragon 8 Gen 3, significantly enhancing their ability to handle LLMs at the end-user level. Xiaomi has also embedded AI LLMs into their HyperOS system to enhance user experiences.
During the 2023 vivo Developer Conference on November 1st, vivo introduced their self-developed Blue Heart model, offering five products with parameters ranging from billions to trillions, covering various core scenarios. Major smartphone manufacturers like Huawei, OPPO, and Honor are also actively engaged in developing LLMs.
Speeding up Practical Use of AI Models in Business
While integrating AI models into end-user devices enhances user experiences and boosts the consumer electronics market, it is equally significant for advancing the practical use of AI models. As reported by Jiwei, Jian Luan, the head of the AI Lab Big Model Team from Xiaomi, explains that large AI models have gain attention because they effectively drive the production of large-scale informational content. This is made possible through users’ extensive data, tasks, and parameter of AI model training. The next step in achieving lightweight models, to ensure effective operation on end-user devices, will be the main focus of industry development.
In fact, generative AI’s combination with smart terminal has several advantages:
Users often used to complain about the lack of intelligence in AI devices, stating that AI systems would reset to a blank state after each interaction. This is a common issue with cloud-based LLMs. Handling such concerns at the end-user device level can simplify the process.
In other words, the expansion of generative AI from the cloud to the edge integrates AI technology with hardware devices like PCs and smartphones. This is becoming a major trend in the commercial application and development of large AI models. It has the potential to enhance or resolve challenges in AI development related to personalization, security and privacy risks, high computing costs, subpar performance, and limited interactivity, thereby accelerating the commercial use of AI models.
Integrated Chips for End-User Devices: CPU+GPU+NPU
The lightweight transformation and localization of AI LLMs rely on advancements in chip technology. Leading manufacturers like Qualcomm, Intel, NVIDIA, AMD, and others have been introducing products in this direction. Qualcomm’s Snapdragon X Elite, the first processor in the Snapdragon X series designed for PCs, integrates a dedicated Neural Processing Unit (NPU) capable of supporting large-scale language models with billions of parameters.
The Snapdragon 8 Gen 3 platform supports over 20 AI LLMs from companies like Microsoft, Meta, OpenAI, Baidu, and others. Intel’s latest Meteor Lake processor integrates an NPU in PC processors for the first time, combining NPU with the processor’s AI capabilities to improve the efficiency of AI functions in PCs. NVIDIA and AMD also plan to launch PC chips based on Arm architecture in 2025 to enter the edge AI market.
Kedar Kondap, Senior Vice President and General Manager of Compute and Gaming Business at Qualcomm, emphasizes the advantages of LLM localization. He envisions highly intelligent PCs that actively understand user thoughts, provide privacy protection, and offer immediate responses. He highlights that addressing these needs at the end-user level provides several advantages compared to solving them in the cloud, such as simplifying complex processes and offering enhanced user experiences.
To meet the increased demand for AI computing when extending LLMs from the cloud to the edge and end-user devices, the integration of CPU+GPU+NPU is expected to be the future of processor development. This underscores the significance of Chiplet technology.
Feng Wu, Chief Engineer of Signal Integrity and Power Integrity at Sanechips/ZTE, explains that by employing Die to Die and Fabric interconnects, it is possible to densely and efficiently connect more computing units, achieving large-scale chip-level hyperscale computing.
Additionally, by connecting the CPU, GPU, and NPU at high speeds in the same system, chip-level heterogeneity enhances data transfer rates, reduces data access power, increases data processing speed, and lowers storage access power to meet the parameter requirements of LLMs.
(Image: Qualcomm)
In-Depth Analyses
In just a short span of six months, AI has evolved from generating text, images, music, and code to automating tasks and producing agents, showcasing astonishing capabilities. TrendForce has issued a new report titled “Surge in AIGC Applications to Drive Long-Term Demand for AI Servers.” Beyond highlighting the latest developments in AI, the report also delves into strategies adopted by governments and industries to ensure the positive trajectory of AI’s development. It analyzes the projected timeline for the widespread implementation of AIGC applications and their impact on the demand for AI servers.
While the AIGC application frenzy in the first half of 2023 has raised concerns, it has also prompted governments and industries to actively address potential risks and issues stemming from AIGC applications, along with devising corresponding solutions. Currently, both the government and industries have strategies in place to regulate AIGC applications in terms of legal oversight, privacy protection, identity establishment, reliability enhancement, security augmentation, and copyright maintenance.
Considering the time required for governments to draft legislation and industries to enhance AI’s reliability, security, and copyright protection, it is estimated that the rules of the AIGC application will gradually solidify by late 2024 to early 2025, paving the way for the AIGC application surge around 2025.
Beyond the five major categories of AIGC applications—text generation, image generation, music generation, video generation, and code generation—AIGC technology-based applications like AI customer service, personalized AI assistants, AI search, and AI productivity tools are also gaining prominence. In the realm of gaming, whether in VR or open-world games, AIGC technology is set to significantly enhance immersion and freedom, ushering in revolutionary experiences.
To secure a dominant position in the AI technology industry and embrace the upcoming AIGC application wave, application service providers, tech giants, national institutions, and startups are competing to bolster their AI computing resources. As core computing components experience increased shipments, the shipment volume of AI servers, which serve as foundational computing units, is also expected to surge.
In the proactive year of 2023, where institutions and enterprises are aggressively building computing resources, the AI server shipment volume is projected to grow substantially. Given the limited upstream semiconductor capacity, this momentum is likely to extend into 2024.
By 2025, propelled by the AIGC application frenzy, AI server shipments are poised for further stimulation. Consequently, due to institutions and businesses preemptively establishing computing resources and the projected timeline for large-scale AIGC application implementation, the AI server market is anticipated to witness a sustained demand surge. Given the intricate manufacturing of AI servers and their higher degree of customization, their profitability exceeds that of general servers. With the continual growth in AI server shipments, relevant server brands and ODM manufacturers are poised to reap significant benefits.
Press Releases
Over the past few decades, semiconductor manufacturing technology has evolved from the 10,000nm process in 1971 to the 3nm process in 2022, driven by the need to increase the number of transistors on chips for enhanced computational performance. However, as applications like artificial intelligence (AI) and AIGC rapidly advance, demand for higher core chip performance at the device level is growing.
While process technology improvements may encounter bottlenecks, the need for computing resources continues to rise. This underscores the importance of advanced packaging techniques to boost the number of transistors on chips.
In recent years, “advanced packaging” has gained significant attention. Think of “packaging” as a protective shell for electronic chips, safeguarding them from adverse environmental effects. Chip packaging involves fixation, enhanced heat dissipation, electrical connections, and signal interconnections with the outside world. The term “advanced packaging” primarily focuses on packaging techniques for chips with process nodes below 7nm.
Amid the AI boom, which has driven demand for AI servers and NVIDIA GPU graphics chips, CoWoS (Chip-on-Wafer-on-Substrate) packaging has faced a supply shortage.
But what exactly is CoWoS?
CoWoS is a 2.5D and 3D packaging technology, composed of “CoW” (Chip-on-Wafer) and “WoS” (Wafer-on-Substrate). CoWoS involves stacking chips and then packaging them onto a substrate, creating a 2.5D or 3D configuration. This approach reduces chip space, while also lowering power consumption and costs. The concept is illustrated in the diagram below, where logic chips and High-Bandwidth Memory (HBM) are interconnected on an interposer through tiny metal wires. “Through-Silicon Vias (TSV)” technology links the assembly to the substrate beneath, ultimately connecting to external circuits via solder balls.
The difference between 2.5D and 3D packaging lies in their stacking methods. 2.5D packaging involves horizontal chip stacking on an interposer or through silicon bridges, mainly for combining logic and high-bandwidth memory chips. 3D packaging vertically stacks chips, primarily targeting high-performance logic chips and System-on-Chip (SoC) designs.
When discussing advanced packaging, it’s worth noting that Taiwan Semiconductor Manufacturing Company (TSMC), rather than traditional packaging and testing facilities, is at the forefront. CoW, being a precise part of CoWoS, is predominantly produced by TSMC. This situation has paved the way for TSMC’s comprehensive service offerings, which maintain high yields in both fabrication and packaging processes. Such a setup ensures an unparalleled approach to serving high-end clients in the future.
Applications of CoWoS
The shift towards multiple small chips and memory stacking is becoming an inevitable trend for high-end chips. CoWoS packaging finds application in a wide range of fields, including High-Performance Computing (HPC), AI, data centers, 5G, Internet of Things (IoT), automotive electronics, and more. In various major trends, CoWoS packaging is set to play a vital role.
In the past, chip performance was primarily reliant on semiconductor process improvements. However, with devices approaching physical limits and chip miniaturization becoming increasingly challenging, maintaining small form factors and high chip performance has required improvements not only in advanced processes but also in chip architecture. This has led to a transition from single-layer chips to multi-layer stacking. As a result, advanced packaging has become a key driver in extending Moore’s Law and is leading the charge in the semiconductor industry.
(Photo credit: TSMC)
Press Releases
High Bandwidth Memory (HBM) is emerging as the preferred solution for overcoming memory transfer speed restrictions due to the bandwidth limitations of DDR SDRAM in high-speed computation. HBM is recognized for its revolutionary transmission efficiency and plays a pivotal role in allowing core computational components to operate at their maximum capacity. Top-tier AI server GPUs have set a new industry standard by primarily using HBM. TrendForce forecasts that global demand for HBM will experience almost 60% growth annually in 2023, reaching 290 million GB, with a further 30% growth in 2024.
TrendForce’s forecast for 2025, taking into account five large-scale AIGC products equivalent to ChatGPT, 25 mid-size AIGC products from Midjourney, and 80 small AIGC products, the minimum computing resources required globally could range from 145,600 to 233,700 Nvidia A100 GPUs. Emerging technologies such as supercomputers, 8K video streaming, and AR/VR, among others, are expected to simultaneously increase the workload on cloud computing systems due to escalating demands for high-speed computing.
HBM is unequivocally a superior solution for building high-speed computing platforms, thanks to its higher bandwidth and lower energy consumption compared to DDR SDRAM. This distinction is clear when comparing DDR4 SDRAM and DDR5 SDRAM, released in 2014 and 2020 respectively, whose bandwidths only differed by a factor of two. Regardless of whether DDR5 or the future DDR6 is used, the quest for higher transmission performance will inevitably lead to an increase in power consumption, which could potentially affect system performance adversely. Taking HBM3 and DDR5 as examples, the former’s bandwidth is 15 times that of the latter and can be further enhanced by adding more stacked chips. Furthermore, HBM can replace a portion of GDDR SDRAM or DDR SDRAM, thus managing power consumption more effectively.
TrendForce concludes that the current driving force behind the increasing demand is AI servers equipped with Nvidia A100, H100, AMD MI300, and large CSPs such as Google and AWS, which are developing their own ASICs. It is estimated that the shipment volume of AI servers, including those equipped with GPUs, FPGAs, and ASICs, will reach nearly 1.2 million units in 2023, marking an annual growth rate of almost 38%. TrendForce also anticipates a concurrent surge in the shipment volume of AI chips, with growth potentially exceeding 50%.
In-Depth Analyses
OpenAI’s ChapGPT, Microsoft’s Copilot, Google’s Bard, and latest Elon Musk’s TruthGPT – what will be the next buzzword for AI? In just under six months, the AI competition has heated up, stirring up ripples in the once-calm AI server market, as AI-generated content (AIGC) models take center stage.
The convenience unprecedentedly brought by AIGC has attracted a massive number of users, with OpenAI’s mainstream model, GPT-3, receiving up to 25 million daily visits, often resulting in server overload and disconnection issues.
Given the evolution of these models has led to an increase in training parameters and data volume, making computational power even more scarce, OpenAI has reluctantly adopted measures such as paid access and traffic restriction to stabilize the server load.
High-end Cloud Computing is gaining momentum
According to Trendforce, AI servers currently have a merely 1% penetration rate in global data centers, which is far from sufficient to cope with the surge in data demand from the usage side. Therefore, besides optimizing software to reduce computational load, increasing the number of high-end AI servers in hardware will be another crucial solution.
Take GPT-3 for instance. The model requires at least 4,750 AI servers with 8 GPUs for each, and every similarly large language model like ChatGPT will need 3,125 to 5,000 units. Considering ChapGPT and Microsoft’s other applications as a whole, the need for AI servers is estimated to reach some 25,000 units in order to meet the basic computing power.
As the emerging applications of AIGC and its vast commercial potential have both revealed the technical roadmap moving forward, it also shed light on the bottlenecks in the supply chain.
The down-to-earth problem: cost
Compared to general-purpose servers that use CPUs as their main computational power, AI servers heavily rely on GPUs, and DGX A100 and H100, with computational performance up to 5 PetaFLOPS, serve as primary AI server computing power. Given that GPU costs account for over 70% of server costs, the increase in the adoption of high-end GPUs has made the architecture more expansive.
Moreover, a significant amount of data transmission occurs during the operation, which drives up the demand for DDR5 and High Bandwidth Memory (HBM). The high power consumption generated during operation also promotes the upgrade of components such as PCBs and cooling systems, which further raises the overall cost.
Not to mention the technical hurdles posed by the complex design architecture – for example, a new approach for heterogeneous computing architecture is urgently required to enhance the overall computing efficiency.
The high cost and complexity of AI servers has inevitably limited their development to only large manufacturers. Two leading companies, HPE and Dell, have taken different strategies to enter the market:
With the booming market for AIGC applications, we seem to be one step closer to a future metaverse centered around fully virtualized content. However, it remains unclear whether the hardware infrastructure can keep up with the surge in demand. This persistent challenge will continue to test the capabilities of cloud server manufacturers to balance cost and performance.
(Photo credit: Google)