Liquid-Cooled AI Servers Face Bottlenecks

Sep 30, 2024

Leave a message

 

 

 

With the increase in shipments of Blackwell chips, customers' willingness to adopt liquid cooling will also rise.

Industry insiders report that the supply of universal quick-disconnect devices (UQD) for liquid cooling solutions is becoming tight, which may become a major bottleneck for the growth of AI server liquid cooling.

 

Server ODM manufacturers indicate that Nvidia's Blackwell AI chips, including B100 and B200, will begin shipping this year, while the GB200 solution won't enter mass production until the end of 2024 or 2025. Most customers of B100 and B200 still use air cooling designs, but ODM manufacturers report a continued increase in liquid cooling penetration, anticipating that customers' willingness to adopt liquid cooling will also rise with the increase in Blackwell chip shipments.

 

Liquid-Cooled AI Servers

▲ Liquid-Cooled AI Servers

 

 

I Expanding capacity to meet the rising demand for liquid cooling

 

Relevant companies are expanding production to welcome the new era of liquid cooling. Thermal module manufacturer Amlogic plans to increase its monthly production capacity of water cooling plates from 30,000 to 300,000 units.

 

Auras has set up a new factory in Thailand to address customers' geopolitical concerns and demands. This factory is expected to begin mass production in the third quarter. In addition to expanding local production capacity for cooling plates, Auras also plans to produce cooling distribution units (CDU) and cooling fluid distribution manifolds (CDM) locally, with a planned monthly capacity of approximately 2,000-3,000 sets.

 

Thermal module manufacturer AVC stated in its recent earnings call that its cooling plate module monthly production capacity in China and Vietnam is about 115,000 units, which translates to a monthly output of around 420,000 units when calculated by cooling plates.

 

AVC plans to expand its capacity by 50% by the end of the year. AVC also plans to increase its monthly production capacity for CDUs to 1,000 units and for CDMs to 30,000 sets. AVC emphasized that this planned capacity can be flexibly adjusted based on customer order demands.

 

Thermo Technology Company Goli Thermal Processing is expanding its capacity at the Zhongli plant in Taiwan due to increased customer demand for liquid cooling. By the end of the third quarter, the monthly capacity for CDMs is expected to rise from 1,000 units to 2,000 units, and by the end of the year to 4,000 units, with annual CDU production also expected to reach 2,000 units by year-end.

 

These manufacturers have high expectations for liquid cooling demand, mainly due to the computational efficiency and data center PUE standards in China and the EU, but the most significant factor is Nvidia lifting self-imposed restrictions on thermal specifications for chip manufacturers.

 

 

II Rapid growth of liquid cooling leads to UQD shortages

 

As the industry eagerly anticipates the arrival of the liquid cooling era, UQD has become the biggest bottleneck to growth. Thermal module manufacturers have pointed out that the supply of UQDs has recently become tight. Although the current market share of liquid cooling is only in single digits, if it rises to double digits in the future, UQDs may become difficult to obtain.

 

UQD suppliers primarily come from Europe and the United States, such as major American companies Parker Hannifin and CPC, Switzerland's Staubli International, Denmark's Danfoss, and Sweden's Cejn. Taiwanese connector component giant Lotes is also actively entering the market and has begun sample deliveries.

 

Anbo Technology's Chairman, Liang Zhijian, pointed out that since liquid cooling primarily aims to avoid leaks, and UQDs are the components most prone to leaking, UQD supply is the most constrained among liquid cooling components. This is not only a technical issue; relevant manufacturers also have patent protections, and Anbo Technology is researching ways to break through these patent barriers.

 

Industry insiders have noted that while UQD manufacturers have patent protections, they must also go through multiple validations, including OCP certification and client validations, which are time-consuming and labor-intensive. Moreover, existing European and American manufacturers have no intention of expanding capacity, which will become a major bottleneck for the rapid development of liquid cooling.

 

Supermicro is one of the fastest-growing liquid cooling manufacturers. Founder and CEO Liang Jianhou pointed out that liquid cooling has accounted for only 1% of the server market over the past 30 years, but he estimates that by 2025, the penetration rate will jump to 30%.

 

 

III Liquid cooling as a potential solution; AI boom puts pressure on the grid

 

The rapid development of generative artificial intelligence has driven unprecedented expansion in data centers, raising concerns about their impact on the power grid. These energy-intensive facilities could lead to blackouts and increased energy costs. According to estimates from the Electric Power Research Institute, by 2030, data centers could consume 9% of the United States' electricity, doubling the current figure. The electricity consumption of a large data center is equivalent to that of hundreds of thousands of households.

 

The increasing power demand from artificial intelligence is particularly concerning. Early AI models consumed ten times the energy of a Google search, while newer chips require even more power. Experts warn that the future development of artificial intelligence may be limited by our ability to generate sufficient power.

 

Some countries face severe challenges. For example, by 2026, Ireland may have 30% of its electricity dedicated to data centers. In the U.S., data center electricity consumption is concentrated in 15 states, with Texas and Virginia being the highest. California is in a critical situation, with new data centers expected to generate electricity demands exceeding those of nuclear power plants.

 

 

IV Data Centers: Huge Energy Consumption and the Rise of Liquid Cooling

 

The computational demands of artificial intelligence are driving up server temperatures and carbon emissions, resulting in a significant increase in demand for cooling systems. Cooling systems account for 40% of a data center's total power consumption, making them the second-largest source of power consumption after the servers themselves. The global server cooling market is projected to grow from $20 billion in 2024 to $90 billion by 2027. The market for liquid cooling systems in data centers is expected to increase from 1% to 22%, with market value projected to grow from $317 million to $7.8 billion in the next three years.

 

Liquid cooling solutions, which use water or coolants to cool servers, are becoming increasingly popular. New technologies include immersion cooling (immersing entire server racks in non-conductive liquids) and direct liquid cooling (circulating water around servers). Although currently more expensive than air cooling systems, liquid cooling can reduce a data center's power consumption by 10% or more.

 

Research firm Global Market Insights predicts that the global liquid cooling market for data centers will grow from $2.1 billion in 2022 to $12.2 billion by 2032. A survey by the Uptime Institute found that 16% of data center managers believe liquid cooling will become the primary cooling method in 1-3 years, while 41% think it will take 4-6 years. Therefore, hybrid cooling methods are more likely to emerge in the short term.

 

Upsite Technologies, a leader in the management of air cooling systems for data centers, noted that while technology is continually advancing, achieving 100% liquid-cooled data centers is unlikely in the short term, as liquid cooling equipment still requires air cooling for heat dissipation. Although liquid cooling is more efficient, it is challenging to implement on a large scale and requires significant upfront investment. Air cooling is less costly but also less efficient. Thus, hybrid cooling facilities are becoming increasingly popular to maximize the benefits of both liquid and air cooling.

 

 

V The Data Center Energy Crisis Calls for Urgent Action

 

Due to their environmental impact, data centers are receiving increasing scrutiny. Governments worldwide are implementing regulations to control their energy consumption and carbon footprint. China's "Green Data Center" guidelines and similar initiatives in Germany, Singapore, and Japan exemplify this trend. Industry experts such as Schneider Electric emphasize the need for comprehensive environmental metrics to assess the sustainability of data centers, including factors beyond energy use, such as water resources and waste generation.

 

The U.S. government is pressuring large tech companies to invest in clean energy and recognize the significant impact that the growing power demands of artificial intelligence have on the environment.

 

 

VI Finding Suitable Power Sources: Data Center and Energy Challenges

 

Data centers need a diversified energy mix to balance reliability and sustainability while meeting growing demands. Renewable energy sources like solar and wind are attractive due to their low carbon footprints. However, their dependence on weather conditions can lead to unstable output, making them unsuitable as the sole power source for data centers. Building redundant facilities to compensate for this inconsistency may be necessary but costly.

 

Nuclear power has emerged as a potential solution. Traditional nuclear power plants provide reliable base-load power, generating stable electricity crucial for data center operations. Moreover, the global nuclear power market is expected to achieve steady growth over the next decade.

 

Innovations in the nuclear sector offer more promising possibilities. Small modular reactors (SMR) are being developed as smaller, safer, and more scalable alternatives to traditional nuclear power plants. Although still in the research and development stage, SMRs have the potential to be deployed directly at data centers, providing dedicated clean energy.

 

Global liquid cooling market (US$b)

▲  Global liquid cooling market (US$b)

 

 

However, the widespread adoption of SMRs faces significant obstacles. Regulatory and manufacturing challenges could delay their commercial deployment by several years. The U.S. government is actively exploring solutions, including partnerships with tech giants to reduce costs and streamline processes.

 

Another way to reduce data center energy demands is to optimize artificial intelligence workloads. By shifting some AI tasks from the cloud to local devices with smaller, less resource-intensive AI models, overall energy consumption can be reduced.

 

The future development of data centers requires a multifaceted approach. Adopting a diversified energy mix, including reliable energy sources like nuclear power, while actively seeking renewable and innovative solutions like SMRs, is essential. Additionally, optimizing AI workloads on local devices can further promote the sustainability of data centers.

 

 

 

Send Inquiry