Is Liquid Cooling the Ultimate Cooling Solution for AI Servers?
Aug 20, 2024
Leave a message
I Traditional Air Cooling is Overburdened; Liquid Cooling Technology Emerges
The three pillars of AI are chips, computing power, and algorithms. Chips and computing power are the productivity drivers, and to achieve extraordinary computing power, people continuously seek faster processors and more efficient hardware. As AI computing power increases and the performance of CPUs and GPUs improves, chip power consumption and heat generation rise accordingly. For example, NVIDIA's H200 series GPU chips have a TDP (Thermal Design Power) of up to 700W.
Traditional air cooling has its limits. The principle of air cooling is to let air flow over the surface of heat sinks to dissipate heat. Currently, the best air coolers on the market can handle up to 275W of power. Traditional air cooling can no longer meet the cooling requirements of mainstream AI servers.
Liquid cooling technology has emerged. The principle of liquid cooling is to circulate a liquid coolant within the server, absorbing heat from the heat-generating components through thermal exchange. Liquid cooling offers higher cooling efficiency and allows precise control of the coolant to adapt to different working environments, though it comes with higher costs.
Liquid cooling has gained official recognition. In June 2023, China's three major telecom operators jointly released the "White Paper on Liquid Cooling Technology for Telecom Operators (2023)," clearly promoting two technical routes: cold plate liquid cooling and single-phase immersion liquid cooling. They also outlined a promotion plan for 2023-2025:

▲ A promotion plan for 2023-2025
II What is Liquid Cooling?
Liquid cooling is a cooling method that uses liquid as the coolant to transfer the heat generated by the internal components of IT equipment in data centers to the outside, ensuring that the heat-generating parts of IT equipment operate within a safe temperature range. The mainstream liquid cooling solutions are mainly divided into cold plate and immersion types. Immersion liquid cooling can be further divided into phase-change and non-phase-change immersion cooling, depending on whether the coolant undergoes a phase change.

▲ Liquid cooling technology
Cold plate liquid cooling works by using a heat sink made of metals like copper or aluminum, which have good thermal conductivity. The heat generated by components is transferred to the liquid coolant within the heat sink, which then carries the heat away.
Immersion liquid cooling is a direct-contact liquid cooling method where IT equipment, such as servers (including CPUs, memory, I/O, etc.), is fully submerged in the cooling liquid. Cooling occurs through the circulation of the liquid. The phase-change immersion cooling technology, which is developing rapidly, works by cooling the heat-generating components, causing the coolant to vaporize, and then condensing the vapor back into liquid form for reuse. This method offers higher cooling efficiency and lower noise.
III Should All Server Cooling Solutions Be Replaced with Liquid Cooling?
Obviously not. Data center cooling methods include both forced air cooling and liquid cooling. While air cooling technology is relatively mature, liquid cooling is a newer requirement driven by the increased cooling needs of data centers in recent years.
Currently, data centers primarily have three cooling solutions:
1. Designing new data centers that rely solely on liquid cooling, creating smaller, more efficient data centers with massive computing power.
2. Designing data centers that primarily use air cooling but include liquid cooling options to simplify future transitions.
3. Data center operators integrating liquid cooling into existing air-cooled facilities, often partially converting air cooling systems to liquid cooling.

▲ Recommended Cooling Methods for Different Types of Data Centers
IV What Challenges Does Liquid Cooling Face?
1. Cost Issues
The cost of liquid cooling systems includes high material costs (coolant, heat sinks, piping, etc.), as well as the complexity of design and manufacturing, high installation and maintenance costs, and the need for higher investments to achieve high reliability.
2. Reliability Issues
Stability: The chemical stability of the liquid coolant may be insufficient, leading to degradation, oxidation, or the production of harmful substances during use, affecting the cooling effect and system stability.
Leakage: In the event of liquid leaks, pipeline blockages, or other incidents, existing indirect liquid cooling systems may fail to detect faults in time, compromising server safety.
Compatibility: Some data center equipment may not be suitable for liquid cooling systems, requiring additional adaptation and modification.
V What Are the Upstream and Downstream Industries in Liquid Cooling?
The liquid cooling industry ecosystem includes upstream component suppliers, midstream liquid-cooled server providers, and downstream computing power users, with a focus on cold plate and immersion liquid cooling systems.

▲ Cold Plate Liquid Cooling System Principle

▲ Immersion Liquid Cooling System Principle
1. Upstream

▲ Upstream of Cold Plate Liquid Cooling System
Cold Plate Liquid Cooling System is mainly composed of CDU, cold source, coolant, liquid cooling pipelines, and liquid cooling cabinet. The immersion liquid cooling system mainly consists of CDU, cold source, liquid cooling pipelines, immersion chamber, IT equipment, and coolant.
Cooling Distribution Unit (CDU)
The CDU is used for heat exchange between the high-temperature coolant on the secondary side and the cold source on the primary side, providing cooling distribution for liquid-cooled IT equipment and managing temperature, pressure, and flow monitoring. It has functions such as heat exchange, circulation driving, coolant purification, and liquid storage. The CDU mainly consists of a heat exchanger/condenser, circulation pump, filter, liquid storage tank, and accessories (valves, pipelines, connectors, sensors, etc.).
Cold Source
The outdoor cold source of the liquid cooling system can be selected based on conditions from dry coolers, closed cooling towers, or chillers.
Liquid Cooling Pipelines
Pipelines are necessary components that connect the heat exchange cold plate, cooling distribution unit, heat exchange unit, and outdoor cold source. Generally, the circulation pipelines can be divided into direct connection (asynchronous) and loop connection (synchronous) types based on their different connection methods.
Coolant
Common coolants for cold plate systems include water, ethylene glycol, propylene glycol, etc. These coolants have high specific heat capacity and low viscosity, allowing them to quickly absorb the heat generated by equipment and transfer it to the heat exchanger for dissipation.
Common immersion coolants (insulating) include mineral oil, fluorocarbon compounds, etc. Among them, phase-change cooling can only use non-flammable fluorocarbon compounds. The compatibility between the coolant medium and the materials of various components is one of the key factors affecting the performance of the immersion phase-change liquid cooling system.
Currently, there are few types of fluorinated liquids available for immersion phase-change cooling, with the main supplier being 3M. China's immersion liquid cooling is still in its infancy.
Sealed Chamber
The chamber is used to contain the phase-change coolant. The chamber must be sealed at the cold plate, pipelines, connectors, and other parts to ensure safety without leakage, thereby ensuring the integrity of power and signal transmission and the safety of the data center. Therefore, custom sealing components are required, such as rubber gaskets, liquid outlet connectors, power connectors, signal connectors, etc.
Liquid Cooling Cabinet
The liquid cooling cabinet mainly consists of a liquid cooling plate, manifold, pipelines, connectors, quick connectors, liquid distributor, and optional compressor.
The liquid cooling plate acts as a heat transfer unit, equivalent to the evaporator in a refrigeration system, and is a key technology designed for the refrigeration system. It generally consists of a cover plate, fins, base plate, and seal.
The manifold is a device that connects the cooling distribution unit to the liquid cooling server cold plate. Typically built into the cabinet, it evenly distributes the coolant flow to each cold plate layer and collects the coolant after it absorbs heat, sending it through connecting pipes to the cooling distribution unit.
The quick connector comprises a main body and an end terminal. The main body is the primary unit for achieving the liquid quick connector's connection and sealing functions, while the end terminal serves as the installation port for mounting and securing the liquid quick connector.
Control System
The control system monitors and regulates the operation of the cooling system, including temperature and pressure sensors, which measure heat dissipation and adjust pump speed or coolant flow rate.
Filtration and Purification Devices
Filters are used to remove debris and particles from the liquid coolant, ensuring system performance and longevity. Methods such as UV sterilization or chemical treatment can purify the coolant.
2. Midstream
The midstream primarily consists of liquid cooling systems, where key components such as radiators, pumps, pipes, and coolants are integrated into a single system. These companies require a certain level of technical expertise and production capacity to ensure product quality and performance.
3. Downstream
Downstream users of computing power include data centers, supercomputers, AI servers, electronic devices, new energy vehicles, lasers, inverters, and other industrial equipment. Liquid cooling technology is even used in the aerospace field. Due to its high-efficiency cooling characteristics, liquid-cooled data centers and their infrastructure are mainly applied in high-performance computing fields. Currently, industries utilizing liquid cooling technology include the internet, finance, telecommunications, energy, biology, and healthcare.
Among these, the internet is the core customer of liquid-cooled data centers, with significant data demand, a wide user base, and large business volumes. The computing power requirements for data centers are high, with single-cabinet power densities exceeding 10kW.
VI At What Stage is the Industrialization of Liquid Cooling?
Currently, various validations of liquid-cooled data centers have been mostly completed, and the next three years will see the expansion of liquid cooling technology on a large scale. On the technical front, cold plate cooling remains the mainstream liquid cooling solution in the market; on the industrial development front, the full-stack delivery model, from infrastructure to AI equipment, has become a consensus in the industry; on the customer front, the internet and telecommunications sectors, the two largest data center industry application customers, have shown positive recognition of liquid cooling technology.
According to the 2023 semi-annual report disclosed by relevant industry companies, their liquid-cooled data centers have accumulated over 260MW of capacity, involving multiple industries such as technology, energy, government, finance, and the internet.
VII How Big is the Market for AI Liquid Cooling?
According to data from Caitong Securities, the market size of China's liquid-cooled data centers will grow from 6.999 billion yuan in 2022 to over 35.877 billion yuan in 2025, with the AI data center liquid cooling market size increasing from 5.552 billion yuan in 2022 to 27.964 billion yuan in 2025, representing a compound annual growth rate of 76.2%. Based on product classification, immersion liquid cooling products, with their higher cooling efficiency and the ability to effectively reduce the PUE (Power Usage Effectiveness) of data centers, are expected to capture a larger market share, growing from 18% in 2019 (with cold plate liquid cooling products accounting for 82%) to 41% in 2025.
As the market demands higher cooling and energy efficiency for data center infrastructure, the industry predicts that the penetration rate of liquid cooling products will reach 20% by 2025, with market share expanding continuously. The springtime of large-scale applications for AI liquid-cooled servers is quietly approaching!
VIII What Are the Well-Known Liquid Cooling Companies?
The major players in the overseas cold plate liquid cooling market include IBM, CoolIT Systems, Asetek, and Motivair, while the main players in the immersion liquid cooling market are GRC, LiquidStack, and Midas.
