Liquid Cooling - The Next Generation Cooling Technology for Data Centers and Server Clusters

Sep 19, 2024

Leave a message

 

I Overview of the Liquid Cooling Industry

 

1. Development History of Liquid Cooling

In terms of the history of liquid-cooled servers, IBM in 1967 became the first to develop the System360 computer with a cold water cooling system.

 

Although China's liquid cooling industry developed relatively late, its technology has rapidly advanced. In 2011, Sugon pioneered the liquid cooling industry in China. Over the next five years, Chinese companies such as Huawei, Inspur, and Sugon began to enter the market and ramp up production. Since 2019, liquid cooling technology has made significant advancements across major manufacturers.

 

Development history ofliquid coolingindustry at home and abroad

▲ Development History Ofliquid Coolingindustry at Home and Abroad

 

2. Liquid-Cooled Server Industry Chain

The liquid cooling industry ecosystem encompasses the upstream, midstream, and downstream sectors, including upstream component suppliers, midstream liquid-cooled server providers, and downstream computing power users.

Upstream: Mainly product components and liquid cooling equipment suppliers, such as quick connectors, CDUs, solenoid valves, immersion liquid cooling tanks, manifolds, and coolants.

 

Midstream: Primarily liquid-cooled server and chip manufacturers, as well as liquid cooling integrated facilities, modules, and cabinets.

 

Downstream: Includes the three major telecom operators, internet companies such as Baidu, Alibaba, Tencent, and JD, as well as clients in the information industry for telecom, internet, government, finance, transportation, and energy applications.

 

Liquid cooling server industry chain combing

▲ Liquid Cooling Server Industry Chain Combing

 

 

II Basic Concepts of Liquid Cooling

 

1. Classification of Liquid Cooling Technology

Liquid cooling uses liquid as the coolant, circulating it to transfer heat from internal components of IT equipment in data centers to the outside, ensuring safe operation.

 

Advantages of Liquid Cooling: It offers ultra-high efficiency and heat density, efficiently dissipates heat, and is not affected by altitude, geography, or temperature.

 

Currently, there are three main types of liquid cooling technology: cold plate liquid cooling, spray cooling, and immersion liquid cooling.

 

Classification of Liquid Cooling Technologies

▲ Classification of Liquid Cooling Technologies

 

2. Comparison of Three Liquid Cooling Technologies - Cold Plate Liquid Cooling

Cold plate liquid cooling uses a closed cavity made of thermally conductive metals (like copper or aluminum) to indirectly transfer heat from the heat-generating components to the liquid coolant circulating in a closed loop. The system typically consists of a cooling tower, CDU, primary and secondary liquid cooling pipelines, coolant, and liquid-cooled cabinets. Liquid-cooled cabinets contain components like cold plates, internal piping, fluid connectors, and manifolds.

 

Cold plate liquid cooling, as a non-contact liquid cooling method, has more than 10 years of research behind it and is the most mature technology among the three mainstream liquid cooling solutions. It is an effective application for deploying high-power equipment, improving energy efficiency, reducing cooling operational costs, and lowering total cost of ownership (TCO). However, since it does not achieve 100% liquid cooling, it is less efficient when cabinet power consumption is low or the proportion of liquid cooling is minimal. Additionally, the design of cold plates must consider existing component layouts, making structural design and implementation more complex, and standardization more difficult to promote.

 

Principle Diagram of Cold Plate Liquid Cooling System

▲ Principle Diagram of Cold Plate Liquid Cooling System

 

3. Comparison of Three Liquid Cooling Technologies - Immersion Liquid Cooling

Immersion liquid cooling involves completely submerging heat-generating components in a coolant, allowing direct contact between the components and the coolant for heat exchange. The system's outdoor components include a cooling tower, primary pipeline, and primary coolant, while indoor components include a CDU, immersion chamber, IT equipment, secondary pipeline, and secondary coolant. Since IT equipment is fully submerged in the secondary coolant, non-conductive liquids such as mineral oil, silicone oil, or fluorinated liquids must be used.

 

Based on whether there is a phase change during heat exchange, immersion liquid cooling can be divided into two types:

 

1) Single-phase immersion cooling: The secondary coolant only undergoes a temperature change during heat transfer, without a phase change, and the heat is transferred entirely through sensible heat.

2) Two-phase immersion cooling: The secondary coolant undergoes a phase change during heat transfer, using latent heat to transfer the heat.

 

Compared to traditional air cooling and cold plate liquid cooling, immersion liquid cooling offers advantages such as energy savings (PUE < 1.13), compactness, high reliability, and low noise. However, it also faces challenges such as component selection limitations, maintenance restrictions, and specific room environment requirements.

 

 Two-phase immersion cooling

▲ Two-phase Immersion Cooling

 

4. Comparison of Three Liquid Cooling Technologies - Spray Cooling

Spray cooling is a direct contact method where coolant is precisely sprayed onto chip-level components through gravity or system pressure, cooling the heat-generating components or the thermally conductive elements connected to them. The system typically consists of a cooling tower, CDU, primary and secondary liquid cooling pipelines, coolant, and spray-cooled cabinets. Spray-cooled cabinets usually contain a piping system, liquid distribution system, spray modules, and return system.

 

Spray cooling also achieves 100% liquid cooling, and its structure is more innovative than immersion cooling. However, its energy-saving performance is inferior to immersion liquid cooling, and it shares similar limitations with immersion cooling.

 

Spray cooling

▲ Spray Cooling

 

5. Comparison of Three Liquid Cooling Technologies

 

Comparison of Data Center Liquid Cooling Methods

▲ Comparison of Data Center Liquid Cooling Methods

 

 

III Drivers of Liquid Cooling Industry Development

 

1. The Surge in Data Volumes is Driving the Continuous Upgrade of Computing Power

Global data volume and computing power are rapidly growing. According to IDC data, the global data sphere reached 103.66ZB in 2022, while China's data volume will grow from 23.88ZB in 2022 to 76.6ZB in 2027, with a CAGR of 26.3%, potentially ranking as the fastest growth rate worldwide. IDC predicts that over the next three years, the newly generated global data will exceed the total data created over the past 30 years, leading to exponential growth in the computing power required for data storage, transmission, and processing.

 

The intelligent upgrade of computing power is becoming a trend, with intelligent computing driving most of the growth. The demand for processing large volumes of complex data is driving the need for more powerful and efficient computing resources to support the development of artificial intelligence applications. As a result, computational infrastructure is being built at a faster pace, becoming the "foundation" for supporting the development of the digital economy. The demand for data capabilities and computing power is reinforcing each other. IDC predicts that China's intelligent computing power will continue to grow rapidly, reaching 1,117.4 EFLOPS by 2027, with a CAGR of 33.9% from 2022 to 2027.

 

Global Data Volume Scale and Forecast

▲ Global Data Volume Scale and Forecast

 

2. AIGC's Leap Forward is Driving a Surge in Computing Power Demand

With the continuous evolution of models and algorithms, the scale and complexity of parameters have significantly increased, resulting in higher computing power requirements. The emergence of large generative models, represented by ChatGPT and GPT-4, has driven rapid development in the AIGC field, further driving the surge in computing power demand.

 

Training Side: The GPT-3 model contains about 174.6 billion parameters, and training it once requires approximately 3,640 PF-days (running for 3,640 days at 10 petaflops per second). The number of parameters in GPT-4 could increase to 1.8 trillion, with the training demand rising to 68 times that of GPT-3, requiring 90-100 days of training on 25,000 A100 GPUs.

Inference Side: For GPT-3, according to estimates by Tianyi Think Tank, the computational demand for generating 500 tokens (about 350 words) reaches 1.75 PFLOPS.

 

 Evolution of Large Model Parameters (2018-2023)

▲ Evolution of Large Model Parameters (2018-2023)


3. The Data Center Market Continues to Expand

As an important part of new infrastructure, in recent years, with the rapid development of the artificial intelligence and data application industry chain, the construction of data centers in China has accelerated, and the number of data center racks has steadily increased. Based on a standard rack of 2.5kW, the number of in-use data center racks in China reached 5.2 million in 2021. Among these, the number of large-scale racks grew even faster, reaching 4.2 million, accounting for 80%. By the end of 2022, the total number of data center racks in China approached 6 million, ranking among the top globally. It is expected that by 2023, the number of data center racks in China will reach 7.76 million, with the data center market size reaching 247 billion RMB.

 

Prediction Trend ofTotal Number of Data Center Racks in China(2017-2023) (Unit. 10,000 Racks)

▲ Prediction Trend of Total Number of Data Center Racks in China(2017-2023) (Unit. 10,000 Racks)

 

4. Energy Consumption and Heat Dissipation Issues in Data Centers Are Becoming More Prominent

With the increasing number of data centers, their electricity consumption has also soared. According to relevant statistics, in 2021, China's data centers consumed 216.6 billion kWh, and it is expected to exceed 380 billion kWh by 2030. At the same time, the rapidly growing scale of data centers is exacerbating heat dissipation challenges:

 

Overall level: Traditional data centers have enormous energy costs, with a high proportion of energy consumption attributed to cooling. Data centers have long been energy-intensive, with national data center electricity consumption accounting for about 2%-3% of total electricity consumption. By 2030, data center electricity consumption is expected to exceed 380 billion kWh, with carbon emissions surpassing 200 million tons. At the same time, traditional data centers incur significant cooling costs. According to the "Uptime Institute Global Data Center Survey Report 2022," the average annual Power Usage Effectiveness (PUE) of global data center samples in 2022 was 1.55, and since 2014, the average annual PUE has remained between 1.55-1.65, indicating that energy consumption related to cooling accounts for 35%-39%.

 

Micro level: Increasing computing density faces cooling challenges. The improvement in computing performance drives up server power consumption and thermal density, and traditional air cooling can no longer meet the heat dissipation needs of high-thermal-density electronic devices. As Moore's Law fades, people are continuously improving the energy efficiency ratio of chips and systems through technologies such as heterogeneous computing, but this has also led to a rapid increase in the power consumption of individual chips. Currently, the mainstream processor chips have a power consumption of about 200W, with some newly released CPUs exceeding 350W, and heterogeneous acceleration chips like GPGPUs have even surpassed 700W. In this context, traditional air cooling is no longer sufficient to meet the cooling needs, and data centers and servers require more efficient cooling technologies to address the heat dissipation issues of high-power, high-thermal-density, and high-computing-density chips and systems.

 

Micro level: High temperatures negatively impact electronic components. In high-temperature environments, machine materials, wire insulation, and waterproof seals are more prone to aging, creating safety hazards. More than half of electronic component failures are caused by high temperatures. When the temperature of semiconductor components rises by 10°C, the reverse leakage current doubles, increasing the risk of fire and making safety incidents more likely, potentially leading to data center paralysis.

 

5. Rapid Increase in Single-Rack Power Density Calls for a Cooling Revolution in Data Centers

Limited by data center construction area and environmental regulations, increasing single-rack power density has become a key solution to reconcile the growing demand for computing power with the limited capacity of data centers. According to data released by Colocation America, in 2020, the global average single-rack power of data centers reached 16.5kW, a 175% increase compared to 2008. According to CCID Consulting, with the rapid increase in data center computing power, high-power single racks will become widespread. It is expected that by 2025, the global average single-rack power of data centers will reach 25kW.

 

Liquid cooling technology, with its highly efficient cooling effect, can significantly improve the efficiency and stability of servers, while allowing more servers to be arranged in a given data center space, thereby improving data center operational efficiency.

 

Correspondence Between Rack Density and Cooling Methods

▲ Correspondence Between Rack Density and Cooling Methods

 

 

IV Liquid Cooling Market Outlook and Application

 

1. China's Liquid Cooling Server Market is Expected to Reach $8.9 Billion by 2027

As the green development of data centers becomes a trend, and the competition in the field of artificial intelligence intensifies, leading to a leap in demand for high-performance computing power, the liquid cooling server market in China has seen explosive growth in recent years.

 

According to IDC data, the scale of China's liquid cooling server market reached $1.01 billion in 2022, a year-on-year growth of 189.9%. In 2023, the market is expected to maintain rapid growth, with the market size projected to reach $1.51 billion. By 2027, the size of China's liquid cooling server market is expected to reach $8.9 billion.

 

Market Size and Forecast of Liquid Cooling Servers in China (Unit: 100 Million USD)

▲ Market Size and Forecast of Liquid Cooling Servers in China (Unit: 100 Million USD)

 

2. Industry Application Structure of Liquid Cooling Data Centers in China

In combination with air cooling, liquid cooling data centers empower the development of various industries. In the future, the data center cooling market will see a collaborative development pattern of "air cooling + liquid cooling." Air cooling technology will not be completely replaced by liquid cooling but will be selected according to the different needs of customers, with different cooling solutions for data centers.

 

In 2019, liquid cooling data centers were mainly used in applications represented by supercomputing. As the business volumes of the internet, finance, and telecommunications industries grow rapidly, the demand for liquid cooling in data centers in these industries will continue to increase. It is expected that by 2025, liquid cooling data centers will account for 24.0% of the internet industry, 25.0% of the financial industry, and 23.0% of the telecommunications industry. Meanwhile, industries such as energy, biotechnology, healthcare, and government will see an acceleration in the integration of liquid cooling data centers into the new ecosystem of general-purpose data centers, with a slight decline in overall scale. It is expected that by 2025, liquid cooling data centers will account for 10.5% of the energy industry, 8.5% of the biotechnology industry, 6.5% of the healthcare industry, and other businesses such as government will drop to 2.5%.

 

Industry Application Structure and Forecast for Liquid Cooling Data Centers in China (2019-2025)

▲ Industry Application Structure and Forecast for Liquid Cooling Data Centers in China (2019-2025)

 

3. Competitive Landscape of China's Liquid Cooling Server Market: Sugon Leads, Followed by Huawei, Alibaba, and Others

Domestic manufacturers, led by Sugon, have accumulated substantial commercial experience, and based on product revenue, market share, customer feedback, and other indicators, Sugon is a key market leader, with Huawei, Alibaba, and Lenovo following closely behind. IBM China is positioned as a potential challenger.

 

The liquid cooling server industry in China has high technical barriers, and first-movers have an advantage. Currently, major domestic manufacturers are still in the experimental or initial application stages of liquid cooling technology, and the competitive landscape has not yet been clearly defined. In addition, due to data security concerns, there are certain geographical barriers to the supply of data center infrastructure in China, making it difficult for foreign manufacturers to enter the Chinese market.

 

Competitiveness Matrix of Liquid Cooling Data Center Vendors in China (2020)

▲ Competitiveness Matrix of Liquid Cooling Data Center Vendors in China (2020)

 

4. Telecom Operators Propose a Three-Year Vision for Liquid Cooling in Data Centers

As industry leaders in the data center sector, the three major telecom operators are at the forefront of liquid cooling technology exploration and application. In June 2023, the three major operators jointly released the "White Paper on Liquid Cooling Technology for Telecom Operators (2023)" and outlined a three-year vision and roadmap for liquid cooling applications.

 

The overall goal of the three-year vision is to bring together industry forces, tackle challenges, build an ecosystem, and expand applications by uniting upstream and downstream sectors in industry, academia, and research. The focus will be on attacking key core technologies with originality and leadership, and on fully building a high-level liquid cooling ecosystem. The goal is to establish an open ecosystem, promote the decoupling of liquid cooling cabinets and servers, and lead the formation of unified standards that reduce PUE (Power Usage Effectiveness) while achieving the lowest TCO (Total Cost of Ownership). Leveraging scale advantages, they aim to significantly expand applications.

 

In terms of the implementation timeline, the three major operators have laid out detailed plans for 2023-2025.

 

  • 2023: Conduct technical verification, thoroughly testing liquid cooling performance, reducing PUE, and preparing planning, construction, and maintenance technical capabilities;
  • 2024: Conduct large-scale testing, promote the decoupling of liquid cooling cabinets and servers, encourage competition, mature the industry ecosystem, and reduce total lifecycle costs;
  • By 2025: Achieve large-scale application, with liquid cooling technology being used in more than 50% of projects, and jointly promote the formation of a unified, standardized, cost-optimal, and widely applied liquid cooling ecosystem. The telecommunications industry aims to become a leader in liquid cooling technology, a leader in the industrial chain, and a leader in promoting applications.

 

Three-Year Vision for Liquid Cooling by Telecom Operators

▲ Three-Year Vision for Liquid Cooling by Telecom Operators

 

 

V Conclusion

 

Great Development Potential in the Liquid Cooling Industry, Along with Challenges and Opportunities

Currently, China's liquid cooling industry is in its early stages, and the penetration rate of liquid cooling applications remains relatively low. However, as the next-generation cooling technology for data centers and server clusters, the market potential and future prospects of liquid cooling have attracted considerable attention, with application penetration expected to increase rapidly.

 

Nevertheless, the liquid cooling industry in China still faces several development challenges:

 

1. Immature Industry Ecosystem

Although liquid cooling technology has been developing for more than a decade both domestically and internationally, the ecosystem remains incomplete. Products vary widely, and standardization is lacking. Currently, there is no unified interface standard for servers and cabinets in the industry. Cabinets and servers are tightly coupled, and the product formats of different manufacturers, such as servers, coolants, refrigeration pipelines, and power supply systems, differ, making product interfaces incompatible with one another. This limits competition and hinders high-quality industry development.

 

2. Evolving Liquid Cooling System Architecture

The liquid cooling system architectures in the industry differ, with variations between distributed and centralized cooling and power supply setups. Some manufacturers' servers have evolved into high-temperature servers, which allow for the reduction of water chiller units, further simplifying the cooling source architecture and promoting cost reduction and efficiency.

 

3. High Costs of Liquid Cooling Systems

Compared to traditional air cooling products, liquid cooling still presents challenges in terms of high initial investment and total lifecycle costs. This issue could affect the large-scale adoption and promotion of liquid cooling products in the early stages.

 

 

 

Send Inquiry