Fully Liquid-Cooled Cold Plate Server
Aug 21, 2024
Leave a message
To further advance the development of liquid cooling technology and mature the ecosystem, Inspur Information, in collaboration with Intel, has focused on optimizing liquid cooling designs for general-purpose high-density servers.
In addition to the widespread industry adoption of CPU and GPU liquid cooling, in-depth exploration and research have been conducted on liquid cooling for high-power memory, solid-state drives (SSDs), OCP network cards, PSUs, PCIe cards, and optical modules.
This effort has resulted in achieving the highest liquid cooling coverage in the industry, meeting various deployment requirements for different levels of liquid cooling coverage, and providing general infrastructure capabilities and diverse technical support for customers in industries such as the internet and telecommunications.
This development of a fully liquid-cooled cold plate system is based on Inspur Information's 2U four-node high-density computing server i24. Each liquid-cooled node supports two Intel 5th Generation Xeon Scalable Processors, paired with 16 DDR5 memory modules, one PCIe expansion card, and one OCP 3.0 network card. The entire system can support up to eight SSDs, meeting the storage needs of customers while achieving high-density computing power. The main heat-generating components of the server include the CPU, memory, I/O cards, local hard drives, and the chassis power supply.
The liquid cooling solution enables approximately 95% of the system's heat to be directly removed by liquid through cold plate contact with the heat source. The remaining 5% of the heat is carried away by the cooling water in the air-liquid heat exchanger located behind the PSU, achieving nearly 100% liquid heat capture at the system level.
I System Composition and Pipeline Layout
1. Overview of the Fully Liquid-Cooled Server System
The 2U four-node fully liquid-cooled server system consists of nodes, chassis, mid-plane, and SSD modules. The connection between nodes and chassis components is realized through blind-mate connections for water, power, and signals via quick connectors, power, and signal connectors.

▲ Figure 1. 2U Four-Node Full Wave Cooled Server
2. Overview of the Fully Liquid-Cooled Server Single Node
The fully liquid-cooled server node consists of a node shell, motherboard, CPU chips, memory modules, memory cold plate, CPU cold plate, I/O cold plate, power supply, and the rear heat exchanger for the power supply.

▲ Figure 2. Full Liquid-Cooled Server Node
II Selection of Flow Pattern and Flow Rate Calculation
To simplify the complexity of flow path design, this fully liquid-cooled server uses a series flow path design for the coolant. The coolant flows from low-power components to high-power components for heat dissipation. The detailed flow direction is shown in the diagram and table below.

▲ Series Flow Path of a 2U Four-Node Full Liquid-Cooled Server

▲ Table 3. Cooling Medium Flow Sequence
The flow rate of the fully liquid-cooled server must meet the system's cooling requirements:
- To ensure the long-term reliability of the secondary-side piping material, the return water temperature on the secondary side should not exceed 65°C.
- To ensure that all components of the fully liquid-cooled server meet the cooling requirements within the defined boundary conditions, a copper cold plate and PG25 are selected for flow rate design analysis.
To meet the requirement that the return water temperature on the secondary side does not exceed 65°C, the minimum flow rate of PG25 per node, Qmin, is calculated using the following formula:
Qmin = Psys / (ρ * C * ∆T) ≈ 1.3 LPM
III Key Component Design of the Fully Liquid-Cooled Server Cold Plate
1. CPU Cold Plate Design
The CPU cold plate module is a reference design optimized based on Intel's 5th Generation Xeon Scalable Processor cold plate design requirements. It considers factors such as cooling, structural performance, yield rate, cost, and compatibility with different materials. The CPU cold plate is primarily composed of a CPU cold plate aluminum bracket, CPU cold plate, and cold plate connectors.

▲ Figure 4. CPU Cold Plate Module
2. Memory Liquid Cooling Design
The memory liquid cooling design adopts an innovative sleeper heatsink liquid cooling solution, named after the memory modules that are arranged like sleepers on a railway track. This solution combines traditional air cooling and cold plate cooling. The heatsink, which has built-in heat pipes (or pure aluminum/copper plates, Vapor Chamber, etc.), transfers heat from the memory modules to both ends. The heat is then transferred to the cold plate through selected thermal pads, and finally, the coolant within the cold plate carries the heat away, achieving memory cooling.
The memory and heatsink can be assembled into the smallest maintenance unit outside the system (referred to as the memory module hereafter). The memory cold plate features a memory module fixing structure to ensure good contact between the heatsink and the memory cold plate. This fixing structure can be secured with screws or maintained without tools as needed. The top of the memory cold plate cools the memory, while the bottom can cool other heat-generating components on the motherboard, such as VRs. To simplify the memory cold plate design, an adapter bracket can be designed between the memory and motherboard to meet the height constraints of different motherboards.

▲ Figure 5.Sleeper Heat Sink Liquid Cooling Solution
Compared with the existing tubing memory liquid cooling solutions on the market, the sleeper heatsink liquid cooling solution has the following main advantages:
Easy Maintenance: Memory maintenance is as simple as maintaining an air-cooled memory module, without the need to remove the heatsink and fasteners. This greatly improves the assembly efficiency and reliability of liquid-cooled memory, reducing potential damage to memory chips and thermal pads during disassembly and reassembly in the system.
Good Compatibility: The heat dissipation performance of this solution is not affected by different memory chip thicknesses and memory spacing. It is compatible with a minimum memory spacing of 7.5mm and upwards. The decoupled design of the heatsink and cold plate allows for reuse and standardization of memory liquid cooling.
Higher Cost-Effectiveness: The memory heatsink can be selected based on memory power consumption, with different processes and cooling technologies, and the quantity can be configured as needed according to the memory. In a 7.5mm memory spacing scenario, it can meet the cooling needs of memory modules exceeding 30W.
Ease of Manufacturing and Assembly: There are no liquid cooling tubes between memory slots, eliminating the need for complex tube welding and process control. The traditional air-cooled heatsink and general CPU cold plate manufacturing processes can be used. When assembling the heatsink, the heat dissipation performance is not sensitive to tolerances between the heatsink and the motherboard in the direction perpendicular to the memory chip plane, avoiding poor thermal contact and making assembly easier.
Good Reliability: The sleeper liquid cooling solution avoids potential damage to memory chips and thermal pads during assembly and can withstand multiple insertions and removals. Additionally, it prevents the risk of signal contact failure caused by tilting between memory and slots after installing the memory and tubing liquid cooling solutions, significantly improving system reliability.
3. Hard Drive Liquid Cooling Design
The innovative solid-state drive (SSD) liquid cooling solution utilizes a heatsink with built-in heat pipes to transfer heat from the hard drive area to the cold plate outside the hard drive area through direct contact with thermal pads, achieving heat exchange.
This SSD liquid cooling solution consists mainly of an SSD module equipped with a heatsink, an SSD cold plate, a hard drive module locking mechanism, and a hard drive bracket. The hard drive module locking mechanism is fixed on the hard drive bracket to provide appropriate preloading force, ensuring long-term contact reliability between the SSD module and the SSD cold plate. To facilitate the installation of the hard drive cold plate loop in a confined space, the hard drive bracket is designed with a drawer-type installation method in the depth direction of the server.

▲ Figure 6. Innovative Solid-State Drive Liquid Cooling Solution
The advanced features of this solution compared to existing hard drive liquid cooling attempts in the industry include:
- Supports over 30 hot-swaps without system power-off.
- No risk of shearing damage to thermal interface materials during hard drive installation; the locking mechanism design ensures long-term contact reliability.
- Low processing requirements for the liquid cooling solution; only traditional air cooling and CPU cold plate processing techniques are needed.
- No water design between hard drives; multiple hard drives can share the same cold plate, reducing the number of joints and lowering the risk of leaks.
- Flexibly adapts to systems with different thicknesses and quantities of solid-state drives (SSDs).
4. PCIe/OCP Card Liquid Cooling Design
PCIe Liquid Cooling Solution
The PCIe card liquid cooling solution is based on the existing air-cooled PCIe card. It achieves cooling for the optical module and the main chips on the PCIe card by developing a PCIe card cooling module that can contact the system cold plate. The heat from the optical module is transferred via heat pipes to the main cooling module on the PCIe card chip, and the cooling module then exchanges heat with the IO cold plate through an appropriate thermal interface material.
The liquid-cooled PCIe card consists mainly of a QSFP heat sink clip, PCIe chip cooling module, and the PCIe card itself. The QSFP heat sink clip is designed with appropriate elasticity to ensure proper floating when the QSFP heat sink and cage on the PCIe cooling module are mated, providing a good user experience, avoiding damage to the optical module, and ensuring stable contact for effective cooling.

▲ Figure 7. PCle Card Liquid Cooling Module
OCP3.0 Liquid Cooling Solution
The OCP3.0 card liquid cooling solution is similar to the PCIe card, where a custom liquid-cooled heatsink is used for the OCP3.0 card. The heat generated by the chips on the card is transferred to the liquid-cooled heatsink, and the heat is finally dissipated through contact between the heatsink and the system's IO cold plate.
The OCP3.0 liquid cooling module mainly consists of the heatsink module, the OCP3.0 card, and its bracket. Due to space constraints, a spring screw is used as the locking mechanism to ensure long-term contact reliability between the heatsink module and the IO cold plate after the liquid-cooled OCP3.0 card is assembled.

▲ Figure 8. OCp3.0 Liquid Cooling Module
Considering the ease of future maintenance and the need for multiple hot-swaps of the OCP3.0 card, the design of the locking mechanism and the selection of thermal interface materials have been optimized to enhance the overall reliability and ease of operation and maintenance.
IO Cold Plate Solution
The IO cold plate is a multifunctional cold plate that not only dissipates heat from the heating components within the motherboard's IO area but also cools the liquid-cooled PCIe card and the liquid-cooled OCP3.0 card.

▲ Figure 9. lO Cold Plate

▲ Figure 10. Position of Liquid-Cooled PCle Card, Liquid-Cooled OCP3.0, and IO Cold Plate
The IO cold plate primarily consists of the IO cold plate body and copper tube channels. The IO cold plate body is made of aluminum alloy, while the copper tubes are responsible for the cooling fluid channels and enhancing heat dissipation. The specific design needs to be optimized based on the motherboard layout and component cooling requirements. The heatsink modules on the liquid-cooled PCIe card and the liquid-cooled OCP3.0 card contact the IO cold plate along the arrow direction. The material selection for the cooling fluid channels needs to consider compatibility with the system's pipeline cooling fluid and wetting materials.
This IO cold plate liquid cooling solution meets the multi-dimensional assembly requirements of multiple components. The mixed use of copper and aluminum materials addresses material compatibility issues, ensures cooling effectiveness, helps reduce the cold plate's weight by 60%, and lowers costs.
5. Power Supply Cold Plate Design
The power supply liquid cooling solution involves cooling the PSU fan's exhaust air by attaching an external air-to-liquid heat exchanger to the existing air-cooled power supply, thereby reducing the system's preheating of the external data center environment.
The PSU rear heat exchanger has a multi-layer structure, with channels and fins stacked on each other. The size of the PSU rear heat exchanger must balance cooling requirements, weight, and cost while ensuring it does not interfere with the power cord insertion/removal function and meets the space constraints of the system cabinet. The PSU rear heat exchanger is independently mounted on the node bracket.

▲ Fiqure 11. PSU Rear Heat Exchanger
This innovative power supply liquid cooling solution eliminates the need for developing new liquid-cooled power supplies, shortening the development cycle and reducing development costs. Its excellent versatility allows it to flexibly adapt to power supply solutions from multiple vendors, saving more than 60% compared to customized liquid-cooled power supplies.
For applications involving entire cabinets, the power supply liquid cooling can also utilize a centralized air-liquid heat exchanger solution. This involves sealing the front and rear doors of the cabinet and placing a centralized air-liquid heat exchanger at the bottom of the cabinet, replacing the distributed air-liquid heat exchanger structure behind the PSU with a centralized one.
The centralized air-liquid heat exchanger consists of aluminum corrugated fins coated with a hydrophilic layer to enhance heat exchange, combined with high heat transfer coefficient copper pipes. It can provide at least 8kW of cooling capacity with a 10°C temperature difference. The heat exchanger's flow path is optimized through simulation to handle more flow at low resistance. It features anti-condensation design and comprehensive leak detection to eliminate safety risks. A special hinge design meets high load requirements, and a card-slot connection design facilitates installation and maintenance.
With over 95% of the heat from a single liquid-cooled server managed by the cold plate, less than 5% of the heat needs to be handled by the air-liquid heat exchanger. Each node requires only 40-50W of air-liquid heat exchange, and a single centralized air-liquid heat exchanger supports 8kW of heat exchange capacity, accommodating cooling for no less than 150 nodes, at a cost far lower than 150 distributed air-liquid heat exchangers.
This solution allows the server power supplies to remain unmodified, with the heat generated being collected and exchanged uniformly by the centralized air-liquid heat exchanger at the rear of the cabinet. The heat forms a self-contained circulation within the cabinet, having no impact on the data center environment, truly achieving "Rack as a computer."
