SC21 Proceedings

The International Conference for High Performance Computing, Networking, Storage, and Analysis

Liquid Cooling Challenges and Facility Experiences: What's Next?


Authors: Chris DePrater (Lawrence Livermore National Laboratory), David Grant (Oak Ridge National Laboratory (ORNL)), Dale Sartor (Lawrence Berkeley National Laboratory (LBNL)), Herbert Huber (Leibniz Supercomputing Centre), Fumiyoshi Shoji (RIKEN Center for Computational Science (R-CCS)), Dave Martinez (Sandia National Laboratories)

Abstract: Liquid cooling is key to dealing with heat density, reducing energy consumption and increasing performance. With more than a decade's experience with liquid cooling, major supercomputing centers are still facing challenges with adoption. Part of the reason is that system load and density continue to grow, but also because load variability is increasing in magnitude and variability. This BoF present gaps, issues and lessons learned. It discusses what’s needed for liquid cooling to be implemented more readily. Topics covered include design tools to assess relative efficiencies of cooling systems, water quality control and heat recovery.

Long Description: The challenges with designing, implementing and maintaining cooling systems are difficult now and will only be exacerbated with exascale class systems. ● Although there are facility models for electrical systems that are widely deployed, the same is not available for cooling mechanical systems. What is the research being done in this area? Could it be valuable and relevant to the design process? ● Mechanical liquid cooling control systems could improve overall HPC facility operational efficiency, but they are often designed with less than optimal efficiency. How would the cooling, power and control components have to be designed to better accommodate HPC requirements? ● There is a definite trend towards higher water temperatures for HPC liquid cooling systems as described by ASHRAE’s Wx classification scheme. Do these higher temperatures add to the potential for heat reuse while challenging of maintenance of water quality? Where can large quantities of heat be used? Is there a better way to design and build-in higher water quality? ● With every computer re-fresh, re-tooling of the facility is required. This adds to the complexity of installation and potentially takes away from time when the computer is doing mission critical science. Plus, it adds to the operational costs. Can liquid cooling infrastructure be designed to facilitate changes? Is there an opportunity for standardization?

This BoF will bring together a cross-functional group of experts with HPC facility and operations managers to discuss the challenges of liquid cooling mechanical system design and operation, and to collectively ask how we can make inroads in solving some of the challenges to fully exploit the benefits of liquid cooling.

Besides operations managers from HPC sites, this BoF will attract many participants from the vendor community; system integrators, liquid cooling suppliers, architecture and engineering firms. We hope to attract participation from mechanical design software and services vendors, as well as water management companies. Although these companies have not been well represented at SC in prior years, this year’s hybrid conference allows for a unique opportunity to recruit virtual participants for the panel and other relevant sessions. (The virtual registration and targeted participation time costs are small compared to traveling and in person participation.)

We know that there will be strong participation from large-scale HPC facilities in the United States, Europe and Japan via the Energy Efficient HPC Working Group. The EE HPC WG has active collaboration with ASHRAE, the Open Compute Project and the Green Grid. This BoF could be very effective at providing guidance for those bodies.

Expert panelists may include: - Wangda Zuo, University of Colorado, working on developing open-source modeling tools for data center cooling systems. - Michael Lesniak, Ecolab, an expert for warm and cool liquid critical closed loop water systems in cloud, enterprise and HPC data centers. - Vali Sorell, Microsoft, industry expert on data center mechanical systems and voting member of ASHRAE Technical Committee 9.9. - Rich Donaldson, kW Engineering Principal Engineer, focus on mechanical systems for mission critical facilities and high performance computing environments.


URL:


Back to Birds of a Feather Archive Listing