Authors: Grant Stewart (Los Alamos National Laboratory), Jimmy Daley (NVIDIA Corporation), Ethan Thomason (Bureau Veritas), Joe Prisco (IBM Corporation), Herbert Huber (Leibniz Supercomputing Centre), Fumiyoshi Shoji (RIKEN Center for Computational Science (R-CCS))
Abstract: The next generation large scale HPC systems with facility provisioning requirements up to 80MW, milli-second-multi-MW power swings and extremely high-heat density will demand stiff and responsive power grid capabilities. Concomitantly, advances in “zero-carbon” mandates, utility resiliency challenges of climate change, and reliance on intermittent renewable generation will narrow the availability of stiff and responsive grid capabilities. This BoF will feature a panel of experts and encourage open audience discussion to consider HPC system and facility development trends and realities in this changing environment. What mitigation techniques are available? Are there any game changers on the horizon?
Long Description: The 2008 “Exascale Computing Study” posed a challenge to achieve exascale petaflops with a system power of no more than 20 MW. The total power draw dimension of the challenge has doubled and is achievable, but there are new dimensions to the power management problem that are equally challenging and must be considered. These challenges include: dynamic and repetitive operational events causing large power swings, distributing high voltage electricity to end components (e.g. 480V), high availability fault currents, extreme power density demands, no UPS or other auxiliary power supply on the compute system, and re-configuring and upgrading the HPC facility with each new supercomputer.
These are some of the current challenges for Lawrence Livermore National Laboratory (LLNL) and the utilities that serve the LLNL site as they fit up their site to support their next supercomputer. These same challenges are being addressed by the other exascale class sites such as Oak Ridge National Laboratory, Los Alamos National Laboratory and RIKEN.
Anna Maria Bailey has been instrumental in transforming the LLNL electrical distribution from one that was commercial grade in the 90s, to industrial grade in the next two decades and now to utility grade with exascale. She writes, “These solutions for exascale can take up to seven years to plan for. What changed here - the big game changers - is dealing with utilities. We are no longer making just local decisions. It is something that's going to take careful planning over a long period of time and a lot of support as well.”
Power availability from the utility is also becoming a challenge for supercomputing centers. This became critically apparent with the 2011 Fukushima Daiichi nuclear disaster in Japan. Even mission critical facilities like supercomputing centers were required to implement facility level power capping. Similar events have been occurring in California for the past few years due to other natural disasters. This trend will continue with the increasing impact of global warming as well as the transition to more unpredictable energy sources like renewable energy. As a result, sites are increasingly requiring power management and power capping features as part of their supercomputer procurements.
There are many other opportunities and challenges the HPC community faces with respect to electricity and power distribution systems. An obvious one is improving energy efficiency and reducing the cost of power and energy through careful negotiation and management of electrical service provider contracts and relationships. Others might facility fit-up with on-site generation capabilities or another could be distribution ‘stiffening’ for both the site and the provider.
These are all extremely relevant and new challenges for the HPC community. They are most apparent for those sites that are currently deploying exascale class systems, but also for those in locations where power availability is most challenged. The concern is not just the operational cost of 20MW or 40MW, which is a significant challenge, but how we should plan for and manage power and energy distribution.
URL:
Back to Birds of a Feather Archive Listing