Authors: Siddhartha Jana (Intel Corporation, Energy Efficient HPC Working Group), Ryan Grant (Queen's University, Canada), Tapasya Patki (Lawrence Livermore National Laboratory)
Abstract: This proposed BoF will bring together academia, government research laboratories, and industry to discuss and contribute towards the two active community-driven, vendor-neutral forums focusing on energy efficiency in HPC-software-stacks. For more than 6 years, these two complementary forums- HPC-PowerStack and PowerAPI - have led the efforts in identifying and building software solutions across the software stack.
This highly interactive BoF will enable the community to discuss ongoing challenges in designing cost-effective, cohesive, portable, and interoperable implementations of HPC software that enable monitoring and control of system efficiency. Attendees will also contribute towards brainstorming solutions for addressing imminent Exascale power challenges.
Long Description: Motivation & Relevance:
With growing concerns about Exascale power challenges, there have been several standalone R&D efforts in energy-efficient solutions. However, the majority of these techniques have been designed in accordance with vendor-/site-specific restrictions. State-of-art specifications like Redfish provide high-level interfaces but stop short of defining which software components should actually be interoperating in a cohesive and coordinated stack. We believe coordination is critical for avoiding the underutilization of system Watts and FLOPS.
For 5+ years, two complementary forums- HPC-PowerStack and PowerAPI have been addressing power challenges from within the software stack. The efforts have focused on: (A) identifying the key software actors needed in a system power stack; (B) reaching a consensus on their roles and responsibilities; (C) designing communication protocols for bidirectional control and feedback signals among them for enabling scalable coordination at multiple granularities; (D) establishing unified hierarchical communication models/APIs to access power monitor and control knobs in hardware and software; and (E) leveraging existing prototypes and building a community that actively participates in open development and engineering efforts.
Pre-SC21 Activities:
Within the PowerStack consortium, 40+ representatives from vendors, labs, and academia have convened twice a year (aligned w/ SC & ISC timeframe) for knowledge transfer and collaboration on community-wide standardization efforts for designing a power-management stack. Likewise, the PowerAPI community has convened monthly to focus on the design of the API specification that enables interoperability between the stack components. Over these past ~5 years, the community has arrived at a consensus that (1) job/application-awareness is going to be critical for boosting system-wide optimization. This implies the need to drive interoperation between a job-level runtime and the job scheduler; (2) hierarchical control-systems are good models for scalable global optimization across the system, so the power-stack should be a multi-tiered system with bidirectional control and feedback signals flowing between the layers. Today’s systems are inefficiently designed, in that, they break this hierarchy model. And we as a community need to work towards fixing this. These were in alignment with the feedback received during the ISC19, SC19, and ISC21 BoFs.
BoF-Goals:
While the community has made steady progress towards designing power management solutions, there are still open questions in the stack’s design. Some of them will be best answered through prototyping and experience gained from the development of current state-of-the-art products. Also, since designing an entire stack from the ground up is a gargantuan effort, it is extremely important that the entire global HPC community is made aware of, and be willing to contribute towards this effort. Hence, this BoF.
Goals are: (1) make attendees aware of the emerging community effort to design a common power stack and discuss the lessons learned during the past seminar; (2) provide updates on the current and future prototyping efforts that have begun; and (3) align efforts across the community so that the SC19 BoF attendees reach a consensus with regards to sharing R&D resources, avoid duplicating effort, agree on common interfaces, and reap the rewards together as a community.
URL: http://hpcpowerstack.github.io
Back to Birds of a Feather Archive Listing