Memory Demands in Disaggregated HPC: How Accurate Do We Need to Be?
Parallel Programming Languages and Models
TimeMonday, 15 November 20219:10am - 9:30am CST
DescriptionJobs running on HPC systems can vary dramatically due to the intrinsic differences in application resource requirements (e.g. memory or cores). Since HPC applications run on a number of self-contained servers whose capacities are fixed at design time, there is often a mismatch between the resource provisioning and the needs of the submitted jobs, leading to stranded and underutilized resources. This is because HPC systems assume the prevalent server-based architecture, which couples together memory and processing resources within a server. To cope efficiently with the demands, disaggregated memory has been proposed to allow flexible and fine-grained allocation of memory capacity to compute jobs.
This paper makes an important step towards understanding the facets of resource allocation and job requirements on disaggregated memory systems. We analyze the implications on HPC system operation, user experience and system performance when resources can be overestimated by users. To conduct our studies, we leverage a disaggregated simulation infrastructure implemented on a popular HPC resource manager. Our results show that the effects of doubling the memory demand in response time can be less than 8%.