LF Energy Summit 2023 showcased groundbreaking discussions and presentations, delving into the forefront of sustainable technology solutions. One such session, titled “Improving High Throughput Computing’s Energy Efficiency – a Measurement-Based Case Study” (video follows below), captivated the audience’s attention. Damu Ding, a representative from the University of Oxford, presented a comprehensive study on enhancing energy efficiency in high throughput computing (HTC), shedding light on innovative strategies to minimize carbon emissions and optimize performance.
Collaborative Research for a Greener Future
Ding’s presentation highlighted the collaboration among renowned institutions, including Oxford University, STFC, University of Bristol, and Newcastle University. This collaboration was driven by the need to quantify carbon emissions within the UK’s infrastructure, aligning with the goals of the UK research infrastructure and net zero project. Their collective effort focused on a sub-project termed “country,” targeting the STFC Raw data center’s energy consumption, a crucial aspect of the study.
Unveiling High Throughput Computing
Ding deftly differentiated between HTC and high-performance computing (HPC). Both involve intensive workloads, yet HTC handles a lower number of jobs compared to HPC’s multitude of jobs. The core of the presentation lay in identifying ways to enhance the energy efficiency of HTC, particularly within the STFC Raw data center.
Data-Driven Analysis and Insights
The study followed a two-pronged approach: leveraging datasets provided by STFC’s data centers and conducting meticulous energy efficiency experiments within Oxford’s lab. The data center’s energy consumption was characterized by consistently high power usage and utilization due to its HTC workload. In particular, the presenter focused on specific data center racks equipped with AMD CPUs. Surprisingly, even with varying CPU configurations, power consumption remained relatively consistent.
CPU Upgrades and Energy Consumption Modeling
Ding delved into the impact of CPU upgrades on energy consumption, revealing that upgrading CPUs did not significantly reduce power consumption as expected. The presentation also introduced a nuanced energy consumption model, categorizing power consumption into instant power, baseline power, and active power. The goal was to minimize total energy consumption per job, which further emphasized the importance of efficient resource utilization.
Experiments Unveil Insights
The experiments conducted within Oxford’s lab employed two servers with different configurations, Intel and AMD, each boasting 16 cores and 32 threads. The workload mirrored the STFC’s high throughput computing requirements. Ding shared intriguing observations, such as the direct relationship between CPU frequency and instant power, where higher frequencies led to increased power consumption but shorter job execution times.
Sensible Strategies for Optimal Efficiency
Ding also discussed the sensitivity analysis of CPU cores and the number of jobs, showcasing how manipulating these factors impacted instant power, job runtime, and total energy consumption. These findings underscored the need to strike a balance between various factors to achieve optimal energy efficiency while maintaining performance.
Recommendations for Sustainable Computing
In light of the research, Ding offered pragmatic recommendations for organizations aiming to enhance energy efficiency. While shutting down servers for energy conservation may not be practical due to potential trade-offs in reliability and life cycle, the focus shifted towards minimizing baseline power. Implementing policies that encourage energy-efficient computing practices and prioritizing carbon-conscious workloads emerged as potent strategies to reduce carbon emissions.