Does the data centre industry need a new performance metric?
Malcolm HoweView bio
Originally published in Inside_Networks (UK) magazine and Mission Critical Magazine in December 2021
Data centres need more than PUE for measured sustainability gains
According to the report recently released by the UN’s Intergovernmental Panel on Climate Change (IPCC), global warming is more widespread and accelerating more rapidly than previously thought. And it is down to human activity.
For the data centre industry, sustainability, and meeting net zero emissions targets have never been more important. However, achieving these will require a paradigm shift in how we approach data centre design. Firstly, we need to reconsider how we typically benchmark data centre efficiency and, secondly, we need to look at data centre cooling.
Power Usage Effectiveness (PUE) has become a globally recognised metric for reporting the energy use efficiency of a data centre. It provides the ratio between total facility power and power consumed by the IT load. In an ideal scenario, PUE would be 1, with 100% of the power delivered to the facility going to the IT equipment. Hence, PUE aims to demonstrate the energy that is consumed by data centre infrastructure – power and cooling – whilst provisioning compute power to the IT equipment.
Specific to each individual data centre, PUE is not a reliable platform for comparing one facility to another. Neither does it give a good indication of environmental performance. Rather, PUE provides trend data whereby efficiency improvements at a particular site can be monitored for their relative effectiveness. However, in the drive towards achieving sustainable, the industry must adopt a more precise means of measuring energy efficiency.
Additionally, IT equipment is a broad church and there are questions about how power delivered to it is consumed. While the larger proportion of IT power is generally consumed by central processing units (CPUs) and graphics processing units (GPUs), a significant amount is also used by on-board fans that induce cooling air across the server components. There is a strong argument that this fan power use should lie on the facility power side of the PUE equation, as it effectively contributes only to the cooling of the IT equipment.
To accurately assess the overall efficiency of an entire facility, it is therefore a fundamental requirement that energy usage should be measured at server level, rather than rack level. PUE considers only the power delivered to the rack, rather than the use to which that power is put. However, 10 years ago, Michael Patterson of Intel and the Energy Efficiency HPC Working Group proposed two alternative metrics - IT Power Usage Effectiveness (ITUE) and Total Power Usage Effectiveness (TUE).
The purpose of ITUE is to gauge power usage efficiency for the IT Equipment, rather than for the data centre. Hence, ITUE accounts for the impact of rack-level ancillary components such as server cooling fans, power supply units and voltage regulators, which can consume a significant proportion of the energy supplied.
ITUE also has its shortcomings, in that power demands external to the rack are not considered. This can be addressed by combining ITUE with PUE to obtain a third metric, Total Power Usage Effectiveness or TUE.
TUE is obtained by multiplying ITUE (a server specific value) with PUE (a data centre infrastructure value). Although TUE has been around for over 10 years, it has been ignored by a large percentage of the industry.
Whilst the debate continues about creating new metrics, raising awareness of TUE, and promoting it as an industry benchmark standard for data centre energy efficiency, could be very valuable in helping data centre operators implement improvements in performance.
Since the biggest single use of energy in any data centre is the IT equipment, implementing solutions which yield improvements at rack level must be among the first steps that need to be taken towards achieving net zero carbon targets.
Since inception, the data centre industry has focussed on engineering secure and controlled environmental conditions within the technical space. Over time, the traditional, air cooling, chiller and computer room air handler (CRAH) approach has been augmented by new methods of delivery, including direct economisation and indirect air cooling and evaporation. Different containment systems have been devised and the ASHRAE TC 9.9 thermal guidelines have been broadened. All these measures have driven incremental improvements in data centre energy efficiency.
The challenge ahead, however, is two-fold. Firstly, how can the energy efficiency of air-cooling be further improved by any significant degree? Deltas can be widened, and supply temperatures and humidity bands marginally increased. Secondly, the rise in rack power means that air-cooling is not an effective strategy for removing heat from the technical space. The amount of air needed to dissipate heat from high density loads is creating challenges.
Air cooling is simply becoming outdated by the changing demands of the hardware that it serves. Higher power densities are compelling IT leaders to innovate. Hyperscalers and internet giants are already advanced in their experimentation with immersion liquid cooled servers and are reaping the benefits of increased energy efficiency.
The level of granularity provided by TUE gives a better understanding of what is taking place inside both the facility and the rack. It specifically highlights an important attribute of the precision immersion cooling architecture, namely the comparatively small proportion of server power that is required for parasitic loads such as server fans.
For example, in a conventional 7.0kW air-cooled rack, as much as 10% of the power delivered to the IT equipment is consumed by the server and PSU cooling fans. Additionally, some data centre operators positively pressurise the cold aisle to assist the server fans and achieve more effective airflow through the rack. By comparison, the pumps required to circulate dielectric fluid within a precision immersion liquid-cooled chassis draw substantially less power.
The heat-removal properties of dielectric fluids are of an order magnitude much greater than that of air, and the amount of power needed to circulate enough fluid to dissipate the heat from the electronic components of a liquid-cooled server is far less than that needed to maintain adequate airflow across an air-cooled server of equivalent power. Further, the comparatively higher operating temperatures of many facility water systems (>45° for ASHRAE Cooling Class W5) that serve liquid-cooled installations is such that reliance upon energy-intensive chiller plant may be reduced or avoided altogether.
By adopting liquid-cooling, data centre operators and owners can potentially drive-up efficiencies at all levels in the system, achieving worthwhile improvement in energy efficiency. Increasing rack power density free from the constraints increased cooling air movement would also allow for more compact data centres.
Driven by climate change, there is an urgent need to adopt both better energy efficiency metrics and cooling solutions, as well as to challenge hardware design practices. Once hardware manufacturers start optimising equipment for liquid cooled environments, there is the potential to realise significant additional rack space and power utilisation - as well as sustainability gains. Adapting existing air-cooled facilities to support liquid-cooled racks could prolong their useful working life.
While some transitions, such as new metrics and optimised server designs, will take time, there is no doubt about the efficacy of liquid cooling. Initiating the transition to liquid-cooling now will start the process of reducing the environmental impact of compute services sooner.