Opinion
Sponsored
DATA + CLOUD
AI

Balancing AI’s potential and pitfalls in data center operations

As AI drives unprecedented demand for data centers, new approaches to power are needed to ensure momentum is not hindered.

|
Published
October 18, 2024
Listen to the episode on:
Apple Podcast LogoSpotify Logo

Photo credit: Shutterstock

Photo credit: Shutterstock

The number of greenfield hyperscale data centers is surging — and is expected to continue growing. Grand View Research anticipates growth of 13% each year through 2030. 

This new infrastructure buildout is fueling the most challenging workloads that the computing industry has ever undertaken, with compute requirements of large language model training growing at a clip of at least 1.5 times Moore’s Law (which observes that the that the number of transistors on an integrated circuit doubles every two years with minimal rise in cost). 

But when we peer into underlying GPU power requirements to fuel this training, we discover platforms that gobble energy at startling rates. NVIDIA’s new Blackwell GPUs, the solution of choice for hyperscalers’ largest training deployments in 2024, consume a whopping 1200 watts per GPU, more than 70% higher than the previous generation. (That said, the company says they contribute to more efficient processes because their increased power means that fewer are required per workload.) When you look at the Grace Blackwell platform, the data is even more eye-opening, with an energy draw of 2700 W per system. 

Given that traditional data centers deliver five to 10 kW per server rack, it is abundantly clear that these new powerhouses require fundamental power delivery changes to maintain rack density.

What are hyperscalers doing to address this? Well, in greenfield environments, the solution starts with delivery of more power per rack with new configurations: 30 kW per rack and beyond, with some reports forecasting rack power scaling up to 200 kW. This enables providers to deploy increased compute density per rack to scale compute capability in training clusters further.

This also, of course, delivers an exponential increase in heat. Operators are turning to direct liquid and immersion cooling solutions as the only alternatives to dissipate heat generated by these powerful configurations. The debate over when liquid cooling should replace air cooling technologies, at least in these hyperscale environments, is all but over. Air cooling simply cannot address the heat generation of these high-powered GPUs.

Operators are also paying special attention to the cradle-to-grave sustainability of data center environments, with more focus on embedded carbon, power consumption at use, and infrastructure circularity in alignment with corporate carbon commitments. In pursuing these efforts, 63% of respondents to ZincFive’s 2024 Data Center Energy Storage Industry Insights Report survey found that their organizations' sustainability programs resulted in reduced costs. The same survey found that sustainability was the second highest consideration when selecting energy storage solutions.

Listen to the episode on:
Apple Podcast LogoSpotify Logo
EVENT
Transition-AI 2024 | Washington DC | December 3rd

Join industry experts for a one-day conference on the impacts of AI on the power sector across three themes: reliability, customer experience, and load growth.

Register
EVENT
Transition-AI 2024 | Washington DC | December 3rd

Join industry experts for a one-day conference on the impacts of AI on the power sector across three themes: reliability, customer experience, and load growth.

Register
EVENT
Transition-AI 2024 | Washington DC | December 3rd

Join industry experts for a one-day conference on the impacts of AI on the power sector across three themes: reliability, customer experience, and load growth.

Register
EVENT
Transition-AI 2024 | Washington DC | December 3rd

Join industry experts for a one-day conference on the impacts of AI on the power sector across three themes: reliability, customer experience, and load growth.

Register
Get in-depth coverage of the energy transition with Latitude Media newsletters

As operators build out high-density, high-power capacity racks, a new approach to power backup must also be considered given the sheer scale of power draw within the cluster — and the mission criticality of training runs to the underlying business opportunity. Here, we see new approaches to both immediate and long-term battery backup also coming under new consideration.

In particular, new battery chemistries have the potential to offer something that lithium-ion or lead acid cannot. A nickel-zinc chemistry, for instance, delivers immediate power backup that is tailored for unexpected AI training cluster outages, delivering power failover prior to server reboot or generator ignition. It also has improved power density — taking up less valuable real estate space in the data center — and has no risk of thermal runaway. 

As we approach the second half of the decade, the growth of hyperscalers is set to continue, driven by the expanding potential of AI. With use cases limited only by human creativity, AI adoption will inevitably grow — but faces constraints due to the capacity of data centers to scale and meet evolving power demands.

Finding safe, reliable, and sustainable infrastructure and energy solutions will fuel the next wave of innovation, whether through new technologies, strategies, or partnerships. These solutions will have to be identified and implemented across various fronts, including the sourcing, use, and storage of power. Fortunately, many of these solutions are already emerging.

Tim Hysell is the CEO and co-founder of ZincFive, a nickel-zinc battery storage company. The opinions represented in this contributed article are solely those of the author, and do not reflect the views of Latitude Media or any of its staff.

No items found.