Interview

Nvidia data center exec on near-term efficiency for AI computing

Dion Harris talks technology improvements, accelerated computing, and efficiency potential.

Lisa Martine Jenkins

Published

October 8, 2024

Listen to the episode on:

Image credit: Nvidia / Lisa Martine Jenkins

The world’s biggest tech companies buy a lot of renewable energy. But with tens of gigawatts of new data centers in development, they’re turning to a wider array of clean, firm options — including nuclear plants and enhanced geothermal.

These options will take years to scale. So in the meantime, what short-term options are there for bringing down the energy intensity of AI?

Dion Harris, the director of accelerated solutions for Nvidia’s data center portfolio, is bullish on the combined impact of accelerated computing and AI efficiency gains. By improving the technologies used to power artificial intelligence, the chip giant is working to reduce how much energy each GPU requires.

Nvidia’s dominant position in the chip industry means it has an outsized impact on energy use with every chip or process improvement.

“In the short-term, most companies who are building hyperscale data centers can't just put one up tomorrow or even six months from now,” Harris told Latitude Media. “So the first question is, ‘how can we make our data center as efficient as possible?’”

In Harris’ experience with Nvidia’s HPC end-user segments, which includes everyone from supercomputing to energy, it comes down to two things.

The first is increased demand for new architectures that enable companies to “get more out of their existing power allocation.” Nvidia’s Blackwell architecture for generative AI, released in March, consumes 1200 watts per GPU — significantly more than previous generations. However, the idea is that a data center would have to use far fewer of them for the same result. Nvidia said Blackwell comes with 30 times better performance for large language modeling workloads and 25 times better energy consumption than previous iterations.

Previous improvements have tended to use more energy with each power improvement. But at the Blackwell’s unveiling, CEO Jensen Huang said the faster processing speed actually lowered the chip’s energy consumption during training; for example, training a large AI model would require 8,000 older generation GPUs using 15 megawatts of power, versus 2,000 Blackwell GPUs using 4 MW over the same period of time.

Companies are also looking into other technologies that can improve performance efficiency, such as direct liquid cooling.

The promise of accelerated computing

Harris is also bullish on accelerated computing, as his colleague Marc Spieler, Nvidia’s senior managing director for the energy industry, discussed on a recent episode of With Great Power (a Latitude Studios partner podcast, produced for GridX).

“Data centers are growing in electric consumption considerably; new data centers are much more dense than previous ones,” he said. “Accelerated computing requires a higher energy footprint per server, but it reduces a considerable amount of servers.”

Accelerated computing is often paired with AI, and involves dividing an application’s data-intensive parts from everything else, to be processed on a separate hardware accelerator.

“That's very tangible, because you're taking a workload that's consuming energy, and you're making it much more productive to the point where it consumes less,” Harris said.

“And those workloads are pretty varied,” he added. “AI is the one that is really kind of the shiny object right now that captures most of the attention, but there are tons of other workloads,” from data processing to climate or weather simulations.

Listen to the episode on:

In the short-term, most companies who are building hyperscale data centers can't just put one up tomorrow or even six months from now. So the first question is, ‘how can we make our data center as efficient as possible?’

Dion Harris, director of accelerated solutions for Nvidia’s data center portfolio

It’s not an easy process, though; Harris described it as “painful,” even after Nvidia’s roughly 18 years of experience.

“It’s domain-by-domain. You literally have to look at every application, look at every line of code and see what can be ported over to the GPU,” Harris said. “But once you do it, it has incredible benefits. And there are a ton of workloads that still need to be accelerated.”

While specific energy efficiency and processing power improvements depend on the workload being accelerated, an Nvidia analysis found that transitioning a science workload from a traditional enterprise server — a CPU — to a GPU-accelerated system speeds it up by 20 to 40 times. And the switch can speed up feature engineering, meanwhile, by 180 times.

“NVIDIA estimates that if all AI, HPC and data analytics workloads that are still running on CPU servers were CUDA GPU-accelerated, data centers would save 40 terawatt-hours of energy annually,” the analysis found, which is the equivalent annual energy consumption of 5 million U.S. homes.

Efficiency as a feedback loop

But efficiency isn’t only about the technology or computing processes. Harris emphasized that AI has the potential to make existing workflows more efficient — even if training the models initially requires a lot of energy.

For instance, he said, climate and weather organizations like the National Oceanic and Atmospheric Administration rely on supercomputers that transmit forecasts at very high resolutions, which are then used by sectors from transportation to disaster recovery. Using an AI model to run that processing brought down energy consumption by 3,000 times, Nvidia found, which included the initial training and fine-tuning the model annually.

And the company is also working to make AI useful to utilities that are also straining to accommodate new load, partially created by AI computing. All of these process shifts are still underway, but Harris believes that “as we come into next year, you'll really start to see the impacts of some of that work taking shape.”

“The key thing is that this is all very nascent — AI is very young,” he added. “I think the next part is understanding the long-term impacts of how it can improve overall data center efficiency or energy consumption.”

EVENT

Transition-AI 2024 | Washington DC | December 3

Join industry experts for a one-day conference on the impacts of AI on the power sector across three themes: reliability, customer experience, and load growth.

EVENT

Transition-AI 2024 | Washington DC | December 3

Join industry experts for a one-day conference on the impacts of AI on the power sector across three themes: reliability, customer experience, and load growth.

EVENT

Transition-AI 2024 | Washington DC | December 3

Join industry experts for a one-day conference on the impacts of AI on the power sector across three themes: reliability, customer experience, and load growth.

EVENT

Transition-AI 2024 | Washington DC | December 3

Join industry experts for a one-day conference on the impacts of AI on the power sector across three themes: reliability, customer experience, and load growth.

Get in-depth coverage of the energy transition with Latitude Media newsletters

Nvidia CEO Jensen Huang echoed that sentiment last week at an event on AI and energy: “Using artificial intelligence to solve problems will use less energy than using calculation to solve problems,” he said. “Although it consumes energy to train the models, the models that are created will do the work much more energy-efficiently.”

This is the complexity of Nvidia’s technological progress: each more powerful, energy-intensive GPU facilitates the advancement of AI, fueling both demand for the technology and for the energy to run it. But each new Blackwell chip, for instance, alone performs the work of several smaller chips, reducing the energy intensity of each workload.

However, as the number of workloads grows, so too does AI’s overall footprint. Even with computational efficiency improvements, the expansion of data centers are creating very real grid capacity challenges — which could result in a lot of new fossil gas plants being built.

Recognizing that challenge, Huang said that he supports the recent Microsoft deal with Constellation Energy to revive the Three Mile Island nuclear facility.

“No one energy source will be sufficient for the world,” Huang said. “But there’s no better way than to not waste energy,” he added, citing the importance of accelerated computing as an example.

No items found.