As demand for compute balloons, the AI infrastructure company Crusoe will soon be offering an intermittent processing option.
Photo credit: Crusoe
Photo credit: Crusoe
Growing concern over data center electricity loads, long interconnection times, and the race to the top of the artificial intelligence dogpile have spurred discussion about whether this new load can be more flexible than its predecessors.
A slew of startups and spinouts focused on this corner of expected growth, flexible interconnection for data centers, has popped up in the last few years. And they’re promising everything from islanding capabilities to purchasing curtailed energy — and even taking on the common assumption that data centers need 99.999% uptime.
That final experiment is somewhat unique to artificial intelligence workloads, many of which are inherently batchable. It’s a segment of the data center development conversation that’s small but growing, though it has yet to take hold among the hyperscalers.
But Crusoe, a vertically integrated AI infrastructure company that taps into excess or curtailed clean energy, will soon be offering this intermittent processing at their data centers, chief product officer Patrick McGregor told Latitude Media. (Crusoe’s claim to fame is its deployment of data center modules at oil wells, where it makes use of otherwise-wasted natural gas.)
That said, McGregor doesn’t anticipate that customer interest in intermittent training or batchable workloads will be driven by reports of the dire climate impacts coming out of the AI boom. Instead, he said, Crusoe expects that customers will be drawn to a would-be flexible processing offering out of a desire to cut back on spending; any intermittent offerings would come at “significantly lower cost” than those with guaranteed up time.
“A venture-funded startup spends close to 50% of the money that they raise on us — on infrastructure providers,” McGregor said. “So even if it’s just a haircut versus a massive drop, it’s very worth it for them to explore.”
Most of the interest Crusoe has received so far on flexible workload capabilities is attached to the inherently lower price tag, he added.
It is perhaps intuitive that early stage, budget-constrained startups would be more amenable to flexible compute. Intermittency is a more viable option in the initial stages of model training, when companies are either exploring and experimenting, or training a model. The latter is where most of the cost of compute is incurred, McGregor said.
But once a company moves to push a model out to customers, intermittency becomes much less of an option.
For smaller, younger companies, moving toward a more flexible approach that only trains a model when the sun is shining or the wind is blowing — or in the case of Crusoe, when excess natural gas is being flared — is a relatively simple move, he added. That’s in large part because most model training already involves “checkpoints” at which work is saved up until a certain point, in the event of a server crash, for example.
“It’s possible to modify these work pipelines,” McGregor explained. “It’s pretty straightforward for [startups] to make that change, and cost is the biggest thing.”