Podcast

Can AI revolutionize materials discovery?

Google DeepMind’s Ekin Dogus Cubuk says AI may be more useful for optimizing existing materials than for discovering new ones.

Shayle Kann

Catalyst

Published

September 19, 2024

Listen to the episode on:

AI is working its way across climate tech, helping companies discover giant lodes of ore, catch battery defects, and monitor energy infrastructure. Could it help us find revolutionary new materials, too?

Turns out, it’s complicated.

In this episode, Shayle talks to Ekin Dogus Cubuk, a researcher focused on materials at Google DeepMind. DeepMind is one of several players, including Microsoft, trying to discover new materials that could be used in things like better battery chemistries, powerful carbon-capture sorbents, and room-temperature superconductors. But so far, Dogus says AI-powered approaches haven’t actually yielded any commercially-deployable materials.

Shayle and Dogus cover topics like:

Existing approaches to materials discovery, like experimentation and density functional theory, and how AI could complement those techniques
Why AI may actually require a lot more lab work — and larger datasets — before it becomes useful for material discovery
The types of material properties that AI may be especially useful for, such as optical or electric qualities

Recommended resources

Latitude Media: Armed with AI, Microsoft found a new battery material in just two weeks
Google DeepMind: Millions of new materials discovered with deep learning
Latitude Media: Could AI-powered defect detection boost battery manufacturing?

Catalyst is brought to you by Kraken, the advanced operating system for energy. Kraken is helping utilities offer excellent customer service and develop innovative products and tariffs through the connection and optimization of smart home energy assets. Already licensed by major players across the globe, including Origin Energy, E.ON, and EDF, Kraken can help you create a smarter, greener grid. Visit kraken.tech.

Catalyst is brought to you by Anza, a revolutionary platform enabling solar and energy storage equipment buyers and developers to save time, increase profits, and reduce risk. Instantly see pricing, product, and counterparty data and comparison tools. Learn more at go.anzarenewables.com/latitude.

‍Catalyst is brought to you by Antenna Group, the global leader in integrated marketing, public relations, creative, and public affairs for energy and climate brands. If you're a startup, investor, or enterprise that's trying to make a name for yourself, Antenna Group's team of industry insiders is ready to help tell your story and accelerate your growth engine. Learn more at antennagroup.com.

Listen to the episode on:

Transcript

Shayle Kann: I'm Shayle Kann, and this is Catalyst.

Ekin Dogus Cubuk: There's a certain amount that we know as humans. And maybe we can use computation to predict a bit outside of that circle sphere. But the farther we get from the sphere, the less good our approximations will be.

Shayle Kann: Will materials discovery be the killer app for AI in climate tech, or is it a lot harder than we think it is?

I'm Shayle Kann. I invest in revolutionary climate technologies at Energy Impact Partners. Welcome. Well, as the AI boom of the past couple of years has taken off, the way that I've thought about the intersection between AI and what I do, which is climate tech, is that there are two distinct components. The first is the impact of the growth of AI or really the growth of AI data centers and compute on energy and therefore on climate. That one we've talked about a bunch here. The other one though, which we haven't talked about as much, are the actual applications of AI for climate tech. And to be honest, one of the reasons that we haven't talked about it so much is that I'm still kind of searching for what I think is a real tangible opportunity that would drive big impact.

Of course, the world abounds with ways to use AI to do things more efficiently. But to really move the needle on gigatons of emissions, I think is trickier. But here's one category I've been pretty curious about, which is AI for materials discovery. Undoubtedly, a big part of the technical challenge of getting to net-zero is a materials challenge. And one of the areas that, at least on the surface, you can pretty easily imagine AI creating a step function improvement is in doing the complex and currently quite slow work of discovering new materials. So is this our AI climate killer app? Let's find out. For this one, I spoke to Dogus Cubuk, who is a research scientist studying materials discovery at Google DeepMind. Here's Dogus. Dogus, Welcome.

Ekin Dogus Cubuk: Hi, Shayle. Thanks for having me.

Shayle Kann: I'm really excited to talk to you about AI for materials discovery. I want to start by talking pre-AI. So obviously in the history of humanity, we've discovered many, many new materials. We've commercialized many of them. Just talk to me about before AI. Just walk me through the process of new materials discovery, broadly.

Ekin Dogus Cubuk: Yeah, that's a great question, and it goes really back far. So if you think about the invention of money, for example, I think one of the timelines for people talking about is when we invented gold or gold money. And if you think about how that happened, turns out a lot of the ores have gold and silver mix, so they're alloys. And I think the big innovation there was when some humans found a way of extracting the gold and silver from each other. And then once you had pure gold, that was used as money. It's interesting to think about those times. But in more recent times, I feel like one thing that's very relevant to this conversation is that a lot of materials discovery has been by random trial and error, and it's been very serendipitous. Actually, the more I look into this, the more I realize that almost all fields involve some kind of important serendipitous discovery.

So one of the fun examples we often talk about is for light bulbs, around 1905 or something, they were realizing that tungsten is a good material as a filament. But tungsten wasn't ductile enough to be wrapped up as a coil. And then apparently by mistake one time, it was dropped into a pool of liquid mercury. And turns out when mercury and tungsten react, it becomes more ductile, so then came ductile tungsten. We can talk more about this, but I think history is just full of examples like this. If you think about the invention of the kind of lithium ion battery, I think one of the stories is at Exxon they were looking at discovering superconductors. And they had this reason to think that lithium ion intercalation could be interesting for studying superconductivity. But as they were intercalating lithium, they realized it's actually really good at storing energy.

And this is one of the reasons I'm really excited about trying to see if AI and simulations can be useful here. Because a lot of the discoveries have just been randomly trying relevant materials. Maybe another example I can give you is in 1949, I think, when Bardeen and his collaborator first invented the transistor, the solid-state transistor. You can look at their diary. And you notice that they tried so many different materials for all the different parts of it because they just didn't know what would work. So at the time they were hoping silicon would work, but turns out silicon didn't work for them. So they ended up switching to germanium, and that worked. And then I think they had a problem with the glue, so they had to change the glue. They had a problem with the metal electrode in the device. What I'm trying to say is that Bardeen was maybe one of the best solid-state physicists ever lived. But even he was just randomly trying materials to make this transistor work.

Shayle Kann: Yeah, it's funny. When you talk about the accidentally dropping tungsten into a bed of... What is it, liquid mercury? It makes me think sometimes of... I have a two-and-a-half-year-old son. I should just put him in a chemistry lab with a bunch of materials in a bunch of different places and just give him enough time and eventually he's going to accidentally do something that's going to discover some amazing new material. Obviously I'm a better father than that. But the transistor one is I think a good one to talk about because... Okay, sure. In the arc of human history, many of the important discoveries have been made purely accidentally. But certainly in the past few decades, I would presume we've developed a body of knowledge about the characteristics and properties of various materials. And so if you're trying to solve a material discovery problem, I don't know, 10 years ago, probably you're not just doing totally random trial and error, right? What was the depth of our knowledge and our ability to iterate on different designs of materials and so on that went beyond the purely random? Again, prior to AI.

Ekin Dogus Cubuk: Yeah, that's a great question. And obviously even the examples I was giving, they were not purely random. For example, Bardeen knew that the semiconductor would be something like silicon or germanium. They both have four electrons in the shell. There was a lot of physical understanding. They knew about the surface states of silicon. So it's quite different than I think, as you said, a random kid going in... Although your kid is probably very smart. A random kid just randomly trying stuff. But here comes, I think, a very interesting philosophical contradiction. And I think this is true for science, but also for machine learning. The better you know a system, the more you can continue optimizing the system. But it doesn't necessarily mean that knowledge will help you discover something different. And I think this is probably why a lot of important discoveries are serendipitous, because let's say you're in a company and your company is really an expert on material A.

So you're right. You've developed so many important sets of expertise that you can really optimize A. But most likely those skills don't help you discover C, which is very different. Or even B. And this I think is definitely an issue with machine learning. So in machine learning, we know that we do really well on the training set distribution, on the kinds of things you're trained on. And the farther you get from the training set distribution, the worse your predictions are. And this is also true for science. The closer things are to the textbooks, the better our theories are at predicting them. And this is partly why I think materials discovery has become so difficult in the commercial space because... Like if you think about plastics, we're still mostly using things that we discovered 70 years ago, 80 years ago. And we've gotten so good at manufacturing them, optimizing them.

So now for someone to come up and say, "Oh, I discovered a completely different one," it's quite difficult. And for this reason, it just lends itself to us optimizing non-materials and not necessarily discovering completely new ones that might be better. And that's probably also why a lot of the materials we're using today in many technologies are quite simple. If you think about transistor, it's just like pure silicon. If you think about... I think in MR machines, the superconductors they're using are quite a bit more simple and older than the new cuprates and stuff. So yeah, it's quite common that I think we're having difficulty discovering new materials, even if we're pretty good at modeling some of the older materials.

Shayle Kann: Okay. So to my layman's ear, I guess what I'm hearing is that what we have gotten pretty good at historically is taking a sort of incremental step in material discovery. We know a system, we know a category of material and so on. We can optimize the hell out of it. Maybe this is what we've done with plastics over 80 years. What we have had a harder and harder time doing, in part because maybe we've discovered the low-hanging fruit, is finding entirely novel materials. So that's maybe a good segue to talking about the new world of AI and to what extent it has a role to play in helping to crack that code. Because the fundamental thing I wonder about is presumably the thing that makes it difficult to discover an entirely novel option C, or whatever you called it before, is that the possibility space is virtually endless.

It's a huge number of possibilities of things that you could do. And so the question is is our ability to do this kind of computation that AI is introducing... Does that make that easier in the sense that you can just run a million combinations if, theoretically, you can simulate the properties of materials? Or does it actually make it just as hard for exactly the reason you described, which is we have a corpus of data we're going to train these models on, but that corpus of data is grounded in what we already know. And so definitionally, it's going to be hard for it to find the next thing.

Ekin Dogus Cubuk: Yeah, that's a great question. And it's not just AI. So there are two things that are happening in the recent decades. Simulations are becoming more and more commonplace, and that's probably very correlated with why AI is becoming very commonplace. Because our computing infrastructure is growing and computers' getting cheaper. So now we are getting to a point where we're better at training neural networks, we're better at using them, and we are better at simulating atoms and materials, and we're doing for cheaper. But I think exactly as you said, both simulations like density functional theory and machine learning have kind of the same bias as the regular pre-simulation science, theoretical science. And maybe it's even worse because humans, as biased as they might be when doing science, they also clearly have this ability to extrapolate. Humans have found ways of discovering things that were beyond their theories.

There's been these paradigm shifts. And AI hasn't really done this yet. Even today's best AI models seem to be really good at doing the textbook stuff, like high school, college. But then when you think about being more creative and trying to shift the paradigm, it's been more difficult. Okay, so that's the pessimistic part. But I think the optimistic part is even the less creative parts of science actually could really benefit from becoming more efficient. So let me give you an example on that. If you think about high temperature superconductivity, as you know, this is different than conventional superconductivity, but it can be much higher transition temperature. And we still don't know as physicists where high temperature superconductivity comes from. It's a crazy thing. It's been around for 50 years, 40 years. We don't know why it happens. But we can still optimize it.

So the first high temperature superconductor cuprate that was discovered was... It's called LBCO. And that's important, because it's L is lanthanum, B is barium, and then copper oxide is the CO. So I think when Mueller and his collaborator first discovered this at IBM, I think people thought that Mueller was crazy for considering cuprates for superconductors. Because all other superconductors were BCS, and they were different. But if you look at Mueller's Nobel Prize speech, he actually talks about how he used the old understanding, the conventional superconductivity, to be able to consider cuprate as an example. But we now know that it's actually not a great transfer because cuprates' actually quite different than conventional superconductors. So LBCO turns out to be quite interesting, but not good enough. So then what people did is, even though they don't know why it's a superconductor, they started replacing elements with similar elements.

And then the first one really made it. And the reason I'm saying it made it is because it was at a temperature above the liquid nitrogen, and that was the YBCO. So what you notice is just the lanthanum was replaced with an element. So humans here were able to find a good enough superconductor, even though they didn't understand why it was a superconductor. So I think what computational machine learning can give us here is even if they can't do the paradigm shift and go from cuprate to a completely different superconductor, they can at least help us do this optimization, the exploitation part, to go from LBCO to YBCO faster. And soon when we start talking about our own work, you'll see clear examples of this.

Shayle Kann: So would I be right in that example to think... We have some high-temperature superconductors. And one of the things that, as a non-physicist, has always bugged me about that terminology, like high-temperature superconductors, still very cold, need to be extremely cold. The holy grail of course is a room-temperature superconductor, which we have not yet discovered. So would it be right to think maybe the type of thing that computational machine learning might be good at is optimizing and tweaking the recipes that we've got for high-temperature superconductors? Probably less likely, at least today, to discover the room-temperature superconductor, because that probably requires some completely orthogonal type of thinking?

Ekin Dogus Cubuk: Yeah, I think that's exactly right. So if you look at last year, the last few years, the most promising discoveries on the computational side, they've been looking at hydrides, which might have conventional superconductivity. And at high enough pressure, they might have high enough temperature. So there's been some really good coming out of Picard group and a few other groups where they use simulations to study these. And the hope, exactly like you said, is maybe take a known kind of superconductivity and optimize it to get as close room temperature as possible.

Shayle Kann: Maybe that's a good segue to giving some examples of the types of things that... In recent years, since the boom in AI, what has been proven to work so far? What have we discovered collectively using AI, that perhaps either we wouldn't have otherwise or would've taken a whole lot longer and more work and effort? Tangibly, what have we shown?

Ekin Dogus Cubuk: I think not a whole lot. And I would actually even make the question a bit larger and ask what has simulation given us that was not possible? Because we have to realize that simulations have started becoming a thing in material science since early 2000s or even late 1990s. So it's been several decades at this point. And it's important to ask what has that given us as an actual material in products that we didn't have before? And I think that's the crucial question. So there's one example that often gets talked about. I think one of the cathode materials maybe from Ceder Group and Materials Project I think is in Duracell batteries.

So this one example that's known. This is a bit out of material science, but in topological insulators, the first three-dimensional topological insulators were proposed in a DFT paper. But otherwise, yeah. So it's actually not that great. I mean, materials discovery in general is quite hard for the reasons we mentioned. So it's usually just experimentalists randomly trying stuff. But the good news is, I think, the simulations and machine learning has been making progress, not yet in putting materials in devices, products, but at least in making useful predictions.

Shayle Kann: We talked about this briefly, but I guess I want to understand it a little bit better. Folks are going to be familiar with AI in the form of LLMs and things like that, all the generative AI world. And one of the benefits that that world has is that the corpus of training data for those models is enormous. You're training LLMs on the internet, basically. How does it look when you're trying to do simulations for the purpose of material science? How big is the data upon which you can train? Is it big enough? And do we need to be generating an enormous amount of synthetic data in order to sufficiently train these models? Is that a real constraint here?

Ekin Dogus Cubuk: Yeah, that's a great question. So if you look at ICSD, which is the Inorganic Crystal Structure Database, there are about a bit more than 200,000 inorganic crystals there. So that kind of tells you that it's quite a bit smaller than the internet scale. One good news for us is, as you said, we can simulate data. And the simulations come from our physical approximations of quantum mechanics, so they tend to be somewhat informative. And with density functional theory simulations, we can do a lot more than 200,000. So if you look at our GNoME paper, we had results for several million training points. And people have been actually pushing that, so now there are many different groups that have results for like 50 million points. So that's one thing. But then the question is how many experimental data points is worth how many computational data points? But this is actually not that different than the internet scale.

So when LLMs are trained on the internet, the data is not very high quality. It's just sentences. And there aren't necessarily very good labels for them. The sentences could be written on the internet. It's not very high quality. But as you mentioned, what can be quite effective is you pre-train on the internet and then you fine tune on specific tasks. And that might be pretty similar to us. If we end up having these hundreds of millions of points from computation and then a few points, like a hundred thousand points from experiment, maybe then we can get some good results. Part of the big issues is, so 200,000 crystals on ICST I mentioned, but for many of them we actually don't know the properties. So what's their band gap, what's their electronic conductivity? And that even becomes a smaller set. Then you may have 1,000, 2,000 data points. And it's a real problem, I think. Yeah.

Shayle Kann: Yeah. That really drives home why this is challenging. You have 1,000 to 2,000 data points where you actually understand the properties of a thing that you're training your data on. Those numbers, even I know, that's tiny. I guess the other question, I think people could imagine the sort of world of material discovery, particularly the way you describe it historically when this stuff has happened semi-accidentally, is a lot of physical trial and error. You're doing something in a lab. People are doing something in a lab and seeing what happens and then measuring those results and then inferring something and moving on.

And then you can imagine the world in which AI, ML, et cetera, replaces the lab work, because you're able to utilize computation to figure out what's going to happen if you were to do that stuff in the lab. Do you see that as being a realistic future? Are we going to replace lab work? I mean, I could make a case, I guess, based on what you just said, that's kind of the opposite. Because at least for a while, in order to get sufficient training data where we know the properties of the things, like there's a chicken or egg problem, and actually maybe you have to do a lot more lab work up front to get the data to train the model that then replaces the lab work.

Ekin Dogus Cubuk: Yeah. I mean, that's a great question. I can't imagine a future where we completely eliminate lab work. Because first of all, we don't know if quantum mechanical simulations will ever become good enough to correctly predict experiments all the time. But also, again, going back to the philosophical perspective, there's a certain amount that we know as humans. And maybe we can use computation to predict a bit outside of that circle, sphere. But the farther we get from the sphere, the less good our approximations will be. So this is that issue where the things we know well, we can predict well. But the things we really want to predict are the things we don't know well.

So from that perspective, experiments and the real universe can always be needed to always validate our predictions to train the next model. So there's been these efforts. You might've heard about the TRI effort. I think there are a few other efforts coming up where, exactly as you proposed, they're trying to create a lot of experimental synthesis data so that you can bootstrap and start using machine learning and computation. Because currently, part of issue is we don't even have a good large dataset where you can train or validate your synthesis predictions on.

Shayle Kann: This is sort of an aside, and I don't know if you'll have an opinion on this, but one thing I'm curious about. So as I'm sure you're aware, there are lots of startups now emerging who are saying, "We're going to do AI for materials discovery. And we've got some kind of a black box machine. You input what you need out of a new material and we're going to tell you what material you should use," obviously with more in between there. So I live in climate tech world, and so there's a bunch of applications of where a novel material could have a big impact on climate change. I find a lot of them, the startups at least, they start by saying, "We're going to find a metal organic framework for carbon capture." That seems to be a very common example. Weirdly common. And I'm curious why that would be, and whether it tells you something about the types of problems that these models actually can attack early on.

Ekin Dogus Cubuk: This is actually something we thought about a lot. Part of the issue is... Let's say you're trying to discover a battery material. Let's say you discover an amazing electrolyte, solid electrolyte. One issue you might face is that by itself isn't a battery. And then you have to put it in a battery. And then will it work with the cathode, the anode, the interface, with the manufacturing line? So I'm wondering if one of the reasons MOFs have become quite popular for these startups is it might be like a standalone material as a product. I guess you could take the MOF, put it in a room, and it will capture some amount of carbon. And you don't have to worry as much about the other parts. But I think that's a good question because you don't currently see MOFs as being very commercially impactful. So maybe they're also betting on the fact that in the future it might be.

Shayle Kann: Yeah, MOFs are one of those categories, like graphene, where you're like... For a long time it's been the holy grail of lots of different things, and you can imagine a million different applications. And people have tried. Maybe now the time is nigh for MOFs to really take off. But I think that other point is actually a really interesting one. In battery world, nothing exists in a vacuum, so you can't create a novel material and then be done with it. You have to figure out not only the material, but then its interaction with all the other materials, which are also in flux. And that's part of what makes batteries so difficult. So maybe that tells you that the types of problems that these models can attack early on are the ones that are self-contained. If you solve that problem with that material, that's all you really need, and you don't need to deal with all this other interaction and stuff like that.

Ekin Dogus Cubuk: Yeah. That's exactly right. And one other potential factor is, and I see this often at Google, when somebody outside of material science gets interested in trying to contribute to material science, a common reason is they worry about the climate and they want to help climate. So it's less often that I see a non-material scientist come to me and say, "How can I improve the ionic conductivity in SEI?" But more often I hear them say, "How can I help carbon capture problem?" And the standard reason maybe the startups gain more interest and more people are excited to work there because they're potentially going to contribute to carbon capture.

Shayle Kann: Are there any particular areas that you're most excited about, like domains or materials requirements? What do you think, where might we see... As you said, we haven't really yet proven a whole lot about the ability of AI in these new methods to discover new materials. Where might we? Where are you most optimistic?

Ekin Dogus Cubuk: Okay. So different applications, I feel like, have different issues. So maybe I can cover a few and say why some of them might be more promising. If you think about something like optical properties or electronic properties, I think one of the limitations might be that DFT itself isn't as good at predicting electronic properties as it is on predicting structural properties. So DFT tends to be better at predicting the formation energy, the stability of a material, but not necessarily the band gap. And the band gap is crucial for understanding the optical properties, like how the light and the electrons will interact. So that's one reason. For example, if people are using the current state of DFT, they might be less successful at discovering optical applications than something else.

Shayle Kann: Can you define DFT for folks who are not familiar?

Ekin Dogus Cubuk: Oh, yeah. DFT stands for density functional theory, and it's been really, really impactful in material science. So basically what happened is people were trying to figure out how to simulate the quantum mechanical aspects of a material. Because what's really interesting in material science, it is for me, is that the properties really depend on how atoms interact. And atoms interact at the quantum mechanical scale. There's been many methods proposed over the century, but it seems like DFT has really taken on as the efficient enough, fast enough, but also accurate enough sometimes as a simulation tool. So now if you look at citations, I think Walter Cohen who got the Nobel Prize for it, has an incredible number of citations. Because everyone uses DFT these days to try to simulate materials.

Shayle Kann: Okay. I interrupted you, but so DFT is a technique essentially. And so you're saying there's certain things that DFT is better at than others. What is that going to lead us to in terms of where we might use DFT to make a globally significant discovery?

Ekin Dogus Cubuk: That's right. So if you think about batteries, there are many aspects of batteries that seem like a better fit to DFT. For example, you'd like your battery materials to be stable. And you'd like them to, for example, the electrolyte to conduct the ions. Like if it's a lithium battery, lithium should go through it fast. So for predictions of this type, DFT seems a bit better. There's probably not a surprise that the one example I gave you earlier was for a battery with Duracell. And a lot of DFT practitioners study batteries at some point. For example, for my PhD, I studied silicon as an anode material. So this one example, if you think about catalysis, I think there's a lot of excitement around catalysis because it's a very important application. But one issue maybe is that the surface, the heterogeneous catalysis, the surface is very messy and it's dynamic over its use.

So if you don't really know what's happening at the surface, you might not be able to predict what's going to happen as a function of the structure. So that's one challenge with catalysis. Superconductivity is very exciting, of course, both from a scientific perspective and maybe from a climate and technological perspective. But superconductivity often involves very complex quantum mechanical interactions, so it's yet to be seen if DFT can be useful. So yeah, I think every different vertical has these different issues, and it's not clear with machine learning support which one will actually be helpful.

Shayle Kann: Are we going to see a watershed moment in this space, in the same sense that GPT-3 was for LLMs? Is there something like that that could or will happen? Or is it going to be more steady progress, perhaps faster than historically, but more consistent as opposed to step function?

Ekin Dogus Cubuk: Yeah. So a very good, I think, comparison point is AlphaFold. I think when AlphaFold came out, people saw it as a watershed moment. And I think part of what was good for AlphaFold is that there was this competition that people really cared about. There was this problem people really cared about, protein folding. And doing well on that, much better than previous methods, kind of made it clear that it's a very useful tool.

Shayle Kann: And it's objective. You could objectively measure whether you were better at it.

Ekin Dogus Cubuk: That's right. In material science, I think one of the issues is the experimental data is actually quite noisy. This is something that you might hear often, that simulations and DFT isn't very accurate, and that's true. But maybe one thing that people don't notice it is the experimental uncertainty is actually usually at the same level of computational errors. And the reason this is really bad is because this means that even if you want to improve your simulations, the experimental labels are noisier than where you want to get to. One thing that is clear is for CASP to be such an impactful dataset, a lot of experimentalists have spent a lot of important effort in trying to get consistent and useful data. And I think now maybe because now machine learning really needs this high precision large dataset, there is bigger efforts trying to create a CASP-like database, but it's not there yet.

Shayle Kann: Well, I guess to wrap up then, I mean I've asked you to talk a lot about the field. Curious to hear what you're working on and what you're most excited about in terms of your work at DeepMind.

Ekin Dogus Cubuk: Yeah. So last year we published our paper GNoME. And that paper was mainly about seeing if machine learning can be used to discover materials stable at zero kelvin. So ideally we'd like to discover materials that are stable at room conditions, but that's quite a bit far from the level of the field, especially back then. And what we realized is even for zero kelvin stability prediction, which is a simpler task, there weren't that many predictions that are from DFT. So there were about, at most, 48,000. And only about 28,000 also had come from computation and 20,000 had come from previous experiments. So we saw that machine learning can really speed up this process. Part of the reason is you've seen with LLMs and with vision models that the more training data you put into LLMs, the better results you get. And how much better your results are actually predictable. It's kind of like a power law.

This comes back from a paper from Baidu Research from back in 2016, I think. And it seems to apply to all kinds of deep learning, including quantum mechanics and material science. So we realized that as we make our model better and better, its predictive ability improves to a point that it can actually now discover crystals that are stable as zero kelvin. So that's what we did last year. And as we said in that paper, one of our next goals is to predict not just zero kelvin stability but finite temperature stability. And this is much harder, because of finite temperature there's entropy effects. And we are interested in finding not just materials that are stable but also exciting, so like superconductors, battery materials, materials that really will impact the technology. That's another thread that we're going towards.

And finally, one other thing that we really care about is taking DFT and making it more predictive. So DFT has been around for decades, but it's mainly been like a theoretically developed tool. So the equations that describe it are as simple as a really good theoretical physicist can model. But these models can actually be a lot more complicated because we have data and we have machine learning now. So we'd also like to improve DFT, which is something else we're working on.

Shayle Kann: Right, Dogus. This was a lot of fun. I'm still lost in half of this stuff. But I feel like I have a better understanding of the overall state of affairs, which is really what I was hoping to get out of this. So really appreciate it. Thanks for the time.

Ekin Dogus Cubuk: Awesome. Yeah, this was super fun. Thank you.

Shayle Kann: Dogus Cubuk is a research scientist at Google DeepMind focused on materials discovery. This show is a production of Latitude Media. You can head over to latitudemedia.com for links to today's topics. Latitude is supported by Prelude Ventures. Prelude backs visionaries accelerating climate innovation that will reshape the global economy for the betterment of people and planet. Learn more at preludeventures.com. This episode was produced by Daniel Waldorf, mixing by Roy Campanella and Sean Marquand, theme song by Sean Marquand. I'm Shayle Kann, and this is Catalyst.

No items found.

Get in-depth coverage of the energy transition with Latitude Media newsletters

No items found.