DRNets can solve Sudoku and accelerate scientific discovery
Suppose you are driving with a friend in a familiar neighborhood and the friend asks you to turn at the next intersection. The friend doesn’t say which way to turn, but since you both know it’s a one-way street, that’s okay.
This kind of reasoning is at the heart of a new artificial intelligence framework – successfully tested on overlapping Sudoku puzzles – that could accelerate discovery in materials science, renewable energy technology, and others. areas.
An interdisciplinary research team led by Carla Gomes, Professor Ronald C. and Antonia V. Nielsen of Computer and Information Science at Cornell Ann S. Bowers College of Computing and Information Science, developed Deep Reasoning Networks (DRNets), which combine learning in depth – even with a relatively small amount of data – with an understanding of the boundaries and rules of the subject, known as “constraint reasoning”.
Di Chen, a doctoral student in computer science in the Gomes group, is the first author of “Automation of crystal structure phase mapping by combining deep learning and constraint reasoning», Published September 16 in Nature Machine Intelligence.
Gomes and John Gregoire, Ph.D. ’09, research professor at the California Institute of Technology, are the lead authors. Grégoire is a former postdoctoral researcher at the co-author’s laboratory R. Bruce van Dover, Walter S. Carpenter, Jr., professor of engineering.
DRNets, introduced at the 37th International Machine Learning Conference, held virtually in July 2020, take machine learning one step further by adding constraint reasoning – the ability to consider rules and previous scientific knowledge, in order to solve problems with very little data such as entering.
You can teach a machine to recognize a dog by showing it 1,000 photos of dogs, Gomes said, but the scientific discovery isn’t like that.
“You’re not going to have a lot, a lot of labeled data,” she said. “And usually the examples you have don’t exactly match what you’re looking for, but then you reason on what you know scientifically about the field and you can derive new knowledge from that. “
Gomes’ group, who have been working on using AI and machine learning techniques to accelerate material discovery for more than a decade, have tested the DRNets framework by unmixing overlapping handwritten Sudoku puzzles – Grids with two numbers or letters in each box. The computer had to separate the puzzles into two solved Sudoku puzzles, without any training data, which it was able to achieve with close to 100% accuracy.
The researchers then put the DRNets to work on a real-world problem: automating the phase mapping of the crystal structure of solar fuel materials, using X-ray diffraction (XRD) models. Crystal structure phase mapping involves separating the source XRD signals from desired crystal structures from “noisy” mixtures of XRD motifs, a task for which labeled training data is generally not available.
Using the included thermodynamic rules, a few bits of unlabeled data, a total of 307 XRD models, and minimal information regarding the elements of the chemical system – in this case, bismuth, copper, and vanadium oxide (Bi-Cu -V) – DRNets was able to identify and separate a total of 13 crystal phases (single-phase materials) in 19 unique mixtures of single-phase materials.
DRNets results, verified using manual analysis, allow the discovery of complex mixtures of crystalline materials that convert solar energy into storable solar chemical fuels.
“The 13 phases and their mixtures include scientific knowledge derived from the thousands of features of the measured XRD models,” said Grégoire, noting that human experts and prior algorithms “were unable to extract this knowledge from the XRD models due to the level high complexity Humans can reason on physical rules and computers can process complex data, but scientific discovery requires the integration of these approaches.
Gomes said: “Verifying that a chemical system solution meets the rules of physics is easier than producing it, in the same way that it is easier to verify that a completed Sudoku is correct than to complete it.” . “
The key to DRNets is the idea of an “interpretable latent space”. Basically, it gives DRNets the ability to reason about the constraints of the domain – in this case materials science – from input data.
“This is really the big breakthrough in our methodology: we are doing it without having data on which the computer to train,” Gomes said, noting that in the
From the Sudoku experiments, “The machine has never seen what a ‘6’ and ‘D’ overlap looks like, but can solve the problem by reasoning, using previous knowledge of Sudoku rules.
“Likewise,” she said, “DRNet reason on the thermodynamic rules and crystal phases known to demix XRD models, with no data to train on.”
DRNets builds on the group’s previous work involving citizen science related to species distribution, carried out in collaboration with the eBird program of the Cornell Lab of Ornithology. The need to capture and interpret interactions between species and their local environments was the initial motivation and inspiration for interpretable latent space in the context of DRNets, said Gomes, a pioneer in the emerging field of IT sustainability.
Other contributors include Bart selman, professor of computer science at Cornell Bowers CIS; and computer science doctoral students Yiwei Bai, Sebastian Ament and Wenting Zhao.
Funding for this work came from the National Science Foundation; the Air Force Office of Scientific Research’s multidisciplinary academic research initiatives program; the US Army DEVCOM Army Research Laboratory Defense University Research Instrumentation Program; Toyota Research Institute; and the Ministry of Energy.