Recent research efforts like the Materials Genome Initiative and the Materials Project have produced a wealth of computational tools for designing new materials that are useful for a range of applications from energy and electronics aeronautics and civil engineering.
But developing processes for producing these materials have continued to depend on a combination of experience, intuition and manual literature reviews.
A team of researchers at MIT, the University of Massachusetts at Amherst and the University of California at Berkeley hope to close the materials-science automation gap with a new artificial intelligence system that would pore through research papers to deduce “recipes” for producing particular materials.
"Computational materials scientists have made a lot of progress in the 'what' to make -- what material to design based on desired properties," said Elsa Olivetti, the Atlantic Richfield Assistant Professor of Energy Studies in MIT's Department of Materials Science and Engineering (DMSE). "But because of that success, the bottleneck has shifted to, 'Okay, now how do I make it?'"
The researchers envision a database that contains materials recipes extracted from millions of papers. Scientists and engineers could enter the name of a target material and any other criteria—precursor materials, reaction conditions, fabrication processes etc.—and pull up suggested recipes.
As a step toward realizing this, Olivetti and her colleagues have developed a machine-learning system that can analyze a research paper, deduce which of its paragraphs contain materials recipes, and then classify the words in those paragraphs according to their roles within the recipes. These roles include names of target materials, numeric quantities, names of pieces of equipment, operating conditions, descriptive adjectives and more.
The researchers also demonstrate that a machine-learning system can analyze the extracted data to infer general characteristics of classes of materials, like different temperature ranges that their synthesis requires, or particular characteristics of individual materials, so the physical forms they will take when their fabrication conditions vary.
The researchers trained their system using supervised and unsupervised machine-learning techniques. “Supervised” means that the training data fed to the system is first annotated by humans. The system tried to find correlations between raw data and the annotations. “Unsupervised” means that the training data is unannotated and the system instead learns to cluster data together according to structural similarities.
Because materials-recipe extraction is a new research area, the researchers didn’t have large, annotated datasets accumulated over years by diverse teams of researchers. They had to annotate their data themselves, around 100 papers.
By machine-learning standards, this is a small dataset. To improve it, they used an algorithm that was developed at Google called Word2vec. Word2vec looks at the contexts in which words occur—the words’ syntactic roles within sentences and the other words around them—and groups together words that tend to have similar contexts.
With Word2vec, the researchers were able to greatly expand their training set, because the machine-learning system could infer that a label attached to any given word was likely to apply to other words clustered with it. Instead of 100 papers, the researchers could train their system on around 640,000 papers.
To test the accuracy of the system, they had to rely on the labeled data because they had no criterion for evaluating its performance on the unlabeled data. In those tests, the system was able to identify with 99 percent accuracy with paragraphs that contained recipes and to label with 86 percent accuracy the words within those paragraphs.
The researchers want to improve the system’s accuracy and in their ongoing work, they are exploring a battery of deep learning techniques that can make further generalizations about the structure of materials recipes with the goal of automatically devising recipes for materials not considered in existing literature.
A lot of Olivetti’s prior research has concentrated on finding more cost-effective and environmentally responsible ways to produce materials. She hopes that a database of recipes could help that move forward.
A paper on this research was published in Chemistry of Materials.