The team of Professor Frank Glorius from the Institute of Organic Chemistry University of Munster has developed an evolutionary algorithm that can identify significant structures in molecules for specific studies. These structures are used to encode molecule properties in various machine learning models. The development is published in the journal Chem.
The algorithm, based on evolutionary principles such as reproduction, mutations, and selection, creates individualized “molecular prints”. These prints have been used to accurately predict chemical reactions and quantum-chemical properties. Researchers emphasize the need to convert molecules into a computer-readable format for machine learning. Various methods have been developed to solve this conversion issue, but determining the most suitable method for specific questions remains a challenge, such as assessing the toxicity of a chemical compound.
The new algorithm helps identify the optimal molecular print for each case by selecting the best-performing prints from a pool of randomly generated ones. Graduate student Felix Katzenburg explains, “Following nature’s example, we use mutations—random changes in print components—or recombine components of two prints.”
This method allows researchers to understand the reasoning behind the model’s forecasts, revealing which parts of molecules impact predictions. This insight enables researchers to purposefully modify these structures. While the Münster team acknowledges their method may not always yield the best results, they highlight its adaptability to any molecular dataset without requiring specialized knowledge of chemical bonds.
“When significant human experience or extensive data are available, other methods like neural networks may be more effective,” notes Katzenburg. However, the study’s primary goal was to develop a universal molecule coding method applicable to diverse molecular data sets.