Bioautomated and Machine Learning in Biology: Unlocking Life’s Secrets

Machine Learning Breakthrough in Biology: Bioautomated Reduces Time and Cost for Researchers

Machine training is a powerful tool for analyzing and predicting data, which can be useful in different fields of science. However, not all researchers have sufficient experience and resources for creating and setting up machine learning models for their tasks. How to make this process more affordable and effective?

Professor Jim Collins and his colleagues at the Massachusetts Institute of Technology have developed an innovative solution to this problem. They have created Bioautomated, an automated machine learning system specifically designed for biological data. In an article published in the magazine Cell Systems, they discuss the capabilities and potential of their creation.

Bioautomated is capable of independently selecting and building a suitable machine learning model for a given data set, as well as pre-processing and formatting the data. This significantly reduces the time required for such tasks, which are often time-consuming and labor-intensive. The system is versatile and can handle various types of models, including binary classification, multi-class classification, and regression. It can also work with different types of biological data, such as DNA, RNA, proteins, and glycans.

“The fundamental language of biology is based on sequences,” explains Luis Soenksen, a postdoctoral researcher at Clinic Jamil, a machine learning clinic, and the first co-author of the article. “Biological sequences, such as DNA, RNA, proteins, and glycans, have an amazing ability to provide standardized information. Many automated machine learning tools are designed for text data, so it made sense to extend them to biological sequences.”

The efficiency of Bioautomated is remarkable. It can reduce a multi-month process to just a few hours, making it extremely convenient for researchers who want to incorporate machine learning into their projects. “Our tool explores models that are best suited for small and sparse datasets of biological data, as well as for more complex neural networks,” says Jacqueline Valerie, a doctoral student of biological engineering in the Collins laboratory and the first co-author of the article.

Bioautomated has already been tested on real tasks in the field of biology and has shown promising results. For instance, the system successfully predicted the function of unknown proteins based on their sequence, identified the role of glycans in the human immune system, and even pinpointed potential medicinal targets for

/Reports, release notes, official announcements.