Could Machine Learning Assist in Predicting Protein Folding?

The process of protein folding, where a protein chain becomes functional biologically through a “folded” 3D structure, has intrigued scientists for decades. Variations in folding can change a protein’s function entirely, or even deactivate it.

Because of folding variation, in the past it has been difficult for researchers to predict how a protein will fold in a new environment. One example involves the pharmaceutical industry, where drug developers have run into problems determining whether a new string of amino acids, the building blocks of a protein, are even viable for use. To try to overcome such issues, researchers at the University of Washington are now using machine learning to examine new protein models more quickly and more accurately than ever before.

Background: What is Machine Learning?

Machine learning is the branch of artificial intelligence that uses the combination of data and algorithms to mimic how humans learn or predict patterns. Like humans, machine learning needs to be trained in order to function. To do this, researchers provide the machine “training data” which is similar to that of the focus of a given task.

The machine uses the training data and algorithms to learn what it is looking for, and once trained, researchers can input actual data and get viable results. For drug or vaccine development, having this process focused on protein folding can help significantly with both cost, and efficiency.

Analysis: Protein Folding Predictions Via Machine Learning

To understand the benefits of machine learning in protein folding, researchers from the University of Washington tested multiple algorithms to develop new protein shapes. This helped the researchers understand how the algorithms worked, and determine specifically which shapes were viable. The team also wrote their own algorithm for generating their own amino acid sequences, called ProteinMPNN, which was found to be 200 times faster than the current best software.

In another experiment, the researchers used a machine learning software called AlphaFold to predict if the amino acid sequences created by the ProteinMPNN algorithm would fold up in a specific way. According to project scientist Basile Wicky, “We found that proteins made using ProteinMPNN were much more likely to fold up as intended, and we could create very complex protein assemblages using these methods.”

Outlook: Better and Safer Drugs

“With these new software tools, researchers should be able to find solutions to long-standing challenges in medicine, energy, and technology,” explained senior author and professor at the University of Washington, David Baker. This software can also be influential in drug design, helping to develop safer and more potent drugs.

Current drug development processes have a lot of trial and error, which machine learning software could help to reduce. The price of this software may also be more cost-effective to drug companies as it helps reduce the need for other materials. 

As Baker adds, “This is the very beginning of machine learning in protein design. In the coming months, we will be working to improve these tools to create even more dynamic and functional proteins.”

The researchers have published their results in the journal Science.

Kenna Hughes-Castleberry is a staff writer at the Debrief and the Science Communicator at JILA (a partnership between the University of Colorado Boulder and NIST). Her writing beats include deep tech, the metaverse, and quantum technology. You can find more of her work at her website: https://kennacastleberry.com/