If humans had the ability to predict protein structure solely from DNA information, it would be a medical superpower against disease, and artificial intelligence is our best hope thus far to obtain it. Such a feat is now one step closer with the creation of “AlphaFold”, a neural network designed by Google’s AI company DeepMind, to do that very thing. After entering a biannual protein folding prediction contest called the Critical Assessment of Structure Prediction (CASP), AlphaFold was declared winner out of 98 AI competitors, specifically by most accurately predicting 25 of 43 protein shapes given using genetic sequences alone. The second place winner predicted only three.
In a nutshell (or smaller, really), proteins are key factors in every living thing’s physiological processes. Their structures are encoded in DNA, and they are responsible for contracting muscles, metabolizing food into energy, fighting disease, and transmitting signals, among a great many other things. The function of proteins depends on their unique 3D structure. The way they are shaped is directly related to what they do in the body. For example, antibodies have “hooks” that attach and tag viruses and bacteria, and ligament proteins are cord-shaped, enabling them to transmit tension.
The being said, the ability to predict protein shapes can enable scientists to learn more about how defects specifically affect the body, repair damaged ones with targeted therapies, and design new ones. Their specific structure is key – the 3D shape determines a protein’s function. To further illustrate this importance, misfolding proteins are linked to many health issues such as type 2 diabetes and Parkinson’s disease.
Some medical progress has been made to address protein folding issues such as drug therapies that bind to proteins and alter their function; however, the human body is able to generate around 2 million different types of proteins, and so far we can only identify about 100,000 of them. Out of those proteins, the variety of folded 3D structures possible is calculated to be a googol cubed – 10 to the power of 300. Clearly, this is not really a job for a human. As further described on DeepMind’s website, “[According to] Levinthal’s paradox, it would take longer than the age of the universe to enumerate all the possible configurations of a typical protein before reaching the right 3D structure.”
DeepMind is no stranger to achieving incredible things with its AI software. A program built by the company called “agent” learned to play 49 different retro computer games in 2015, making it the first computer program capable of independently learning a large variety of tasks. Two other programs named “AlphaZero” and “AlphaGo” were able to beat the world’s best human and computer players at chess and the ancient Chinese game “Go”, respectively. AlphaGo was later revised as “AlphaGo Zero” to play the same Go game without any prior human knowledge, i.e., it taught itself to play and subsequently win.
AlphaFold was trained with thousands of known proteins until it could accurately predict those proteins’ 3D shape. This was a significant improvement over other existing technology, not only in levels of accuracy, but in cost-effectiveness. Other protein identification techniques such as cryo-electron microscopy and nuclear magnetic resonance depend on a lot of trial and error, which involves years of work and several thousands of dollars per protein structure to achieve. Considering the complexity involved in this field, the AlphaFold’s achievement in the CASP contest is, to say the least, representative of the expanding possibilities for scientific research and discovery using artificial intelligence.