In October, the Royal Swedish Academy of Sciences awarded the Nobel Prize in Chemistry to biochemist David Baker “for computational protein design” and to researcher Demis Hassabis and chemist John Jumper “for protein structure prediction.”
Their groundbreaking work has opened a new avenue in the study of proteins, a path that many laboratories are currently exploring.
New proteins. Among these labs is the U.S. startup EvolutionaryScale. The company recently succeeded in creating an artificial fluorescent protein using an AI model. According to a study published in Science, the creation of this protein, esmGFP, simulates 500 million years of natural evolution.
GFP. The newly developed protein is part of a family of proteins known as green fluorescent proteins (GFP). These proteins are found in nature, particularly in some jellyfish. The discovery of GFP also earned three researchers a Nobel Prize in Chemistry in 2008.
While the esmGFP protein is related to the GFP family, it differs in structure and shape from its natural counterparts, though it retains certain portions that resemble the GFP proteins.
Simulated evolution. Scientists have no evidence that the simulated protein exists in nature. However, its existence and functionality allow them to imagine an alternative reality where evolution could have taken different paths.
According to the team’s estimates, the differences between this simulated protein and natural proteins are comparable to 500 million years of natural evolution.
ESM3. The research team developed a generative language model called ESM3 (Evolutionary Scale Model 3). Interestingly, despite its name, this model generates proteins instead of text.
ESM3 enables the study of the sequence, three-dimensional structure, and function of a vast number of combinations. This represents a significant improvement over models that consider only the sequence of amino acids in a protein without accounting for the shape created by the molecule’s folds. Both sequence and shape are crucial, given that they influence protein function.
To train the model, researchers used 771 billion data packages derived from 3.15 billion protein sequences, 236 million structures, and 539 million proteins, along with their associated functions.
Future applications. Exploring proteins that could have existed but didn’t through evolution allows scientists to speculate on alternative realities where evolutionary processes took different paths. This exploration not only encourages imaginative thinking about “what might have been” but can also lead to practical applications.
One significant practical application of these hypothetical proteins lies in medicine. Discovering new proteins that have functions similar to those produced naturally by our bodies could be beneficial in combating certain disorders.
Image | EvolutionaryScale
View 0 comments