A few days ago the news broke: The AlphaFold system of Google’s DeepMind project is capable of predicting the tertiary structure of proteins with a fairly high precision. Despite the great implications and the turning point that it can mark in biology research, and in biomedical research in general, it seems that it has hardly had any relevance among the general public. In this post, we will explain why this news is so important and beneficial for everyone.
Understanding how proteins work
Proteins are amino acid sequences , that is, a chain of different molecules (amino acids) linked together that is unique to each protein. Each of these molecules has a different chemical composition and for this reason we have different amino acids (alanine, glutamine, lysine, methionine … so up to a total of 22).
As we said, proteins have different combinations of amino acids and also have highly variable lengths. Thus, we could have infinite combinations of amino acids that make up proteins. In humans there are millions of proteins, it has been estimated at 42 million according to this study published in Cell Biology .
Although we define proteins as a sequence of amino acids, and this is very important, the key to knowing how a protein works is its tertiary structure, that is, how it is in three dimensions once it has gone through the entire process of reaching that conformation. (Before reaching the tertiary structure they pass through a secondary one). How an amino acid sequence reaches its tertiary structure is known as protein folding.
The function that a protein performs depends on its tertiary structure , therefore, knowing it is key to investigating its function and the implications it has in each biochemical process. If a protein is folded incorrectly, it stops developing its assigned biological function.
Examples of the importance of protein structure
The relationship between proteins and disease occurs in a large percentage of them. Protein folding failures are rare and not so rare diseases. Knowing the structure of key pathogens proteins is also important in developing treatments.
For example, in the most recent case: covid-19. The S protein (or spike) of the coronavirus capsule is what binds to the receptors (ACE) of our cells. Knowing what protein S is like and its structure allows us to develop (or estimate the use of existing drugs) that limit the binding of viruses to our cells and therefore their entry.
Another disease equally current but overshadowed by the pandemic that is due to errors in protein folding is Alzheimer’s, where the incorrect folding of the peptide-β-amyloid (Aβ42) causes amyloid plaques (the correctly folded peptide protects from oxidative stress, activates kinases…). Parkinson’s disease, ALS, and cancer are other diseases for which knowing the misfolding of proteins is very important.
So far I hope I have managed to explain why it is very important to know the tertiary structure of proteins.
Determining protein folding is not easy
As I have explained in the previous point, the amino acid sequences of proteins are highly variable, and especially when they are very long, the combinations between the chemical interaction of the amino acids that compose it can be millions of combinations depending on the proteins.
To do this, different experimental methods are used that are very expensive and laborious: determining a protein structure can take months or even years. For example, we recently published how a reader of the blog within her doctoral thesis sought to determine the structure of a part of the proteasome and how it is something complicated, laborious and long on numerous occasions.
Among the different methods you can use crystallography, use a synchrotron (particle accelerator) which is very expensive, or electron cryomicroscopy.
The change of AlphaFold
AlphaFold, the open access Google Deepmind project system, predicts the tertiary structure of an amino acid sequence using Artificial Intelligence. CASP14 is not new, it was already a first experimental version of the method that AlphaFold now uses and whose fundamental change is the precision in the determination of the human proteome.
Currently, there are approximately 180,000 experimentally determined structures in protein tertiary structure databases . While the structures predicted by AlphaFold are theoretical, they plan to release 100 million protein structures from both human and other key organisms, up to 2,700 amino acids in length.
Here you can see how AlphaFold works with an example of a protein from Drosophila melanogaster , the most widely used experimental model organism for genetic studies.
The system is not entirely accurate , with a plddt greater than 70% (plddt is the score that evaluates how well the interatomic distances are reproduced in a reference protein structure in a second structure that is compared to it), they are 58% of the amino acid residues and at the protein level is lower. This is so because, at least for the moment, this system does not consider the cellular environment and interactions with other molecules.
However, at this lower precision, the time you can save in determining the structure of proteins is very great . At this time, thanks to the Human Genome Project, we know the sequence of almost all the proteins in the human body. With this project, we can learn about its structure, accelerating research in a myriad of diseases and in particular, being of great help for rare diseases since it can reduce a part of the research costs.