Planned 3D Human Protein Structures Database Publication
DeepMind announced its partnership with the European Molecular Biology Laboratory (EMBL), Europe’s flagship laboratory for the life sciences, to create the most comprehensive and accurate database to date of structural models of proteins predicted for the human proteome. This will cover all ~ 20,000 proteins expressed by the human genome, and the data will be freely and openly available to the scientific community. The database and artificial intelligence system provide structural biologists with powerful new tools to examine the three-dimensional structure of a protein and offer a treasure trove of data that could unlock future advances and herald a new era for biology based on AI.
The recognition of AlphaFold in December 2020 by the organizers of the Protein Structure Prediction Critical Appraisal Benchmark (CASP) as a solution to the grand 50-year-old challenge of protein structure prediction was a astonishing breakthrough for the field. The AlphaFold Protein Structure Database builds on this innovation and the discoveries of generations of scientists, from the early pioneers of protein imaging and crystallography to the thousands of predictors and structural biologists who have spent years experimenting with proteins since then. The database dramatically expands the accumulated knowledge of protein structures, by more than doubling the number of high-precision human protein structures available to researchers. Advancing our understanding of these building blocks of life, which underlie every biological process in every living being, will help researchers in a wide variety of fields accelerate their work.
Last week, the methodology behind the highly innovative latest release of AlphaFold, the sophisticated AI system announced last December that powers these structure predictions, and its open source code were published in Nature. Today’s announcement coincides with a second Nature article that provides the most comprehensive picture of the proteins that make up the human proteome, and the release of 20 additional organisms that are important for biological research.
“Our goal at DeepMind has always been to develop AI and then use it as a tool to help accelerate the pace of scientific discovery itself, thus advancing our understanding of the world around us,” said the founder and CEO of DeepMind, Demis Hassabis, PhD. “We used AlphaFold to generate the most complete and accurate picture of the human proteome. We believe this represents the most significant contribution AI has made to the advancement of scientific knowledge to date, and is a great illustration of the types of benefits AI can bring to society. “
AlphaFold is already helping scientists accelerate discovery
The ability to computer predict the shape of a protein from its amino acid sequence – rather than determining it experimentally through years of painstaking, laborious, and often expensive techniques – is already helping scientists achieve this in a matter of months. that used to take years.
“The AlphaFold database is a perfect example of the virtuous circle of open science,” said Edith Heard, Executive Director of EMBL. “AlphaFold was formed using data from public resources built by the scientific community, so it makes sense that its predictions are public. Sharing AlphaFold predictions openly and freely will allow researchers around the world to gain new knowledge and driving discovery. I believe AlphaFold is truly a revolution for the life sciences just like genomics was decades ago and I am very proud that EMBL was able to help DeepMind enable a open access to this remarkable resource. “
AlphaFold is already in use by partners such as the Drugs for Neglected Diseases Initiative (DNDi), which has advanced their research into life-saving cures for diseases that disproportionately affect the poorest regions of the world, and the Center for Enzyme Innovation (CEI) uses AlphaFold to help design faster enzymes to recycle some of our most polluting single-use plastics. For scientists who rely on the experimental determination of protein structure, AlphaFold’s predictions have helped speed up their research. For example, a team from the University of Colorado Boulder is promising to use AlphaFold predictions to study antibiotic resistance, while a group from the University of California at San Francisco has used them to improve their understanding of biology of SARS-CoV-2.
The AlphaFold protein structure database
The AlphaFold Protein Structure Database draws on numerous contributions from the international scientific community, as well as AlphaFold’s sophisticated algorithmic innovations and EMBL-EBI’s decades of experience in sharing data. global biologicals. DeepMind and EMBL’s European Institute for Bioinformatics (EMBL-EBI) provide access to AlphaFold’s predictions so others can use the system as a tool to enable and accelerate research and open whole new avenues of discovery scientist.
“This will be one of the most important datasets since mapping the human genome,” said Ewan Birney, deputy executive director of EMBL and director of EMBL-EBI. “Making AlphaFold’s predictions accessible to the international scientific community opens up many new avenues of research, from neglected diseases to new enzymes for biotechnology and everything in between. This is a formidable new scientific tool, which complements existing technologies and will allow us to push the limits of our understanding of the world. “
In addition to the human proteome, the database is launched with approximately 350,000 structures, including 20 biologically significant organisms such as E. coli, the fruit fly, the mouse, the zebrafish, the malaria parasite and the bacteria of the tuberculosis. Research on these organisms has been the subject of countless research papers and many major breakthroughs. These structures will allow researchers in a wide variety of fields – from neuroscience to medicine – to speed up their work.
The future of AlphaFold
The database and system will be periodically updated as we continue to invest in future improvements to AlphaFold, and over the coming months we plan to significantly expand coverage to almost all proteins. sequences known to science – over 100 million structures covering most UniProt references. database.
Reference: Tunyasuvunakool K, Adler J, Wu Z et al. Very accurate prediction of protein structure for the human proteome. Nature. 2021: 1-9. doi: 10.1038 / s41586-021-03828-1
This article was republished from the following materials. Note: The material may have been modified for its length and content. For more information, please contact the cited source.