Convergent evolution refers to the independent emergence of similar shape and functions in different lineages of species, such as the wings of birds and bats. Recently, a research team led by Dr. ZOU Zhengting from the Institute of Zoology, Chinese Academy of Sciences proposed a novel approach based on cutting-edge deep learning methods to investigate the complex molecular basis of such phenomenon. The study was published on Sep 23 in PNAS.
Many conventional methods have been developed to find convergent changes at individual sites of specific proteins to explain the macroscopic functional convergence, as these are valuable cases for understanding the genomic solutions by which living organisms adapting to the often-similar environmental factors.
However, protein function depends on more complex high-order features than the individual amino acid residues composing them, such as three-dimensional structures. Current methods are not capable of considering these features when evaluating adaptive convergence at molecular level.
ZOU and colleagues proposed to harness the power of the recently developed pretrained protein language models (PLM) to lift this long-existing constraint. In several known biological cases, they showed that the output embeddings of PLM can indeed reflect the similarity of protein high-order features, even under high site-level dissimilarity.
The team continued to develop an analysis framework ACEP (Adaptive Convergence by Embedding of Proteins), enabling genome-wide detection of adaptive convergent evolution of high-order protein features by PLM embeddings.
The ACEP analysis was then applied to the case of echolocating mammals. Between echolocating bats and toothed whales, ACEP reported known and previously unknown candidate genes with putatively adaptive convergence of protein high-order features.
The team conducted further analyses to investigate potential underlying molecular mechanism of the found convergence, discovering intriguing relations between ACEP significance and convergent physicochemical feature of proteins, such as net electric charge density.
Overall, these findings indicate the importance of protein high-order features, as molecular basis of functional convergent evolution during organismal adaptation to environment, meanwhile providing a novel methodology detect such relations. The effectiveness of ACEP demonstrated the power of artificial intelligence (AI) techniques in elucidating the complex genetic basis of phenotypic and functional evolution.
Reference: Language models reveal a complex sequence basis for adaptive convergent evolution of protein functions.
https://doi.org/10.1073/pnas.2418254122

Figure: (A) Diagram of the ACEP pipeline. (B) Phylogeny of 115 mammalian species involved in the ACEP test of echolocating mammals. (C) ACEP test result of a known echolocation-related gene SLC26A5 (Prestin). (D) Venn diagram showing the two sets of ACEP significant genes based on different distance metrics, associated with the functional term “Sensory Perception”. (Image by ZOU Zhengting’s group)