Improving protein secondary structure prediction by deep language models and transformer networks

Citation DatabioRxiv, ISSN: 2692-8205

Publication Year2022

0
Citations
0
Usage
5
Captures
0
Mentions
0
Social Media

Metric Options: Counts1 Year3 Year

Metrics Details

Captures
5
- Readers
  5

Article Description

Protein secondary structure prediction is useful for many applications. It can be considered a language translation problem, i.e., translating a sequence of 20 different amino acids into a sequence of secondary structure symbols (e.g., alpha helix, beta strand, and coil). Here, we develop a novel protein secondary structure predictor called TransPross based on the transformer network and attention mechanism widely used in natural language processing to directly extract the evolutionary information from the protein language (i.e., raw multiple sequence alignment (MSA) of a protein) to predict the secondary structure. The method is different from traditional methods that first generate a MSA and then calculate expert-curated statistical profiles from the MSA as input. The attention mechanism used by TransPross can effectively capture long-range residue-residue interactions in protein sequences to predict secondary structures. Benchmarked on several datasets, TransPross outperforms the state-of-art methods. Moreover, our experiment shows that the prediction accuracy of TransPross positively correlates with the depth of MSAs and it is able to achieve the average prediction accuracy (i.e., Q3 score) above 80% for hard targets with few homologous sequences in their MSAs.

Bibliographic Details

DOI10.1101/2022.11.21.517442

URL IDhttp://www.scopus.com/inward/record.url?partnerID=HzOxMe3b&scp=85146502486&origin=inward; http://dx.doi.org/10.1101/2022.11.21.517442; https://dx.doi.org/10.1101/2022.11.21.517442; https://www.biorxiv.org/content/10.1101/2022.11.21.517442v1

AUTHOR(S)

Tianqi Wu; Jianlin Cheng; Weihang Cheng

PUBLISHER(S)

Cold Spring Harbor Laboratory

TAG(S)

Biochemistry, Genetics and Molecular Biology; Agricultural and Biological Sciences; Immunology and Microbiology; Neuroscience; Pharmacology, Toxicology and Pharmaceutics

Provide Feedback

Have ideas for a new metric? Would you like to see something else here?Let us know