MP3vec: A Reusable Machine-Constructed Feature Representation for Protein Sequences - Publication - Bridge of Knowledge

Search

MP3vec: A Reusable Machine-Constructed Feature Representation for Protein Sequences

Abstract

—Machine Learning (ML) methods have been used with varying degrees of success on protein prediction tasks, with two inherent limitations. First, prediction performance often depends upon the features extracted from the proteins. Second, experimental data may be insufficient to construct reliable ML models. Here we introduce MP3vec, a transferable representation for protein sequences that is designed to be used specifically for sequence-to-sequence learning tasks. We use transfer learning to generate the MP3vecs by training a deep neural network on the source problem of protein secondary structure prediction, and then extracting representations learned by the trained network for use in related downstream prediction tasks. ML methods using MP3vecs perform as well as the state-of-the-art (or better) on the target problems, while being orders of magnitude faster in terms of training time. We suggest that MP3vec can act as a strong baseline for comparative work on the use of ML in protein-prediction tasks; and for future extensions with domainspecific features.

Citations

  • 0

    CrossRef

  • 0

    Web of Science

  • 0

    Scopus

Authors (4)

  • Photo of  Sanket Rajan Gupte

    Sanket Rajan Gupte

  • Photo of  Dharm Skandh Jain

    Dharm Skandh Jain

  • Photo of  Ashwin Srinivasan

    Ashwin Srinivasan

  • Photo of  Raviprasad Aduri

    Raviprasad Aduri

Cite as

Full text

full text is not available in portal

Keywords

Details

Category:
Conference activity
Type:
publikacja w wydawnictwie zbiorowym recenzowanym (także w materiałach konferencyjnych)
Title of issue:
2020 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) strony 421 - 425
Language:
English
Publication year:
2020
Bibliographic description:
Gupte S. R., Jain D. S., Srinivasan A., Aduri R.: MP3vec: A Reusable Machine-Constructed Feature Representation for Protein Sequences// 2020 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)/ : , 2020, s.421-425
DOI:
Digital Object Identifier (open in new tab) 10.1109/bibm49941.2020.9313301
Verified by:
Gdańsk University of Technology

seen 80 times

Recommended for you

Meta Tags