VinaR - Repository of the Vinča Nuclear Institute
    • English
    • Српски
    • Српски (Serbia)
  • English 
    • English
    • Serbian (Cyrillic)
    • Serbian (Latin)
  • Login
View Item 
  •   Vinar
  • Vinča
  • Radovi istraživača
  • View Item
  •   Vinar
  • Vinča
  • Radovi istraživača
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Automated feature engineering improves prediction of protein–protein interactions

No Thumbnail
Authors
Šumonja, Neven
Gemović, Branislava S.
Veljković, Nevena V.
Perović, Vladimir R.
Article (Published version)
,
© 2019, Springer-Verlag GmbH Austria, part of Springer Nature
Metadata
Show full item record
Abstract
Over the last decade, various machine learning (ML) and statistical approaches for protein–protein interaction (PPI) predictions have been developed to help annotating functional interactions among proteins, essential for our system-level understanding of life. Efficient ML approaches require informative and non-redundant features. In this paper, we introduce novel types of expert-crafted sequence, evolutionary and graph features and apply automatic feature engineering to further expand feature space to improve predictive modeling. The two-step automatic feature-engineering process encompasses the hybrid method for feature generation and unsupervised feature selection, followed by supervised feature selection through a genetic algorithm (GA). The optimization of both steps allows the feature-engineering procedure to operate on a large transformed feature space with no considerable computational cost and to efficiently provide newly engineered features. Based on GA and correlation filte...ring, we developed a stacking algorithm GA-STACK for automatic ensembling of different ML algorithms to improve prediction performance. We introduced a unified method, HP-GAS, for the prediction of human PPIs, which incorporates GA-STACK and rests on both expert-crafted and 40% of newly engineered features. The extensive cross validation and comparison with the state-of-the-art methods showed that HP-GAS represents currently the most efficient method for proteome-wide forecasting of protein interactions, with prediction efficacy of 0.93 AUC and 0.85 accuracy. We implemented the HP-GAS method as a free standalone application which is a time-efficient and easy-to-use tool. HP-GAS software with supplementary data can be downloaded from: http://www.vinca.rs/180/tools/HP-GAS.php. © 2019, Springer-Verlag GmbH Austria, part of Springer Nature.

Keywords:
Protein-protein interactions / Human proteome / Graph / Sequence / Evolutionary features / Machine learning
Source:
Amino Acids, 2019, 51, 8, 1187-1200
Funding / projects:
  • Application of the EIIP/ISM bioinformatics platform in discovery of novel therapeutic targets and potential therapeutic molecules (RS-173001)

DOI: 10.1007/s00726-019-02756-9

ISSN: 0939-4451; 1438-2199

PubMed: 31278492

WoS: 000480488100007

Scopus: 2-s2.0-85068834039
[ Google Scholar ]
12
7
URI
https://vinar.vin.bg.ac.rs/handle/123456789/8395
Collections
  • Radovi istraživača
Institution/Community
Vinča
TY  - JOUR
AU  - Šumonja, Neven
AU  - Gemović, Branislava S.
AU  - Veljković, Nevena V.
AU  - Perović, Vladimir R.
PY  - 2019
UR  - https://vinar.vin.bg.ac.rs/handle/123456789/8395
AB  - Over the last decade, various machine learning (ML) and statistical approaches for protein–protein interaction (PPI) predictions have been developed to help annotating functional interactions among proteins, essential for our system-level understanding of life. Efficient ML approaches require informative and non-redundant features. In this paper, we introduce novel types of expert-crafted sequence, evolutionary and graph features and apply automatic feature engineering to further expand feature space to improve predictive modeling. The two-step automatic feature-engineering process encompasses the hybrid method for feature generation and unsupervised feature selection, followed by supervised feature selection through a genetic algorithm (GA). The optimization of both steps allows the feature-engineering procedure to operate on a large transformed feature space with no considerable computational cost and to efficiently provide newly engineered features. Based on GA and correlation filtering, we developed a stacking algorithm GA-STACK for automatic ensembling of different ML algorithms to improve prediction performance. We introduced a unified method, HP-GAS, for the prediction of human PPIs, which incorporates GA-STACK and rests on both expert-crafted and 40% of newly engineered features. The extensive cross validation and comparison with the state-of-the-art methods showed that HP-GAS represents currently the most efficient method for proteome-wide forecasting of protein interactions, with prediction efficacy of 0.93 AUC and 0.85 accuracy. We implemented the HP-GAS method as a free standalone application which is a time-efficient and easy-to-use tool. HP-GAS software with supplementary data can be downloaded from: http://www.vinca.rs/180/tools/HP-GAS.php. © 2019, Springer-Verlag GmbH Austria, part of Springer Nature.
T2  - Amino Acids
T1  - Automated feature engineering improves prediction of protein–protein interactions
VL  - 51
IS  - 8
SP  - 1187
EP  - 1200
DO  - 10.1007/s00726-019-02756-9
ER  - 
@article{
author = "Šumonja, Neven and Gemović, Branislava S. and Veljković, Nevena V. and Perović, Vladimir R.",
year = "2019",
abstract = "Over the last decade, various machine learning (ML) and statistical approaches for protein–protein interaction (PPI) predictions have been developed to help annotating functional interactions among proteins, essential for our system-level understanding of life. Efficient ML approaches require informative and non-redundant features. In this paper, we introduce novel types of expert-crafted sequence, evolutionary and graph features and apply automatic feature engineering to further expand feature space to improve predictive modeling. The two-step automatic feature-engineering process encompasses the hybrid method for feature generation and unsupervised feature selection, followed by supervised feature selection through a genetic algorithm (GA). The optimization of both steps allows the feature-engineering procedure to operate on a large transformed feature space with no considerable computational cost and to efficiently provide newly engineered features. Based on GA and correlation filtering, we developed a stacking algorithm GA-STACK for automatic ensembling of different ML algorithms to improve prediction performance. We introduced a unified method, HP-GAS, for the prediction of human PPIs, which incorporates GA-STACK and rests on both expert-crafted and 40% of newly engineered features. The extensive cross validation and comparison with the state-of-the-art methods showed that HP-GAS represents currently the most efficient method for proteome-wide forecasting of protein interactions, with prediction efficacy of 0.93 AUC and 0.85 accuracy. We implemented the HP-GAS method as a free standalone application which is a time-efficient and easy-to-use tool. HP-GAS software with supplementary data can be downloaded from: http://www.vinca.rs/180/tools/HP-GAS.php. © 2019, Springer-Verlag GmbH Austria, part of Springer Nature.",
journal = "Amino Acids",
title = "Automated feature engineering improves prediction of protein–protein interactions",
volume = "51",
number = "8",
pages = "1187-1200",
doi = "10.1007/s00726-019-02756-9"
}
Šumonja, N., Gemović, B. S., Veljković, N. V.,& Perović, V. R.. (2019). Automated feature engineering improves prediction of protein–protein interactions. in Amino Acids, 51(8), 1187-1200.
https://doi.org/10.1007/s00726-019-02756-9
Šumonja N, Gemović BS, Veljković NV, Perović VR. Automated feature engineering improves prediction of protein–protein interactions. in Amino Acids. 2019;51(8):1187-1200.
doi:10.1007/s00726-019-02756-9 .
Šumonja, Neven, Gemović, Branislava S., Veljković, Nevena V., Perović, Vladimir R., "Automated feature engineering improves prediction of protein–protein interactions" in Amino Acids, 51, no. 8 (2019):1187-1200,
https://doi.org/10.1007/s00726-019-02756-9 . .

DSpace software copyright © 2002-2015  DuraSpace
About the VinaR Repository | Send Feedback

OpenAIRERCUB
 

 

All of DSpaceCommunitiesAuthorsTitlesSubjectsThis institutionAuthorsTitlesSubjects

Statistics

View Usage Statistics

DSpace software copyright © 2002-2015  DuraSpace
About the VinaR Repository | Send Feedback

OpenAIRERCUB