Automated feature engineering improves prediction of protein–protein interactions
No Thumbnail
Article (Published version)

© 2019, Springer-Verlag GmbH Austria, part of Springer Nature
Metadata
Show full item recordAbstract
Over the last decade, various machine learning (ML) and statistical approaches for protein–protein interaction (PPI) predictions have been developed to help annotating functional interactions among proteins, essential for our system-level understanding of life. Efficient ML approaches require informative and non-redundant features. In this paper, we introduce novel types of expert-crafted sequence, evolutionary and graph features and apply automatic feature engineering to further expand feature space to improve predictive modeling. The two-step automatic feature-engineering process encompasses the hybrid method for feature generation and unsupervised feature selection, followed by supervised feature selection through a genetic algorithm (GA). The optimization of both steps allows the feature-engineering procedure to operate on a large transformed feature space with no considerable computational cost and to efficiently provide newly engineered features. Based on GA and correlation filte...ring, we developed a stacking algorithm GA-STACK for automatic ensembling of different ML algorithms to improve prediction performance. We introduced a unified method, HP-GAS, for the prediction of human PPIs, which incorporates GA-STACK and rests on both expert-crafted and 40% of newly engineered features. The extensive cross validation and comparison with the state-of-the-art methods showed that HP-GAS represents currently the most efficient method for proteome-wide forecasting of protein interactions, with prediction efficacy of 0.93 AUC and 0.85 accuracy. We implemented the HP-GAS method as a free standalone application which is a time-efficient and easy-to-use tool. HP-GAS software with supplementary data can be downloaded from: http://www.vinca.rs/180/tools/HP-GAS.php. © 2019, Springer-Verlag GmbH Austria, part of Springer Nature.
Keywords:
Protein-protein interactions / Human proteome / Graph / Sequence / Evolutionary features / Machine learningSource:
Amino Acids, 2019, 51, 8, 1187-1200Funding / projects:
DOI: 10.1007/s00726-019-02756-9
ISSN: 0939-4451; 1438-2199
PubMed: 31278492
WoS: 000480488100007
Scopus: 2-s2.0-85068834039
Collections
Institution/Community
VinčaTY - JOUR AU - Šumonja, Neven AU - Gemović, Branislava S. AU - Veljković, Nevena V. AU - Perović, Vladimir R. PY - 2019 UR - https://vinar.vin.bg.ac.rs/handle/123456789/8395 AB - Over the last decade, various machine learning (ML) and statistical approaches for protein–protein interaction (PPI) predictions have been developed to help annotating functional interactions among proteins, essential for our system-level understanding of life. Efficient ML approaches require informative and non-redundant features. In this paper, we introduce novel types of expert-crafted sequence, evolutionary and graph features and apply automatic feature engineering to further expand feature space to improve predictive modeling. The two-step automatic feature-engineering process encompasses the hybrid method for feature generation and unsupervised feature selection, followed by supervised feature selection through a genetic algorithm (GA). The optimization of both steps allows the feature-engineering procedure to operate on a large transformed feature space with no considerable computational cost and to efficiently provide newly engineered features. Based on GA and correlation filtering, we developed a stacking algorithm GA-STACK for automatic ensembling of different ML algorithms to improve prediction performance. We introduced a unified method, HP-GAS, for the prediction of human PPIs, which incorporates GA-STACK and rests on both expert-crafted and 40% of newly engineered features. The extensive cross validation and comparison with the state-of-the-art methods showed that HP-GAS represents currently the most efficient method for proteome-wide forecasting of protein interactions, with prediction efficacy of 0.93 AUC and 0.85 accuracy. We implemented the HP-GAS method as a free standalone application which is a time-efficient and easy-to-use tool. HP-GAS software with supplementary data can be downloaded from: http://www.vinca.rs/180/tools/HP-GAS.php. © 2019, Springer-Verlag GmbH Austria, part of Springer Nature. T2 - Amino Acids T1 - Automated feature engineering improves prediction of protein–protein interactions VL - 51 IS - 8 SP - 1187 EP - 1200 DO - 10.1007/s00726-019-02756-9 ER -
@article{ author = "Šumonja, Neven and Gemović, Branislava S. and Veljković, Nevena V. and Perović, Vladimir R.", year = "2019", abstract = "Over the last decade, various machine learning (ML) and statistical approaches for protein–protein interaction (PPI) predictions have been developed to help annotating functional interactions among proteins, essential for our system-level understanding of life. Efficient ML approaches require informative and non-redundant features. In this paper, we introduce novel types of expert-crafted sequence, evolutionary and graph features and apply automatic feature engineering to further expand feature space to improve predictive modeling. The two-step automatic feature-engineering process encompasses the hybrid method for feature generation and unsupervised feature selection, followed by supervised feature selection through a genetic algorithm (GA). The optimization of both steps allows the feature-engineering procedure to operate on a large transformed feature space with no considerable computational cost and to efficiently provide newly engineered features. Based on GA and correlation filtering, we developed a stacking algorithm GA-STACK for automatic ensembling of different ML algorithms to improve prediction performance. We introduced a unified method, HP-GAS, for the prediction of human PPIs, which incorporates GA-STACK and rests on both expert-crafted and 40% of newly engineered features. The extensive cross validation and comparison with the state-of-the-art methods showed that HP-GAS represents currently the most efficient method for proteome-wide forecasting of protein interactions, with prediction efficacy of 0.93 AUC and 0.85 accuracy. We implemented the HP-GAS method as a free standalone application which is a time-efficient and easy-to-use tool. HP-GAS software with supplementary data can be downloaded from: http://www.vinca.rs/180/tools/HP-GAS.php. © 2019, Springer-Verlag GmbH Austria, part of Springer Nature.", journal = "Amino Acids", title = "Automated feature engineering improves prediction of protein–protein interactions", volume = "51", number = "8", pages = "1187-1200", doi = "10.1007/s00726-019-02756-9" }
Šumonja, N., Gemović, B. S., Veljković, N. V.,& Perović, V. R.. (2019). Automated feature engineering improves prediction of protein–protein interactions. in Amino Acids, 51(8), 1187-1200. https://doi.org/10.1007/s00726-019-02756-9
Šumonja N, Gemović BS, Veljković NV, Perović VR. Automated feature engineering improves prediction of protein–protein interactions. in Amino Acids. 2019;51(8):1187-1200. doi:10.1007/s00726-019-02756-9 .
Šumonja, Neven, Gemović, Branislava S., Veljković, Nevena V., Perović, Vladimir R., "Automated feature engineering improves prediction of protein–protein interactions" in Amino Acids, 51, no. 8 (2019):1187-1200, https://doi.org/10.1007/s00726-019-02756-9 . .