Automated feature engineering improves prediction of protein–protein interactions
Нема приказа
Чланак у часопису (Објављена верзија)
,
© 2019, Springer-Verlag GmbH Austria, part of Springer Nature
Метаподаци
Приказ свих података о документуАпстракт
Over the last decade, various machine learning (ML) and statistical approaches for protein–protein interaction (PPI) predictions have been developed to help annotating functional interactions among proteins, essential for our system-level understanding of life. Efficient ML approaches require informative and non-redundant features. In this paper, we introduce novel types of expert-crafted sequence, evolutionary and graph features and apply automatic feature engineering to further expand feature space to improve predictive modeling. The two-step automatic feature-engineering process encompasses the hybrid method for feature generation and unsupervised feature selection, followed by supervised feature selection through a genetic algorithm (GA). The optimization of both steps allows the feature-engineering procedure to operate on a large transformed feature space with no considerable computational cost and to efficiently provide newly engineered features. Based on GA and correlation filte...ring, we developed a stacking algorithm GA-STACK for automatic ensembling of different ML algorithms to improve prediction performance. We introduced a unified method, HP-GAS, for the prediction of human PPIs, which incorporates GA-STACK and rests on both expert-crafted and 40% of newly engineered features. The extensive cross validation and comparison with the state-of-the-art methods showed that HP-GAS represents currently the most efficient method for proteome-wide forecasting of protein interactions, with prediction efficacy of 0.93 AUC and 0.85 accuracy. We implemented the HP-GAS method as a free standalone application which is a time-efficient and easy-to-use tool. HP-GAS software with supplementary data can be downloaded from: http://www.vinca.rs/180/tools/HP-GAS.php. © 2019, Springer-Verlag GmbH Austria, part of Springer Nature.
Кључне речи:
Protein-protein interactions / Human proteome / Graph / Sequence / Evolutionary features / Machine learningИзвор:
Amino Acids, 2019, 51, 8, 1187-1200Финансирање / пројекти:
- Примена EIIP/ISM биоинформатичке платформе у откривању нових терапеутских таргета и потенцијалних терапеутских молекула (RS-MESTD-Basic Research (BR or ON)-173001)
DOI: 10.1007/s00726-019-02756-9
ISSN: 0939-4451; 1438-2199
PubMed: 31278492
WoS: 000480488100007
Scopus: 2-s2.0-85068834039
Колекције
Институција/група
VinčaTY - JOUR AU - Šumonja, Neven AU - Gemović, Branislava S. AU - Veljković, Nevena V. AU - Perović, Vladimir R. PY - 2019 UR - https://vinar.vin.bg.ac.rs/handle/123456789/8395 AB - Over the last decade, various machine learning (ML) and statistical approaches for protein–protein interaction (PPI) predictions have been developed to help annotating functional interactions among proteins, essential for our system-level understanding of life. Efficient ML approaches require informative and non-redundant features. In this paper, we introduce novel types of expert-crafted sequence, evolutionary and graph features and apply automatic feature engineering to further expand feature space to improve predictive modeling. The two-step automatic feature-engineering process encompasses the hybrid method for feature generation and unsupervised feature selection, followed by supervised feature selection through a genetic algorithm (GA). The optimization of both steps allows the feature-engineering procedure to operate on a large transformed feature space with no considerable computational cost and to efficiently provide newly engineered features. Based on GA and correlation filtering, we developed a stacking algorithm GA-STACK for automatic ensembling of different ML algorithms to improve prediction performance. We introduced a unified method, HP-GAS, for the prediction of human PPIs, which incorporates GA-STACK and rests on both expert-crafted and 40% of newly engineered features. The extensive cross validation and comparison with the state-of-the-art methods showed that HP-GAS represents currently the most efficient method for proteome-wide forecasting of protein interactions, with prediction efficacy of 0.93 AUC and 0.85 accuracy. We implemented the HP-GAS method as a free standalone application which is a time-efficient and easy-to-use tool. HP-GAS software with supplementary data can be downloaded from: http://www.vinca.rs/180/tools/HP-GAS.php. © 2019, Springer-Verlag GmbH Austria, part of Springer Nature. T2 - Amino Acids T1 - Automated feature engineering improves prediction of protein–protein interactions VL - 51 IS - 8 SP - 1187 EP - 1200 DO - 10.1007/s00726-019-02756-9 ER -
@article{ author = "Šumonja, Neven and Gemović, Branislava S. and Veljković, Nevena V. and Perović, Vladimir R.", year = "2019", abstract = "Over the last decade, various machine learning (ML) and statistical approaches for protein–protein interaction (PPI) predictions have been developed to help annotating functional interactions among proteins, essential for our system-level understanding of life. Efficient ML approaches require informative and non-redundant features. In this paper, we introduce novel types of expert-crafted sequence, evolutionary and graph features and apply automatic feature engineering to further expand feature space to improve predictive modeling. The two-step automatic feature-engineering process encompasses the hybrid method for feature generation and unsupervised feature selection, followed by supervised feature selection through a genetic algorithm (GA). The optimization of both steps allows the feature-engineering procedure to operate on a large transformed feature space with no considerable computational cost and to efficiently provide newly engineered features. Based on GA and correlation filtering, we developed a stacking algorithm GA-STACK for automatic ensembling of different ML algorithms to improve prediction performance. We introduced a unified method, HP-GAS, for the prediction of human PPIs, which incorporates GA-STACK and rests on both expert-crafted and 40% of newly engineered features. The extensive cross validation and comparison with the state-of-the-art methods showed that HP-GAS represents currently the most efficient method for proteome-wide forecasting of protein interactions, with prediction efficacy of 0.93 AUC and 0.85 accuracy. We implemented the HP-GAS method as a free standalone application which is a time-efficient and easy-to-use tool. HP-GAS software with supplementary data can be downloaded from: http://www.vinca.rs/180/tools/HP-GAS.php. © 2019, Springer-Verlag GmbH Austria, part of Springer Nature.", journal = "Amino Acids", title = "Automated feature engineering improves prediction of protein–protein interactions", volume = "51", number = "8", pages = "1187-1200", doi = "10.1007/s00726-019-02756-9" }
Šumonja, N., Gemović, B. S., Veljković, N. V.,& Perović, V. R.. (2019). Automated feature engineering improves prediction of protein–protein interactions. in Amino Acids, 51(8), 1187-1200. https://doi.org/10.1007/s00726-019-02756-9
Šumonja N, Gemović BS, Veljković NV, Perović VR. Automated feature engineering improves prediction of protein–protein interactions. in Amino Acids. 2019;51(8):1187-1200. doi:10.1007/s00726-019-02756-9 .
Šumonja, Neven, Gemović, Branislava S., Veljković, Nevena V., Perović, Vladimir R., "Automated feature engineering improves prediction of protein–protein interactions" in Amino Acids, 51, no. 8 (2019):1187-1200, https://doi.org/10.1007/s00726-019-02756-9 . .