Logo image
Development of machine learning model for predicting possible side-effects of computationally synthesized lead hypertension drugs molecules
Thesis   Open access

Development of machine learning model for predicting possible side-effects of computationally synthesized lead hypertension drugs molecules

Takudzwa Ndhlovu
Master of Artificial Intelligence, University of Johannesburg
2024
Handle:
https://hdl.handle.net/10210/519312

Abstract

Hypertension is a critical public health challenge that affects over one billion adults globally. While medications and treatment continue to improve, only 20% of those with hypertension are successfully managing their condition. A significant factor contributing to this issue is the debilitating side effects associated with antihypertensive drugs, such as headaches, heart palpitations, hypotension, and hyperkalemia. These adverse effects often lead to poor patient adherence to prescribed treatments. With the advent of artificial intelligence (AI) in computational drug discovery, a large number of novel lead antihypertensive drug molecules are being generated. However, predicting the potential side effects of these newly synthesized compounds remains a significant challenge, especially given the complexity of biological interactions and the limited availability of extensive data on these molecules. Traditional methods for evaluating drug side effects are time-consuming, expensive, and only applicable in the later stages of drug development. This research addresses these challenges by developing a machine learning model, specifically a gradient boosting classifier, to predict the side effects of AI-generated hypertension drug candidates early in the drug development process. The model was trained on engineered features derived from molecular structures, including functional groups and molecular properties, and leveraged SMOTE oversampling to address data imbalance. Using cross-validation for evaluation demonstrated the model’s strong performance, with high recall and AUC-ROC scores across multiple side effect categories. Key findings include the identification of polar surface area and hydrogen bond donors as significant predictors of adverse effects. The gradient boosting classifier outperformed the baseline random forest model, achieving an average F1 score of 87.22%, with a 34.43% improvement in AUC-ROC after oversampling. Functional group analysis revealed key insights into chemical predictors of side effects, with groups such as phenols and carboxylic acids prominently influencing multiple adverse conditions. When tested against real-world hypertension drug data, the model demonstrated strong predictive capabilities for respiratory and cardiovascularside effects. The research believes that streamlining side effect prediction contributes to the potential for safer and more effective antihypertensive drugs.
pdf
T_Ndhlovu_2170010632.77 MBDownloadView
Open Access

Metrics

1 Record Views

Details

Logo image