Abstract
Hypertension is a critical public health challenge that affects over one billion adults
globally. While medications and treatment continue to improve, only 20% of those
with hypertension are successfully managing their condition. A significant factor contributing
to this issue is the debilitating side effects associated with antihypertensive
drugs, such as headaches, heart palpitations, hypotension, and hyperkalemia. These
adverse effects often lead to poor patient adherence to prescribed treatments.
With the advent of artificial intelligence (AI) in computational drug discovery, a large
number of novel lead antihypertensive drug molecules are being generated. However,
predicting the potential side effects of these newly synthesized compounds remains a
significant challenge, especially given the complexity of biological interactions and
the limited availability of extensive data on these molecules. Traditional methods for
evaluating drug side effects are time-consuming, expensive, and only applicable in
the later stages of drug development.
This research addresses these challenges by developing a machine learning model,
specifically a gradient boosting classifier, to predict the side effects of AI-generated
hypertension drug candidates early in the drug development process. The model was
trained on engineered features derived from molecular structures, including functional
groups and molecular properties, and leveraged SMOTE oversampling to address
data imbalance. Using cross-validation for evaluation demonstrated the model’s
strong performance, with high recall and AUC-ROC scores across multiple side effect
categories. Key findings include the identification of polar surface area and hydrogen
bond donors as significant predictors of adverse effects. The gradient boosting
classifier outperformed the baseline random forest model, achieving an average
F1 score of 87.22%, with a 34.43% improvement in AUC-ROC after oversampling.
Functional group analysis revealed key insights into chemical predictors of side effects,
with groups such as phenols and carboxylic acids prominently influencing multiple
adverse conditions. When tested against real-world hypertension drug data, the
model demonstrated strong predictive capabilities for respiratory and cardiovascularside effects. The research believes that streamlining side effect prediction contributes
to the potential for safer and more effective antihypertensive drugs.