Abstract
Background: Hospital readmission is a critical metric of healthcare quality and cost. Machine learning (ML) models have shown promise in predicting readmission risk, yet comparative performance across diverse patient populations and algorithms remains underexplored. Methods: We conducted a retrospective cohort study using electronic health records from a large urban hospital (2018–2022). After preprocessing, 12,347 admissions were included. Six ML algorithms—logistic regression, random forest, gradient boosting, support vector machine, k-nearest neighbors, and a deep neural network—were trained to predict 30-day all-cause readmission. Model performance was evaluated using area under the receiver operating characteristic curve (AUC), sensitivity, specificity, and calibration. Results: Gradient boosting achieved the highest AUC (0.84), followed by random forest (0.82) and deep neural network (0.81). Logistic regression had the lowest AUC (0.72). Sensitivity ranged from 0.65 (logistic regression) to 0.79 (gradient boosting). Key predictors included number of prior admissions, comorbidity index, length of stay, and discharge disposition. Conclusions: Ensemble and deep learning methods outperform traditional logistic regression for readmission prediction. However, model interpretability and clinical integration remain challenges. Future work should focus on external validation and fairness across demographic subgroups.