Abstract:Objective To develop a risk prediction model for postpartum depression (PPD), and to identify the predictors. Methods A total of 835 women who gave birth in hospital were selected, and divided into a training set of 722 women and a test set of 113 ones according to the time period.The outcome variable was defined as the occurrence of PPD at 6 weeks. Three supervised machine learning algorithms, namely logistic regression, support vector machine and random forest, were used to build risk prediction models, and the features were screened by using the sequence forward selection method, and the model parameters were adjusted by using the grid search method.The trained model was subjected to ten-fold cross-validation on the training set and external validation on the test set. Results The overall incidence of PPD at 6 weeks was 22.6%(189/835).Fourteen predictors were eventually included.Among the three supervised learning models, the random forest model had the best prediction performance, with the area under the receiver operator characteristic curve, Brier score, accuracy, precision, recall and F1 values of 0.943、0.073、0.903、0.684、0.722 and 0.703. Conclusion The prediction model based on random forest algorithm can help health care workers to identify women at high risk of PPD.