Abstract:Objective To construct predictive models for the onset of diabetic foot in type 2 diabetes patients using five machine lear-ning algorithms, to select the optimal performing model, and to provide evidence for healthcare workers to early and accurately identify high-risk individuals for diabetic foot.Methods Through literature review and expert consultation, a list of risk factors for diabetic foot ulcer was formulated to create a questionnaire.A total of 984 patients with type 2 diabetes who were admitted from March 2018 to October 2021 and received follow-up management were selected.Data collection was conducted, and the predictive variables were screened using the Lasso regression method.The patients were randomly divided into a training set of 787 and a validation set of 197 patients in a ratio of 8∶2.The training set used logistic regression, decision trees, support vector machines, random forests, and extreme gradient boosting to build models, and the validation set was internally validated.The optimal model was determined based on a comprehensive evaluation of the area under the receiver operating characteristic curve (AUC), and F1 score.A risk scoring table for diabetic foot ulcer in type 2 diabetes patients was constructed and validated based on the optimal model.Results The incidence rate of diabetic foot ulcers in the training set stood at 22.05%(217 cases).Lasso regression identified 8 predictors, including age, total cholesterol, smoking, tingling pain, cold and wet skin on the foot, foot deformity, toenail deformity, and footwear discomfort.The results showed that the AUC of the random forest model was 0.787, the accuracy was 0.838, the precision was 0.591, the sensitivity was 0.361, the specificity was 0.944, and the F1 score was 0.448, indicating better predictive performance than other models.The diabetic foot ulcer risk scoring table based on the random forest model had a score range of 0 to 101 points, with the optimal cut-off value of 43 points, and the AUC was 0.745.Conclusion The model built based on the random forest algorithm has the best overall prediction performance, and the diabetic foot disease risk scoring table based on this mo-del can be used for early screening of high-risk patients with diabetic foot disease.