To train multimodal models and perform radiograph case–control matching, we needed to handle missing data in numerous PT and HP variable fields. For categorical variables, missing entries were replaced by an explicit “(Missing)” value. The only PT variable with missing data was BMI. To impute BMI, we trained linear regression models on the subset of data with available BMIs using each combination of predictor variables, possibly with imputed HP variables (see Supplementary Table 20). We used the model with all predictor sets and imputed HP variables to impute all missing BMI entries. For other continuous variables, we simply performed median imputation.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.