In order to be able to easily understand the final model of the weakly-supervised approach we trained a simple Machine Learning model: a logistic regression28. Logistic Regression is probably the simplest yet powerful algorithm for binary classification. It has the advantage of being easily comprehensible thanks to its closed form prediction function
with p being the probability of being an atypia, the input features, the sigmoid function and the learnt parameters. Before training, features are normalized to get 0-mean and standard deviation of 1. This is done following the safe cross validation procedure in order to get reliable results : standardization is done with training statistics and the validation set is transformed accordingly. As we have weak labels, the default scikit-learn’s35 parameters are used without need for hyper-parameter tuning nor risks of overfitting.
Using 5 fold cross-validation at image level, we can predict for each nucleus an atypia score between 0 and 1. Having a continuous probability as atypia score allows to set up a binary threshold to detect atypia. The choice of this threshold could allow us to sharpen our definition of atypia, for example a threshold of 0.5 will consider more nuclei as atypical when a threshold of 0.9 will only detect the most atypical nuclei. The results shown in Fig. 4 are obtained using a simple averaging of all cells predictions, without thresholding.
Moreover, we can use the fitted parameters of the logistic regression to understand what atypia means for the trained model. Figure 5a shows the model weights: the larger the absolute value the more impact the feature has on the final decision. Positive weights mean positively correlated impact (a larger value of the feature means a larger atypia score) while negative weights mean the opposite. We can understand in depth how atypia is defined by the model. Cytoplasm ratio (border to border distance divided by center to center distance), nucleus volume and neighbours volumes are positively correlated to atypia while neighbours distance, number of neighbours and volume over compactness are negatively correlated to atypia.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.