2.5 Material Attribute-Category Convolutional Neural Network Training

This protocol is extracted from research article:

Learning Medical Materials From Radiography Images

**
Front Artif Intell**,
Jun 18, 2021;
DOI:
10.3389/frai.2021.638299

Learning Medical Materials From Radiography Images

Procedure

The convolutional layers in the backbone network are pretrained on ImageNet (Deng et al., 2009) for robust feature extraction, while the fully connected layers and auxiliary network are initialized with random weights. The training process optimizes these weights with respect to the target function and allows for a faster training process than starting with random weights for the entire network. A fast training process is important if the MAC-CNN is to be used in many different expert domains with little correlation to each other.

Like the D-CNN, we reduce overfitting by saving the MAC-CNN model from the training epoch with the lowest validation-set loss, which is not necessarily the model from the final epoch. This allows for the model to be trained for more epochs while mitigating potential overfitting later in the training process. To improve the MAC-CNN’s training convergence, we also use a learning rate scheduler that reduces the learning rate by a factor of 10 following epochs where validation set loss increases.

We train the network parameters $\mathbf{\Theta}$, dependent on the material attribute-category matrix $\mathbf{A}$, to classify patches into *K* material categories and *M* material attributes simultaneously. The training set $\mathbf{X}$ is a set of *N* pairs of raw feature vectors and material category labels of the form $T=\left\{\left({\mathbf{x}}_{i},{\mathbf{y}}_{i}\right)\right\}$, where ${\mathbf{x}}_{i}$ is the raw feature vectors of image patch *i* and ${\mathbf{y}}_{i}$ is a one-hot encoded label vector for its *K* material categories. Equation 7 formalizes the definition of these training pairs.

The loss function and minimization objective for the MAC-CNN is given in Eq. 8, which follows from the loss function used in Schwartz and Nishino (2020).^{
3
} The loss function combines the negative log-likelihood of the *K* material category predictions for each image patch ${\mathbf{x}}_{i}\in T$.

The ${\gamma}_{1}$-weighted term represents the KL-divergence between the *M* material attribute predictions for ${\mathbf{x}}_{i}$ and a Beta distribution with $a,b=0.5$. The Beta distribution is again chosen as a comparison distribution for reasons like those discussed in Section 2.2.

The ${\gamma}_{2}$-weighted term constrains the loss to the material attributes encoded in the $\mathbf{A}$ matrix. The term represents the mean squared error between rows of $\mathbf{A}$, where each row represents one category’s probability distribution of attributes, and the material attribute predictions on the samples ${T}_{k}$ for each category.

The hyperparameters ${\gamma}_{1}$, ${\gamma}_{2}$ assign weights to their respective loss terms and are chosen at training time.

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

Note: The content above has been extracted from a research article, so it may not display correctly.

Q&A

Your question will be posted on the Bio-101 website. We will send your questions to the authors of this protocol and Bio-protocol community members who are experienced with this method. you will be informed using the email address associated with your Bio-protocol account.