Model design

Naveen Subhas; Hongyu Li; Mingrui Yang; Carl S. Winalski; Joshua Polster; Nancy Obuchowski; Kenji Mamoto; Ruiying Liu; Chaoyi Zhang; Peizhou Huang; Sunil Kumar Gaire; Dong Liang; Bowen Shen; Xiaojuan Li; Leslie Ying

Improve Research Reproducibility A Bio-protocol resource

Home
Protocols

Concise Method

Model design

NS Naveen Subhas

HL Hongyu Li

MY Mingrui Yang

CW Carl S. Winalski

JP Joshua Polster

NO Nancy Obuchowski

KM Kenji Mamoto

RL Ruiying Liu

CZ Chaoyi Zhang

PH Peizhou Huang

SG Sunil Kumar Gaire

DL Dong Liang

BS Bowen Shen

XL Xiaojuan Li

LY Leslie Ying

This method is extracted from research article: Quant Imaging Med Surg, Sep 2020

Diagnostic interchangeability of deep convolutional neural networks reconstructed knee MR images: preliminary experience

DOI: 10.21037/qims-20-664

Request a Protocol

Ask a question

Favorite

The goal of DL-based reconstruction is to obtain an image from undersampled k-space data with an image quality comparable to that of an image obtained from a fully sampled data set. Using CNN, the typical practice is to first perform the Fourier transform on the zero-filled k-space data to obtain an aliased image and to then use training data to find the CNN that maps the aliased image (input x) to the reconstructed image (output y). This nonlinear mapping can be represented as y=F(x;θ), where θ represents the parameters of the network. Figure 1 illustrates the architecture of the CNN.

CNN architecture. The DL module comprises a skip-connection-based CNN. n64k3s1p1 indicates 64 filters of kernel size 3 with stride of 1 and padding of 1. Except for the last layer, each convolutional layer is followed by a BN and a ReLU. CNN, convolutional neural network; DL, deep learning; BN, batch normalization; ReLU, rectified linear unit.

In each layer, a convolution between the image from the previous layer and a set of filters was performed using the equation $H_{l} = σ_{l} (W_{l} * H_{l - l} + b_{l})$ , where W_l and B_l represent the filters and biases, respectively, * denotes the convolution operation, and σ_l is a nonlinear operator. W_l corresponds to n_l filters of support n_l_–1 × f × f, where n_l_–1 is the number of channels in the previous layer and f is the spatial size of a filter. Here we chose n=64 and f=3 for all layers except for the last layer; for the last layer, we chose n=1. We used rectified linear unit (ReLU) for σ_l, applying max (0, *) on the filter responses. The output H_l is composed of n_l feature maps.

During training, the objective was to learn the nonlinear relationship, or more specifically, the parameter θ, which represents all of the filter coefficients and biases. Learning was achieved through minimizing the loss function between the network prediction and the corresponding ground truth data. For MR image reconstruction, given a set of ground truth images y_i and their corresponding undersampled k-space data x_i, we used mean squared error (MSE) between them as the loss function as follows: $L (θ) = \sum_{i = 1}^{t} ∥ F (x_{i}; θ) - y_{i} ∥^{2}$ , where t is the number of training samples. To increase convergence speed and account for vanishing gradient for deep neural networks, a skip connection was added and the loss function was minimized using a residual learning algorithm (14,15). Weight decay was set to prevent overfitting and to improve generalization (16).

For training, we used 2D fat-suppressed sagittal intermediate-weighted turbo spin-echo images and 2D non-fat-suppressed coronal intermediate-weighted turbo spin-echo images from the baseline visits of 212 randomly selected patients (424 knee MRI examinations) enrolled in the OAI. This training cohort consisted of 59 men and 153 women (mean age, 64 years; range, 45–79 years). The sequence parameters for the sagittal images were as follows: repetition time (TR) =3,000 ms, echo time (TE) =30 ms, resolution =0.36 mm × 0.46 mm × 3 mm, and original image size =384×307. The sequence parameters for the coronal images were as follows: TR =3,000 ms, TE =29 ms, resolution =0.36 mm × 0.46 mm × 3 mm, and original image size =384×307.

Multiple models were trained to determine the optimal model design. All models were trained from 5,000 randomly selected images and tested on a separate 500 randomly selected images.

First, to determine the best model performance with regard to the number of convolutional layers, models were constructed with different layers (3, 10, 15, 20, and 25), and the error rates of these models were compared to select the model with the optimal number of layers (DCNN).

Second, to analyze whether it is necessary to train models with images in each imaging plane, coronal images were reconstructed from models trained from only coronal images (COR model), and these images were compared to coronal images reconstructed from models trained from only sagittal images (SAG model).

Lastly, to analyze whether it is necessary to train models with pathologic lesions, models were trained using images from patients with different degrees of osteoarthritis (OA). Specifically, 3 different models were trained using sagittal data from 2,700 patients: a “healthy” model using images from patients without OA [Kellgren-Lawrence (KL) grade =0], a “mixed” model using images from patients without OA (KL =0) and with OA (KL =2–4), and a “lesion” model using images from patients with OA (KL =2–4).

For training each model, all images were rotated 90 degrees 3 times to augment the training size. The entire dataset was normalized to a constant range based on the maximum intensity of the dataset. For k-space undersampling, a variable density sampling with an AF of 6 was used. All data were shuffled before training.

The same hyperparameters and training iterations were used for the CNNs of different layers to ensure a fair comparison between the algorithms. Specifically, a learning rate of 0.0001, momentum of 0.0, and weight decay of 0.0001 were used. It is worth noting that although smaller learning rates are preferred for deeper networks with more layers, we used a fixed learning rate because it is not trivial to determine the rate with respect to the number of layers. Padding was used with each layer during training to generate an output size that was the same as the input size. Each network was trained for 1 million iterations (training time typically 8 hours) on Caffe for Windows using a Matlab interface with two NIVIDIA GTX 1080Ti graphic processing units, each with 3,584 CUDA cores and 11 GB memory. Once fully trained, the models were able to reconstruct images from undersampled data with an average reconstruction time of 0.0001 s/image.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

0/150

tip Tips for asking effective questions

+ Description

Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.

Post a Question

0 Q&A

Share your protocol with your peers.

Submit a Preprint Protocol