2.2.2. Discriminator

SC Sung In Cho
JP Jae Hyeon Park
SK Suk-Ju Kang
request Request a Protocol
ask Ask a question
Favorite

In the general GAN [17], the following adversarial min-max problem is used for training:

where IGT and IN are a ground-truth image and an input noisy image, respectively. Ptrain and PG are the data distributions of the ground-truth image and resultant image by the G. D (·) denotes the output of the D, which indicates the probability that the current input is the ground-truth. The G(IN) denotes the output of the G for a given noisy image, thus, it is the restored image from a noisy image by the G. Therefore, the D is trained so that D(IGT) is close to 1 and so that D(G(IN)) is close to 0.

We utilize this training process of the general GAN for the training of the proposed method. The general CNN denoiser [1,10] is trained using MSE between the ground-truth and restored images. In this case, some small structural information, such as weak edges or texture, can be lost because the training is performed only in the direction of reducing MSE of the entire image. We alleviate this problem by incorporating MSE and the gradient-based structural loss that can be adjusted by the result of the D. In the proposed method, the D uses the gradients of a given ground-truth image (XGT) and the gradients of the restored images (XY) as an input as shown in Figure 1b so that it can estimate the restoration quality of gradient information of the G. (IGT and G(IN) in Equation (3) is changed to XGT and XY, respectively.) For example, a high D(XGT) and a low D(XY) indicate that the performance of gradient information restoration of the G is lower than the classification accuracy of the D. For this case, the strength of structural loss is increased for the training of the G, while in the opposite case, MSE-based loss is increased for the training of the G. Through this training strategy, the proposed GAN reproduces the structural information most similar to that of the ground-truth image while maintaining the quality of noise suppression in smooth regions. This loss function and training process used for the proposed method will be described in detail in Section 2.3.

Table 2 shows the structure of the proposed D. The D is composed of 13 convolution layers as in G, and BN and ReLU are applied between the two consecutive convolution layers. After the 13th convolution layer, two dense layers are connected. Finally, the sigmoid activation function is applied to extract the scalar probability value that the input image is the original noise-free image. Because the G is intended to deduce the original pixel value from a noisy input patch, whereas the D is intended to determine the probability that the input patch is the original patch, we consider that the problem difficulty of the G is higher than that of the D. Hence, we set the size of the future channel of the D (FD1 and FD2) to 1/3 that of the G, so that we can balance the performances between the G and the D.

The structure of a discriminator.

1 N: the number of pixels in the resultant image of the 13th layer.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

0/150

tip Tips for asking effective questions

+ Description

Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.

post Post a Question
0 Q&A