Unpaired image to image problem is an important problem. In many cases, building a paired data set takes a lot of time and work, but using unpaired image to image translation can avoid time-consuming data collection. The CycleGAN proposed by Zhu et al. [15] is a good model to solve such problems. Zhu et al. proposed a cycle consistency loss, ‖FX⟶Y (GY⟶X (x)) − x‖1, ‖GY⟶X (FX⟶Y (x)) − x‖1 finished the translation of the domain X and the domain Y. Using this unsupervised method can effectively achieve conversion between two distributed data. But as mentioned above, the data distribution between our existing two domains is extremely uneven. In our experiment, we used the CycleGAN model to complete the data construction, but the results were very disappointing.
Figure 2(a) is the result of training using the CycleGAN model. We can see that there are a large number of sharp holes in the image. The input stained image's positions corresponding to these holes have normal textures, indicating that this model cannot effectively learn the features corresponding hole domain. We then introduce a large number of data augmentation methods based on the original CycleGAN, such as random jitter, random horizontal and vertical flips, random jitter rotation (first interpolate and zoom, then randomly rotate a small angle, and finally crop to the original image size), and elastic deformation [16]. We can observe the results as shown in Figure 2(b). The number of cavities is obviously reduced, and the overall image quality has been slightly improved, but the existence of cavities is still not completely resolved.
Comparison of the results of using different trained models to construct data sets: (a) results generated by CycleGAN model; (b) results synthesized by the improved CycleGAN model; (c) enlarge some cavity area of the synthesized image; (d) results after introducing domain consistency network training; (e) result of using our modified generator structure and domain consistency network.
We zoomed on the position of the hole as shown in Figure 2(c). It can be found that the generator and discriminator in these positions did not play their role at all. We hypothesis that the feature distribution of the unstained image domain is sparser than the stained image domain's. The essence of domain to domain translation is to learn the features of images in a domain and then translate these features based on the supervision of the image feature rules of the target domain. Based on the above assumptions, when converting from a stained domain to an unstained domain, the features of the stained domain learned by the generator may be difficult to be reasonably represented by the limited positive sample image features of the unstained domain distribution. This will affect the discriminator's training on the area where these features are located. The generator synthesized a black hole in this area, which can make the discriminator think it is true, resulting in a train mode collapse in this area. With continuous training, the effect in areas with sufficient supervised positive sample features is getting better and better; this area remains unchanged, and the black hole continues to become obvious.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.