All the experiments in this study were carried out on a computer equipped with an Intel(R) Xeon(R) Gold 6240 CPU @ 2.60 GHz, an NVIDIA Tesla V100 32 GB GPU, and 32 GB of RAM. The software environments in this study were Python 3.9, Torch 1.11.0, and CUDA 11.0. The operating system was Linux.
Considering the actual production and application situation, we adopted the transfer-learning method to reduce the time cost of training. The models in this paper were pretrained on the public dataset ImageNet-21k (14 million images; 21,843 classes).
During the experiment, 40 epochs were set for the finetuning of the model with a batch size of 64. Adam was used as the optimizer for the gradient descent calculation, and the learning rate was set to , with 10 epochs set to the warmup learning rate. After training for 30 epochs, we decayed the learning rate to 0.1 of its original one. The probability taken by Random Erasing during the training process was 0.5, which meant 50% of the input images went through Random Erasing. The label-smoothing factor α was set to 0.1.
To evaluate the progress of our model training, we applied the cross-entropy loss function, which was the measurement of the difference between two probability distributions. It has been widely used in the field of machine learning and deep learning [20]. In model training, the cross-entropy loss function was used to represent the closeness of the predicted data distribution to the actual data distribution, as defined below in Equation (6):
where represents the number of categories, and is the sample’s label. If the true category of sample is equal to before using label smoothing, equals 1; otherwise, it equals 0. represents the predicted probability that the observed sample belongs to category .
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.