2.5. Model Training Methods

Junwen Deng; Yuhang Liu; Xinqing Xiao

Improve Research Reproducibility A Bio-protocol resource

Home
Protocols

Concise Method

2.5. Model Training Methods

JD Junwen Deng

YL Yuhang Liu

XX Xinqing Xiao

This method is extracted from research article: Sensors (Basel), Jun 2022

Deep-Learning-Based Wireless Visual Sensor System for Shiitake Mushroom Sorting

DOI: 10.3390/s22124606

Ask a question

Favorite

All the experiments in this study were carried out on a computer equipped with an Intel(R) Xeon(R) Gold 6240 CPU @ 2.60 GHz, an NVIDIA Tesla V100 32 GB GPU, and 32 GB of RAM. The software environments in this study were Python 3.9, Torch 1.11.0, and CUDA 11.0. The operating system was Linux.

Considering the actual production and application situation, we adopted the transfer-learning method to reduce the time cost of training. The models in this paper were pretrained on the public dataset ImageNet-21k (14 million images; 21,843 classes).

During the experiment, 40 epochs were set for the finetuning of the model with a batch size of 64. Adam was used as the optimizer for the gradient descent calculation, and the learning rate was set to $1 \times 10^{- 3}$ , with 10 epochs set to the warmup learning rate. After training for 30 epochs, we decayed the learning rate to 0.1 of its original one. The probability taken by Random Erasing during the training process was 0.5, which meant 50% of the input images went through Random Erasing. The label-smoothing factor α was set to 0.1.

To evaluate the progress of our model training, we applied the cross-entropy loss function, which was the measurement of the difference between two probability distributions. It has been widely used in the field of machine learning and deep learning [20]. In model training, the cross-entropy loss function was used to represent the closeness of the predicted data distribution to the actual data distribution, as defined below in Equation (6):

where $M$ represents the number of categories, and $y_{i c}$ is the sample’s label. If the true category of sample $i$ is equal to $c$ before using label smoothing, $y_{i c}$ equals 1; otherwise, it equals 0. $p_{i c}$ represents the predicted probability that the observed sample $i$ belongs to category $c$ .

Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

Post a Question

0 Q&A

Share your protocol with your peers.

Submit a Preprint Protocol