3.6.4. Why Were Leaky Rectified Linear Unit and Mish Used as the Activation Functions for the YOLOv4 Models?

Addie Ira Borja Parico; Tofael Ahamed

Improve Research Reproducibility A Bio-protocol resource

Home
Protocols

Concise Method

3.6.4. Why Were Leaky Rectified Linear Unit and Mish Used as the Activation Functions for the YOLOv4 Models?

AP Addie Ira Borja Parico

TA Tofael Ahamed

This method is extracted from research article: Sensors (Basel), Jul 2021

Real Time Pear Fruit Detection and Counting Using YOLOv4 Models and Deep SORT

DOI: 10.3390/s21144803

Ask a question

Favorite

The Leaky Rectified Linear Unit (or Leaky ReLU) is a modified version of ReLU. The difference is that the former allows a small nonzero gradient over its entire domain, unlike ReLU (Figure 6). Deep neural networks utilizing Leaky ReLU were found to reach convergence slightly faster than those using ReLU. However, Leaky ReLU is slightly less accurate but has lower standard deviations compared to its more novel counterparts Swish and Mish [24]. However, Leaky Re Lu has better performance with under a 75% IoU threshold and with large objects and has lower computational cost due to lower complexity [24].

Activation Functions. (Left) Rectified Linear Unit (ReLU); (Center) Leaky ReLU; (Right) Mish.

Mish, on the other hand, is a smooth, continuous, self-regularized, nonmonotonic activation function that enables smoother loss landscapes which helps in easier optimization and better generalization. It has a wider minimum, and thus can achieve lower loss. Because of these benefits, neural networks implementing Mish led to higher accuracy and lower standard deviations in object detection. Moreover, it retains the feature of its predecessors (Swish and Leaky ReLU) in terms of unbounded above and bounded below. The former avoids saturation (which generally causes training to slow down), whereas the latter results in stronger regularization effects (fits the model properly).

Thus, Leaky Re LU would be more suitable if the goal was to maximize speed without sacrificing much of the accuracy. Then, if accuracy should be maximized, Mish would be the better option. Table 4 summarizes the activation functions used and their corresponding effects on each YOLOv4 model.

Summary of activation function used and the reason why the specified activation functions were used.

Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

Post a Question

0 Q&A

Share your protocol with your peers.

Submit a Preprint Protocol