Yolo (you only look once) is a single-stage detector designed for real-time object detection which performs object classification and localization at the same time. In object detection, high real-time processing frame rates and detection accuracy are the primary objective. The YoloV4 object detection model is benchmarked on the MS COCO dataset [32], achieving 65 fps inference speed with an accuracy of 43.5% AP (65.7% AP50) on Tesla V100 [33]. Object detectors compress features of an input image down through a convolutional neural network backbone. The mixing and holding up of the feature layers from the convolutional backbone happens in the neck part of the object detector. The detection of a specific object in the image happens in the head part of the detector. As YoloV4 is a single-stage object detector, the classification and prediction of object localization are done at the same time.
The backbone of YoloV4 is based on CSPDarknet53. The convolutional architecture is based on a modified DenseNet [34]. The edited DenseNet uses cross-stage partial connections that send one copy of the feature map separated from the base layer through the dense block and another to the next stage. The major advantages of choosing DenseNet architecture are alleviating the gradient vanishing problem, bolstering backpropagation, and fewer network parameters, while, when using the cross-stage partial connections, the computational bottleneck of DenseNet is removed, with improved learning.
Feature aggregation occurs in the neck part of the YoloV4 object detector. Path aggregation networks (PANets) are used by the YoloV4 detector for feature aggregation along with a spatial pyramid pooling block after CSPDarknet53 to increase and improve the receptive field and sort out the most important features from the backbone.
Anchor-based detection steps are deployed with three levels of detection granularity in the head region of the YoloV4 detector, the same as those implemented in YoloV3 [35], a previous version of Yolo. Certain novel features have been added to YoloV4, such as bag of freebies, which includes different augmentation techniques, drop block regularization, complete IoU loss (CIoU), etc. Additionally, some bags of specials are also included that consist of mish activation, DioU-NMS, modified path aggregation networks, etc. [33]. The complete structural diagram, with different blocks of YoloV4, is presented in Appendix A Figure A1.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.