YOLOv3 architecture

JA Jayakrishnan Ajayakumar
AC Andrew J. Curtis
VR Vanessa Rouzier
JP Jean William Pape
SB Sandra Bempah
MA Meer Taifur Alam
MA Md. Mahbubul Alam
MR Mohammed H. Rashid
AA Afsar Ali
JM John Glenn Morris
request Request a Protocol
ask Ask a question
Favorite

YOLOv3 utilizes Darknet-53 [40] as its backbone network for feature extraction. Each image in the training set, for example the muddy water (M.Water) seen in Fig. 1, is divided into a 2D matrix of NxN (N usually 7) grid. The network outputs five bounding boxes for each grid cell along with an “objectness” score for each bounding box. It also outputs K class probabilities where K represents the total number of classes. Thus each grid produces a total number of 25 + K (5 × 4 + 5 + K) values. Rather than predicting the absolute coordinates of the bounding box centers, YOLOv3 predicts an offset relative to the coordinates of the grid cell. For each grid cell, YOLOv3 is trained to predict only the bounding boxes whose center lies in that grid cell. Confidence for predictions in each of the grid cell is given by Eq. 1.

The YOLOv3 model. Object detection is posed as a regression problem

Here pr(Object) is 1 if the target is in the grid and 0 otherwise. IOUpredtruth(intersection over union) is defined as the overlap ratio between the predicted bounding box and the true bounding box (Eq. 2). The confidence provides estimates about whether a grid contains an object and the accuracy of the bounding box that the network has predicted.

In-order to reduce the detection error, anchor boxes which are a priori bounding boxes (5 for each grid), are generated by using a k-means algorithm applied to the height and width of the training set of bounding boxes. These make the network more likely to predict appropriate sized bounding boxes which also speeds up training [40]. For training, YOLOv3 uses sum-squared error in the output as the optimization procedure. The loss function is a combination of errors on the bounding box prediction, object prediction, and class prediction (Eq. 3).

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

0/150

tip Tips for asking effective questions

+ Description

Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.

post Post a Question
0 Q&A