Finding two corresponding image points that depict the same scene point in both images of a stereo camera setup (Appendix A), means to match the pixels of the reference image, typically the image of the left camera, against each pixel in the matching image within a certain disparity range . In this, the similarity between these the two pixels is modeled by a similarity measure from which a cost function can be deduced, which typically is at its minimum if both pixels coincide. This so-called matching cost between a pixel in the left image and a corresponding pixel , located according to a disparity shift d in the right image, is stored in the three-dimensional cost volume :
Thus, the objective in finding two corresponding image points is to minimize the matching cost computed by .
When relying on distinctive image features such as SIFT [34], SURF [35] or ORB [36] features, a unique matching between two image points can be found. However, such image features can only be calculated in descriptive image regions, resulting in a very sparse correspondence field. Thus, in the case of dense image matching Equation (1) is evaluated for each pixel in the reference image, computing a pixel-wise matching cost s according to , which indicates the similarity between the pixel pair. This cost is then stored inside a three-dimensional cost volume from which the disparity map is later extracted. In our work, we have implemented the two most commonly used similarity measures for real-time dense image matching, namely the Hamming distance of the non-parametric census transform (CT) [37] and the normalized cross-correlation (NCC). Since the Hamming distance of the CT is minimal when both image patches are most similar, it can directly be used as the matching cost . The NCC, however, is equal to 1 when both image patches are equal. Thus, the NCC is inverted and truncated before being evaluated as matching cost: .
Since the disparity is only evaluated along the same pixel row (Equation (1)), it is assumed that the input images and are rectified prior to the process of image matching. Similar to the way described by Ruf et al. [19], we use a standard calibration routine implemented in the OpenCV library [38] to calculate two rectification maps, which allow efficient resampling of the input images such that the epipolar lines lie horizontally on the image rows.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.
Tips for asking effective questions
+ Description
Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.