We demonstrate a single RGB image based localization system which is not only capable of reaching sub-meter localization accuracy but also estimating orientation.
The proposed system consists of three components, as shown in Figure 1:
Data preparation, shown in Figure 1a: We collected RGB images from target scenarios, then extracted CNN features from all RGB images through pre-trained CNN models. All of the work was done in offline period.
Image retrieval, shown in Figure 1b: We loaded all of the CNN features of images in database, and ranked them according to their similarity from the CNN features extracted from captured image, and then output a set of images with top similarity.
Pose estimation, shown in Figure 1c: We carried out image retrieval to the query image and got two of the most similar images as well as their poses. Then, feature points were extracted from the query image and retrieved images. We employed 2D-to-2D correspondence to feature points extracted from two retrieved images to compute the scale in monocular vision setting, and then applied the same procedure to feature points from the query image, and the matching image to compute the pose of the query image.
Overview of our visual indoor positioning method. The process is composed of (a) database construction; (b) image retrieval; and (c) pose estimation stages.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.
Tips for asking effective questions
+ Description
Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.