Pose Estimation for Two-View Panoramas based on Keypoint Matching: a Comparative Study and Critical Analysis

Federal University of Rio Grande do Sul
Conference on Computer Vision and Pattern Recognition (CVPR) 2022 Workshops - OMNICV

Abstract

Pose estimation is a crucial problem in several computer vision and robotics applications. For the two-view scenario, the typical pipeline consists of finding point correspondences between the two views and using them to estimate the pose. However, most available keypoint extraction and matching methods were designed to work with perspective images and may fail under not-affine distortions present in wide-angle or omnidirectional media, which are becoming increasingly popular in recent years. This paper presents a comprehensive comparative analysis of different keypoint matching algorithms for panoramas coupled to different linear and non-linear approaches for pose estimation. As an additional contribution, we explore a recent approach for mitigating spherical distortions using tangent plane projections, which can be coupled with any planar descriptor, and allows the adaptation of recent learning-based methods. We evaluate the combination of keypoint matching and pose estimation methods using the rotation and translation error of the estimated pose in different scenarios (indoor and outdoor), and our results indicate that SPHORB and ``tangent SIFT'' are competitive algorithms. We also show that tangent plane adaptations frequently present competitive results, and some optimization steps consistently improve the performance in all methods.

Description of the image

Approach

Our main goal is to evaluate different feature matching algorithms coupled with pose estimation algorithms in the context of panoramas. As noticed in previous works, spherically-adapted methods present better results than their planar counterparts. However, such conclusion was based on limited datasets and using generic keypoint detection/matching metrics, which might not directly correlate with pose estimation. Hence, we also consider analyzing planar methods either by applying them directly to ERP images or by locally adapting them with tangent plane projections.

Description of the image

Results

Table 5 shows the accuracy achieved by all descriptors using all pose estimation methods for both translation and rotation errors. Although the accuracy values for the rotation component are still higher than the translation for similar thresholds, they are considerable lower than those obtained when pure rotations were applied.

Description of the image

Table 6 is similar to Table 5, but relating to the results in the outdoor scenes. Again, rotation results are better than translations ones for all angular thresholds, which also confirms that translation is more challenging even in a mixed setup for outdoor images.

Description of the image

BibTeX

@InProceedings{Murrugarra_2022_OMNICV_CVPRW,
  author={Murrugarra-Llerena, Jeffri and Da Silveira, Thiago L. T. and Jung, Claudio R.},
  booktitle={2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)}, 
  title={Pose Estimation for Two-View Panoramas based on Keypoint Matching: a Comparative Study and Critical Analysis}, 
  year={2022},
  volume={},
  number={},
  pages={5198-5207},
  keywords={Learning systems;Couplings;Computer vision;Planarization;Pose estimation;Nonlinear distortion;Pipelines},
  doi={10.1109/CVPRW56347.2022.00568}}

}