Open Access
ARTICLE
Dual Branch PnP Based Network for Monocular 6D Pose Estimation
1 Department of Computer Science and Technology, Huaqiao University, Xiamen, 361000, China
2 Xiamen Key Laboratory of Computer Vision and Pattern Recognition, Huaqiao University, Xiamen, 361000, China
3 Fujian Key Laboratory of Big Data Intelligence and Security, Huaqiao University, Xiamen, 361000, China
4 College of Mechanical Engineering and Automation, Xiamen, 361000, China
* Corresponding Author: Hong-Bo Zhang. Email:
Intelligent Automation & Soft Computing 2023, 36(3), 3243-3256. https://doi.org/10.32604/iasc.2023.035812
Received 05 September 2022; Accepted 04 November 2022; Issue published 15 March 2023
Abstract
Monocular 6D pose estimation is a functional task in the field of computer vision and robotics. In recent years, 2D-3D correspondence-based methods have achieved improved performance in multiview and depth data-based scenes. However, for monocular 6D pose estimation, these methods are affected by the prediction results of the 2D-3D correspondences and the robustness of the perspective-n-point (PnP) algorithm. There is still a difference in the distance from the expected estimation effect. To obtain a more effective feature representation result, edge enhancement is proposed to increase the shape information of the object by analyzing the influence of inaccurate 2D-3D matching on 6D pose regression and comparing the effectiveness of the intermediate representation. Furthermore, although the transformation matrix is composed of rotation and translation matrices from 3D model points to 2D pixel points, the two variables are essentially different and the same network cannot be used for both variables in the regression process. Therefore, to improve the effectiveness of the PnP algorithm, this paper designs a dual-branch PnP network to predict rotation and translation information. Finally, the proposed method is verified on the public LM, LM-O and YCB-Video datasets. The ADD(S) values of the proposed method are 94.2 and 62.84 on the LM and LM-O datasets, respectively. The AUC of ADD(-S) value on YCB-Video is 81.1. These experimental results show that the performance of the proposed method is superior to that of similar methods.Keywords
Cite This Article
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.