Next-generation Surgical Navigation: Marker-less Multi-view 6DoF Pose Estimation of Surgical Instruments

¹Research in Orthopedic Computer Science, Balgrist University Hospital, University of Zurich, Switzerland, ²Computer Vision and Geometry Group, ETH Zurich, Switzerland, ³Balgrist University Hospital, University of Zurich, Switzerland, ⁴OR-X Translational Center for Surgery, Balgrist University Hospital, University of Zurich, Switzerland, ⁵Computer Aided Medical Procedures, Technical University Munich, Germany

Abstract

State-of-the-art research of traditional computer vision is increasingly leveraged in the surgical domain. A particular focus in computer-assisted surgery is to replace marker-based tracking systems for instrument localization with pure image-based 6DoF pose estimation using deep-learning methods. However, state-of-the-art single-view pose estimation methods do not yet meet the accuracy required for surgical navigation. In this context, we investigate the benefits of multi-view setups for highly accurate and occlusion-robust 6DoF pose estimation of surgical instruments and derive recommendations for an ideal camera system that addresses the challenges in the operating room.

Our contributions are threefold. First, we present a multi-view RGB-D video dataset of ex-vivo spine surgeries, captured with static and head-mounted cameras and including rich annotations for surgeon, instruments, and patient anatomy. Second, we perform an extensive evaluation of three state-of-the-art single-view and multi-view pose estimation methods, analyzing the impact of camera quantities and positioning, limited real-world data, and static, hybrid, or fully mobile camera setups on the pose accuracy, occlusion robustness, and generalizability. Third, we design a multi-camera system for marker-less surgical instrument tracking, achieving an average position error of 1.01 mm and orientation error of 0.89° for a surgical drill, and 2.79 mm and 3.33° for a screwdriver under optimal conditions. Our results demonstrate that marker-less tracking of surgical instruments is becoming a feasible alternative to existing marker-based systems.

Video

OR-X Bright Test Set

OR-X Dark Test Set

Synthetic Training

Synth-Real Training

BibTeX

@article{hein_next-generation_2025, title = {Next-generation surgical navigation: Marker-less multi-view 6DoF pose estimation of surgical instruments}, issn = {1361-8415}, url = {https://www.sciencedirect.com/science/article/pii/S1361841525001604}, doi = {10.1016/j.media.2025.103613}, shorttitle = {Next-generation surgical navigation}, pages = {103613}, journaltitle = {Medical Image Analysis}, author = {Hein, Jonas and Cavalcanti, Nicola and Suter, Daniel and Zingg, Lukas and Carrillo, Fabio and Calvet, Lilian and Farshad, Mazda and Navab, Nassir and Pollefeys, Marc and Fürnstahl, Philipp}, keywords = {Deep Learning, Marker-less tracking, Multi-view {RGB}-D video dataset, Object pose estimation, Surgical instruments, Surgical navigation}, }

Next-generation Surgical Navigation: Marker-less Multi-view 6DoF Pose Estimation of Surgical Instruments

A multi-view RGB-D video dataset of ex-vivo spine surgeries.

Millimeter-accurate marker-less 6DoF pose estimation of surgical instruments.

Abstract

Dataset

Video

Qualitative Baselines Comparison

BibTeX

Next-generation Surgical Navigation: Marker-less Multi-view 6DoF Pose Estimation of Surgical Instruments

A multi-view RGB-D video dataset of ex-vivo spine surgeries. Millimeter-accurate marker-less 6DoF pose estimation of surgical instruments.

Abstract

Dataset

Video

Qualitative Baselines Comparison

BibTeX

A multi-view RGB-D video dataset of ex-vivo spine surgeries.

Millimeter-accurate marker-less 6DoF pose estimation of surgical instruments.