State-of-the-art research of traditional computer vision is increasingly leveraged in the surgical domain. A particular focus in computer-assisted surgery is to replace marker-based tracking systems for instrument localization with pure image-based 6DoF pose estimation using deep-learning methods. However, state-of-the-art single-view pose estimation methods do not yet meet the accuracy required for surgical navigation. In this context, we investigate the benefits of multi-view setups for highly accurate and occlusion-robust 6DoF pose estimation of surgical instruments and derive recommendations for an ideal camera system that addresses the challenges in the operating room.
Our contributions are threefold. First, we present a multi-view RGB-D video dataset of ex-vivo spine surgeries, captured with static and head-mounted cameras and including rich annotations for surgeon, instruments, and patient anatomy. Second, we perform an extensive evaluation of three state-of-the-art single-view and multi-view pose estimation methods, analyzing the impact of camera quantities and positioning, limited real-world data, and static, hybrid, or fully mobile camera setups on the pose accuracy, occlusion robustness, and generalizability. Third, we design a multi-camera system for marker-less surgical instrument tracking, achieving an average position error of 1.01 mm and orientation error of 0.89° for a surgical drill, and 2.79 mm and 3.33° for a screwdriver under optimal conditions. Our results demonstrate that marker-less tracking of surgical instruments is becoming a feasible alternative to existing marker-based systems.
We provide download and visualization scripts, and a Python wrapper for our dataset on Github: https://github.com/jonashein/mvpsp_dataset
OR-X Bright Test Set | OR-X Dark Test Set | |
---|---|---|
Synthetic Training | ||
Synth-Real Training |
OR-X Bright Test Set
OR-X Dark Test Set
@article{hein_next-generation_2025,
title = {Next-generation surgical navigation: Marker-less multi-view 6DoF pose estimation of surgical instruments},
issn = {1361-8415},
url = {https://www.sciencedirect.com/science/article/pii/S1361841525001604},
doi = {10.1016/j.media.2025.103613},
shorttitle = {Next-generation surgical navigation},
pages = {103613},
journaltitle = {Medical Image Analysis},
author = {Hein, Jonas and Cavalcanti, Nicola and Suter, Daniel and Zingg, Lukas and Carrillo, Fabio and Calvet, Lilian and Farshad, Mazda and Navab, Nassir and Pollefeys, Marc and Fürnstahl, Philipp},
keywords = {Deep Learning, Marker-less tracking, Multi-view {RGB}-D video dataset, Object pose estimation, Surgical instruments, Surgical navigation},
}