One Map to Find Them All: Real-time Open-Vocabulary Mapping for Zero-shot Multi-Object Navigation

Division of Robotics, Perception, and Learning at KTH Royal Institute of Technology
Accepted to ICRA 2025

Overview of OneMap

Video 1

Our method deployed on a Boston Dynamics Spot robot, searching a sequence of three objects. All computations are executed on the on-board Jetson Orin AGX.

Video 2

Our method deployed on a Boston Dynamics Spot robot, searching a sequence of three objects. All computations are executed on the on-board Jetson Orin AGX.

Abstract

The capability to efficiently search for objects in complex environments is fundamental for many real-world robot applications. Recent advances in open-vocabulary vision models have resulted in semantically-informed object navigation methods that allow a robot to search for an arbitrary object without prior training. However, these zero-shot methods have so far treated the environment as unknown for each consecutive query. In this paper we introduce a new benchmark for zero-shot multi-object navigation, allowing the robot to leverage information gathered from previous searches to more efficiently find new objects. To address this problem we build a reusable open-vocabulary feature map tailored for real-time object search. We further propose a probabilistic-semantic map update that mitigates common sources of errors in semantic feature extraction and leverage this semantic uncertainty for informed multi-object exploration. We evaluate our method on a set of object navigation tasks in both simulation as well as with a real robot, running in real-time on a Jetson Orin AGX. We demonstrate that it outperforms existing state-of-the-art approaches both on single and multi-object navigation tasks.

BibTeX

@INPROCEEDINGS{11128393,
      author={Busch, Finn Lukas and Homberger, Timon and Ortega-Peimbert, Jesús and Yang, Quantao and Andersson, Olov},
      booktitle={2025 IEEE International Conference on Robotics and Automation (ICRA)}, 
      title={One Map to Find Them All: Real-time Open-Vocabulary Mapping for Zero-shot Multi-Object Navigation}, 
      year={2025},
      volume={},
      number={},
      pages={14835-14842},
      keywords={Training;Three-dimensional displays;Uncertainty;Navigation;Semantics;Benchmark testing;Search problems;Probabilistic logic;Real-time systems;Videos},
      doi={10.1109/ICRA55743.2025.11128393},
      }