Next Best View (NBV) algorithms aim to acquire an optimal set of images using minimal resources, time, or number of captures to enable efficient 3D reconstruction of a scene. Existing approaches often rely on prior scene knowledge or additional image captures and often develop policies that maximize coverage. Yet, for many real scenes with complex geometry and self-occlusions, coverage maximization does not lead to better reconstruction quality directly. In this paper, we propose the View Introspection Network (VIN), which is trained to predict the reconstruction quality improvement of views directly, and the VIN-NBV policy. A greedy sequential sampling-based policy, where at each acquisition step, we sample multiple query views and choose the one with the highest VIN predicted improvement score. We design the VIN to perform 3D-aware featurization of the reconstruction built from prior acquisitions, and for each query view create a feature that can be decoded into an improvement score. We then train the VIN using imitation learning to predict the reconstruction improvement score. We show that VIN-NBV improves reconstruction quality by ~30% over a coverage maximization baseline when operating with constraints on the number of acquisitions or the time in motion.
Overview of the VIN-NBV Policy and the VIN architecture. The VIN is trained to predict the reconstruction improvement of a query view given a set of prior acquisitions. The VIN-NBV policy uses the VIN to select the next best view to acquire. The design of our policy makes it easy to modify with custom termination criteria and decision making logic.
We show the final average chamfer distance of our method compared to prior works evaluated on the OmniObject3D houses category for 20 captures. We also graph the average chamfer distance of our method as more acquisitions are made. We show that our method outperforms all prior works and that the chamfer distance improves as more acquisitions are made.
An interactive comparison of the final reconstruction after 10 total acquistions using our method (VIN-NBV) and our coverage baseline (Cov-NBV). Click the different object names to visualize more objects.
Controls: Drag on any panel to rotate the shared camera; use the mouse wheel to zoom. Click here to reset view to default.
We provide an interactive comparison of the final reconstruction under different time in motion limits. Under the different time limits the robot is forced to complete all acquistions in a certain time. The robot can move for a maximum of 15, 30, 45, and 60 seconds during acquistion. We compare our method (VIN-NBV) with the coverage baseline Cov-NBV and show the final results.
Controls: Drag on any panel to rotate the shared camera; use the mouse wheel to zoom. Click here to reset view to default.
@misc{frahm2025vinnbvviewintrospectionnetwork,
title={VIN-NBV: A View Introspection Network for Next-Best-View Selection for Resource-Efficient 3D Reconstruction},
author={Noah Frahm and Dongxu Zhao and Andrea Dunn Beltran and Ron Alterovitz and Jan-Michael Frahm and Junier Oliva and Roni Sengupta},
year={2025},
eprint={2505.06219},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2505.06219},
}