RegNeRF enables realistic view synthesis from as few as 3 input images.
Neural Radiance Fields (NeRF) have emerged as a powerful representation for the task of novel view synthesis due to their simplicity and state-of-the-art performance. Though NeRF can produce photorealistic renderings of unseen viewpoints when many input views are available, its performance drops significantly when this number is reduced. We observe that the majority of artifacts in sparse input scenarios are caused by errors in the estimated scene geometry, and by divergent behavior at the start of training. We address this by regularizing the geometry and appearance of patches rendered from unobserved viewpoints, and annealing the ray sampling space during training. We additionally use a normalizing flow model to regularize the color of unobserved viewpoints. Our model outperforms not only other methods that optimize over a single scene, but in many cases also conditional models that are extensively pre-trained on large multi-view datasets.
TL;DR: We regularize unseen views during optimization to enable view synthesis from sparse inputs with as few as 3 input images.
NeRF optimizes the reconstruction loss for a given set of input images (blue cameras). For sparse inputs, however, this leads to degenerate solutions. In this work, we propose to sample unobserved views (red cameras) and regularize the geometry and appearance of patches rendered from those views. More specifically, we cast rays through the scene and render patches from unobserved viewpoints for a given radiance field f. We then regularize appearance by feeding the predicted RGB patches through a trained normalizing flow model phi and maximizing predicted log-likelihood. We regularize geometry by enforcing a smoothness loss on the rendered depth patches. Further, we avoid divergence at early stages of optimization by annealing the scene sampling space ofter the first iterations. Our approach leads to 3D-consistent representations even for sparse input scenarios with as few as 3 input views from which realistic novel views can be rendered.
While mip-NeRF leads to degenerate view synthesis and predicted scene geometry, our method enables realistic view synthesis from 3 input views.
For 6 input views, mip-NeRF improves but predicted renderings and the optimized scene geometry still contain floating artifacts. Our approach leads to smooth predicted scene geometry and realistic novel views.
For 9 input views, mip-NeRF and our method both lead to high-quality view synthesis. For mip-NeRF, small floating artfiacts for far-away novel views near the table are still visible while our predicted scene geometry appears more realistic.
If you want to cite our work, please use:
@InProceedings{Niemeyer2021Regnerf, author = {Michael Niemeyer and Jonathan T. Barron and Ben Mildenhall and Mehdi S. M. Sajjadi and Andreas Geiger and Noha Radwan}, title = {RegNeRF: Regularizing Neural Radiance Fields for View Synthesis from Sparse Inputs}, booktitle = {Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR)}, year = {2022}, }
If you want to use this fully-responsive and easy-to-adapt homepage template, you can download it from the github repository.