Tutorial at IEEE ISMAR 2023
A Beginner's Guide to Neural Rendering
WORK IN PROGRESS
Updated 20 October 2023 1145am
You may have heard of NeRFs (Neural Radiance Fields), or neural rendering more generally. Neural rendering brings together deep learning and computer graphics in order to generate extremely compelling 3D content from a set of 2D images. In this full-day tutorial, we'll start by learning the core principles of neural networks, deep learning, and volume rendering in order to prepare ourselves to scale NeRF Mountain. Later in the day, we’ll dissect the original NeRF paper in detail, explore extensions to the method and advancements in neural rendering, and see a lot of cool examples. We’ll close with a forward-thinking discussion on the opportunities and challenges associated with the use of neural rendering in MR. By the end of this tutorial, you should have a solid grasp of the NeRF method and the underlying technologies, including neural networks and deep learning.
Fourier Features Let Networks Learn
High Frequency Functions in Low Dimensional Domains
In general, MLP networks (such as those employed in the NeRF paper) struggle to learn high-frequency functions. This paper demonstrates that a simple Fourier feature mapping of the input before passing it to the MLP network yields substantially better results.
NeRF in the Wild: Neural Radiance Fields for Unconstrained Photo Collections
Mip-NeRF: A Multiscale Representation
for Anti-Aliasing Neural Radiance Fields
Base NeRF works really well on data that is really well-structured - data where all of the viewing positions are at approximately the same distance from the model. But base NeRF has no concept of scale or aliasing, so it struggles with changing resolutions or scales.
By simply changing the feature mapping - which effectively creates an image pyramid, or mipmap, hence the name - we can achieve markedly superior results.
Neural Radiance Fields for Dynamic Scenes
D-NeRF (short for Dynamic-NeRF) extends NeRF to be able to handle animation sequences involving deformable objects. Naïvely adding the time index as a 6th parameter to the NeRF neural network does not yield satisfactory results, as it fails to exploit temporal redundancies in the input stream.
D-NeRF solves this problem by training two NNs instead of one: The first NN maps each point of the scene at time t back to that point in a canonical space - for convenience, the location of that point at t = 0. The second NN is a conventional NeRF network that computes the color and opacity of the transformed point from the desired viewing angle.
pixelNeRF: Neural Radiance Fields from One or Few Images
Both pixelNeRF and the subsequent reference, RegNeRF, address the data-hungry nature of base NeRF. Base-NeRF produces very compelling results when tens of input images are available; pixelNeRF and RegNeRF present different strategies for reducing this number to a few, or even just one.
pixelNeRF, like D-NeRF, uses 2 NNs to accomplish this goal. pixelNeRF pretrains a convolutional NN (CNN) on ImageNet data, which is used as a "prior" for the conventional NeRF network. When the desired viewing position/direction is similar to the/a input view, the input view is weighted more heavily; when it is substantially different, the pretrained prior is weighted more heavily.
Regularizing Neural Radiance Fields for View Synthesis from Sparse Inputs
RegNeRF takes as its starting point Mip-NeRF and then, similar to pixelNeRF, attempts to produce compelling results for small sets of input images, in this case as few as 3.
To accomplish this, it generates additional input viewpoints that better sample the pose space, and modifies the standard NeRF cost/loss function to regularize the geometry and color of patches observed from those viewpoints.
KiloNeRF: Speeding up Neural Radiance Fields with Thousands of Tiny MLPs
NeRFs are visually impressive, but out of the box, they are slow. Rendering a NeRF involves making millions of queries on a deep MLP NN, each of which consists of ~1M floating point operations (FLOPs). The key insight of KiloNeRF is to replace the single (big) MLP with many (tiny) MLPs.
The scene is subdivided into a coarse voxel grid, each of which corresponds to one of the small MLPs. These small MLPs are then trained using teacher-student distillation of a base NeRF model to maintain visual quality.
Anti-Aliased Neural Radiance Fields
Mip-NeRF 360, as the name suggests, is an extension of Mip-NeRF with optimizations that enable it to perform well on a particular type of dataset: unbounded (in depth) images taken from a 360° orbit of an object or objects of interest. It employs non-linear scene parameterization (the 2D mip-NeRF Gaussians are mapped into a 3D "ball"), online distillation (a simpler "coarse" network), and a novel distortion-based regularizer (which addresses "floaters" and "background collapse") to do so.
Tensorial Radiance Fields
TensoRF is perhaps the method that is "most different" from base NeRF of those on this page, certainly of those addressed so far. While NeRF-based methods model radiance fields purely as the weights and biases of an MLP NN, TensoRF explicitly models a radiance field as a 4D tensor: a voxel grid (3D) with per-voxel multi-channel features. They then perform various decompositions of this 4D tensor into matrix and vector components.
This new representation significantly improves reconstruction (training) time and delivers visually and quantitatively superior results to base NeRF, while maintaining a very small memory footprint.
Scalable Large Scene Neural View Synthesis
Block-NeRF builds on techniques from NeRF in the Wild and Mip-NeRF to generate coherent city-scale radiance fields from millions of input images.
To accomplish this, it trains many individual NNs - in this work, roughly one per city block - using a combination of NeRF-W and mip-NeRF techniques and composites them by a simple interpolation weighted on distance from the viewpoint.
NeRF in the Dark:
High Dynamic Range View Synthesis from Noisy Raw Images
Base NeRF works well on clean (i.e., not noisy), low dynamic range, sRGB images. RawNeRF demonstrates the utility of the NeRF pipeline applied to noisy unprocessed camera output, which can preserve more of the scene's dynamic range.
It turns out that the NeRF pipeline is extremely robust to zero-mean camera noise, enabling robust denoising and manipulation of focus, exposure, and tonemapping in the resulting images, in addition to the novel view synthesis most associated with NeRF.