HeadNeRF: A Real-time NeRF-based Parametric Head Model provides a parametric head model which could generates photorealistic face images conditioned on identity, expression, head pose and appearance (lighting). Compared with traditional mesh and texture, it provides higher fidelity, inherently differetiable and doesn’t required a 3D dataset; compared with GAN, it provides rendering at different head pose with accurate 3D information. This is achived with NeRF. In addition, it could render in real time (5ms) with a model GPU.
This my reading note on Learning Transferable Visual Models From Natural Language Supervision. The proposed method is called Contrastive Language-Image Pre-training or CLIP. State-of-the-art computer vision systems are trained to predict a fixed set of predetermined object categories. We demonstrate that the simple pre-training task of predicting which caption (freeform text instead of strict labeling) goes with which image is an efficient and scalable way to learn SOTA image representations from scratch on a dataset of 400 million (image, text) pairs collected from the internet. After pre-training, natural language is used to reference learned visual concepts (or describe new ones) enabling zero-shot transfer of the model to downstream tasks.
This is my note on rotation in 3D space. There are many different ways of representating the rotation in 3D space, e.g., 3x3 rotation matrix, Euler angle (pitch, yaw and roll), Rodrigues axis-angle representation and quanterion. The relationship and conversion between those representation will be described as below. You could also use scipy.spatial.transform.Rotation to convert between methods.
The term power of attorney (POA) refers to a legal authorization that gives a designated person the power to act for someone else. As such, a POA gives the agent or attorney-in-fact the authority to act on behalf of the principal. The agent may be given broad or limited authority to make decisions about the principal’s property, finances, investments, or medical care.
pixelNeRF: Neural Radiance Fields from One or Few Images tries to learn a discontinuous neutral scene representation from one or few input images. To this end, pixelNeRF introduced an architecture that conditions a NeRF on image inputs in a fully convolutional manner. This allows the network to be trained across multiple scenes to learn a scene prior, enabling it to perform novel view synthesis in a feed-forward manner from a sparse set of views (as few as one).
NeuMan: Neural Human Radiance Field from a Single Video proposes a novel framework to reconstruct the human and the scene that can be ren- dered with novel human poses and views from just a single in-the-wild video. Given a video captured by a moving camera, we train two NeRF models: a human NeRF model (condition on SMPL) and a scene NeRF model. Our method is able to learn subject specific details, including cloth wrinkles and ac- cessories, from just a 10 seconds video clip, and to provide high quality renderings of the human under novel poses, from novel views, together with the background.
Nerfies: Deformable Neural Radiance Fields present the first method capable ofphotorealistically reconstructing deformable scenes using photos/videos cap- tured casually from mobile phones. Our approach augments neural radiance fields (NeRF) by optimizing an additional continuous volumetric deformation field that warps each observed point into a canonical 5D NeRF. To avoid local minima, we propose a coarse-to-fine optimization method for coordinate-based models that allows for more robust optimization. To avoid overfit, we propose an elastic regularization ofthe deformation field that further improves robustness.
This note discusses NeRF in the Wild: Neural Radiance Fields for Unconstrained Photo Collections. NeRF-W addresses the central limitation of NeRF that we address here is its assumption that the world is geometrically, materially, and photometrically static — that the density and radiance of the world is constant. NeRF-W instead models per-image appearance variations (such as exposure, lighting, weather) as well as model the scene as the union of shared and image-dependent elements, thereby enabling the unsuper- vised decomposition of scene content into “static” and “transient” components.
This is my reading note for GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields. The paper aims to provide more control to 3D object rendering NeRF. For example moving the objects in the 3D scene, adding/deleting objects and so on. To acheive this, GIRAFFE proposed to model the objects and background in the scene separately and then composite together for the rendering. In addition, different from NeRF, GIRAFFE uses a learned discriminator instead of L2 or L1 loss as loss function, thus it is a GAN.
Instant Neural Graphics Primitives with a Multiresolution Hash Encoding tries to reduce inference cost with a versatile new input encoding that permits the use of a smaller network without sacrificing quality. This is achieved via a small neural network is augmented by a multiresolution hash table of trainable feature vectors whose values are op- timized through stochastic gradient descent.