The term power of attorney (POA) refers to a legal authorization that gives a designated person the power to act for someone else. As such, a POA gives the agent or attorney-in-fact the authority to act on behalf of the principal. The agent may be given broad or limited authority to make decisions about the principal’s property, finances, investments, or medical care.
pixelNeRF: Neural Radiance Fields from One or Few Images tries to learn a discontinuous neutral scene representation from one or few input images. To this end, pixelNeRF introduced an architecture that conditions a NeRF on image inputs in a fully convolutional manner. This allows the network to be trained across multiple scenes to learn a scene prior, enabling it to perform novel view synthesis in a feed-forward manner from a sparse set of views (as few as one).
NeuMan: Neural Human Radiance Field from a Single Video proposes a novel framework to reconstruct the human and the scene that can be ren- dered with novel human poses and views from just a single in-the-wild video. Given a video captured by a moving camera, we train two NeRF models: a human NeRF model (condition on SMPL) and a scene NeRF model. Our method is able to learn subject specific details, including cloth wrinkles and ac- cessories, from just a 10 seconds video clip, and to provide high quality renderings of the human under novel poses, from novel views, together with the background.
Nerfies: Deformable Neural Radiance Fields present the first method capable ofphotorealistically reconstructing deformable scenes using photos/videos cap- tured casually from mobile phones. Our approach augments neural radiance fields (NeRF) by optimizing an additional continuous volumetric deformation field that warps each observed point into a canonical 5D NeRF. To avoid local minima, we propose a coarse-to-fine optimization method for coordinate-based models that allows for more robust optimization. To avoid overfit, we propose an elastic regularization ofthe deformation field that further improves robustness.
This note discusses NeRF in the Wild: Neural Radiance Fields for Unconstrained Photo Collections. NeRF-W addresses the central limitation of NeRF that we address here is its assumption that the world is geometrically, materially, and photometrically static — that the density and radiance of the world is constant. NeRF-W instead models per-image appearance variations (such as exposure, lighting, weather) as well as model the scene as the union of shared and image-dependent elements, thereby enabling the unsuper- vised decomposition of scene content into “static” and “transient” components.
This is my reading note for GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields. The paper aims to provide more control to 3D object rendering NeRF. For example moving the objects in the 3D scene, adding/deleting objects and so on. To acheive this, GIRAFFE proposed to model the objects and background in the scene separately and then composite together for the rendering. In addition, different from NeRF, GIRAFFE uses a learned discriminator instead of L2 or L1 loss as loss function, thus it is a GAN.
Instant Neural Graphics Primitives with a Multiresolution Hash Encoding tries to reduce inference cost with a versatile new input encoding that permits the use of a smaller network without sacrificing quality. This is achieved via a small neural network is augmented by a multiresolution hash table of trainable feature vectors whose values are op- timized through stochastic gradient descent.
This is my 4th note in Diffusion models. For the previous notes, please refer to diffusion and stable diffusion. My contents are based on paper listed in Diffusion Explained and Diffusion Models: A Comprehensive Survey of Methods and Applications.
This is my reading note on Hierarchical Text-Conditional Image Generation with CLIP Latents. This paper proposes a two-stage model (unCLIP): a prior that generates a CLIP image embedding given a text caption, and a decoder that generates an image conditioned on the image embedding, for generating images from text.
This is my 2nd reading note on diffusion model, which will focus on the
stabe diffusion, aka High-Resolution Image Synthesis with Latent Diffusion Models. By decomposing the image formation process into a sequential application of denoising autoencoders, diffusion models (DMs) achieve state-of-the-art synthesis results on image data and beyond. However, as mentioned in diffusion, DM sufferes high computational cost. The proposed Latent Diffusion Models (LDM) reduces the computational cost via latent space and introduces cross-attention to enable multi-modality conditioning.