FineRecon Depth-aware Feed-forward Network for Detailed 3D Reconstruction

This is my reading note for FineRecon: Depth-aware Feed-forward Network for Detailed 3D Reconstruction. It proposes a high detail surface reconstruction algorithm based voxel volume and multi-view geometry. Two major novelties: improve reconstruction accuracy using a novel MVS depth-guidance strategy and enable the reconstruction of sub-voxel detail with a novel TSDF prediction architecture that can be queriedat any 3D point, using point back-projected fine-grained image features.

Read More

AVIS Autonomous Visual Information Seeking with Large Language Models

This is my reading note on AVIS Autonomous Visual Information Seeking with Large Language Models. The paper proposes a method on how to use L lm to use tools or APIs to solve different visual questions. The biggest contribution is this page collect how seal human uses the same set of tools and APIs to solve different visual question. The collected data generates a translation graph between states and action to take.

Read More

Billion-scale similarity search with GPUs

This is my reading note on Billion-scale similarity search with GPUs. FAISS (Facebook AI Similarity Search) is an open-source library that allows developers to quickly search for similar embeddings of multimedia documents. FAISS uses indexing structures like LSH, IVF, and PQ to speed up the search. It also supports GPUs, which can further accelerate the search. FAISS was developed by Facebook AI Research (FAIR).

Read More

ProPainter Improving Propagation and Transformer for Video Inpainting

This is my reading note for ProPainter: Improving Propagation and Transformer for Video Inpainting. This paper proposes a video inpainting method which remove object from video while reserving spatial temporal consistency. The paper is based on flow based transformer. Two contributions are made, 1) the consistency to improve flow performance which is applied to both image and feature; 2) reduce the # of tokens of than Horner both spatially and temporally.

Read More