AlphaPose--Multip Personal Human Pose Estimation

This is my reading note for RMPE: Regional Multi-person Pose Estimation and the code is available at MVIG-SJTU/AlphaPose. This paper is a novel regional multi-person pose estimation (RMPE) framework to facilitate pose estimation in the presence of inaccurate human bounding boxes. The framework consists of three components: Symmetric Spatial Transformer Network (SSTN), Parametric Pose Non-Maximum-Suppression (NMS), and Pose-Guided Proposals Generator (PGPG).

Read More

Transformer Introduction

This is my reading note for Transformers in Vision: A Survey. Transformers enable modeling long dependencies between input sequence elements and support parallel processing of sequence as compared to recurrent networks e.g., Long short-term memory (LSTM). Different from convolutional networks, Transformers require minimal inductive biases for their design and are naturally suited as set-functions. Furthermore, the straightforward design of Transformers allows processing multiple modalities (e.g., images, videos, text and speech) using similar processing blocks and demonstrates excellent scalability to very large capacity networks and huge datasets.

Read More

Swin Transformer

ViT provides the possibilities of using transformers along as a backbone for vision tasks. However, due to transformer conduct global self attention, where the relationships of a token and all other tokens are computed, its complexity grows exponentially with image resolution. This makes it inefficient for image segmentation or semantic segmentation task. To this end, twin transformer is proposed in Swin Transformer: Hierarchical Vision Transformer using Shifted Windows, which addresses the computation issue by conducting self attention in a local window and has multi-layers for windows at different resolution.

Read More

CVPR 2021 Transformer Paper

This post summarizes the papers on transformers in CVPR 2021. This is from CVPR2021-Papers-with-Code. Given transforms captures the interaction between query (Q) and dictionary (K), transform begins to see applications in tracking (e.g., Transformer Tracking, Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking), local match matching (e.g., LoFTR Detector-Free Local Feature Matching with Transformers) and image retrieval (e.g., Thinking Fast and Slow: Efficient Text-to-Visual Retrieval with Transformers, Revamping cross-modal recipe retrieval with hierarchical Transformers and self-supervised learning)

Read More

Test Drive of VolksWagan ID 4 and Ford Mach E

As Tesla stock owner, I decided to have a test drive VolksWagon ID 4 and Ford Mach E to evaluate my long position in Tesla. I am located in San Jose bay area. My experience is both cars are very good car and their experiences significantly reduce the transition efforts from gaosline to EV, compared with Tesla. Instead of being a risk to Tesla (at least for coming few years), I would say it is a risk to other models of Volkswagon and Ford.

Read More

Compute Discounted Cash Flow for Buying a House as Investment

In this post, I apply DCF to evaluate whether it is a good idea to buy a house as investment. Here I use the numbers for a typical townhouse in bay area. Based on my analysis here, it may not be a good investment to buy a house for rent in Bay area–it could take more than 40 years to pay back your investment from rent, if doesn’t consider the value of selling house at the end of investment period. This could be even worse, if you could not use mortgage when buying the house.

Read More