Transformer Introduction

This is my reading note for Transformers in Vision: A Survey. Transformers enable modeling long dependencies between input sequence elements and support parallel processing of sequence as compared to recurrent networks e.g., Long short-term memory (LSTM). Different from convolutional networks, Transformers require minimal inductive biases for their design and are naturally suited as set-functions. Furthermore, the straightforward design of Transformers allows processing multiple modalities (e.g., images, videos, text and speech) using similar processing blocks and demonstrates excellent scalability to very large capacity networks and huge datasets.

Swin Transformer

ViT provides the possibilities of using transformers along as a backbone for vision tasks. However, due to transformer conduct global self attention, where the relationships of a token and all other tokens are computed, its complexity grows exponentially with image resolution. This makes it inefficient for image segmentation or semantic segmentation task. To this end, twin transformer is proposed in Swin Transformer: Hierarchical Vision Transformer using Shifted Windows, which addresses the computation issue by conducting self attention in a local window and has multi-layers for windows at different resolution.

CVPR 2021 Transformer Paper

This post summarizes the papers on transformers in CVPR 2021. This is from CVPR2021-Papers-with-Code. Given transforms captures the interaction between query (Q) and dictionary (K), transform begins to see applications in tracking (e.g., Transformer Tracking, Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking), local match matching (e.g., LoFTR Detector-Free Local Feature Matching with Transformers) and image retrieval (e.g., Thinking Fast and Slow: Efficient Text-to-Visual Retrieval with Transformers, Revamping cross-modal recipe retrieval with hierarchical Transformers and self-supervised learning)

思想史读书笔记

As for my note on reading book Ideas: A History from Fire to Freud or known as 思想史: 从火到弗洛伊德, I would like to cite 西方哲学史思维导图+脉络图（完整版）

My Paper Reading List For Facial Landmark Detection

Facial landmark detection is the task of detecting key landmarks on the face and tracking them (being robust to rigid and non-rigid facial deformations due to head movements and facial expressions).

Test Drive of VolksWagan ID 4 and Ford Mach E

As Tesla stock owner, I decided to have a test drive VolksWagon ID 4 and Ford Mach E to evaluate my long position in Tesla. I am located in San Jose bay area. My experience is both cars are very good car and their experiences significantly reduce the transition efforts from gaosline to EV, compared with Tesla. Instead of being a risk to Tesla (at least for coming few years), I would say it is a risk to other models of Volkswagon and Ford.

ViT AN IMAGE IS WORTH 16X16 WORDS TRANSFORMERS FOR IMAGE RECOGNITION AT SCALE

Vision Transformer (ViT) is a pure transformer architecture (no CNN is required) applied directly to a sequence of image patches for classification tasks. The order of patches in sequence capture the spatial information of those patches, similar to words in sentences.

Compute Discounted Cash Flow for Buying a House as Investment

In this post, I apply DCF to evaluate whether it is a good idea to buy a house as investment. Here I use the numbers for a typical townhouse in bay area. Based on my analysis here, it may not be a good investment to buy a house for rent in Bay area–it could take more than 40 years to pay back your investment from rent, if doesn’t consider the value of selling house at the end of investment period. This could be even worse, if you could not use mortgage when buying the house.

My Paper Reading List for 3D Face Reconstructions

Here is my paper reading lsit for 3D face reconstructions based on Papers with Code. 3D face reconstruction is the task of reconstructing a face from an image into a 3D form (or mesh). Most of the papers on the list are between 2017~2020.

Difference between Us Business Entities

This is my reading note for C Corp vs. S Corp, Partnership, Proprietorship, and LLC: What Is the Best Business Entity?. You may consider to create a business probably to save some tax. There are different type of busniess entities in US and you should understand the differences between them before decide which type to choose.