Tag: resnet
- Battle of the Backbones A Large-Scale Comparison of Pretrained Models across Computer Vision Tasks (29 Oct 2023)
This is my reading note for Battle of the Backbones: A Large-Scale Comparison of Pretrained Models across Computer Vision Tasks. This paper benchmarks different vision backbones and found that supervised ConvNext may show best performance. After it, supervised swin-transformer and clip based transformer is also very competitive. Different vision tasks shows highly correlated performance for different backbones.
- DiffusionDet Diffusion Model for Object Detection (06 Oct 2023)
This is my reading note for DiffusionDet: Diffusion Model for Object Detection. This paper formulates the object detection problem as a diffusion process: recover object bounding box from noisy estimation. The initial estimation could be from purely random Gaussian noise. One benefit of this method is that it could automatically handle different number of bounding boxes
- Pix2seq A Language Modeling Framework for Object Detection (28 Sep 2022)
Pix2seq: A Language Modeling Framework for Object Detection casts object detection as a language modeling task conditioned on the observed pixel inputs. Object descriptions (e.g., bounding boxes and class labels) are expressed as sequences of discrete tokens, and we train a neural network to perceive the image and generate the desired sequence. Our approach is based mainly on the intuition that if a neural network knows about where and what the objects are, we just need to teach it how to read them out. Experiment results are shown in Table 1, which indicates Pix2seq achieves state of art result on coco.
- CLIP Learning Transferable Visual Models From Natural Language Supervision (27 Sep 2022)
This my reading note on Learning Transferable Visual Models From Natural Language Supervision. The proposed method is called Contrastive Language-Image Pre-training or CLIP. State-of-the-art computer vision systems are trained to predict a fixed set of predetermined object categories. We demonstrate that the simple pre-training task of predicting which caption (freeform text instead of strict labeling) goes with which image is an efficient and scalable way to learn SOTA image representations from scratch on a dataset of 400 million (image, text) pairs collected from the internet. After pre-training, natural language is used to reference learned visual concepts (or describe new ones) enabling zero-shot transfer of the model to downstream tasks.
- Modern Convolution Neutral Network (20 Mar 2022)
Convolutional Neural Networks (ConvNets or CNNs) are a class of neural networks algorithms that are mostly used in visual recognition tasks such as image classification, object detection, and image segmentation. The use of ConvNets in visual recognition is inarguably one of the biggest inventions of decade 2010s in deep learning community.
- Must-read AI Papers (16 Feb 2021)
I will create a new reading note series based on Must-read AI Papers from Crossminds.ai.
- Convolution Nerual Network Backbone (15 Apr 2019)
Convolution Nerual Network (CNN) has been used in many visual tasks. You may find the networks for varying types of visual tasks share similar set of feature extraction layer, which is referred as backbone. Researchers typically use backbone which has been succesful in ImageNet competion and combine them with different loss functions to solve different type of visual tasks.
- ResNet and Its Variations (08 Apr 2019)
ResNet