Pyramid in Neural Network
[dilated-convolution
fcn
panet
multi-scale
zigzagnet
m2det
u-net
deep-learning
nas-fpn
thundernet
ssd
fpn
psp
pyramid
]
Pyramid is a widely used technique in vision tasks. It applied when you want to obtain features from different scales. There are generally two types of pyramids available:
- image pyramid: where image pyramid is generated first and then feature extract is applied on each level of pyramid. Image pyramid is a very effective way but has high computational cost.
- feature pyramid: feature is extract on the original image and next level of feature is built on the feature of previous level. Feature pyramid may reduce the computational cost by reusing the feature from lower level.
In this post, we will focus on feature pyramid based and deep learning based approaches.
SSD
In SSD, features of different scales are used to detect objects of different scales independently.
U-Net
In U-Net, higher levels are upscales to combine to lower level higher resolution feature to generate a high level high resolution feature map. This combination is sometimes refered as skip connection.
FPN
In U-Net only the final layer is used as output, but in FPN, all decode layers are used as output. Note in SSD, lower level doesn’t contain any information from higher level feature; but in FPN, lower
level feature contains information from higher level feature, namely top-down.
Path Aggregation Network for Instance Segmentation
In PANet, bottom-up network is appended after top-down network.
ZigZagNet: Fusing Top-Down and Bottom-Up Context for Object Segmentation
ZigZagNet improves PANet by adding connections between each layer to top-down to each layer bottom-up.
ThunderNet: Towards Real-time Generic Object Detection
ThunderNet improves efficiency of FPN to enable real time application on embedded system.
Rethinking on Multi-Stage Networks for Human Pose Estimation
Add one more stage of FPN?
M2Det: A Single-Shot Object Detector based on Multi-Level Feature PyramidNetwork.
M2Det can be viewed as a FPN inside the other FPN, thus each level of feature contains information from all scales.
NAS-FPN: Learning Scalable Feature Pyramid Architecturefor Object Detection
Let algorithm search for the optimal FPN for us.
Rethinking Atrous Convolution for Semantic Image Segmentation
Uses dilated (Atrous) convolution to achieve feature pyramid.
Pyramid Scene Parsing Network
Use different pooling stride to achieve feature pyramid.
Unified Perceptual Parsing for Scene Understanding
Adding a perceptual parsing module to the FPN.
Parsing R-CNN for Instance-Level Human Analysis
Rethinking Atrous Convolution for Semantic Image Segmentation
+ Non-local.