Pyramid in Neural Network

[ zigzagnet  deep-learning  panet  psp  m2det  multi-scale  ssd  fpn  dilated-convolution  fcn  u-net  pyramid  nas-fpn  thundernet  ]

Pyramid is a widely used technique in vision tasks. It applied when you want to obtain features from different scales. There are generally two types of pyramids available:

  • image pyramid: where image pyramid is generated first and then feature extract is applied on each level of pyramid. Image pyramid is a very effective way but has high computational cost.
  • feature pyramid: feature is extract on the original image and next level of feature is built on the feature of previous level. Feature pyramid may reduce the computational cost by reusing the feature from lower level.

In this post, we will focus on feature pyramid based and deep learning based approaches.


In SSD, features of different scales are used to detect objects of different scales independently.


In U-Net, higher levels are upscales to combine to lower level higher resolution feature to generate a high level high resolution feature map. This combination is sometimes refered as skip connection.


In U-Net only the final layer is used as output, but in FPN, all decode layers are used as output. Note in SSD, lower level doesn’t contain any information from higher level feature; but in FPN, lower level feature contains information from higher level feature, namely top-down.

Path Aggregation Network for Instance Segmentation

In PANet, bottom-up network is appended after top-down network.

ZigZagNet: Fusing Top-Down and Bottom-Up Context for Object Segmentation

ZigZagNet improves PANet by adding connections between each layer to top-down to each layer bottom-up.

ThunderNet: Towards Real-time Generic Object Detection

ThunderNet improves efficiency of FPN to enable real time application on embedded system.

Rethinking on Multi-Stage Networks for Human Pose Estimation

Add one more stage of FPN?

M2Det: A Single-Shot Object Detector based on Multi-Level Feature PyramidNetwork.

M2Det can be viewed as a FPN inside the other FPN, thus each level of feature contains information from all scales.

NAS-FPN: Learning Scalable Feature Pyramid Architecturefor Object Detection

Let algorithm search for the optimal FPN for us.

Rethinking Atrous Convolution for Semantic Image Segmentation

Uses dilated (Atrous) convolution to achieve feature pyramid.

Pyramid Scene Parsing Network

Use different pooling stride to achieve feature pyramid.

Unified Perceptual Parsing for Scene Understanding

Adding a perceptual parsing module to the FPN.

Parsing R-CNN for Instance-Level Human Analysis

Rethinking Atrous Convolution for Semantic Image Segmentation + Non-local.

Written on June 24, 2019