Pyramid in Neural Network

[ dilated-convolution fcn panet multi-scale zigzagnet m2det u-net deep-learning nas-fpn thundernet ssd fpn psp pyramid ]

Pyramid is a widely used technique in vision tasks. It applied when you want to obtain features from different scales. There are generally two types of pyramids available:

image pyramid: where image pyramid is generated first and then feature extract is applied on each level of pyramid. Image pyramid is a very effective way but has high computational cost.
feature pyramid: feature is extract on the original image and next level of feature is built on the feature of previous level. Feature pyramid may reduce the computational cost by reusing the feature from lower level.

In this post, we will focus on feature pyramid based and deep learning based approaches.

SSD

In SSD, features of different scales are used to detect objects of different scales independently.

U-Net

In U-Net, higher levels are upscales to combine to lower level higher resolution feature to generate a high level high resolution feature map. This combination is sometimes refered as skip connection.

FPN

In U-Net only the final layer is used as output, but in FPN, all decode layers are used as output. Note in SSD, lower level doesn’t contain any information from higher level feature; but in FPN, lower level feature contains information from higher level feature, namely top-down.