Image Segmentation

[ fcn skip mask-rcnn unet refinenet segnet deeplab deep-learning dilated-convolutions image-segmentation pspnet scribble ]

There are two types of image segmentation:

Examples

Fully Convolution Neural Network (FCN): FCN convert the fully layers in traditional network used for image classifcation, e.g., Alexnet, VGG16, to convolution layer. Thus is could generate a probability map of each pixel.
UNet: it combines two parts: left part uses convolution and max-pooling to extract feature; right part uses upsampling and skip (input from lower layer of left part) to generate the label map.
SegNet: similar to UNet, but it doesn’t use skip to combine the input from lower layer of left part (refered encoder network).
Dilated Convolutions: the problem in FCN is that, using pooling and then up-sampling will cause data loss. Dilate convolution resolves this problem by adding dilate to convolution, which increases the field size while doesn’t reduce the output size as pooling dose.
RefineNet: similar to UNet, but utilizes the ResNet as the base.
PspNet: is applies the idea of spatial pyramid pooling to image segmentation,
DeepLab: combines Atrous Convolution (similar to Dilated Convolutions) with PspNet.
Mask-R-CNN: uses the idea of object detection for semantic segmentation, where the probability of each boundary box is used as a response map, where softmax is then applied to generate the mask.

transposed convolution: a conjugate pair of convolution operator, whose forward propgation is the backward propagation of convolution operation and vice versa
skip: combine the output of intermidiate layers to have multiple level features

Scribble uses a few simple scribbles as the label of the training image. Cost function is \(\sum_{i}\psi _i^{scr}\left(y_i|X,S\right)+\sum i-logP\left(y_i| X,\theta\right)+\sum{i,j}\psi _{ij}\left(y_i,y_j|X\right)\)
Image-level label the label if provides to image level and there is no pixel level label, like image classificaition case. Cost function is \(\underset{\theta ,P}{minimize}\qquad D(P(X)||Q(X|\theta ))\ subject\to\qquad A\overrightarrow{P} \geqslant \overrightarrow{b},\sum_{X}^{ }P(X)=1\)
Bounding box and label: the label is some bounding boxes and their labels, as object detection case. Cost function is \(P\left ( x,y,z;\theta \right ) = P\left ( x \right )\left (\prod_{m=1}^{M} P\left ( y_m|x;\theta \right )\right )P\left ( z|y \right )\)

Written on April 2, 2019