Effcient Deep Neural Network[
In this post, we will introduce some neural networks which are suitable for running on mobile devices.
SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size
SqueezeNet is one of the first work to reduce the network size and computation cost. It utilizes the following techniques:
- Replace 3x3 filters with 1x1 filters.
- Decrease the number of input channels to 3x3 filters.
- Downsample late in the network so that convolution layers have large activation maps.
fire module which is comprised of a squeeze convolution layer (which has only 1x1 filters), feeding into an expand layer that has a mix of 1x1 and 3x3 convolution. Bypass is applied cross layers.
Flattened Convolutional Neural Networks for Feedforward Acceleration
Flattened network identifies the redundancy of parameter of convolution filters and address it via low rank approximation. The convolution can be represented via matrix product as $F \times X$, if the filter can be low rank approximated as $F = A \times B$, then $F \times X = A \times (B \times X))$. Flattened network uses 1D filter. Flattened network uses rank-one filter. The saving computation cost for a filter of size $K_x \times K_y \times M \times N$ for flattened network over standard one would be
Xception: Deep Learning with Depthwise Separable Convolutions
Similar to the idea of MobileNet but working on Inception.
MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications
MobileNet mainly utilizes the separable convolution to reduce the parameter numbers and computational cost. It factorizes the convolution to a pair of depthwise convolution (whose input channel and output channel is one) and pointwise convolution (whose kernel size is 1x1). Depthwise convolution extract spatial information and pointwise convolution combines feature cross channels. For convolution with size $K_x \times K_y \times N$ on input $P \times Q \times M$, the improvement of parameter size and cost for separable convoltuion over standard convolution would be:
The difference of MobileNet V2 to V1 is the inverted residual with linear bottleneck. This module takes as an input a low-dimensional compressed representation which is first expanded to high dimension and filtered with a lightweight depthwise convolution. Features are subsequently projected back to a low-dimensional representation with a linear convolution.
Factorized convolutional neural networks
Factorized network is Similar as MobileNet, but also has the idea of residual network.
MixNet finds larger kernel size (up to 9x9) tends to improve the performance on image classification and object detection compared with 3x3 kernel size used in MobileNet V3. As a result, MixNet proposes to have convolution with different kernel size in parallel and combined via concatenation, namely MdConv. MdConv is similar to group wise convolution, where each group has different kernel size.
def mdconv(x, filters, **args): G = len(filters) y =  for xi, fi in zip(tf.split(x, G, axis=-1), filters): y.append(tf.nn.depthwise_conv2d(xi, fi, **args)) return tf.concat(y, axis=-1)