Performance Guaranteed Network Acceleration via High-Order Residual Quantization

[ ]

Scholars from Shanghai Jiaotong Universities has proposed a new method for compressing the neural networks. Their method is able to reduce the size of the network to 1/30 while maintaining the accuracies.

Architecture

Different from existing pruning method or quantization method, the proposed method is based on recursively binarizing the residuals of the network: in each stage, the residual of the compressed network is binarized to form a new bit; then the new compressed network is applied to compute the residual.

Experiment result on MNIST

Experiment result on Cifar10

Speedup ratio over number of bits

Written on August 29, 2017