Performance Guaranteed Network Acceleration via High-Order Residual Quantization
[ ]Scholars from Shanghai Jiaotong Universities has proposed a new method for compressing the neural networks. Their method is able to reduce the size of the network to 1/30
while maintaining the accuracies.
Different from existing pruning method or quantization method, the proposed method is based on recursively binarizing the residuals of the network: in each stage, the residual of the compressed network is binarized to form a new bit; then the new compressed network is applied to compute the residual.
Written on August 29, 2017