Performance Guaranteed Network Acceleration via High-Order Residual Quantization[ ]
Scholars from Shanghai Jiaotong Universities has proposed a new method for compressing the neural networks. Their method is able to reduce the size of the network to
1/30 while maintaining the accuracies.
Different from existing pruning method or quantization method, the proposed method is based on recursively binarizing the residuals of the network: in each stage, the residual of the compressed network is binarized to form a new bit; then the new compressed network is applied to compute the residual.