Deep residual networks have emerged as a family of extremely deep architectures showing compelling accuracy and nice convergence behaviors. In this paper, we analyze the propagation formulations behind the residual building blocks, which suggest that the forward and backward signals can be directly propagated from one block to any other block when using identity mappings as the skip connections and after-addition activation. A series of ablation experiments support the importance of these identity mappings. This motivates us to propose a new residual unit, which makes training easier and improves generalization. We report improved results using a 1001-layer ResNet on CIFAR-10 (4.62 % error) and CIFAR-100, and a 200-layer ResNet on ImageNet. Code is available at: https://github.com/KaimingHe/resnet-1k-layers.
Identity Mappings in Deep Residual Networks
This paper shows that using identity mappings for skip connections and pre-activation in residual blocks allows signals to flow unimpeded, making it easier to train very deep networks. Through theoretical analysis and ablation studies, the authors introduce a pre-activation residual unit that enables successful training of 1000-layer ResNets and improves CIFAR-10/100 and ImageNet accuracy, influencing later architectures such as ResNet-v2 and numerous deep vision models.
Introduction
Information
- Websitearxiv.org
- AuthorsKaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun
- Published date2016/03/16