Do Better ImageNet Models Transfer Better? Paper Summary & Analysis

Paper Objective

It is often implicitly assumed that models which perform well on ImageNet would perform better in other CV tasks as well. This paper looks to empirically investigate if it is true that models trained on ImageNet perform better on other CV datasets because they have been trained on ImageNet, or simply because their architectures are well suited for general CV tasks. More broadly, this paper discusses if CV is overfitting to the ImageNet dataset.

Background

The background of the paper consists of basic fundamental knowledge of modern computer vision architectures and the prevalence of ImageNet as a dataset used to train state-of-the-art models. In addition, the reader should be familiar with transfer learning in computer vision and the two methods that are used in the paper’s experiments.

  1. Fixed feature extraction: the final layer of the Image-Net trained network is removed in favour of a linear classifier, which outputs the class prediction over the classes of the new (target) dataset.
  2. Fine-tuning: the weights of the ImageNet pretrained model are treated as an initialisation for the model trained on the new (target) dataset

Paper Contributions

The paper performs a rigorous statistical analysis comparing ImageNet transfer performance using Fixed Feature Extraction and fine-tuning. Using robust spearman correlation metrics, they compare whether this correlation is statistically robust. Their main contribution is this rigorous analysis, answering to a high degree of confidence that improved ImageNet performance is strongly correlated with improved transfer performance, and reassuringly demonstrates that Computer Vision, as a field, has not overfit on ImageNet as a dataset.

  1. the absence of a scale parameter (γ) for batch normalization layers
  2. the use of label smoothing
  3. dropout
  4. the presence of an auxiliary classifier head
https://openaccess.thecvf.com/content_CVPR_2019/papers/Kornblith_Do_Better_ImageNet_Models_Transfer_Better_CVPR_2019_paper.pdf

Conclusion

Ultimately, because of this high correlation, CV can safely continue using ImageNet as a core benchmark for understanding CV model performance. However, it is not particularly clear why certain types of regularizations reduce transfer performance, rather than as expected, improving it. Nonetheless, this research shows that it is always better to start with an already-trained ImageNet model for other CV tasks rather than initializing from random; even if there are no performance gains, the drastically improved speed of convergence easily pays for itself.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Cornell Data Science

Cornell Data Science

Cornell Data Science is an engineering project team @Cornell that seeks to prepare students for a career in data science.