“MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications” Paper Summary & Analysis

Introduction

In order to increase model accuracy, data scientists often continuously increase the size of models. However, this comes at the expense of efficiency and computational hardware requirements. As a result, these oversized models may become unsuitable for some tasks. Self-driving cars, for example, need to identify problems quickly, or else their systems may not have enough time to resolve them, which can be very dangerous. The goal of this paper is to develop smaller, thinner CNNs that can perform comparably to larger models while still being lightweight enough to run on devices with less computational power, such as mobile devices. Rather than focusing on pruning techniques to decrease the size of pre-existing networks, this paper focuses on creating a lightweight network from scratch and emphasizes the use of thinner, deeper networks rather than making a shallower net.

The MobileNet Architecture

This paper approaches the task of creating more lightweight CNNs by introducing depthwise separable convolutions into their CNN architecture. Depthwise separable convolutions are a form of a factorized convolutions which consist of a depthwise convolution and a 1x1 convolution, which is referred to as a pointwise convolution.

https://www.cs.cornell.edu/courses/cs5670/2019sp/lectures/lec21_cnns.pdf
https://towardsdatascience.com/a-basic-introduction-to-separable-convolutions-b99ec3102728
https://towardsdatascience.com/a-basic-introduction-to-separable-convolutions-b99ec3102728
https://arxiv.org/pdf/1704.04861.pdf
https://arxiv.org/pdf/1704.04861.pdf

A Critique of the Paper

While the paper showed compelling evidence for the effectiveness of MobileNets on a variety of tasks, explanations for the difference in performance were not given. This is especially confusing in the case of the PlaNet MobileNet architecture which managed to outperform the original PlaNet model in 2 scales despite having significantly fewer parameters.

Applications

Due to its small size and efficiency, MobileNets provide users with an option to decide between the trade off of latency and accuracy via adjusting the hyper-parameters so as to find the right sized model for the purpose of their use case. The paper tested MobileNets’ performance on a number of applications, including large scale geolocalization, face attribute classification, object detection, and Face embeddings. We see potential applications in contexts where the loss of accuracy isn’t as costly (for example in VR applications vs automated vehicles) compared to an increased ability to fit scale constraints and an improvement in user experience due to faster responses. In addition, MobileNets might even be coupled with a more complex model to give a timely response followed by a “confirmation” by the complex model, which is likely to have higher accuracy.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Cornell Data Science

Cornell Data Science

Cornell Data Science is an engineering project team @Cornell that seeks to prepare students for a career in data science.