SDE-Net Paper Summary & Analysis

Paper: https://arxiv.org/pdf/2008.10546.pdf

Discussion led by Phillip Si & Melinda Fang, Intelligent Systems subteam

Objectives of the Paper

Three sentence paper summary: Uncertainty quantification is a fundamental yet unsolved problem for deep learning. The Bayesian framework provides a principled way of uncertainty estimation but is often not scalable to modern deep neural nets. We propose a new method for quantifying uncertainties of DNNs from a dynamical system perspective.

Deep learning techniques have demonstrated high performance on numerous tasks, but recent research on adversarial networks and neural network stability have demonstrated that these models can unexpectedly fail, making current neural networks difficult to deploy in high risk areas like aviation, judiciary, and medicine. Therefore, research is needed to provide neural networks with better uncertainty estimation, allowing humans to intervene if the model is underconfident. A great example of this would be in self-driving cars, where it would be greatly preferable to have a neural network inform the driver that it cannot operate than blindly making decisions.

To address this, it’s useful for models to provide a confidence or uncertainty value with each prediction. Specifically, whenever a model is provided with data that it has not seen before, known as out-of-distribution data, it should notify the user of high uncertainty in its prediction.

Typically, a classifier model has to decide between 2 outcomes even when it has no clue — it just flipped a coin. In real life, a model for diagnosis should not only care about the accuracy but also how certain a prediction is. If the uncertainty is high, then a doctor would take this into account during their decision process.

Some models are used in a low-risk setting (think Netflix movie recommendation), so they may not require explanations. However, interpretability is required to increase the social acceptance of AI.

There are two types of uncertainties: aleatoric uncertainty — the natural randomness inherent in the task and epistemic uncertainty — the model uncertainty caused by insufficient amount of sample data. In many cases it is preferred to separate these two sources, as one particular ML task may prefer data within one uncertainty region than the other.

Paper Contributions

Kong et. al. approached the problem of distinguishing between aleatoric and epistemic uncertainty by implementing an uncertainty-aware neural network. This new neural net models the passing of samples through the hidden layers as a dynamic process akin to that of a moving liquid at a molecular level, hence the application of the Brownian motion formula to this problem. If a model approaches something it’s seen before, the variance of the Brownian motion will be fairly small. Vice versa, if a model approaches a region it is unfamiliar with, the variance of the Brownian motion will be rather large.

The proposed method to quantify Deep Neural Nets (DNNs) is to view DNN transformations as state evolution of a stochastic dynamical system. Then, to classify epistemic uncertainty, a Brownian motion term is introduced. The authors propose SDE-Net, a neural stochastic differential equation model, and analyze the uniqueness and existence of the solution to SDE-Net.

This paper introduces the SDE-Net, which consists of two components, the Drift Net and the Diffusion Net. The Drift Net captures aleatoric uncertainty and takes precedence when faced with in-distribution data. Model output is represented as a probabilistic distribution for classification and as a Gaussian distribution for regression. The Diffusion Net represents the diffusion of the system and takes precedence when faced with out-of-distribution data. This is accomplished with low variation of Brownian motion for in-distribution data. In addition, parameters in the SDE-Net are shared by each layer to reduce memory requirements.

The authors tested how estimated uncertainty could improve model robustness in three benchmarks, out-of-distribution detection, misclassification detection, and adversarial sample detection, compared to current state of the art methods in each task. The network architecture they used was a ResNet with their SDE-Net block in place of residual blocks. In OOD, RMSE performed worse than DeepEnsemble, but was better in AUROC and AUPR, with a similar story in other tasks and data sets as well. In general, they found a worse performance in the main metric that state of the art models use, but better performance in all to almost all other metrics. In active learning however, SDE-Net proved to be far superior on the Year Prediction MSD regression dataset. The RMSE of SDE-Net consistently decreased with further acquisitions with others showing negligible change after a certain point or even increasing.

Paper Limitations, Further Research, and/or Potential Applications

SDE-Net provides future directions for modeling uncertainties with neural nets.

Their approach not only outperformed a lot of the other state-of-the-art techniques, but it is also efficient when it comes to modern DNNs, which usually have a lot of parameters. It can potentially equip neural nets with meaningful uncertainty quantifications for safety-critical applications such as self-driving vehicles and automated medical diagnosis, where it is important to know what a model does not know.

However, there are some strange results, specifically with respect to the active learning section of the paper. In addition, some of the metrics seem to be pretty biased towards their specific model in the paper; for example, other metrics in the classification results section seem to outweigh the classification accuracy, which is usually the main metric used in evaluating results on the datasets.