Bayesian Online Training of Neural Networks

We view the neural network as a probabilistic model. In this project, we propose an online Variational Bayes approximation. In an online approach, the data points are received one at a time, and the parameters of the neural network are updated after each observation. The Bayesian framework makes an online approach to learning which is natural to implement using Bayes’ rule. The assumption of a small change in the parameters of the neural network between each observation enables us to find an analytical solution to the posterior distribution of the updated parameters. Using the analytical approximation, we can develop an online learning algorithm without resorting to a learning rate hyperparameter.
What are Bayesian Neural Networks?
Neural networks with a prior distribution over the weights (adds uncertainty to the weights). This uncertainty in the weights propagates into uncertainty about predictions and provides robustness against overfitting.
Why do we need Bayesian Learning?
1. Uncertainty- regular neural networks compute point estimates of their weights and therefore make point prediction as well. Moreover, NNs make overly confident decisions about the correct class, prediction or action (when classifying cats and dogs it is not too bad to classify some dogs as cats, but for medical or autonomous driving applications this may be fatal ).
2. Overfitting- modern NNs are prone to overfitting (do not generalize well) and require large datasets to alleviate this symptom. The Bayesian approach is a natural regularization.
3. Catastrophic Forgetting- the tendency of NNs to rapidly lose their previously learned performance when their input distribution is changed (changing a task or a source). Using Bayes rule, the important weights won’t change much while the less important ones will, thus, allowing Continual Learning.
Bayesian Gradient Descent (BGD) Algorithm
An algorithm developed to allow continual learning in BNNs, and thus alleviate the Catastrophic Forgetting symptom. The original paper: "Bayesian Gradient Descent: Online Variational Bayes Learning with Increased Robustness to Catastrophic Forgetting and Weight Pruning" by Chen Zeno, Itay Golan, Elad Hoffer, Daniel Soudry
This Project's Main Goal
We implement the BGD algorithm as a "black-box" for users to try and experiment with a BNN using BGD (see the video at the top) and examine the uncertainty output on a simple regression task as a proof-of-concept. The implementation is Python based using the TensorFlow framework and we call it "tf-bgd".
Download and Learn
The project is available on GitHub at: https://github.com/taldatech/tf-bgd
Video: https://youtu.be/fa-xLXTzZ8I