Implicit bias in optimization algorithms

On many ML problems (most notably DL’s neural networks), different optimization algorithms lead to different results. This is known as implicit bias of the optimization algorithm.

Soudry et al. have shown that GD algorithm, when operating on Linearly separable data, given enough time, leads to the L2 Max-Margin solution obtained by SVM.

In our project, we empirically test different optimization algorithms performance on Linearly separable data. We analyze impact of hyper-parameters and initialization on the result, and attempt to characterize the obtained solutions bias. All benchmark performance and behavior was tested against GD.