# On computational barriers in data science and the paradoxes of deep learning

Presented by:
Anders Hansen University of Cambridge
Date:
Thursday 2nd November 2017 - 11:10 to 12:00
Venue:
INI Seminar Room 1
Abstract:
The use of regularisation techniques such as l^1 and Total Variation in Basis Pursuit and Lasso, as well as linear and semidefinite programming and neural networks (deep learning) has seen great success in data science. Yet, we will discuss the following paradox: it is impossible to design algorithms to find minimisers accurately for these problems when given inaccurate input data, even when the inaccuracies can be made arbitrarily small. The paradox implies that any algorithm designed to solve these problems will fail in the following way: For fixed dimensions and any small accuracy parameter epsilon > 0, one can choose an arbitrary large time T and find an input such that the algorithm will run for longer than T and still not have reached epsilon accuracy. Moreover, it is impossible to determine when the algorithm should halt to achieve an epsilon accurate solution. The largest epsilon for which this failure happens is called the Breakdown-epsilon. Typically, the Breakdown-epsilon > 1/2 even when the the input is bounded by one, is well-conditioned, and the objective function can be computed with arbitrary accuracy.

Despite the paradox we explain why empirically many modern algorithms perform very well in real-world scenarios. In particular, when restricting to subclasses of problems the Breakdown epsilon may shrink. Moreover, typically one can find polynomial time algorithms in L and n, where L < log(1/Breakdown-epsilon) is the number of correct digits in the computed solution and n is the size of the input data. However, the Breakdown-epsilon is the breaking point, and for L >  log(1/Breakdown-epsilon), any algorithm, even randomised, becomes arbitrarily slow and will not be able to halt and guarantee L correct digits in the output.

The above result leads to the paradoxes of deep learning: (1) One cannot guarantee the existence of algorithms for accurately training the neural network, even if there is one minimum and no local minima. Moreover, (2) one can have 100% success rate on arbitrarily many test cases, yet uncountably many misclassifications on elements that are arbitrarily close to the training set.

This is joint work with Alex Bastounis (Cambridge) and Verner Vlacic (ETH)
The video for this talk should appear here if JavaScript is enabled.
If it doesn't, something may have gone wrong with our embedded player.
We'll get it fixed as soon as possible.