skip to content

Timetable (VMVW02)

Generative models, parameter learning and sparsity

Monday 30th October 2017 to Friday 3rd November 2017

Monday 30th October 2017
09:00 to 09:40 Registration
09:40 to 09:50 Welcome from Christie Marr (INI Deputy Director)
09:50 to 10:40 James Nagy
Spectral Computed Tomography
Co-authors: Martin Andersen (Technical University of Denmark), Yunyi Hu (Emory University)

An active area of interest in tomographic imaging is the goal of quantitative imaging, where in addition to producing an image, information about the material composition of the object is recovered. In order to obtain material composition information, it is necessary to better model of the image formation (i.e., forward) problem and/or to collect additional independent measurements. In x-ray computed tomography (CT), better modeling of the physics can be done by using the more accurate polyenergetic representation of source x-ray beams, which requires solving a challenging nonlinear ill-posed inverse problem. In this talk we explore the mathematical and computational problem of polyenergetic CT when it is used in combination with new energy-windowed spectral CT detectors. We formulate this as a regularized nonlinear least squares problem, which we solve by a Gauss-Newton scheme. Because the approximate Hessian system in the Gauss-Newton scheme is very ill-conditioned, we propose a preconditioner that effectively clusters eigenvalues and, therefore, accelerates convergence when the conjugate gradient method is used to solve the linear subsystems. Numerical experiments illustrate the convergence, effectiveness, and significance of the proposed method.
10:40 to 11:10 Morning Coffee
11:10 to 12:00 Eldad Haber
12:00 to 12:50 Christoph Brune
Cancer ID - From Spectral Segmentation to Deep Learning
One of the most important challenges in health is the fight against cancer. A desired goal is the early detection and guided therapy of cancer patients. A very promising approach is the detection and quantification of circulating tumor cells in blood, called liquid biopsy. However, this task is similar to looking for needles in a haystack, where the needles even have unclear shapes and materials. There is a strong need for reliable image segmentation, classification and a better understanding of the generative composition of tumor cells. For a robust and reproducible quantification of tumor cell features, automatic multi-scale segmentation is the key. In recent years, new theory and algorithms for nonlinear, non-local eigenvalue problems via spectral decomposition have been developed and shown to result in promising segmentation and classification results. We analyze different nonlinear segmentation approaches and evaluate how informative the resulting spectral responses are. The success of our analysis is supported by results of simulated cells and first European clinical studies. In the last part of this talk we switch the viewpoint and study first results for deep learning of tumor cells. Via generative models there is hope for understanding tumor cells much better, however many mathematical questions arise. This is a joint work with Leonie Zeune, Stephan van Gils, Guus van Dalum and Leon Terstappen.
12:50 to 14:00 Lunch @ Wolfson Court
14:00 to 14:50 Lars Ruthotto
PDE-based Algorithms for Convolution Neural Network
This talk presents a new framework for image classification that exploits the relationship between the training of deep Convolution Neural Networks (CNNs) to the problem of optimally controlling a system of nonlinear partial differential equations (PDEs). This new interpretation leads to a variational model for CNNs, which provides new theoretical insight into CNNs and new approaches for designing learning algorithms. We exemplify the myriad benefits of the continuous network in three ways. First, we show how to scale deep CNNs across image resolutions using multigrid methods. Second, we show how to scale the depth of deep CNNS in a shallow-to-deep manner to gradually increase the flexibility of the classifier. Third, we analyze the stability of CNNs and present stable variants that are also reversible (i.e., information can be propagated from input to output layer and vice versa), which in combination allows training arbitrarily deep networks with limited computational resources. This is joint work with Eldad Haber (UBC), Lili Meng (UBC), Bo Chang (UBC), Seong-Hwan Jun (UBC), Elliot Holtham (Xtract Technologies)
14:50 to 15:40 Gitta Kutyniok (Technische Universität Berlin)
Optimal Approximation with Sparsely Connected Deep Neural Networks
Despite the outstanding success of deep neural networks in real-world applications, most of the related research is empirically driven and a mathematical foundation is almost completely missing. One central task of a neural network is to approximate a function, which for instance encodes a classification task. In this talk, we will be concerned with the question, how well a function can be approximated by a neural network with sparse connectivity. Using methods from approximation theory and applied harmonic analysis, we will derive a fundamental lower bound on the sparsity of a neural network. By explicitly constructing neural networks based on certain representation systems, so-called $\alpha$-shearlets, we will then demonstrate that this lower bound can in fact be attained. Finally, we present numerical experiments, which surprisingly show that already the standard backpropagation algorithm generates deep neural networks obeying those optimal approximation rates. This is joint work with H. Bölcskei (ETH Zurich), P. Grohs (Uni Vienna), and P. Petersen (TU Berlin).
15:40 to 16:00 Eva-Maria Brinkmann
Enhancing fMRI Reconstruction by Means of the ICBTV-Regularisation Combined with Suitable Subsampling Strategies and Temporal Smoothing
Based on the magnetic resonance imaging (MRI) technology, fMRI is a noninvasive functional neuroimaging method, which provides maps of the brain at different time steps, thus depicting brain activity by detecting changes in the blood flow and hence constituting an important tool in brain research.
An fMRI screening typically consists of three stages: At first, there is a short low-resolution prescan to ensure the proper positioning of the proband or patient. Secondly, an anatomical high resolution MRI scan is executed and finally the actual fMRI scan is taking place, where a series of data is acquired via fast MRI scans at consecutive time steps thus illustrating the brain activity after a stimulus. In order to achieve an adequate temporal resolution in the fMRI data series, usually only a specific portion of the entire k-space is sampled.
Based on the assumption that the full high-resolution MR image and the fast acquired actual fMRI frames share a similar edge set (and hence the sparsity pattern with respect to the gradient), we propose to use the Infimal Convolution of Bregman Distances of the TV functional (ICBTV), first introduced in \cite{Moeller_et_al}, to enhance the quality of the reconstructed fMRI data by using the full high-resolution MRI scan as a prior. Since in fMRI the hemodynamic response is commonly modelled by a smooth function, we moreover discuss the effect of suitable subsampling strategies in combination with temporal regularisation.

This is joint work with Julian Rasch, Martin Burger (both WWU Münster) and with Ville Kolehmainen (University of Eastern Finland).

[1] {Moeller_et_al} M. Moeller, E.-M. Brinkmann, M. Burger, and T. Seybold: Color Bregman TV. SIAM J. Imag. Sci. 7(4) (2014), pp. 2771-2806.
16:00 to 16:30 Afternoon Tea
16:30 to 17:20 Joan Bruna
Geometry and Topology of Neural Network Optimization
Co-author: Daniel Freeman (UC Berkeley)

The loss surface of deep neural networks has recently attracted interest in the optimization and machine learning communities as a prime example of high-dimensional non-convex problem. Some insights were recently gained using spin glass models and mean-field approximations, but at the expense of simplifying the nonlinear nature of the model.

In this work, we do not make any such assumption and study conditions on the data distribution and model architecture that prevent the existence of bad local minima. We first take a topological approach and characterize absence of bad local minima by studying the connectedness of the loss surface level sets. Our theoretical work quantifies and formalizes two important facts: (i) the landscape of deep linear networks has a radically different topology from that of deep half-rectified ones, and (ii) that the energy landscape in the non-linear case is fundamentally controlled by the interplay between the smoothness of the data distribution and model over-parametrization. Our main theoretical contribution is to prove that half-rectified single layer networks are asymptotically connected, and we provide explicit bounds that reveal the aforementioned interplay.

The conditioning of gradient descent is the next challenge we address. We study this question through the geometry of the level sets, and we introduce an algorithm to efficiently estimate the regularity of such sets on large-scale networks. Our empirical results show that these level sets remain connected throughout all the learning phase, suggesting a near convex behavior, but they become exponentially more curvy as the energy level decays, in accordance to what is observed in practice with very low curvature attractors. Joint work with Daniel Freeman (UC Berkeley). 
17:20 to 18:10 Justin Romberg
Structured solutions to nonlinear systems of equations
We consider the question of estimating a solution to a system of equations that involve convex nonlinearities, a problem that is common in machine learning and signal processing. Because of these nonlinearities, conventional estimators based on empirical risk minimization generally involve solving a non-convex optimization program. We propose a method (called "anchored regression”) that is based on convex programming and amounts to maximizing a linear functional (perhaps augmented by a regularizer) over a convex set. The proposed convex program is formulated in the natural space of the problem, and avoids the introduction of auxiliary variables, making it computationally favorable. Working in the native space also provides us with the flexibility to incorporate structural priors (e.g., sparsity) on the solution. For our analysis, we model the equations as being drawn from a fixed set according to a probability law. Our main results provide guarantees on the accuracy of the estimator in terms of the number of equations weare solving, the amount of noise present, a measure of statistical complexity of the random equations, and thegeometry of the regularizer at the true solution. We also provide recipes for constructing the anchor vector (that determines the linear functional to maximize) directly from the observed data. We will discuss applications of this technique to nonlinear problems including phase retrieval, blind deconvolution, and inverting the action of a neural network. This is joint work with Sohail Bahmani.
18:10 to 19:10 Poster Session & Welcome Wine Reception at INI
Tuesday 31st October 2017
09:00 to 09:50 Stacey Levine
Denoising Geometric Image Features
Given a noisy image, it can sometimes be more productive to denoise a transformed version of the image rather than process the image data directly. In this talk we will discuss several novel frameworks for image denoising, including one that involves smoothing the noisy image’s level line curvature and another that regularizes the components of the noisy image in a moving frame that encodes its local geometry. Both frameworks satisfy some nice unexpected properties that provide justification for this framework. Experiments confirm an improvement over the usual denoising paradigm in terms of both PSNR and SSIM. Moreover, this approach provides a mechanism for preserving geometry in solutions of sparse patch based models that typically exploit self similarity. This is joint work with Thomas Batard, Marcelo Bertalmio, and Gabriela Ghimpeteanu.
09:50 to 10:40 Ozan Öktem
Task Oriented Reconstruction using Deep Learning
Co-Author: Jonas Adler 

Machine learning has been used if image reconstruction for several years, mostly driven by the recent advent of deep learning. Deep learning based reconstruction methods have been shown to give good reconstruction quality by learning a reconstruction operator that maps data directly to reconstruction. These methods typically perform very well when performance is measured using classical quantified, such as the RMSE, but they tend to produce over-smoothed images, reducing their usefulness in applications.  

We propose a framework based on statistical decision theory that allows learning a reconstruction operator that is optimal with respect to a given task, which  can be segmentation of a tumor or classification. In this framework, deep learning is used not only to solve the inverse problem, but also to simultaneously learn how to use the reconstructed image in order to complete an end-task. We demonstrate that the framework is computationally feasible and that it can improve human interpretability of the reconstructions. We also suggest new research directions in the field of data driven, task oriented image reconstruction.  

Related publications: (accepted for publication in Inverse Problems) (submitted to IEEE Transaction for Medical Imaging)
10:40 to 11:10 Morning Coffee
11:10 to 12:00 Lior Horesh
Accelerated Free-Form Model Discovery of Interpretable Models using Small Data
The ability to abstract the behavior of a system or a phenomenon and distill it into a consistent mathematical model is instrumental for a broad range of applications. Historically, models were manually derived in a first principles fashion. The first principles approach often offers the derivation of interpretable models of remarkable levels of universality using little data. Nevertheless, their derivation is time consuming and relies heavily upon domain expertise. Conversely, with the rising pervasiveness of data-driven approaches, the rapid derivation and deployment of models has become a reality. Scalability is gained through dependence upon exploitable structure (functional form). Such structures, in turn, yield non-interpretable models, require Big Data for training, and provide limited predictive power outside the training set span. In this talk, we will introduce an accelerated model discovery approach that attempts to bridge between the two conducts, to enable the discovery of universal, interpretable models, using Small Data. To accomplish that, the proposed algorithm searches for free-form symbolic models, where neither the structure nor the set of operator primitives are predetermined. The discovered models are provably globally optimal, promoting superior predictive power for unseen input. Demonstration of the algorithm in re-discovery of some fundamental laws of science will be provided, and references to on-going work in the discovery of new models for, hitherto, unexplainable phenomena. Globally optimal symbolic regression, NIPS Interpretable ML Workshop, 2017, Globally optimal Mixed Integer Non-Linear Programming (MINLP) formulation for symbolic regression, IBM Technical Report ID 219095, 2016
12:00 to 12:50 Martin Benning
Nonlinear Eigenanalysis of sparsity-promoting regularisation operators
In this talk we analyse Eigenfunctions of nonlinear, variational regularisation operators. We show that they are closely related to a generalisation of singular vectors of compact operators, and demonstrate key mathematical properties. We use them to show how a systematic bias of variational regularisation methods can be corrected with the help of iterative regularisation methods, and discuss conditions that guarantee the decomposition of an additive composition of multiple Eigenfunctions. In the last part of the talk, we focus on utilising the concept of nonlinear Eigenanalysis to learn parametrised regularisations that can effectively separate different geometric structures. This is joint work with Joana Sarah Grah, Guy Gilboa, Carola-Bibiane Schönlieb, Marie Foged Schmidt and Martin Burger.
12:50 to 14:00 Lunch @ Wolfson Court
14:00 to 14:50 Alfred Hero
The tensor graphical lasso (Teralasso)
Co-authors: Kristjian Greenewald (Harvard University), Shuheng Zhou (University of Michigan), Alfred Hero (University of Michigan)

We propose a new ultrasparse graphical model for representing multiway data based on a Kronecker sum representation of the process inverse covariance matrix. This statistical model decomposes the inverse covariance into a linear Kronecker sum representation with sparse Kronecker factors.

Under the assumption that the multiway observations are matrix-normal the l1 sparsity regularized log-likelihood function is convex and admits significantly faster statistical rates of convergence than other sparse matrix normal algorithms such as graphical lasso or Kronecker graphical lasso.

We specify a scalable composite gradient descent method for minimizing the objective function and analyze both the statistical and the computational convergence ratesm, showing that the composite gradient descent algorithm is guaranteed to converge at a geometric rate to the global minimizer. We will illustrate the method on several real multiway datasets, showing that we can recover sparse graphical structures in high dimensional data.

Related Links
14:50 to 15:40 Francis Bach
Breaking the Curse of Dimensionality with Convex Neural Networks

We consider neural networks with a single hidden layer and non-decreasing positively homogeneous activation functions like the rectified linear units. By letting the number of hidden units grow unbounded and using classical non-Euclidean regularization tools on the output weights, they lead to a convex optimization problem and we provide a detailed theoretical analysis of their generalization performance, with a study of both the approximation and the estimation errors. We show in particular that they are adaptive to unknown underlying linear structures, such as the dependence on the projection of the input variables onto a low-dimensional subspace. Moreover, when using sparsity-inducing norms on the input weights, we show that high-dimensional non-linear variable selection may be achieved, without any strong assumption regarding the data and with a total number of variables potentially exponential in the number of observations. However, solving this convex optimization pro blem in infinite dimensions is only possible if the non-convex subproblem of addition of a new unit can be solved efficiently. We provide a simple geometric interpretation for our choice of activation functions and describe simple conditions for convex relaxations of the finite-dimensional non-convex subproblem to achieve the same generalization error bounds, even when constant-factor approximations cannot be found. We were not able to find strong enough convex relaxations to obtain provably polynomial-time algorithms and leave open the existence or non-existence of such tractable algorithms with non-exponential sample complexities.

Related links:
 - JMLR paper
15:40 to 16:00 Jonas Adler
Learned forward operators: Variational regularization for black-box models
In inverse problems, correct modelling of the forward model is typically one of the most important components to obtain good reconstruction quality. Still, most work is done on highly simplified forward models. For example, in Computed Tomography (CT), the true forward model, given by the solution operator for the radiative transport equation, is typically approximated by the ray-transform. The primary reason for this gross simplification is that the higher quality forward models are both computationally costly, and typically do not have an adjoint of the derivative of the forward operator that can be feasibly evaluated. The community is not un-aware of this miss-match, but the work has been focused on “the model is right, lets fix the data”. We instead propose going the other way around by using machine learning in order to learn a mapping from the simplified model to the complicated model using deep neural networks. Hence instead of learning how to correct complicated data so that it matches a simplified forward model, we accept that the data is always right and instead correct the forward model. We then use this learned forward operator, which is given as a composition of a simplified forward operator and a convolutional neural network, as a forward operator in a classical variational regularization scheme. We give a theoretical argument as to why correcting the forward model is more stable than correcting the data and provide numerical examples in Cone Beam CT reconstruction.
16:00 to 16:30 Afternoon Tea
16:30 to 17:20 Julianne Chung
Advancements in Hybrid Iterative Methods for Inverse Problems
Hybrid iterative methods are increasingly being used to solve large, ill-posed inverse problems, due to their desirable properties of (1) avoiding semi-convergence, whereby later reconstructions are no longer dominated by noise, and (2) enabling adaptive and automatic regularization parameter selection. In this talk, we describe some recent advancements in hybrid iterative methods for computing solutions to large-scale inverse problems. First, we consider a hybrid approach based on the generalized Golub-Kahan bidiagonalization for computing Tikhonov regularized solutions to problems where explicit computation of the square root and inverse of the covariance kernel for the prior covariance matrix is not feasible. This is useful for large-scale problems where covariance kernels are defined on irregular grids or are only available via matrix-vector multiplication, e.g., those from the Matern class. Second, we describe flexible hybrid methods for solving l_p regularized inverse problems, where we approximate the p-norm penalization term as a sequence of 2-norm penalization terms using adaptive regularization matrices, and we exploit flexible preconditioning techniques to efficiently incorporate the weight updates. We introduce a flexible Golub-Kahan approach within a Krylov-Tikhonov hybrid framework, such that our approaches extend to general (non-square) l_p regularized problems. Numerical examples from dynamic photoacoustic tomography and space-time deblurring demonstrate the range of applicability and effectiveness of these approaches. This is joint work with Arvind Saibaba, North Carolina State University, and Silvia Gazzola, University of Bath.
17:20 to 18:10 Andreas Hauptmann
Learning iterative reconstruction for high resolution photoacoustic tomography
Recent advances in deep learning for tomographic reconstructions have shown great potential to create accurate and high quality images with a considerable speed-up. In this work we present a deep neural network that is specifically designed to provide high resolution 3D images from restricted photoacoustic measurements. The network is designed to represent an iterative scheme and incorporates gradient information of the data fit to compensate for limited view artefacts. Due to the high complexity of the photoacoustic forward operator, we separate training and computation of the gradient information. A suitable prior for the desired image structures is learned as part of the training. The resulting network is trained and tested on a set of segmented vessels from lung CT scans and then applied to in-vivo photoacoustic measurement data.
Wednesday 1st November 2017
09:00 to 09:50 Mila Nikolova
Below the Surface of the Non-Local Bayesian Image Denoising Method
joint work with Pablo Arias CMLA, ENS Cachan, CNRS, University Paris-Saclay The non-local Bayesian (NLB) patch-based approach of Lebrun, Buades, and Morel [1] is considered as a state-of-the-art method for the restoration of (color) images corrupted by white Gaussian noise. It gave rise to numerous ramiifications like e.g., possible improvements, processing of various data sets and video. This work is the first attempt to analyse the method in depth in order to understand the main phenomena underlying its effectiveness. Our analysis, corroborated by numerical tests, shows several unexpected facts. In a variational setting, the first-step Bayesian approach to learn the prior for patches is equivalent to a pseudo-Tikhonov regularisation where the regularisation parameters can be positive or negative. Practically very good results in this step are mainly due to the aggregation stage - whose importance needs to be re-evaluated. Reference [1] Lebrun, M., Buades, A., Morel, J.M.: A nonlocal Bayesian image denoising algorithm. SIAM J. Imaging Sci.6(3), 1665-1688 (2013)
09:50 to 10:40 Xavier Bresson
Convolutional Neural Networks on Graphs
Convolutional neural networks have greatly improved state-of-the-art performances in computer vision and speech analysis tasks, due to its high ability to extract multiple levels of representations of data. In this talk, we are interested in generalizing convolutional neural networks from low-dimensional regular grids, where image, video and speech are represented, to high-dimensional irregular domains, such as social networks, telecommunication networks, or words' embedding. We present a formulation of convolutional neural networks on graphs in the context of spectral graph theory, which provides the necessary mathematical background and efficient numerical schemes to design fast localized convolutional filters on graphs. Numerical experiments demonstrate the ability of the system to learn local stationary features on graphs.
10:40 to 11:10 Morning Coffee
11:10 to 12:00 Julie Delon
High-Dimensional Mixture Models For Unsupervised Image Denoising (HDMI)
This work addresses the problem of patch-based image denoising through the unsupervised learning of a probabilistic high-dimensional mixture models on the noisy patches. The model, named HDMI, proposes a full modeling of the process that is supposed to have generated the noisy patches. To overcome the potential estimation problems due to the high dimension of the patches, the HDMI model adopts a parsimonious modeling which assumes that the data live in group-specific subspaces of low dimensionalities. This parsimonious modeling allows in turn to get a numerically stable computation of the conditional expectation of the image which is applied for denoising. The use of such a model also permits to rely on model selection tools to automatically determine the intrinsic dimensions of the subspaces and the variance of the noise. This yields a blind denoising algorithm that demonstrates state-of-the-art performance, both when the noise level is known and unknown. Joint work with Charles Bouveyron and Antoine Houdard.
12:00 to 12:50 Bangti Jin
Sparse Recovery by l0 Penalty
Sparsity is one of the powerful tools for signal recovery, and has achieved great success in many practical applications. Conventionally this is realized numerically by imposing an l1 penalty, which is the convex relaxation of the l0 penalty. In this talk, I will discuss our recent efforts in the efficient numerical solution of the l0 problem. I will describe a primal dual active set algorithm, and present some numerical results to illustrate its convergence. This talk is based on joint work with Dr. Yuling Jiao and Dr. Xiliang Lu.
12:50 to 14:00 Lunch @ Wolfson Court
14:00 to 17:00 Free Afternoon
Thursday 2nd November 2017
09:00 to 09:50 Silvia Gazzola
Krylov Subspace Methods for Sparse Reconstruction
Krylov subspace methods are popular numerical linear algebra tools that can be successfully employed to regularize linear large-scale inverse problems, such as those arising in image deblurring and computed tomography. Though they are commonly used as purely iterative regularization methods (where the number of iterations acts as a regularization parameter), they can be also employed in a hybrid fashion, i.e., to solve Tikhonov regularized problems (where both the number of iterations and and the Tikhonov parameter play the role of regularizations parameters, which can be chosen adaptively). Krylov subspace methods can naturally handle unconstrained penalized least squares problems. The goal of this talk is to present a common framework that exploits a flexible version of well-known Krylov methods such as CGLS and GMRES to handle nonnegativity constraints and regularization terms expressed with respect to the 1-norm, resulting in an efficient way to enforce sparse reconstructions of the solution. Numerical experiments and comparisons with other well-known methods for the computation of nonnegative and sparse solutions will be presented. These results have been obtained working jointly with James Nagy (Emory University), Paolo Novati (University of Trieste), Yves Wiaux (Heriot-Watt University), and Julianne Chung (Virginia Polytechnic Institute and State University).
09:50 to 10:40 Pierre Weiss
Generating sampling patterns in MRI
In this work I will describe a few recent results for the generation of sampling patterns in MRI. In the first part of my talk, I will provide mathematical models describing the sampling problem in MRI. This will allow me to show that the traditional way mathematicians look at an MRI scanner is usually way too idealized and that important ingredients are currently missing in the theories. The mathematical modelling shows that a natural way to generate a pattern consists in projecting a density onto a set of admissible measures. I will then describe two projection algorithms. The first one is based on a distance defined through a convolution mapping the measures to L^2, while the second is based on the L^2 transportation distance. After describing a few original applications of this formalism, I will show how it allows to significantly improve scanning times in MRI systems with real in vivo experiments. An outcome of this work is that compressed sensing, as it stands, only allows for moderate acceleration factors, while other ideas that take advantage of all the degrees of freedom of an MRI scanner yield way more significant improvements.
10:40 to 11:10 Morning Coffee
11:10 to 12:00 Anders Hansen
On computational barriers in data science and the paradoxes of deep learning
The use of regularisation techniques such as l^1 and Total Variation in Basis Pursuit and Lasso, as well as linear and semidefinite programming and neural networks (deep learning) has seen great success in data science. Yet, we will discuss the following paradox: it is impossible to design algorithms to find minimisers accurately for these problems when given inaccurate input data, even when the inaccuracies can be made arbitrarily small. The paradox implies that any algorithm designed to solve these problems will fail in the following way: For fixed dimensions and any small accuracy parameter epsilon > 0, one can choose an arbitrary large time T and find an input such that the algorithm will run for longer than T and still not have reached epsilon accuracy. Moreover, it is impossible to determine when the algorithm should halt to achieve an epsilon accurate solution. The largest epsilon for which this failure happens is called the Breakdown-epsilon. Typically, the Breakdown-epsilon > 1/2 even when the the input is bounded by one, is well-conditioned, and the objective function can be computed with arbitrary accuracy.

Despite the paradox we explain why empirically many modern algorithms perform very well in real-world scenarios. In particular, when restricting to subclasses of problems the Breakdown epsilon may shrink. Moreover, typically one can find polynomial time algorithms in L and n, where L   log(1/Breakdown-epsilon), any algorithm, even randomised, becomes arbitrarily slow and will not be able to halt and guarantee L correct digits in the output.

The above result leads to the paradoxes of deep learning: (1) One cannot guarantee the existence of algorithms for accurately training the neural network, even if there is one minimum and no local minima. Moreover, (2) one can have 100% success rate on arbitrarily many test cases, yet uncountably many misclassifications on elements that are arbitrarily close to the training set.

This is joint work with Alex Bastounis (Cambridge) and Verner Vlacic (ETH)
12:00 to 12:50 Josiane Zerubia
Stochastic geometry for automatic object detection and tracking
In this talk, we combine the methods from probability theory and stochastic geometry to put forward new solutions to the multiple object detection and tracking problem in high resolution remotely sensed image sequences. First, we present a spatial marked point process model to detect a pre-defined class of objects based on their visual and geometric characteristics. Then, we extend this model to the temporal domain and create a framework based on spatio-temporal marked point process models to jointly detect and track multiple objects in image sequences. We propose the use of simple parametric shapes to describe the appearance of these objects. We build new, dedicated energy based models consisting of several terms that take into account both the image evidence and physical constraints such as object dynamics, track persistence and mutual exclusion. We construct a suitable optimization scheme that allows us to find strong local minima of the proposed highly non-convex energy. As the simulation of such models comes with a high computational cost, we turn our attention to the recent filter implementations for multiple objects tracking, which are known to be less computationally expensive. We propose a hybrid sampler by combining the Kalman filter with the standard Reversible Jump MCMC. High performance computing techniques are also used to increase the computational efficiency of our method. We provide an analysis of the proposed framework. This analysis yields a very good detection and tracking performance at the price of an increased complexity of the models. Tests have been conducted both on high resolution satellite and UAV image sequences
12:50 to 14:00 Lunch @ Wolfson Court
14:00 to 14:50 Mario Figueiredo
Divide and Conquer: Patch-based Image Denoising, Restoration, and Beyond
Patch-based image processing methods can be seen as an application of the “divide and conquer” strategy: since it is admittedly too difficult to formulate a global prior for an entire image, methods in this class process overlapping patches thereof, and combine the results to obtain an image estimate. A particular class of patch-based methods uses Gaussian mixtures models (GMM) to model the patches, in what can be seen as yet another application of the divide and conquer principle, now in the space of patch configurations. Different components of the GMM specialize in modeling different types of typical patch configurations. Although many other statistical image models exist, using a GMM for patches has several relevant advantages: (i) the corresponding minimum mean squared error (MMSE) estimate can be obtained in closed form; (ii) the variance of the estimate can also be computed, providing a principled way to weight the estimates when combining the patch estimates to obtain the full image estimate; (iii) the GMM parameters can be estimated from a dataset of clean patches, from the noisy image itself, or from a combination of the two; (iv) theoretically, a GMM can approximate arbitrarily well any probability density (under mild conditions). In this talk, I will overview the class of patch/GMM-based approaches to image restoration. After reviewing the first members of this family of methods, which simply addressed denoising, I will describe several more recent advances, namely: use of class-adapted GMMs (i.e., tailored to specific image classes, such as faces, fingerprints, text); tackling inverse problems other than denoising (namely, deblurring, hyperspectral super-resolution, compressive imaging), by plugging GMM-based denoisers in the loop of an iterative algorithm (in what has recently been called the plug-and-play approach); joint restoration/segmentation of images; application to blind deblurring. This is joint work with Afonso Teodoro, José Bioucas-Dias, Marina Ljubenović, and Milad Niknejad.
14:50 to 15:40 Marcelo Pereyra
Bayesian analysis and computation for convex inverse problems: theory, methods, and algorithms
This talk presents some new developments in theory, methods, and algorithms for performing Bayesian inference in high-dimensional inverse problems that are convex, with application to mathematical and computational imaging. These include new efficient stochastic simulation and optimisation Bayesian computation methods that tightly combine proximal optimisation with Markov chain Monte Carlo techniques; strategies for estimating unknown model parameters and performing model selection, methods for calculating Bayesian confidence intervals for images and performing uncertainty quantification analyses; and new theory regarding the role of convexity in maximum-a-posteriori and minimum-mean-square-error estimation. The new theory, methods, and algorithms are illustrated with a range of mathematical imaging experiments.
15:40 to 16:00 Pol del Aguila Pla
Cell detection by functional inverse diffusion and group sparsity
Biological assays in which particles generated by cells bind to a surface and can be imaged to reveal the cells' location are ubiquitous in biochemical, pharmacological and medical research. In this talk, I will describe the physics of these processes, a 3D radiation-diffusion-adsorption-desorption partial differential equation, and present our novel parametrization of its solution (i.e., the observation model) in terms of convolutional operators. Then, I will present our proposal to invert this observation model through a functional optimization problem with group-sparsity regularization and explain the reasoning behind this choice of regularizer. I will also present the results needed to derive the accelerated proximal gradient algorithm for this problem, and justify why we chose to formulate the algorithm in the original function spaces where our observation model operates. Finally, I will briefly comment on our choice of discretization, and show the final performance of our algorithm in both synthetic and real data. arXiv preprints: arXiv:1710.0164 , arXiv:1710.01622
16:00 to 16:30 Afternoon Tea
16:30 to 17:20 Claire Boyer
Structured compressed sensing and recent theoretical advances on optimal sampling
Joint works with Jérémie Bigot and Pierre Weiss on the one hand, and Ben Adcock on the other hand. First, we will theoretically justify the applicability of compressed sensing (CS) in real-life applications. To do so, CS theorems compatible with physical acquisition constraints will be introduced. These results do not only encompass structure in the acquisition but also structured sparsity of the signal of interest. This theory considerably extends the standard framework of CS. Secondly, recent advances on optimal sampling in CS will be presented, in the sense that the sampling strategy minimizes the bound on the required number of measurements for CS recovery.
17:20 to 18:10 Tuomo Valkonen
What do regularisers do?
Which regulariser is the best? Is any of them any good? Do they introduce artefacts? What other qualitative properties do they have? These are some questions, on which I want to shed some light of the early dawn. Specifically, I will firstly discuss recent work on natural conditions, based on an analytical study of a bilevel learning approach, to ensure that regularisation does indeed improve an image. Based on a more computational study, based on bilevel learning, I will also try to answer the question, which constructed regulariser is the best one. Secondly, I will discuss geometrical aspects of the solutions to higher-order regularised imaging problems.
19:30 to 22:00 Formal Dinner at Corpus Christi College
Friday 3rd November 2017
09:00 to 09:50 Irene Waldspurger (Université Paris-Dauphine); (INRIA Paris - Rocquencourt)
Alternating projections for phase retrieval with random sensing vectors
Phase retrieval consists in reconstructing an unknown element in a complex vector space from the modulus of linear measurements. The first reconstruction algorithms for which theoretical guarantees could be proven relied on convexification techniques. It has only recently been realized that similar guarantees hold for non-convex local search methods, that are faster and simpler, provided that their starting point is carefully chosen. We will explain how to establish these guarantees for the most well-known local search method: alternating projections. We will also discuss the role of the initialization method.
09:50 to 10:40 Martin Holler
Analysis and applications of structural-prior-based total variation regularization for inverse problems
Structural priors and joint regularization techniques, such as parallel level set methods and joint total variation, have become quite popular recently in the context of variational image processing. Their main application scenario are particular settings in multi-modality/multi-spectral imaging, where there is an expected correlation between different channels of the image data. In this context, one can distinguish between two different approaches for exploiting such correlations: Joint reconstruction techniques that tread all available channels equally and structural prior techniques that assume some ground truth structural information to be available. This talk focuses on a particular instance of the second type of methods, namely structural total-variation-type functionals, i.e., functionals which integrate a spatially-dependent pointwise function of the image gradient for regularization. While this type of methods has been shown to work well in practical applications, some of their analytical properties are not immediate. Those include a proper definition for BV functions and non-smooth a-priory data as well as existence results and regularization properties for standard inverse problem settings. In this talk we address some of these issues and show how they can partially be overcome using duality. Employing the framework of functions of a measure, we define structural-TV-type functionals via lower-semi-continuous relaxation. Since the relaxed functionals are, in general, not explicitly available, we show that instead of the classical Tikhonov regularization problem, one can equivalently solve a saddle-point problem where no a priori knowledge of the relaxation is needed. In particular, motivated by concrete applications, we deduce corresponding results for linear inverse problems with norm and Poisson log-likelihood data discrepancy terms. The talk concludes with proof-of-concept numerical examples. This is joint work with M. Hintermüller and K. Papafitsoros (both from the Weierstrass Institute Berlin)
10:40 to 11:10 Morning Coffee
11:10 to 12:00 Raymond Chan
A Nuclear-norm Model for Multi-Frame Super-resolution Reconstruction
In this talk, we give a new variational approach to obtain super-resolution images from multiple low-resolution image frames extracted from video clips. First the displacement between the low-resolution frames and the reference frame are computed by an optical flow algorithm. The displacement matrix is then decomposed into product of two matrices corresponding to the integer and fractional displacement matrices respectively. The integer displacement matrices give rise to a non-convex low-rank prior which is then convexified to give the nuclear-norm regularization term. By adding a standard 2-norm data fidelity term to it, we obtain our proposed nuclear-norm model. Alternating direction method of multipliers can then be used to solve the model. Comparison of our method with other models on synthetic and real video clips shows that our resulting images are more accurate with less artifacts. It also provides much finer and discernable details. Joint work with Rui Zhao. Research supported by HKRGC.
12:00 to 12:50 Mihaela Pricop-jeckstadt
From spatial learning to machine learning: an unsupervised approach with applications to behavioral science
In this talk we consider an example-driven approach for identifying ability patterns from data partitions based on learning behaviour in the water maze experiment. A modification of the k-means algorithm for longitudinal data as introduced in [1] is used to identify clusters based on the learning variable (see [3]). The association between these clusters and the flying ability variables is statistically tested in order to characterize the partitions in terms of flying traits. Since the learning variables seem to reflect flying abilities, we propose a new sparse clustering algorithm in an approach modelling the covariance matrix by a Kronecker product. Consistency and an EM-algorithm are studied in this framework also. References: 1. Genolini, C.; Ecochard, R.; Benghezal, M. et al., ''kmlShape: An Efficient Method to Cluster Longitudinal Data (Time-Series) According to Their Shapes'', PLOS ONE; Vol. 11, 2016.2. Sung, K.K. and Poggio, T., ''Example-based learning for view-based human face detection''; IEEE Transactions on pattern analysis and machine intelligence, Vol. 20, 39-51, 1998.; 3. Rosipal, R. and Kraemer, N. , ''Overview and recent advances in partial least squares'',  Subspace, latent structure and feature selection,  Book Series: Lecture Notes in Computer Science; Vol. 3940, 34-51, 2006.
12:50 to 14:00 Lunch @ Wolfson Court
14:00 to 14:50 Robert Plemmons
Sparse Recovery Algorithms for 3D Imaging using Point Spread Function Engineering
Co-authors: Chao Wang (Mathematics, Chinese University of Hong Kong), Raymond Chan (Mathematics, Chinese University of Hong Kong), Sudhakar Prasad (Physics, University of New Mexico)

Imaging and localizing point sources with high accuracy in a 3D volume is an important but challenging task. For example, super-resolution 3D single molecule localization is an area of intense interest in biology (cell imaging, folding, membrane behavior, etc.), in chemistry (spectral diffusion, molecular distortions, etc.), and in physics (structures of materials, quantum optics, etc.). We consider here the high-resolution imaging problem of 3D point source image recovery from 2D data using methods based on point spread function (PSF) design. The methods involve a new technique, recently patented by S. Prasad, for applying rotating point spread functions with a single lobe to obtain depth from defocus. The amount of rotation of the PSF encodes the depth position of the point source. The distribution of point sources is discretized on a cubical lattice where the indexes of nonzero entries represent the 3D locations of point sources. The values of these entries are the point source fluxes. Finding the locations and fluxes is a large-scale sparse 3D inverse problem and we have developed solution algorithms based on sparse recovery using non-convex optimization. Applications to high-resolution single molecule localization microscopy are described, as well as localization of space debris using a space-based telescope. Sparse recovery optimization methods, including the Continuous Exact L0 (CEL0) algorithm, are used in our numerical experiments.
14:50 to 15:40 Jeff Calder
The weighted p-Laplacian and semi-supervised learning
Semi-supervised learning refers to machine learning algorithms that make use of both labeled data and unlabeled data for learning tasks. Examples include large scale nonparametric regression and classification problems, such as predicting voting preferences of social media users, or classifying medical images. In today's big data world, there is an abundance of unlabeled data, while labeled data often requires expert labeling and is expensive to obtain. This has led to a resurgence of semi-supervised learning techniques, which use the topological or geometric properties of large amounts of unlabeled data to aid the learning task. In this talk, I will discuss some new rigorous PDE scaling limits for existing semisupervised learning algorithms and their practical implications. I will also discuss how these scaling limits suggest new ideas for fast algorithms for semi-supervised learning. 
15:40 to 16:10 Afternoon Tea
University of Cambridge Research Councils UK
    Clay Mathematics Institute London Mathematical Society NM Rothschild and Sons