Generative models, parameter learning and sparsity
Monday 30th October 2017 to Friday 3rd November 2017
09:00 to 09:40  Registration  
09:40 to 09:50  Welcome from Christie Marr (INI Deputy Director)  
09:50 to 10:40 
James Nagy Spectral Computed Tomography
Coauthors: Martin Andersen (Technical University of Denmark), Yunyi Hu (Emory University) An active area of interest in tomographic imaging is the goal of quantitative imaging, where in addition to producing an image, information about the material composition of the object is recovered. In order to obtain material composition information, it is necessary to better model of the image formation (i.e., forward) problem and/or to collect additional independent measurements. In xray computed tomography (CT), better modeling of the physics can be done by using the more accurate polyenergetic representation of source xray beams, which requires solving a challenging nonlinear illposed inverse problem. In this talk we explore the mathematical and computational problem of polyenergetic CT when it is used in combination with new energywindowed spectral CT detectors. We formulate this as a regularized nonlinear least squares problem, which we solve by a GaussNewton scheme. Because the approximate Hessian system in the GaussNewton scheme is very illconditioned, we propose a preconditioner that effectively clusters eigenvalues and, therefore, accelerates convergence when the conjugate gradient method is used to solve the linear subsystems. Numerical experiments illustrate the convergence, effectiveness, and significance of the proposed method. 
INI 1  
10:40 to 11:10  Morning Coffee  
11:10 to 12:00 
Eldad Haber tba 
INI 1  
12:00 to 12:50 
Christoph Brune Cancer ID  From Spectral Segmentation to Deep Learning
One of the most important challenges in health is the fight against cancer. A desired goal is the early detection and guided therapy of cancer patients. A very promising approach is the detection and quantification of circulating tumor cells in blood, called liquid biopsy. However, this task is similar to looking for needles in a haystack, where the needles even have unclear shapes and materials. There is a strong need for reliable image segmentation, classification and a better understanding of the generative composition of tumor cells.
For a robust and reproducible quantification of tumor cell features, automatic multiscale segmentation is the key. In recent years, new theory and algorithms for nonlinear, nonlocal eigenvalue problems via spectral decomposition have been developed and shown to result in promising segmentation and classification results. We analyze different nonlinear segmentation approaches and evaluate how informative the resulting spectral responses are. The success of our analysis is supported by results of simulated cells and first European clinical studies.
In the last part of this talk we switch the viewpoint and study first results for deep learning of tumor cells. Via generative models there is hope for understanding tumor cells much better, however many mathematical questions arise.
This is a joint work with Leonie Zeune, Stephan van Gils, Guus van Dalum and Leon Terstappen.

INI 1  
12:50 to 14:00  Lunch @ Wolfson Court  
14:00 to 14:50 
Lars Ruthotto PDEbased Algorithms for Convolution Neural Network
This talk presents a new framework for image classification that exploits the relationship between the training of deep Convolution Neural Networks (CNNs) to the problem of optimally controlling a system of nonlinear partial differential equations (PDEs). This new interpretation leads to a variational model for CNNs, which provides new theoretical insight into CNNs and new approaches for designing learning algorithms.
We exemplify the myriad benefits of the continuous network in three ways. First, we show how to scale deep CNNs across image resolutions using multigrid methods. Second, we show how to scale the depth of deep CNNS in a shallowtodeep manner to gradually increase the flexibility of the classifier. Third, we analyze the stability of CNNs and present stable variants that are also reversible (i.e., information can be propagated from input to output layer and vice versa), which in combination allows training arbitrarily deep networks with limited computational resources.
This is joint work with Eldad Haber (UBC), Lili Meng (UBC), Bo Chang (UBC), SeongHwan Jun (UBC), Elliot Holtham (Xtract Technologies)

INI 1  
14:50 to 15:40 
Gitta Kutyniok (Technische Universität Berlin) Optimal Approximation with Sparsely Connected Deep Neural Networks
Despite the outstanding success of deep neural networks in realworld applications, most of the related research is
empirically driven and a mathematical foundation is almost completely missing. One central task of a neural network
is to approximate a function, which for instance encodes a classification task. In this talk, we will be concerned
with the question, how well a function can be approximated by a neural network with sparse connectivity. Using methods
from approximation theory and applied harmonic analysis, we will derive a fundamental lower bound on the sparsity of
a neural network. By explicitly constructing neural networks based on certain representation systems, socalled
$\alpha$shearlets, we will then demonstrate that this lower bound can in fact be attained. Finally, we present
numerical experiments, which surprisingly show that already the standard backpropagation algorithm generates deep
neural networks obeying those optimal approximation rates. This is joint work with H. Bölcskei (ETH Zurich), P. Grohs
(Uni Vienna), and P. Petersen (TU Berlin).

INI 1  
15:40 to 16:00 
EvaMaria Brinkmann Enhancing fMRI Reconstruction by Means of the ICBTVRegularisation Combined with Suitable Subsampling Strategies and Temporal Smoothing
Based on
the magnetic resonance imaging (MRI) technology, fMRI is a noninvasive
functional neuroimaging method, which provides maps of the brain at different
time steps, thus depicting brain activity by detecting changes in the blood
flow and hence constituting an important tool in brain research. An fMRI screening typically consists of three stages: At first, there is a short lowresolution prescan to ensure the proper positioning of the proband or patient. Secondly, an anatomical high resolution MRI scan is executed and finally the actual fMRI scan is taking place, where a series of data is acquired via fast MRI scans at consecutive time steps thus illustrating the brain activity after a stimulus. In order to achieve an adequate temporal resolution in the fMRI data series, usually only a specific portion of the entire kspace is sampled. Based on the assumption that the full highresolution MR image and the fast acquired actual fMRI frames share a similar edge set (and hence the sparsity pattern with respect to the gradient), we propose to use the Infimal Convolution of Bregman Distances of the TV functional (ICBTV), first introduced in \cite{Moeller_et_al}, to enhance the quality of the reconstructed fMRI data by using the full highresolution MRI scan as a prior. Since in fMRI the hemodynamic response is commonly modelled by a smooth function, we moreover discuss the effect of suitable subsampling strategies in combination with temporal regularisation. This is joint work with Julian Rasch, Martin Burger (both WWU Münster) and with Ville Kolehmainen (University of Eastern Finland). [1] {Moeller_et_al} M. Moeller, E.M. Brinkmann, M. Burger, and T. Seybold: Color Bregman TV. SIAM J. Imag. Sci. 7(4) (2014), pp. 27712806. 
INI 1  
16:00 to 16:30  Afternoon Tea  
16:30 to 17:20 
Joan Bruna Geometry and Topology of Neural Network Optimization
Coauthor: Daniel Freeman (UC Berkeley) The loss surface of deep neural networks has recently attracted interest in the optimization and machine learning communities as a prime example of highdimensional nonconvex problem. Some insights were recently gained using spin glass models and meanfield approximations, but at the expense of simplifying the nonlinear nature of the model. In this work, we do not make any such assumption and study conditions on the data distribution and model architecture that prevent the existence of bad local minima. We first take a topological approach and characterize absence of bad local minima by studying the connectedness of the loss surface level sets. Our theoretical work quantifies and formalizes two important facts: (i) the landscape of deep linear networks has a radically different topology from that of deep halfrectified ones, and (ii) that the energy landscape in the nonlinear case is fundamentally controlled by the interplay between the smoothness of the data distribution and model overparametrization. Our main theoretical contribution is to prove that halfrectified single layer networks are asymptotically connected, and we provide explicit bounds that reveal the aforementioned interplay. The conditioning of gradient descent is the next challenge we address. We study this question through the geometry of the level sets, and we introduce an algorithm to efficiently estimate the regularity of such sets on largescale networks. Our empirical results show that these level sets remain connected throughout all the learning phase, suggesting a near convex behavior, but they become exponentially more curvy as the energy level decays, in accordance to what is observed in practice with very low curvature attractors. Joint work with Daniel Freeman (UC Berkeley). 
INI 1  
17:20 to 18:10 
Justin Romberg Structured solutions to nonlinear systems of equations
We consider the question of estimating a solution to a system of equations that involve convex nonlinearities, a problem that is common in machine learning and signal processing. Because of these nonlinearities, conventional estimators based on empirical risk minimization generally involve solving a nonconvex optimization program. We propose a method (called "anchored regression”) that is based on convex programming and amounts to maximizing a linear functional (perhaps augmented by a regularizer) over a convex set.
The proposed convex program is formulated in the natural space of the problem, and avoids the introduction of auxiliary variables, making it computationally favorable. Working in the native space also provides us with the flexibility to incorporate structural priors (e.g., sparsity) on the solution.
For our analysis, we model the equations as being drawn from a fixed set according to a probability law. Our main results provide guarantees on the accuracy of the estimator in terms of the number of equations weare solving, the amount of noise present, a measure of statistical complexity of the random equations, and thegeometry of the regularizer at the true solution. We also provide recipes for constructing the anchor vector (that determines the linear functional to maximize) directly from the observed data.
We will discuss applications of this technique to nonlinear problems including phase retrieval, blind deconvolution, and inverting the action of a neural network.
This is joint work with Sohail Bahmani.

INI 1  
18:10 to 19:10  Poster Session & Welcome Wine Reception at INI 
09:00 to 09:50 
Stacey Levine Denoising Geometric Image Features
Given a noisy image, it can sometimes be more productive to denoise a transformed version of the image rather than process the image data directly. In this talk we will discuss several novel frameworks for image denoising, including one that involves smoothing the noisy image’s level line curvature and another that regularizes the components of the noisy image in a moving frame that encodes its local geometry. Both frameworks satisfy some nice unexpected properties that provide justification for this framework. Experiments confirm an improvement over the usual denoising paradigm in terms of both PSNR and SSIM. Moreover, this approach provides a mechanism for preserving geometry in solutions of sparse patch based models that typically exploit self similarity. This is joint work with Thomas Batard, Marcelo Bertalmio, and Gabriela Ghimpeteanu.

INI 1  
09:50 to 10:40 
Ozan Öktem Task Oriented Reconstruction using Deep Learning
CoAuthor: Jonas
Adler Machine learning has been used if image reconstruction for several years, mostly driven by the recent advent of deep learning. Deep learning based reconstruction methods have been shown to give good reconstruction quality by learning a reconstruction operator that maps data directly to reconstruction. These methods typically perform very well when performance is measured using classical quantified, such as the RMSE, but they tend to produce oversmoothed images, reducing their usefulness in applications. We propose a framework based on statistical decision theory that allows learning a reconstruction operator that is optimal with respect to a given task, which can be segmentation of a tumor or classification. In this framework, deep learning is used not only to solve the inverse problem, but also to simultaneously learn how to use the reconstructed image in order to complete an endtask. We demonstrate that the framework is computationally feasible and that it can improve human interpretability of the reconstructions. We also suggest new research directions in the field of data driven, task oriented image reconstruction. Related publications: http://arxiv.org/abs/1704.04058 (accepted for publication in Inverse Problems) http://arxiv.org/abs/1707.06474 (submitted to IEEE Transaction for Medical Imaging) 
INI 1  
10:40 to 11:10  Morning Coffee  
11:10 to 12:00 
Lior Horesh Accelerated FreeForm Model Discovery of Interpretable Models using Small Data
The ability to abstract the behavior of a system or a phenomenon and distill it into a consistent mathematical model is instrumental for a broad range of applications. Historically, models were manually derived in a first principles fashion. The first principles approach often offers the derivation of interpretable models of remarkable levels of universality using little data. Nevertheless, their derivation is time consuming and relies heavily upon domain expertise. Conversely, with the rising pervasiveness of datadriven approaches, the rapid derivation and deployment of models has become a reality. Scalability is gained through dependence upon exploitable structure (functional form). Such structures, in turn, yield noninterpretable models, require Big Data for training, and provide limited predictive power outside the training set span. In this talk, we will introduce an accelerated model discovery approach that attempts to bridge between the two conducts, to enable the discovery of universal, interpretable models, using Small Data. To accomplish that, the proposed algorithm searches for freeform symbolic models, where neither the structure nor the set of operator primitives are predetermined. The discovered models are provably globally optimal, promoting superior predictive power for unseen input. Demonstration of the algorithm in rediscovery of some fundamental laws of science will be provided, and references to ongoing work in the discovery of new models for, hitherto, unexplainable phenomena.
Globally optimal symbolic regression, NIPS Interpretable ML Workshop, 2017, https://arxiv.org/abs/1710.10720
Globally optimal Mixed Integer NonLinear Programming (MINLP) formulation for symbolic regression, IBM Technical Report ID 219095, 2016

INI 1  
12:00 to 12:50 
Martin Benning Nonlinear Eigenanalysis of sparsitypromoting regularisation operators
In this talk we analyse Eigenfunctions of nonlinear, variational regularisation operators. We show that they are closely related to a generalisation of singular vectors of compact operators, and demonstrate key mathematical properties. We use them to show how a systematic bias of variational regularisation methods can be corrected with the help of iterative regularisation methods, and discuss conditions that guarantee the decomposition of an additive composition of multiple Eigenfunctions. In the last part of the talk, we focus on utilising the concept of nonlinear Eigenanalysis to learn parametrised regularisations that can effectively separate different geometric structures.
This is joint work with Joana Sarah Grah, Guy Gilboa, CarolaBibiane Schönlieb, Marie Foged Schmidt and Martin Burger.

INI 1  
12:50 to 14:00  Lunch @ Wolfson Court  
14:00 to 14:50 
Alfred Hero The tensor graphical lasso (Teralasso)
Coauthors: Kristjian Greenewald (Harvard University), Shuheng Zhou (University of Michigan), Alfred Hero (University of Michigan) We propose a new ultrasparse graphical model for representing multiway data based on a Kronecker sum representation of the process inverse covariance matrix. This statistical model decomposes the inverse covariance into a linear Kronecker sum representation with sparse Kronecker factors. Under the assumption that the multiway observations are matrixnormal the l1 sparsity regularized loglikelihood function is convex and admits significantly faster statistical rates of convergence than other sparse matrix normal algorithms such as graphical lasso or Kronecker graphical lasso. We specify a scalable composite gradient descent method for minimizing the objective function and analyze both the statistical and the computational convergence ratesm, showing that the composite gradient descent algorithm is guaranteed to converge at a geometric rate to the global minimizer. We will illustrate the method on several real multiway datasets, showing that we can recover sparse graphical structures in high dimensional data. Related Links

INI 1  
14:50 to 15:40 
Francis Bach Breaking the Curse of Dimensionality with Convex Neural Networks We consider neural networks with a single hidden layer and nondecreasing positively homogeneous activation functions like the rectified linear units. By letting the number of hidden units grow unbounded and using classical nonEuclidean regularization tools on the output weights, they lead to a convex optimization problem and we provide a detailed theoretical analysis of their generalization performance, with a study of both the approximation and the estimation errors. We show in particular that they are adaptive to unknown underlying linear structures, such as the dependence on the projection of the input variables onto a lowdimensional subspace. Moreover, when using sparsityinducing norms on the input weights, we show that highdimensional nonlinear variable selection may be achieved, without any strong assumption regarding the data and with a total number of variables potentially exponential in the number of observations. However, solving this convex optimization pro blem in infinite dimensions is only possible if the nonconvex subproblem of addition of a new unit can be solved efficiently. We provide a simple geometric interpretation for our choice of activation functions and describe simple conditions for convex relaxations of the finitedimensional nonconvex subproblem to achieve the same generalization error bounds, even when constantfactor approximations cannot be found. We were not able to find strong enough convex relaxations to obtain provably polynomialtime algorithms and leave open the existence or nonexistence of such tractable algorithms with nonexponential sample complexities. Related links: http://jmlr.org/papers/volume18/14546/14546.pdf  JMLR paper 
INI 1  
15:40 to 16:00 
Jonas Adler Learned forward operators: Variational regularization for blackbox models
In inverse problems, correct modelling of the forward model is typically one of the most important components to obtain good reconstruction quality. Still, most work is done on highly simplified forward models. For example, in Computed Tomography (CT), the true forward model, given by the solution operator for the radiative transport equation, is typically approximated by the raytransform. The primary reason for this gross simplification is that the higher quality forward models are both computationally costly, and typically do not have an adjoint of the derivative of the forward operator that can be feasibly evaluated.
The community is not unaware of this missmatch, but the work has been focused on “the model is right, lets fix the data”. We instead propose going the other way around by using machine learning in order to learn a mapping from the simplified model to the complicated model using deep neural networks. Hence instead of learning how to correct complicated data so that it matches a simplified forward model, we accept that the data is always right and instead correct the forward model.
We then use this learned forward operator, which is given as a composition of a simplified forward operator and a convolutional neural network, as a forward operator in a classical variational regularization scheme. We give a theoretical argument as to why correcting the forward model is more stable than correcting the data and provide numerical examples in Cone Beam CT reconstruction.

INI 1  
16:00 to 16:30  Afternoon Tea  
16:30 to 17:20 
Julianne Chung Advancements in Hybrid Iterative Methods for Inverse Problems
Hybrid iterative methods are increasingly being used to solve large, illposed inverse problems, due to their desirable properties of (1) avoiding semiconvergence, whereby later reconstructions are no longer dominated by noise, and (2) enabling adaptive and automatic regularization parameter selection. In this talk, we describe some recent advancements in hybrid iterative methods for computing solutions to largescale inverse problems. First, we consider a hybrid approach based on the generalized GolubKahan bidiagonalization for computing Tikhonov regularized solutions to problems where explicit computation of the square root and inverse of the covariance kernel for the prior covariance matrix is not feasible. This is useful for largescale problems where covariance kernels are defined on irregular grids or are only available via matrixvector multiplication, e.g., those from the Matern class. Second, we describe flexible hybrid methods for solving l_p regularized inverse problems, where we approximate the pnorm penalization term as a sequence of 2norm penalization terms using adaptive regularization matrices, and we exploit flexible preconditioning techniques to efficiently incorporate the weight updates. We introduce a flexible GolubKahan approach within a KrylovTikhonov hybrid framework, such that our approaches extend to general (nonsquare) l_p regularized problems. Numerical examples from dynamic photoacoustic tomography and spacetime deblurring demonstrate the range of applicability and effectiveness of these approaches.
This is joint work with Arvind Saibaba, North Carolina State University, and Silvia Gazzola, University of Bath.

INI 1  
17:20 to 18:10 
Andreas Hauptmann Learning iterative reconstruction for high resolution photoacoustic tomography
Recent advances in deep learning for tomographic reconstructions have shown great potential to create accurate and high quality images with a considerable speedup. In this work we present a deep neural network that is specifically designed to provide high resolution 3D images from restricted photoacoustic measurements. The network is designed to represent an iterative scheme and incorporates gradient information of the data fit to compensate for limited view artefacts. Due to the high complexity of the photoacoustic forward operator, we separate training and computation of the gradient information. A suitable prior for the desired image structures is learned as part of the training. The resulting network is trained and tested on a set of segmented vessels from lung CT scans and then applied to invivo photoacoustic measurement data.

INI 1 
09:00 to 09:50 
Mila Nikolova Below the Surface of the NonLocal Bayesian Image Denoising Method
joint work with Pablo Arias
CMLA, ENS Cachan, CNRS, University ParisSaclay
The nonlocal Bayesian (NLB) patchbased approach of Lebrun, Buades, and Morel [1] is considered as a stateoftheart method for the restoration of (color) images corrupted by white Gaussian noise. It gave rise to numerous ramiifications like e.g., possible improvements, processing of various data sets and video. This work is the first attempt to analyse the method in depth in order to understand the main phenomena underlying its effectiveness. Our analysis, corroborated by numerical tests, shows several unexpected facts. In a variational setting, the firststep Bayesian approach to learn the prior for patches is equivalent to a pseudoTikhonov regularisation where the regularisation parameters can be positive or negative. Practically very good results in this step are mainly due to the aggregation stage  whose importance needs to be reevaluated.
Reference
[1] Lebrun, M., Buades, A., Morel, J.M.: A nonlocal Bayesian image denoising algorithm. SIAM J. Imaging Sci.6(3), 16651688 (2013)

INI 1  
09:50 to 10:40 
Xavier Bresson Convolutional Neural Networks on Graphs
Convolutional neural networks have greatly improved stateoftheart performances in computer vision and speech analysis tasks, due to its high ability to extract multiple levels of representations of data. In this talk, we are interested in generalizing convolutional neural networks from lowdimensional regular grids, where image, video and speech are represented, to highdimensional irregular domains, such as social networks, telecommunication networks, or words' embedding. We present a formulation of convolutional neural networks on graphs in the context of spectral graph theory, which provides the necessary mathematical background and efficient numerical schemes to design fast localized convolutional filters on graphs. Numerical experiments demonstrate the ability of the system to learn local stationary features on graphs.

INI 1  
10:40 to 11:10  Morning Coffee  
11:10 to 12:00 
Julie Delon HighDimensional Mixture Models For Unsupervised Image Denoising (HDMI)
This work addresses the problem of patchbased image denoising through the unsupervised learning of a probabilistic highdimensional mixture models on the noisy patches. The model, named HDMI, proposes a full modeling of the process that is supposed to have generated the noisy patches. To overcome the potential estimation problems due to the high dimension of the patches, the HDMI model adopts a parsimonious modeling which assumes that the data live in groupspecific subspaces of low dimensionalities. This parsimonious modeling allows in turn to get a numerically stable computation of the conditional expectation of the image which is applied for denoising. The use of such a model also permits to rely on model selection tools to automatically determine the intrinsic dimensions of the subspaces and the variance of the noise. This yields a blind denoising algorithm that demonstrates stateoftheart performance, both when the noise level is known and unknown.
Joint work with Charles Bouveyron and Antoine Houdard.

INI 1  
12:00 to 12:50 
Bangti Jin Sparse Recovery by l0 Penalty
Sparsity is one of the powerful tools for signal recovery, and has achieved great success in many practical applications. Conventionally this is realized numerically by imposing an l1 penalty, which is the convex relaxation of the l0 penalty. In this talk, I will discuss our recent efforts in the efficient numerical solution of the l0 problem. I will describe a primal dual active set algorithm, and present some numerical results to illustrate its convergence. This talk is based on joint work with Dr. Yuling Jiao and Dr. Xiliang Lu.

INI 1  
12:50 to 14:00  Lunch @ Wolfson Court  
14:00 to 17:00  Free Afternoon 
09:00 to 09:50 
Silvia Gazzola Krylov Subspace Methods for Sparse Reconstruction
Krylov
subspace methods are popular numerical linear algebra tools that can be
successfully employed to regularize linear largescale inverse problems, such
as those arising in image deblurring and computed tomography. Though they are
commonly used as purely iterative regularization methods (where the number of
iterations acts as a regularization parameter), they can be also employed in a
hybrid fashion, i.e., to solve Tikhonov regularized problems (where both the
number of iterations and and the Tikhonov parameter play the role of
regularizations parameters, which can be chosen adaptively). Krylov subspace
methods can naturally handle unconstrained penalized least squares problems.
The goal of this talk is to present a common framework that exploits a flexible
version of wellknown Krylov methods such as CGLS and GMRES to handle
nonnegativity constraints and regularization terms expressed with respect to
the 1norm, resulting in an efficient way to enforce sparse reconstructions of
the solution. Numerical experiments and comparisons with other wellknown
methods for the computation of nonnegative and sparse solutions will be
presented. These results have been obtained working jointly with James Nagy
(Emory University), Paolo Novati (University of Trieste), Yves Wiaux
(HeriotWatt University), and Julianne Chung (Virginia Polytechnic Institute
and State University).

INI 1  
09:50 to 10:40 
Pierre Weiss Generating sampling patterns in MRI
In this work I will describe a few recent results for the generation of sampling patterns in MRI.
In the first part of my talk, I will provide mathematical models describing the sampling problem in MRI. This will allow me to show that the traditional way mathematicians look at an MRI scanner is usually way too idealized and that important ingredients are currently missing in the theories.
The mathematical modelling shows that a natural way to generate a pattern consists in projecting a density onto a set of admissible measures. I will then describe two projection algorithms. The first one is based on a distance defined through a convolution mapping the measures to L^2, while the second is based on the L^2 transportation distance. After describing a few original applications of this formalism, I will show how it allows to significantly improve scanning times in MRI systems with real in vivo experiments.
An outcome of this work is that compressed sensing, as it stands, only allows for moderate acceleration factors, while other ideas that take advantage of all the degrees of freedom of an MRI scanner yield way more significant improvements.

INI 1  
10:40 to 11:10  Morning Coffee  
11:10 to 12:00 
Anders Hansen On computational barriers in data science and the paradoxes of deep learning
The use of regularisation techniques such as l^1 and Total Variation in Basis Pursuit and Lasso, as well as linear and semidefinite programming and neural networks (deep learning) has seen great success in data science. Yet, we will discuss the following paradox: it is impossible to design algorithms to find minimisers accurately for these problems when given inaccurate input data, even when the inaccuracies can be made arbitrarily small. The paradox implies that any algorithm designed to solve these problems will fail in the following way: For fixed dimensions and any small accuracy parameter epsilon > 0, one can choose an arbitrary large time T and find an input such that the algorithm will run for longer than T and still not have reached epsilon accuracy. Moreover, it is impossible to determine when the algorithm should halt to achieve an epsilon accurate solution. The largest epsilon for which this failure happens is called the Breakdownepsilon. Typically, the Breakdownepsilon > 1/2 even when the the input is bounded by one, is wellconditioned, and the objective function can be computed with arbitrary accuracy. Despite the paradox we explain why empirically many modern algorithms perform very well in realworld scenarios. In particular, when restricting to subclasses of problems the Breakdown epsilon may shrink. Moreover, typically one can find polynomial time algorithms in L and n, where L log(1/Breakdownepsilon), any algorithm, even randomised, becomes arbitrarily slow and will not be able to halt and guarantee L correct digits in the output. The above result leads to the paradoxes of deep learning: (1) One cannot guarantee the existence of algorithms for accurately training the neural network, even if there is one minimum and no local minima. Moreover, (2) one can have 100% success rate on arbitrarily many test cases, yet uncountably many misclassifications on elements that are arbitrarily close to the training set. This is joint work with Alex Bastounis (Cambridge) and Verner Vlacic (ETH) 
INI 1  
12:00 to 12:50 
Josiane Zerubia Stochastic geometry for automatic object detection and tracking
In this talk, we combine the methods from probability theory and stochastic geometry to put forward new solutions to the multiple object detection and tracking problem in high resolution remotely sensed image sequences. First, we present a spatial marked point process model to detect a predefined class of objects based on their visual and geometric characteristics. Then, we extend this model to the temporal domain and create a framework based on spatiotemporal marked point process models to jointly detect and track multiple objects in image sequences. We propose the use of simple parametric shapes to describe the appearance of these objects. We build new, dedicated energy based models consisting of several terms that take into account both the image evidence and physical constraints such as object dynamics, track persistence and mutual exclusion. We construct a suitable optimization scheme that allows us to find strong local minima of the proposed highly nonconvex energy.
As the simulation of such models comes with a high computational cost, we turn our attention to the recent filter implementations for multiple objects tracking, which are known to be less computationally expensive. We propose a hybrid sampler by combining the Kalman filter with the standard Reversible Jump MCMC. High performance computing techniques are also used to increase the computational efficiency of our method. We provide an analysis of the proposed framework. This analysis yields a very good detection and tracking performance at the price of an increased complexity of the models. Tests have been conducted both on high resolution satellite and UAV image sequences

INI 1  
12:50 to 14:00  Lunch @ Wolfson Court  
14:00 to 14:50 
Mario Figueiredo Divide and Conquer: Patchbased Image Denoising, Restoration, and Beyond
Patchbased image processing methods can be seen as an application of the “divide and conquer” strategy: since it is admittedly too difficult to formulate a global prior for an entire image, methods in this class process overlapping patches thereof, and combine the results to obtain an image estimate. A particular class of patchbased methods uses Gaussian mixtures models (GMM) to model the patches, in what can be seen as yet another application of the divide and conquer principle, now in the space of patch configurations. Different components of the GMM specialize in modeling different types of typical patch configurations. Although many other statistical image models exist, using a GMM for patches has several relevant advantages: (i) the corresponding minimum mean squared error (MMSE) estimate can be obtained in closed form; (ii) the variance of the estimate can also be computed, providing a principled way to weight the estimates when combining the patch estimates to obtain the full image estimate; (iii) the GMM parameters can be estimated from a dataset of clean patches, from the noisy image itself, or from a combination of the two; (iv) theoretically, a GMM can approximate arbitrarily well any probability density (under mild conditions). In this talk, I will overview the class of patch/GMMbased approaches to image restoration. After reviewing the first members of this family of methods, which simply addressed denoising, I will describe several more recent advances, namely: use of classadapted GMMs (i.e., tailored to specific image classes, such as faces, fingerprints, text); tackling inverse problems other than denoising (namely, deblurring, hyperspectral superresolution, compressive imaging), by plugging GMMbased denoisers in the loop of an iterative algorithm (in what has recently been called the plugandplay approach); joint restoration/segmentation of images; application to blind deblurring.
This is joint work with Afonso Teodoro, José BioucasDias, Marina Ljubenović, and Milad Niknejad.

INI 1  
14:50 to 15:40 
Marcelo Pereyra Bayesian analysis and computation for convex inverse problems: theory, methods, and algorithms
This talk presents some new developments in theory, methods, and algorithms for performing Bayesian inference in highdimensional inverse problems that are convex, with application to mathematical and computational imaging. These include new efficient stochastic simulation and optimisation Bayesian computation methods that tightly combine proximal optimisation with Markov chain Monte Carlo techniques; strategies for estimating unknown model parameters and performing model selection, methods for calculating Bayesian confidence intervals for images and performing uncertainty quantification analyses; and new theory regarding the role of convexity in maximumaposteriori and minimummeansquareerror estimation. The new theory, methods, and algorithms are illustrated with a range of mathematical imaging experiments.

INI 1  
15:40 to 16:00 
Pol del Aguila Pla Cell detection by functional inverse diffusion and group sparsity
Biological assays in which particles generated by cells bind to a surface and can be imaged to reveal the cells' location are ubiquitous in biochemical, pharmacological and medical research. In this talk, I will describe the physics of these processes, a 3D radiationdiffusionadsorptiondesorption partial differential equation, and present our novel parametrization of its solution (i.e., the observation model) in terms of convolutional operators. Then, I will present our proposal to invert this observation model through a functional optimization problem with groupsparsity regularization and explain the reasoning behind this choice of regularizer. I will also present the results needed to derive the accelerated proximal gradient algorithm for this problem, and justify why we chose to formulate the algorithm in the original function spaces where our observation model operates. Finally, I will briefly comment on our choice of discretization, and show the final performance of our algorithm in both synthetic and real data.
arXiv preprints: arXiv:1710.0164 , arXiv:1710.01622

INI 1  
16:00 to 16:30  Afternoon Tea  
16:30 to 17:20 
Claire Boyer Structured compressed sensing and recent theoretical advances on optimal sampling
Joint works with Jérémie Bigot and Pierre Weiss on the one hand, and Ben Adcock on the other hand.
First, we will theoretically justify the applicability of compressed sensing (CS) in reallife applications. To do so, CS theorems compatible with physical acquisition constraints will be introduced. These results do not only encompass structure in the acquisition but also structured sparsity of the signal of interest. This theory considerably extends the standard framework of CS.
Secondly, recent advances on optimal sampling in CS will be presented, in the sense that the sampling strategy minimizes the bound on the required number of measurements for CS recovery.

INI 1  
17:20 to 18:10 
Tuomo Valkonen What do regularisers do?
Which regulariser is the best? Is any of them any
good? Do they introduce artefacts? What other qualitative properties do
they have? These are some questions, on which I want to shed some light of
the early dawn. Specifically, I will firstly discuss recent work
on natural conditions, based on an analytical study of a bilevel learning
approach, to ensure that regularisation does indeed improve an image. Based on
a more computational study, based on bilevel learning, I will also try to answer
the question, which constructed regulariser is the best one. Secondly, I
will discuss geometrical aspects of the solutions to higherorder regularised
imaging problems.

INI 1  
19:30 to 22:00  Formal Dinner at Corpus Christi College 
09:00 to 09:50 
Irene Waldspurger (Université ParisDauphine); (INRIA Paris  Rocquencourt) Alternating projections for phase retrieval with random sensing vectors
Phase retrieval consists in reconstructing an unknown element in a complex vector space from the modulus of linear measurements. The first reconstruction algorithms for which theoretical guarantees could be proven relied on convexification techniques. It has only recently been realized that similar guarantees hold for nonconvex local search methods, that are faster and simpler, provided that their starting point is carefully chosen. We will explain how to establish these guarantees for the most wellknown local search method: alternating projections. We will also discuss the role of the initialization method.

INI 1  
09:50 to 10:40 
Martin Holler Analysis and applications of structuralpriorbased total variation regularization for inverse problems
Structural priors and joint regularization techniques, such as parallel level set methods and joint total variation, have become quite popular recently in the context of variational image processing. Their main application scenario are particular settings in multimodality/multispectral imaging, where there is an expected correlation between different channels of the image data. In this context, one can distinguish between two different approaches for exploiting such correlations: Joint reconstruction techniques that tread all available channels equally and structural prior techniques that assume some ground truth structural information to be available.
This talk focuses on a particular instance of the second type of methods, namely structural totalvariationtype functionals, i.e., functionals which integrate a spatiallydependent pointwise function of the image gradient for regularization. While this type of methods has been shown to work well in practical applications, some of their analytical properties are not immediate. Those include a proper definition for BV functions and nonsmooth apriory data as well as existence results and regularization properties for standard inverse problem settings.
In this talk we address some of these issues and show how they can partially be overcome using duality. Employing the framework of functions of a measure, we define structuralTVtype functionals via lowersemicontinuous relaxation. Since the relaxed functionals are, in general, not explicitly available, we show that instead of the classical Tikhonov regularization problem, one can equivalently solve a saddlepoint problem where no a priori knowledge of the relaxation is needed. In particular, motivated by concrete applications, we deduce corresponding results for linear inverse problems with norm and Poisson loglikelihood data discrepancy terms. The talk concludes with proofofconcept numerical examples.
This is joint work with M. Hintermüller and K. Papafitsoros (both from the Weierstrass Institute Berlin)

INI 1  
10:40 to 11:10  Morning Coffee  
11:10 to 12:00 
Raymond Chan A Nuclearnorm Model for MultiFrame Superresolution Reconstruction
In this talk, we give a new variational approach to obtain superresolution
images from multiple lowresolution image frames extracted from video clips.
First the displacement between the lowresolution frames and the reference
frame are computed by an optical flow algorithm. The displacement matrix
is then decomposed into product of two matrices corresponding to the integer and fractional
displacement matrices respectively. The integer displacement matrices give rise to a nonconvex
lowrank prior which is then convexified to give the nuclearnorm regularization
term. By adding a standard 2norm data fidelity term to it, we obtain our proposed
nuclearnorm model. Alternating direction method of multipliers can then be used
to solve the model. Comparison of our method with other models on synthetic
and real video clips shows that our resulting images are more accurate
with less artifacts. It also provides much finer and discernable details.
Joint work with Rui Zhao. Research supported by HKRGC.

INI 1  
12:00 to 12:50 
Mihaela Pricopjeckstadt From spatial learning to machine learning: an unsupervised approach with applications to behavioral science
In this talk we consider an exampledriven approach for identifying ability patterns from data partitions based on learning behaviour in the water maze experiment. A modification of the kmeans algorithm for longitudinal data as introduced in [1] is used to identify clusters based on the learning variable (see [3]). The association between these clusters and the flying ability variables is statistically tested in order to characterize the partitions in terms of flying traits. Since the learning variables seem to reflect flying abilities, we propose a new sparse clustering algorithm in an approach modelling the covariance matrix by a Kronecker product. Consistency and an EMalgorithm are studied in this framework also.
References:
1. Genolini, C.; Ecochard, R.; Benghezal, M. et al., ''kmlShape: An Efficient Method to Cluster Longitudinal Data (TimeSeries) According to Their Shapes'', PLOS ONE; Vol. 11, 2016.2.
Sung, K.K. and Poggio, T., ''Examplebased learning for viewbased human face detection''; IEEE Transactions on pattern analysis and machine intelligence, Vol. 20, 3951, 1998.;
3. Rosipal, R. and Kraemer, N. , ''Overview and recent advances in partial least squares'', Subspace, latent structure and feature selection, Book Series: Lecture Notes in Computer Science; Vol. 3940, 3451, 2006.

INI 1  
12:50 to 14:00  Lunch @ Wolfson Court  
14:00 to 14:50 
Robert Plemmons Sparse Recovery Algorithms for 3D Imaging using Point Spread Function Engineering
Coauthors: Chao Wang (Mathematics, Chinese University of Hong Kong), Raymond Chan (Mathematics, Chinese University of Hong Kong), Sudhakar Prasad (Physics, University of New Mexico)
Imaging and localizing point sources with high accuracy in a 3D volume is an important but challenging task. For example, superresolution 3D single molecule localization is an area of intense interest in biology (cell imaging, folding, membrane behavior, etc.), in chemistry (spectral diffusion, molecular distortions, etc.), and in physics (structures of materials, quantum optics, etc.). We consider here the highresolution imaging problem of 3D point source image recovery from 2D data using methods based on point spread function (PSF) design. The methods involve a new technique, recently patented by S. Prasad, for applying rotating point spread functions with a single lobe to obtain depth from defocus. The amount of rotation of the PSF encodes the depth position of the point source. The distribution of point sources is discretized on a cubical lattice where the indexes of nonzero entries represent the 3D locations of point sources. The values of these entries are the point source fluxes. Finding the locations and fluxes is a largescale sparse 3D inverse problem and we have developed solution algorithms based on sparse recovery using nonconvex optimization. Applications to highresolution single molecule localization microscopy are described, as well as localization of space debris using a spacebased telescope. Sparse recovery optimization methods, including the Continuous Exact L0 (CEL0) algorithm, are used in our numerical experiments. 
INI 1  
14:50 to 15:40 
Jeff Calder The weighted pLaplacian and semisupervised learning
Semisupervised learning refers to machine learning algorithms that make use of both labeled data and unlabeled data for learning tasks. Examples include large scale nonparametric regression and classification problems, such as predicting voting preferences of social media users, or classifying medical images. In today's big data world, there is an abundance of unlabeled data, while labeled data often requires expert labeling and is expensive to obtain. This has led to a resurgence of semisupervised learning techniques, which use the topological or geometric properties of large amounts of unlabeled data to aid the learning task. In this talk, I will discuss some new rigorous PDE scaling limits for existing semisupervised learning algorithms and their practical implications. I will also discuss how these scaling limits suggest new ideas for fast algorithms for semisupervised learning.

INI 1  
15:40 to 16:10  Afternoon Tea 