What is Mathematical Methods for Arbitrary Data Sources?

The lecture series will collect talks on mathematical disciplines related to all kind of data, ranging from statistics and machine learning to model-based approaches and inverse problems. Each pair of talks will address a specific direction, e.g., a NoMADS session related to nonlocal approaches or a DeepMADS session related to deep learning.

The series is created in the spirit of the One World Series pioneered by the seminars in probability and PDE.

Using Zoom

For this online seminar we will use zoom as video service. Approximately 15 minutes prior to the beginning of the lecture, a zoom link will be provided on this website and via mailing list.

Mailing list

Please subscribe to our mailing list by filling this form.

Materials and Video Recordings

Additional material, slides and video recordings are attached to the respective session in our session archive. Additionally please visit our YouTube channel for video recordings of past sessions.

Program

  • Jun 8, 2020: Session IV

    Time Germany, UTC+2 Speaker Title tap/hover for abstract
    14:00-14:45 Michael UnserÉcole polytechnique fédérale de Lausanne, CH Representer theorems for machine learning and inverse problems
    Representer theorems for machine learning and inverse problems

    15:00-15:45 Vincent DuvalInria, FR Representing the solutions of total variation regularized problems
    Representing the solutions of total variation regularized problems

    The total (gradient) variation is a regularizer which has been widely used in inverse problems arising in image processing, following the pioneering work of Rudin, Osher and Fatemi. In this talk, I will describe the structure the solutions to the total variation regularized variational problems when one has a finite number of measurements.
    First, I will present a general representation principle for the solutions of convex problems, then I will apply it to the total variation by describing the faces of its unit ball.

    It is a joint work with Claire Boyer, Antonin Chambolle, Yohann De Castro, Frédéric de Gournay and Pierre Weiss.
  • Jun 15, 2020: Session V

    Time Germany, UTC+2 Speaker Title tap/hover for abstract
    14:00-14:45 Andrea BraidesUniversità Roma Tor Vergata, IT tba
    tba

    15:00-15:45 Nicolás García Trillos University of Wisconsin-Madison, US Regularity theory and uniform convergence in the large data limit of graph Laplacian eigenvectors on random data clouds
    Regularity theory and uniform convergence in the large data limit of graph Laplacian eigenvectors on random data clouds

    Graph Laplacians are omnipresent objects in machine learning that have been used in supervised, unsupervised and semi supervised settings due to their versatility in extracting local and global geometric information from data clouds. In this talk I will present an overview of how the mathematical theory built around them has gotten deeper and deeper, layer by layer, since the appearance of the first results on pointwise consistency in the 2000’s, until the most recent developments; this line of research has found strong connections between PDEs built on proximity graphs on data clouds and PDEs on manifolds, and has given a more precise mathematical meaning to the task of “manifold learning”. In the first part of the talk I will highlight how ideas from optimal transport made some of the initial steps, which provided L2 type error estimates between the spectra of graph Laplacians and Laplace-Beltrami operators, possible. In the second part of the talk, which is based on recent work with Jeff Calder and Marta Lewicka, I will present a newly developed regularity theory for graph Laplacians which among other things allow us to bootstrap the L2 error estimates developed through optimal transport and upgrade them to uniform convergence and almost C^{0,1} convergence rates. The talk can be seen as a tale of how a flow of ideas from optimal transport, PDEs, and in general, analysis, has made possible a finer understanding of concrete objects popular in data analysis and machine learning.
  • Jun 29, 2020: Session VI

    Time Germany, UTC+2 Speaker Title tap/hover for abstract
    14:00-14:45 Jana de WiljesUniversität Potsdam, DE tba
    tba

    tba
    15:00-15:45 tbatba tba
    tba

    tba

Past Sessions

  • Apr 20, 2020: Session I

    Speaker Title tap/hover for abstract Materials
    Gabriel PeyréCNRS, FR
    Ecole Normale Supérieure, FR
    Scaling Optimal Transport for High dimensional Learning
    Scaling Optimal Transport for High dimensional Learning

    Optimal transport (OT) has recently gained lot of interest in machine learning. It is a natural tool to compare in a geometrically faithful way probability distributions. It finds applications in both supervised learning (using geometric loss functions) and unsupervised learning (to perform generative model fitting). OT is however plagued by the curse of dimensionality, since it might require a number of samples which grows exponentially with the dimension. In this talk, I will review entropic regularization methods which define geometric loss functions approximating OT with a better sample complexity. More information and references can be found on the website of our book Computational Optimal Transport.
    video
    slides
    Marie-Therese WolframWarwick University, UK
    joint work with:Andrew StuartCaltech, US
    Inverse Optimal Transport
    Inverse Optimal Transport

    Discrete optimal transportation problems arise in various contexts in engineering, the sciences and the social sciences. Examples include the marriage market in economics or international migration flows in demographics. Often the underlying cost criterion is unknown, or only partly known, and the observed optimal solutions are corrupted by noise. In this talk we discuss a systematic approach to infer unknown costs from noisy observations of optimal transportation plans. The proposed methodologies are developed within the Bayesian framework for inverse problems and require only the ability to solve the forward optimal transport problem, which is a linear program, and to generate random numbers. We illustrate our approach using the example of international migration flows. Here reported migration flow data captures (noisily) the number of individuals moving from one country to another in a given period of time. It can be interpreted as a noisy observation of an optimal transportation map, with costs related to the geographical position of countries. We use a graph-based formulation of the problem, with countries at the nodes of graphs and non-zero weighted adjacencies only on edges between countries which share a border. We use the proposed algorithm to estimate the weights, which represent cost of transition, and to quantify uncertainty in these weights.
    video
    slides
  • May 4, 2020: Session II

    Speaker Title tap/hover for abstract Materials
    Lorenzo RosascoUniversitá di Genova, IT
    MIT, US
    Efficient learning with random projections
    Efficient learning with random projections

    Despite stunning performances, state of the art machine learning approaches are often computational intensive and efficiency remains a challenge. Dimensionality reduction, if performed efficiently, provides a way to reduce the computational requirements of downstream tasks, but possibly at the expanses of the obtained accuracy. In this talk, we discuss the interplay between accuracy and efficiency when dimensionality reduction is performed by means of, possibly data dependent, random projections. The latter are related to discretization methods for integral operators, to sampling methods in randomized numerical linear algebra and to sketching methods. Our results show that there are number of different tasks and regimes where, using random projections and regularization, efficiency can be improved with no costs in accuracy. Theoretical results are used to derive scalable and fast kernel methods for datasets with millions of points.
    video
    slides
    Michaël FanuelKU Leuven, BE
    joint work with:Joachim Schreurs, Johan Suykens
    Diversity sampling in kernel method
    Diversity sampling in kernel method

    A well-known technique for large scale kernel methods is the Nyström approximation. Based on a subset of landmarks, it gives a low rank approximation of the kernel matrix, and is known to provide a form of implicit regularization. We will discuss the impact of sampling diverse landmarks for constructing the Nyström approximation in supervised and unsupervised problems. In particular, three methods will be considered: uniform sampling, leverage score sampling and Determinantal Point Processes (DPP). The implicit regularization due the diversity of the landmarks will be made explicit by numerical simulations and analysed further in the case of DPP sampling by some theoretical results.
    video
    slides
  • May 18, 2020: Session III

    Please note: The second talk by Fracis Bach was jointly organized with the One World Optimization Seminar.

    Speaker Title tap/hover for abstract Materials
    Lars RuthottoEmory University, US Machine Learning meets Optimal Transport: Old solutions for new problems and vice versa
    Machine Learning meets Optimal Transport: Old solutions for new problems and vice versa

    This talk presents new connections between optimal transport (OT), which has been a critical problem in applied mathematics for centuries, and machine learning (ML), which has been receiving enormous attention in the past decades. In recent years, OT and ML have become increasingly intertwined. This talk contributes to this booming intersection by providing efficient and scalable computational methods for OT and ML.
    The first part of the talk shows how neural networks can be used to efficiently approximate the optimal transport map between two densities in high dimensions. To avoid the curse-of-dimensionality, we combine Lagrangian and Eulerian viewpoints and employ neural networks to solve the underlying Hamilton-Jacobi-Bellman equation. Our approach avoids any space discretization and can be implemented in existing machine learning frameworks. We present numerical results for OT in up to 100 dimensions and validate our solver in a two-dimensional setting.
    The second part of the talk shows how optimal transport theory can improve the efficiency of training generative models and density estimators, which are critical in machine learning. We consider continuous normalizing flows (CNF) that have emerged as one of the most promising approaches for variational inference in the ML community. Our numerical implementation is a discretize-optimize method whose forward problem relies on manually derived gradients and Laplacian of the neural network and uses automatic differentiation in the optimization. In common benchmark challenges, our method outperforms state-of-the-art CNF approaches by reducing the network size by 8x, accelerate the training by 10x- 40x and allow 30x-50x faster inference.
    video
    slides
    Francis BachUniversité PSL, FR
    joint work with:Lénaïc Chizat
    On the convergence of gradient descent for wide two-layer neural networks
    On the convergence of gradient descent for wide two-layer neural networks

    Many supervised learning methods are naturally cast as optimization problems. For prediction models which are linear in their parameters, this often leads to convex problems for which many guarantees exist. Models which are non-linear in their parameters such as neural networks lead to non-convex optimization problems for which guarantees are harder to obtain. In this talk, I will consider two-layer neural networks with homogeneous activation functions where the number of hidden neurons tends to infinity, and show how qualitative convergence guarantees may be derived. I will also highlight open problems related to the quantitative behavior of gradient descent for such models.
    video
    slides

Organizers

Leon Bungert
Martin Burger
Antonio Esposito
Janic Föcke
Daniel Tenbrinck
Philipp Wacker

Other One World Seminars