Partial Differential Equations Approaches to Optimization and - PowerPoint PPT Presentation

Partial Differential Equations Approaches to Optimization and Regularization of Deep Neural Networks Celebrating 75 Years of Mathematics of Computation ICERM Nov 2, 2018 Adam Oberman McGill Dept of Math and Stats supported by NSERC, Simons Fellowship, AFOSR FA9550-18-1-0167

Background AI • Artificial Intelligence is loosely defined as intelligence exhibited by machines • Operationally: R&D in CS academic sub-disciplines: Computer Vision, Natural Language Processing (NLP), Robotics, etc AlphaGo uses DL to beat world champion at Go

Artificial General Intelligence (AGI) • AI : specific tasks, • AGI : general cognitive abilities. • AGI is a small research area within AI: build machines that can successfully perform any task that a human might do • So far, no progress on AGI.

Deep Learning vs. traditional Machine Learning • Machine Learning (ML) has been around for some time. • Deep Learning is newer branch of ML which uses Deep Neural networks. • ML has theory: error estimates and convergence proofs. • DL less theory. But DL can e ff ectively solve substantially larger scale problems

What are DNNs (in Math Language)? • ImageNet: Total number of classes: m =21841 • Total number of images: n =14,197,122 • Color images d= 3*256*256= 196,608 Facebook used 256 GPUs, working in parallel, to train ImageNet. Still an academic dataset . Total number of images on Facebook is much larger

What is the function? Looking for a map from images to labels f(x) x in M = manifold of images graph of list of word labels

Doing the impossible? In theory, due to curse of dimensionality, impossible to accurately interpolate a high dimensional function. In practice, possible using Deep Neural Network architecture, training to fit the data with SGD. However we don’t know why it works. Can train a computer to caption images more accurately that human performance.

Loss Functions versus Error Classification problem: map image to discrete label space {1,2,3,…,10} In practice: map to a probability vector, then assign label of the arg max. Classification is not differentiable, so instead, in order to train, use a loss function as a surrogate.

<latexit sha1_base64="3uHpidkJhxBpaO95Fmt7WdytCQ=">ACQ3icbZBNSysxFIbP+HW9avq0k1QhA6IdNwoyAVBZcKVoudGjJpxoYmSHJaMsw/8Wf4sY/4M4/4MaF3ErmLYifh0IPLznPZycN0oFN7ZavfdGRsfGJ/5M/i1NTc/MzpXnF05MkmnKajQRia5HxDBFatZbgWrp5oRGQl2GnV2+/3TS6YNT9Sx7aWsKcmF4jGnxDoJl89CyRW+QqEkth1F+X6B8y4KDZco1O0EqwKFTIhKXOluX/lrvUrX9E/Z8gkzrmjoDhXHxbM302Y+z4ur1TXq4NCHxB8h5WdfVW/BoBDXL4LWwnNJFOWCmJMI9hIbTMn2nIqWFEKM8NSQjvkgjUcKiKZaeaDAq06pQWihPtnrJoH6eyIk0picj5+yfar73+uJvUZm461mzlWaWabocFGcCWQT1A8Utbhm1IqeA0I1d39FtE0odbFXnIh/Dj5J5xsrAeOj1waezCsSViCZahAJuwAwdwCDWgcAMP8AT/vVv0Xv2XobWEe9ZhG+lPf6Bur+sIg=</latexit> <latexit sha1_base64="LDMLH6UNa2jZV0Aeyzoau7EZiQ=">ACQ3icbZDLSgMxFIYz9V5vVZdugiJ0QGqnGwURBVcKlgtdmrIpBkbmSGJGNbhnkIn8EXceMLuPMF3LhQxK1g2op4OxD4+M9/ODl/EHOmTbn84ORGRsfGJyan8tMzs3PzhYXFUx0litAqiXikagHWlDNJq4YZTmuxolgEnJ4F7b1+/+yKs0ieWJ6MW0IfClZyAg2VkKFc18wiTrQF9i0giA9yFDahb5mAvqFSGZQZ9yXgyL3e2Ou94rdl0X7lhDIlDKLHnZhfyIPZpQsx1UWG1XCoPCn6B9xtWdw9k7WZj/oIFe79ZkQSQaUhHGtd9yqxaRYGUY4zfJ+omMSRtf0rpFiQXVjXSQbXrNKEYaTskwYO1O8TKRZa90Rgnf1T9e9eX/yvV09MuNVImYwTQyUZLgoTDk0E+4HCJlOUGN6zgIli9q+QtLDCxNjY8zaEPyf/hdNKybN8bNPYB8OaBMtgBRSBzbBLjgER6AKCLgFj+AZvDh3zpPz6rwNrTnc2YJ/Cjn/QNKGrGP</latexit> <latexit sha1_base64="LDMLH6UNa2jZV0Aeyzoau7EZiQ=">ACQ3icbZDLSgMxFIYz9V5vVZdugiJ0QGqnGwURBVcKlgtdmrIpBkbmSGJGNbhnkIn8EXceMLuPMF3LhQxK1g2op4OxD4+M9/ODl/EHOmTbn84ORGRsfGJyan8tMzs3PzhYXFUx0litAqiXikagHWlDNJq4YZTmuxolgEnJ4F7b1+/+yKs0ieWJ6MW0IfClZyAg2VkKFc18wiTrQF9i0giA9yFDahb5mAvqFSGZQZ9yXgyL3e2Ou94rdl0X7lhDIlDKLHnZhfyIPZpQsx1UWG1XCoPCn6B9xtWdw9k7WZj/oIFe79ZkQSQaUhHGtd9yqxaRYGUY4zfJ+omMSRtf0rpFiQXVjXSQbXrNKEYaTskwYO1O8TKRZa90Rgnf1T9e9eX/yvV09MuNVImYwTQyUZLgoTDk0E+4HCJlOUGN6zgIli9q+QtLDCxNjY8zaEPyf/hdNKybN8bNPYB8OaBMtgBRSBzbBLjgER6AKCLgFj+AZvDh3zpPz6rwNrTnc2YJ/Cjn/QNKGrGP</latexit> <latexit sha1_base64="1t3sG2NE1Erwu7My7b6VndIN3d0=">ACQ3icbZDLSgMxFIYz9VbrerSTbAIHRCZcaMgqCywrWip0aMmDSaZIcloyzDv5sYXcOcLuHGhiFvBtJbi7UDg4z/4eT8YcKZNp736BQmJqemZ4qzpbn5hcWl8vLKuY5TRWidxDxWFyHWlDNJ64YZTi8SRbEIOW2E14eDfuOGKs1ieWb6CW0J3JEsYgQbK6HyZSCYRLcwENh0wzA7zlHWg4FmAgaqGyOZw4ByXo2qvb1bd7Nf7bku3LeGVKCMWfLzKzm2IDYyIea6qFzxtrxhwTH4v6ECRlVD5YegHZNUGkIx1o3/e3EtDKsDCOc5qUg1TB5Bp3aNOixILqVjbMIcbVmnDKFb2SQOH6veJDAut+yK0zsGp+ndvIP7Xa6Ym2m1lTCapoZJ8LYpSDk0MB4HCNlOUGN63gIli9q+QdLHCxNjYSzaEPyf/hfPtLd/yqVc5OBrFUQRrYB1UgQ92wAE4ATVQBwTcgSfwAl6de+fZeXPev6wFZzSzCn6U8/EJVBWunQ=</latexit> DNNs in Math Language: high dimensional function fitting n X min w E x ∼ ρ n ` ( f ( x ; w ) , y ( x )) = ` ( f ( x i ; w ) , y ( x i )) i =1 Data fitting problem: f parameterized map from images to probability vectors on labels. y is the correct label. Try to fit data by minimizing the loss. Training: minimize expected loss, by taking stochastic (approximate) gradients Note: train on an empirical distribution sampled from the density rho.

Partial Differential Equations Approaches to Optimization and - PowerPoint PPT Presentation

Partial Differential Equations Approaches to Optimization and Regularization of Deep Neural Networks Celebrating 75 Years of Mathematics of Computation ICERM Nov 2, 2018 Adam Oberman McGill Dept of Math and Stats supported by NSERC, Simons

15. Partial differential equations; double integrals 15.1. Partial differential equations. Recall

Modelling with Differential Equations Modelling with Differential Equations Modelling with

Stochastic (partial) differential equations and Gaussian processes Simo Srkk Aalto

Differential Equations Classification and Simulation November 25, 2008 M. Emmerich, LIACS 1

Numerical Solutions to Partial Differential Equations Zhiping Li LMAM and School of Mathematical

Parallel Numerical Algorithms Discretised Partial Differential Equations 1 Overview of Lecture

Differential equations Programming of Differential Equations A differential equation (ODE)

Differential equations Programming of Differential Equations A differential equation (ODE)

1.3 Differential Equations as Mathematical Models a lesson for MATH F302 Differential Equations

Concepts and Algorithms of Scientific and Visual Computing Partial Differential Equations

Singular stochastic partial differential equations Giovanni Jona-Lasinio Firenze, November 23,

MA-207 Differential Equations II Ronnie Sebastian Department of Mathematics Indian Institute of

Dynamic Process Models Moritz Diehl Overview Ordinary Differential Equations (ODE)

Overview Partial Constituent Fronting in German The phenomenon: Partial constituent fronting

Differential Equations x i +1 x i x i +1 = x i + x x Physics-based simulation

TOC 1 Introduction to Differential Equations 1.1 Preliminaries 1.2 Differential Equations; Basic

Effect Handlers in Multicore OCaml Daniel Hillerstrm, Daan Leijen, Sam Lindley, Matija Pretnar,

f National Disaster Resilience Competition (NDRC) Q&A Session: MID-URN Summary Checklist

Secure Computation ORAM The Case of 3-Party Computation Stanislaw Jarecki, UC Irvine

The he Sales C Com ompa parison Appr pproa oach Pa Part B B 2020 Le Level el I I

Cliquewidth and Knowledge Compilation Igor Razgon 1 & Justyna Petke 2 1 Birkbeck, University of

A Partial Deducer Assisted by Predefined Assertions and a Backwards Analyzer an Puebla and

ECE 550D Fundamentals of Computer Systems and Engineering Fall 2016 Introduction Tyler Bletsch

11th International Satisfiability Modulo Theories Competition SMT-COMP 2016 Sylvain Conchon

Partial Differential Equations Approaches to Optimization and - PowerPoint PPT Presentation

Partial Differential Equations Approaches to Optimization and Regularization of Deep Neural Networks Celebrating 75 Years of Mathematics of Computation ICERM Nov 2, 2018 Adam Oberman McGill Dept of Math and Stats supported by NSERC, Simons

15. Partial differential equations; double integrals 15.1. Partial differential equations. Recall

Modelling with Differential Equations Modelling with Differential Equations Modelling with

Stochastic (partial) differential equations and Gaussian processes Simo Srkk Aalto

Differential Equations Classification and Simulation November 25, 2008 M. Emmerich, LIACS 1

Numerical Solutions to Partial Differential Equations Zhiping Li LMAM and School of Mathematical

Parallel Numerical Algorithms Discretised Partial Differential Equations 1 Overview of Lecture

Differential equations Programming of Differential Equations A differential equation (ODE)

Differential equations Programming of Differential Equations A differential equation (ODE)

1.3 Differential Equations as Mathematical Models a lesson for MATH F302 Differential Equations

Concepts and Algorithms of Scientific and Visual Computing Partial Differential Equations

Singular stochastic partial differential equations Giovanni Jona-Lasinio Firenze, November 23,

MA-207 Differential Equations II Ronnie Sebastian Department of Mathematics Indian Institute of

Dynamic Process Models Moritz Diehl Overview Ordinary Differential Equations (ODE)

Overview Partial Constituent Fronting in German The phenomenon: Partial constituent fronting

Differential Equations x i +1 x i x i +1 = x i + x x Physics-based simulation

TOC 1 Introduction to Differential Equations 1.1 Preliminaries 1.2 Differential Equations; Basic

Effect Handlers in Multicore OCaml Daniel Hillerstrm, Daan Leijen, Sam Lindley, Matija Pretnar,

f National Disaster Resilience Competition (NDRC) Q&amp;A Session: MID-URN Summary Checklist

Secure Computation ORAM The Case of 3-Party Computation Stanislaw Jarecki, UC Irvine

The he Sales C Com ompa parison Appr pproa oach Pa Part B B 2020 Le Level el I I

Cliquewidth and Knowledge Compilation Igor Razgon 1 &amp; Justyna Petke 2 1 Birkbeck, University of

A Partial Deducer Assisted by Predefined Assertions and a Backwards Analyzer an Puebla and

ECE 550D Fundamentals of Computer Systems and Engineering Fall 2016 Introduction Tyler Bletsch

11th International Satisfiability Modulo Theories Competition SMT-COMP 2016 Sylvain Conchon

f National Disaster Resilience Competition (NDRC) Q&A Session: MID-URN Summary Checklist

Cliquewidth and Knowledge Compilation Igor Razgon 1 & Justyna Petke 2 1 Birkbeck, University of