The landscape of non-convex losses for statistical learning problems - PowerPoint PPT Presentation

The landscape of non-convex losses for statistical learning problems Song Mei Stanford University October 19, 2017 Song Mei (Stanford University) The landscape of non-convex optimization October 19, 2017 1 / 32

Deep learning Song Mei (Stanford University) The landscape of non-convex optimization October 19, 2017 2 / 32

Convolutional Neural Network Song Mei (Stanford University) The landscape of non-convex optimization October 19, 2017 3 / 32

Non-convex optimization Song Mei (Stanford University) The landscape of non-convex optimization October 19, 2017 4 / 32

Why does non-convex neural network perform well? Song Mei (Stanford University) The landscape of non-convex optimization October 19, 2017 5 / 32

Why does some non-convex optimization perform well? Song Mei (Stanford University) The landscape of non-convex optimization October 19, 2017 6 / 32

Why does some non-convex optimization perform well? ◮ Stochastic gradient descent escape bad local minima. ◮ Good initialization escape bad local minima. ◮ Globally there are less bad local minima. ◮ .... Song Mei (Stanford University) The landscape of non-convex optimization October 19, 2017 6 / 32

Non-convex optimization: analysis of global geometry Number and locations of saddle points and local minima. Song Mei (Stanford University) The landscape of non-convex optimization October 19, 2017 7 / 32

Let’s do it! The objective function ♥ ❳ ✶ ❢ ② ✐ � ✛ ✭ ❲ ❦ ✁ ✁ ✁ ✛ ✭ ❲ ✷ ✁ ✛ ✭ ❲ ✶ ① ✐ ✮✮✮ ❣ ✷ ♠✐♥ ♥ ❲ ✐ ✐ ❂✶ Song Mei (Stanford University) The landscape of non-convex optimization October 19, 2017 8 / 32

Let’s do it! The objective function ♥ ❳ ✶ ❢ ② ✐ � ✛ ✭ ❲ ✷ ✁ ✛ ✭ ❲ ✶ ① ✐ ✮✮ ❣ ✷ ♠✐♥ ♥ ❲ ✐ ✐ ❂✶ Song Mei (Stanford University) The landscape of non-convex optimization October 19, 2017 8 / 32

Let’s do it! The objective function ♥ ❳ ✶ ❢ ② ✐ � ✛ ✭ ❤ ✒❀ ① ✐ ✐ ✮ ❣ ✷ ♠✐♥ ♥ ✒ ✐ ❂✶ Song Mei (Stanford University) The landscape of non-convex optimization October 19, 2017 8 / 32

Binary linear classification The model ③ ✐ ❂ ✭ ① ✐ ❀ ② ✐ ✮ . ① ✐ ✷ R ❞ , ② ✐ ✷ ❢ ✵ ❀ ✶ ❣ . Song Mei (Stanford University) The landscape of non-convex optimization October 19, 2017 9 / 32

One node neural network The model ③ ✐ ❂ ✭ ① ✐ ❀ ② ✐ ✮ . ① ✐ ✷ R ❞ , ② ✐ ✷ ❢ ✵ ❀ ✶ ❣ . ◮ Convex logit loss ( ❵ ❝ is cvx in ✒ ) ❵ ❝ ✭ ✒ ❀ ③ ✮ ❂ ② ❤ ①❀ ✒ ✐ � ❧♦❣ ❢ ✶ ✰ ❡①♣✭ ❤ ①❀ ✒ ✐ ✮ ❣ ✿ ◮ Non-convex loss ( ❵ is not cvx in ✒ ) ❵ ✭ ✒ ❀ ③ ✮ ❂ ❢ ② � ✛ ✭ ❤ ①❀ ✒ ✐ ✮ ❣ ✷ ❀ where ✛ ✭ t ✮ ❂ ✶ ❂ ✭✶ ✰ ❡①♣✭ t ✮✮ ✿ ◮ Empirical Risk ♥ ♥ ❳ ❳ ❘ ♥ ✭ ✒ ✮ ❂ ✶ ❵ ✭ ✒ ❀ ③ ✐ ✮ ❂ ✶ ❜ ❢ ② ✐ � ✛ ✭ ❤ ✒❀ ① ✐ ✐ ✮ ❣ ✷ ✿ ♥ ♥ ✐ ❂✶ ✐ ❂✶ ◮ Empirical risk minimizer ❫ ❜ ✒ ♥ ❂ ❛r❣ ♠✐♥ ❘ ♥ ✭ ✒ ✮ ✿ ✒ ✷ B ❞ ✭ ❘ ✮ Song Mei (Stanford University) The landscape of non-convex optimization October 19, 2017 10 / 32

❜ ❘ ♥ ✭ ✒ ✮ A negative theoretical result Theorem (Auer et. al. . 1996) For the one node neural network, ✽ ♥❀ ❞ ❃ ✵ , there exists a dataset ❞ ❝ ❞ distinct local ✭ ① ✐ ❀ ② ✐ ✮ ♥ ✐ ❂✶ such that the empirical risk ❜ ❘ ♥ ✭ ✒ ✮ has ❜ ♥ minima. Song Mei (Stanford University) The landscape of non-convex optimization October 19, 2017 11 / 32

A negative theoretical result Theorem (Auer et. al. . 1996) For the one node neural network, ✽ ♥❀ ❞ ❃ ✵ , there exists a dataset ❞ ❝ ❞ distinct local ✭ ① ✐ ❀ ② ✐ ✮ ♥ ✐ ❂✶ such that the empirical risk ❜ ❘ ♥ ✭ ✒ ✮ has ❜ ♥ minima. The landscape of ❜ ❘ ♥ ✭ ✒ ✮ is very rough. Song Mei (Stanford University) The landscape of non-convex optimization October 19, 2017 11 / 32

A negative theoretical result Theorem (Auer et. al. . 1996) For the one node neural network, ✽ ♥❀ ❞ ❃ ✵ , there exists a dataset ❞ ❝ ❞ distinct local ✭ ① ✐ ❀ ② ✐ ✮ ♥ ✐ ❂✶ such that the empirical risk ❜ ❘ ♥ ✭ ✒ ✮ has ❜ ♥ minima. The landscape of ❜ ❘ ♥ ✭ ✒ ✮ is very rough. Is this the end of the world of deep learning? Song Mei (Stanford University) The landscape of non-convex optimization October 19, 2017 11 / 32

Real data experiment ◮ The "Australian" data set from Statlog: ❞ ❂ ✶✶ , ♥ ❂ ✻✽✸ . ◮ Random initialization ✒ ✭✵✮ ✘ ◆ ✭ 0 ❀ ■ ❞ ✮ . ◮ Run gradient descent and track the path ✒ ✭ ❦ ✮ . ◮ Generate multiple paths with independent initializations. ◮ Plot standard deviation over paths st❞✭ ✒ ✭ ❦ ✮✮ versus ❦ . Song Mei (Stanford University) The landscape of non-convex optimization October 19, 2017 12 / 32

The landscape of non-convex losses for statistical learning problems - PowerPoint PPT Presentation

The landscape of non-convex losses for statistical learning problems Song Mei Stanford University October 19, 2017 Song Mei (Stanford University) The landscape of non-convex optimization October 19, 2017 1 / 32 Deep learning Song Mei

Convex Hell 362 dnc CS 16: Convex Hull Whoops, I mean... Convex Hull Whats a Convex Hull?

Contents of Presentation Types of losses Causes of losses Prevention of losses

Convex hull 1 - 1 Convex hull 1 - 2 Convex hull 1 - 3 Convex hull Definition, extremal

CS133 Computational Geometry Convex Hull 1 Convex Hull Given a set of n points, find the

constrained convex optimization virgil pavlu 1 convex set a set X in a vector space is convex if

The landscape of empirical risk for non-convex losses Song Mei ICME, Stanford December 3, 2016

Optimizing Convex Functions over Non-Convex Domains Dan Bienstock and Alex Michalka

CS675: Convex and Combinatorial Optimization Spring 2018 Convex Sets Instructor: Shaddin Dughmi

Convex hull: basic facts Convex hull: basic facts CG Lecture 1 CG Lecture 1 Problem : give a set

Convex hulls of spheres and convex hulls of convex polytopes lying on parallel hyperplanes

CS675: Convex and Combinatorial Optimization Fall 2019 Convex Functions Instructor: Shaddin

Convex Analysis Jos e De Don a September 2004 Centre of Complex Dynamic Systems and

CS675: Convex and Combinatorial Optimization Fall 2019 Convex Sets Instructor: Shaddin Dughmi

CS133 Computational Geometry Convex Hull 4/12/2018 1 Convex Hull Given a set of n points,

CS675: Convex and Combinatorial Optimization Fall 2014 Convex Functions Instructor: Shaddin

Food Losses/Waste in Food Value Chains Food Losses/Waste in Food Value Chains Areas

Banks Non-Interest Income and Systemic Risk Markus Brunnermeier, Princeton University Gang

Monetary Policy Drivers of Bond and Equity Risks John Y. Campbell, Carolin Pflueger, and Luis M.

Post-Winter Ozone Season Public Meeting June 26, 2019 Boulder Community Center Boulder, WY 1

Emotional Wellbeing & Mental Health Services Children, Young People and Families Health

Consultation meeting with stakeholders Request from the European Commission for advice on the

STANDARDIZED METHODOLOGY for the elaboration of ice throw risk assessments Andreas Krenn

As opinies expressas neste trabalho so exclusivamente do(s) autor(es) e no refletem,

OECD REVIEW OF SURVEYS ON THE DIGITAL SECURITY RISK MANAGEMENT PRACTICES OF BUSINESSES Benjamin

The landscape of non-convex losses for statistical learning problems - PowerPoint PPT Presentation

The landscape of non-convex losses for statistical learning problems Song Mei Stanford University October 19, 2017 Song Mei (Stanford University) The landscape of non-convex optimization October 19, 2017 1 / 32 Deep learning Song Mei

Convex Hell 362 dnc CS 16: Convex Hull Whoops, I mean... Convex Hull Whats a Convex Hull?

Contents of Presentation Types of losses Causes of losses Prevention of losses

Convex hull 1 - 1 Convex hull 1 - 2 Convex hull 1 - 3 Convex hull Definition, extremal

CS133 Computational Geometry Convex Hull 1 Convex Hull Given a set of n points, find the

constrained convex optimization virgil pavlu 1 convex set a set X in a vector space is convex if

The landscape of empirical risk for non-convex losses Song Mei ICME, Stanford December 3, 2016

Optimizing Convex Functions over Non-Convex Domains Dan Bienstock and Alex Michalka

CS675: Convex and Combinatorial Optimization Spring 2018 Convex Sets Instructor: Shaddin Dughmi

Convex hull: basic facts Convex hull: basic facts CG Lecture 1 CG Lecture 1 Problem : give a set

Convex hulls of spheres and convex hulls of convex polytopes lying on parallel hyperplanes

CS675: Convex and Combinatorial Optimization Fall 2019 Convex Functions Instructor: Shaddin

Convex Analysis Jos e De Don a September 2004 Centre of Complex Dynamic Systems and

CS675: Convex and Combinatorial Optimization Fall 2019 Convex Sets Instructor: Shaddin Dughmi

CS133 Computational Geometry Convex Hull 4/12/2018 1 Convex Hull Given a set of n points,

CS675: Convex and Combinatorial Optimization Fall 2014 Convex Functions Instructor: Shaddin

Food Losses/Waste in Food Value Chains Food Losses/Waste in Food Value Chains Areas

Banks Non-Interest Income and Systemic Risk Markus Brunnermeier, Princeton University Gang

Monetary Policy Drivers of Bond and Equity Risks John Y. Campbell, Carolin Pflueger, and Luis M.

Post-Winter Ozone Season Public Meeting June 26, 2019 Boulder Community Center Boulder, WY 1

Emotional Wellbeing &amp; Mental Health Services Children, Young People and Families Health

Consultation meeting with stakeholders Request from the European Commission for advice on the

STANDARDIZED METHODOLOGY for the elaboration of ice throw risk assessments Andreas Krenn

As opinies expressas neste trabalho so exclusivamente do(s) autor(es) e no refletem,

OECD REVIEW OF SURVEYS ON THE DIGITAL SECURITY RISK MANAGEMENT PRACTICES OF BUSINESSES Benjamin

Emotional Wellbeing & Mental Health Services Children, Young People and Families Health