Regularization prescriptions and convex duality: density estimation - PowerPoint PPT Presentation

Regularization prescriptions and convex duality: density estimation and Renyi entropies Ivan Mizera University of Alberta Department of Mathematical and Statistical Sciences Edmonton, Alberta, Canada Linz, October 2008 joint work with Roger Koenker (University of Illinois at Urbana-Champaign) Gratefully acknowledging the support of the Natural Sciences and Engineering Research Council of Canada

Density estimation (say) A useful heuristics: maximum likelihood Given the datapoints X 1 , X 2 , . . . , X n , solve n � f ( X i ) � max ! f i = 1 or equivalently n � − log f ( X i ) � min ! f i = 1 under the side conditions � f � 0, f = 1 1

Note that useful... 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0 0 5 10 15 20 25 2

Dirac catastrophe! 3

Preventing the disaster for general case • Sieves (...) 4

Preventing the disaster for general case • Sieves (...) • Regularization � n � − log f ( X i ) � min ! f � 0, f = 1 f i = 1 4

Preventing the disaster for general case • Sieves (...) • Regularization � n � − log f ( X i ) � min ! J ( f ) � Λ , f � 0, f = 1 f i = 1 4

Preventing the disaster for general case • Sieves (...) • Regularization � n � − log f ( x i ) + λJ ( f ) � min ! f � 0, f = 1 f i = 1 4

Preventing the disaster for general case • Sieves (...) • Regularization � n � log f ( x i ) + λJ ( f ) � min ! f � 0, f = 1 − f i = 1 J ( · ) - penalty (penalizing complexity, lack of smoothness etc.) � | ( log f ) ′′ | = TV (( log f ) ′ ) for instance, J ( f ) = � | ( log f ) ′′′ | = TV (( log f ) ′′ ) or also J ( f ) = Good (1971), Good and Gaskins (1971), Silverman (1982), Leonard (1978), Gu (2002), Wahba, Lin, and Leng (2002) See also: Eggermont and LaRiccia (2001) Ramsay and Silverman (2006) Hartigan (2000), Hartigan and Hartigan (1985) Davies and Kovac (2004) 4

See also in particular Roger Koenker and Ivan Mizera (2007) Density estimation by total variation regularization Roger Koenker and Ivan Mizera (2006) The alter egos of the regularized maximum likelihood density estimators: deregularized maximum-entropy, Shannon, R´ enyi, Simpson, Gini, and stretched strings Roger Koenker, Ivan Mizera, and Jungmo Yoon (200?) What do kernel density estimators optimize? Roger Koenker and Ivan Mizera (2008): Primal and dual formulations relevant for the numerical estimation of a probability density via regularization Roger Koenker and Ivan Mizera (200?) Quasi-concave density estimation http://www.stat.ualberta.ca/ ∼ mizera/ http://www.econ.uiuc.edu/ ∼ roger/ 5

Preventing the disaster for special cases • Shape constraint: monotonicity � n � log f ( X i ) � min ! f � 0, f = 1 − f i = 1 6

Preventing the disaster for special cases • Shape constraint: monotonicity � n � log f ( X i ) � min ! f decreasing, f � 0, f = 1 − f i = 1 Grenander (1956), Jongbloed (1998), Groeneboom, Jongbloed, and Wellner (2001),... 6

Preventing the disaster for special cases • Shape constraint: monotonicity � n � log f ( X i ) � min ! f decreasing, f � 0, f = 1 − f i = 1 Grenander (1956), Jongbloed (1998), Groeneboom, Jongbloed, and Wellner (2001),... • Shape constraint: (strong) unimodality � n � log f ( X i ) � min ! f � 0, f = 1 − f i = 1 6

Preventing the disaster for special cases • Shape constraint: monotonicity � n � log f ( X i ) � min ! f decreasing, f � 0, f = 1 − f i = 1 Grenander (1956), Jongbloed (1998), Groeneboom, Jongbloed, and Wellner (2001),... • Shape constraint: (strong) unimodality � n � log f ( X i ) � min ! − log f convex, f � 0, f = 1 − f i = 1 Eggermont and LaRiccia (2000), Walther (2000) Rufibach and Dumbgen (2006) Pal, Woodroofe, and Meyer (2006) 6

Note Shape constraint: no regularization parameter to be set... ... but of course, we need to believe that the shape is plausible 7

Note Shape constraint: no regularization parameter to be set... ... but of course, we need to believe that the shape is plausible Regularization via TV penalty... ... vs log-concavity shape constraint: The differential operator is the same, only the constraint is somewhat different � | ( log f ) ′′ | � Λ , in the dual | ( log f ) ′′ | � Λ Log-concavity: ( log f ) ′′ � 0 7

Note Shape constraint: no regularization parameter to be set... ... but of course, we need to believe that the shape is plausible Regularization via TV penalty... ... vs log-concavity shape constraint: The differential operator is the same, only the constraint is somewhat different � | ( log f ) ′′ | � Λ , in the dual | ( log f ) ′′ | � Λ Log-concavity: ( log f ) ′′ � 0 Only the functional analysis may be a bit more difficult... ... so let us do the shape-constrained case first 7

The hidden charm of log-concave distributions A density f is called log-concave if − log f is convex. (Usual conventions: − log 0 = ∞ , convex where finite, ...) 8

The hidden charm of log-concave distributions A density f is called log-concave if − log f is convex. (Usual conventions: − log 0 = ∞ , convex where finite, ...) Schoenberg 1940’s, Karlin 1950’s (monotone likelihood ratio) Karlin (1968) - monograph about their mathematics Barlow and Proschan (1975) - reliability Flinn and Heckman (1975) - social choice Caplin and Nalebuff (1991a,b) - voting theory Devroye (1984) - how to simulate from them Mizera (1994) - M-estimators 8

The hidden charm of log-concave distributions A density f is called log-concave if − log f is convex. (Usual conventions: − log 0 = ∞ , convex where finite, ...) Schoenberg 1940’s, Karlin 1950’s (monotone likelihood ratio) Karlin (1968) - monograph about their mathematics Barlow and Proschan (1975) - reliability Flinn and Heckman (1975) - social choice Caplin and Nalebuff (1991a,b) - voting theory Devroye (1984) - how to simulate from them Mizera (1994) - M-estimators Uniform, Normal, Exponential, Logistic, Weibull, Gamma... - all log-concave If f is log-concave, then - it is unimodal (“strongly”) - the convolution with any unimodal density is unimodal - the convolution with any log-concave density is log-concave - f = e − g , with g convex... 8

The hidden charm of log-concave distributions A density f is called log-concave if − log f is convex. (Usual conventions: − log 0 = ∞ , convex where finite, ...) Schoenberg 1940’s, Karlin 1950’s (monotone likelihood ratio) Karlin (1968) - monograph about their mathematics Barlow and Proschan (1975) - reliability Flinn and Heckman (1975) - social choice Caplin and Nalebuff (1991a,b) - voting theory Devroye (1984) - how to simulate from them Mizera (1994) - M-estimators Uniform, Normal, Exponential, Logistic, Weibull, Gamma... - all log-concave If f is log-concave, then - it is unimodal (“strongly”) - the convolution with any unimodal density is unimodal - the convolution with any log-concave density is log-concave - f = e − g , with g convex... No heavy tails! t -distributions (finance!): not log-concave (!!) 8

A convex problem Let g = − log f ; let K be the cone of convex functions. The original problem is transformed: � n � e − g = 1 g ( X i ) � min ! g ∈ K , g i = 1 9

A convex problem Let g = − log f ; let K be the cone of convex functions. The original problem is transformed: � n � e − g � min g ( X i ) + ! g ∈ K g i = 1 9

A convex problem Let g = − log f ; let K be the cone of convex functions. The original problem is transformed: � n � e − g � min g ( X i ) + ! g ∈ K g i = 1 and generalized: let ψ be convex and nonincreasing (like e − x ) � n � e − g � min g ( X i ) + ! g ∈ K g i = 1 9

A convex problem Let g = − log f ; let K be the cone of convex functions. The original problem is transformed: � n � e − g � min g ( X i ) + ! g ∈ K g i = 1 and generalized: let ψ be convex and nonincreasing (like e − x ) � n � ψ ( g ) � min ! g ∈ K g ( X i ) + g i = 1 9

Primal and dual Recall: K is the cone of convex functions; ψ is convex and nonincreasing The strong Fenchel dual of � n 1 � g ( X i ) + ψ ( g ) dx � min ! g ∈ K (P) n g i = 1 is � f = d ( P n − G ) ψ ∗ (− f ) dx � max G ∈ K ∗ ! , (D) − dx f Extremal relation: f = − ψ ′ ( g ) . For penalized estimation, in discretized setting: Koenker and Mizera (2007b) 10

Remarks ψ ∗ ( y ) = x ∈ dom ψ ( yx − ψ ( x )) is the conjugate of ψ sup if primal solutions g are sought in some space, then dual solutions G are sought in a dual space for instance, if g ∈ C ( X ) , and X is compact, then G ∈ C ( X ) ∗ , the space of (signed) Radon measures on X . The equality f = d ( P n − G ) is thus a feasibility constraint dx (for other G , the dual objective is − ∞ ) K ∗ is the dual cone to K - a collection of (signed) Radon � measures such that gdG � 0 for any convex g . Dual: good for computation... 11

Dual: good not only for computation Couldn’t we have here heavy-tailed distribution too? ...possibly going beyond log-concavity? Recall: the strong Fenchel dual of � n 1 � g ( X i ) + ψ ( g ) dx � min ! g ∈ K (P) n g i = 1 is � f = d ( P n − G ) ψ ∗ (− f ) dx � max G ∈ K ∗ − ! , (D) dx f Extremal relation: f = − ψ ′ ( g ) . 12

Regularization prescriptions and convex duality: density estimation - PowerPoint PPT Presentation

Regularization prescriptions and convex duality: density estimation and Renyi entropies Ivan Mizera University of Alberta Department of Mathematical and Statistical Sciences Edmonton, Alberta, Canada Linz, October 2008 joint work with Roger

CS675: Convex and Combinatorial Optimization Spring 2018 Duality of Convex Sets and Functions

CS675: Convex and Combinatorial Optimization Fall 2019 Geometric Duality of Convex Sets and

Review of duality so far LP/QP duality, cone duality, set duality All are halfspace bounds

Convex Hell 362 dnc CS 16: Convex Hull Whoops, I mean... Convex Hull Whats a Convex Hull?

Convex hull 1 - 1 Convex hull 1 - 2 Convex hull 1 - 3 Convex hull Definition, extremal

constrained convex optimization virgil pavlu 1 convex set a set X in a vector space is convex if

CS133 Computational Geometry Convex Hull 1 Convex Hull Given a set of n points, find the

CS675: Convex and Combinatorial Optimization Fall 2019 Duality of Convex Optimization Problems

16. Review of convex optimization Convex sets and functions Convex programming models

CS675: Convex and Combinatorial Optimization Spring 2018 Convex Sets Instructor: Shaddin Dughmi

CS675: Convex and Combinatorial Optimization Fall 2019 Convex Functions Instructor: Shaddin

CS675: Convex and Combinatorial Optimization Fall 2019 Convex Sets Instructor: Shaddin Dughmi

CS675: Convex and Combinatorial Optimization Fall 2014 Convex Functions Instructor: Shaddin

Convex hulls of spheres and convex hulls of convex polytopes lying on parallel hyperplanes

Convex Analysis Jos e De Don a September 2004 Centre of Complex Dynamic Systems and

Convex hull: basic facts Convex hull: basic facts CG Lecture 1 CG Lecture 1 Problem : give a set

Advanced Computational Modeling of Social Systems Lars-Erik Cederman and Luc Girardin Center for

Lattice paths with catastrophes Gascom 2016 Cyril Banderier and Michael Wallner Laboratoire

A lattice study of N =2 Landau-Ginzburg model using a Nicolai map based on arXiv:1005.4671

GDR Terascale Avoiding the Goldstone Boson Catastrophe in general renormalisable field theories

Conformal Gravity The missing symmetry in GR? Reinoud J. Slagter ASFYON, The Netherlands Slagter

Thermostatic Controls for Noisy Gradient Systems and Applications to Machine Learning Ben

Making Decisions 10 AI Slides (6e) c Lin Zuoquan@PKU 1998-2020 10 1 10 Making Decisions

QCD Daniel de Florian Dpto. de Fsica- FCEyN- UBA 1 DISCLAIMER(S) Purpose(s) of these