La th eorie PAC-Bayes en apprentissage supervis e Pr esentation - PowerPoint PPT Presentation

The mathematics of the PAC-Bayes Theory PAC-Bayes bounds and algorithms References La th´ eorie PAC-Bayes en apprentissage supervis´ e Pr´ esentation au LRI de l’universit´ e Paris XI Fran¸ cois Laviolette, Laboratoire du GRAAL, Universit´ e Laval, Qu´ ebec, Canada 14 dcembre 2010 Fran¸ cois Laviolette, Laboratoire du GRAAL, Universit´ e Laval, Qu´ La th´ ebec, Canada eorie PAC-Bayes en apprentissage supervis´ e

The mathematics of the PAC-Bayes Theory PAC-Bayes bounds and algorithms References Summary Aujourd’hui, j’ai l’intention de vous pr´ esenter les math´ ematiques qui sous tendent la th´ eorie PAC-Bayes vous pr´ esenter des algorithmes qui consistent en la minimisation d’une borne PAC-Bayes et comparer ces derniers avec des algorithmes existants. Fran¸ cois Laviolette, Laboratoire du GRAAL, Universit´ e Laval, Qu´ La th´ ebec, Canada eorie PAC-Bayes en apprentissage supervis´ e

The mathematics of the PAC-Bayes Theory PAC-Bayes bounds and algorithms Derivation of classical PAC-Bayes bound References Definitions Each example ( x , y ) ∈ X × {− 1 , +1 } , is drawn acc. to D . The (true) risk R ( h ) and training error R S ( h ) are defined as: m = 1 � def def R ( h ) = ( x , y ) ∼ D I ( h ( x ) � = y ) E ; R S ( h ) I ( h ( x i ) � = y i ) . m i =1 The learner’s goal is to choose a posterior distribution Q on a space H of classifiers such that the risk of the Q -weighted majority vote B Q is as small as possible. � � def B Q ( x ) = sgn h ∼ Q h ( x ) E B Q is also called the Bayes classifier . Fran¸ cois Laviolette, Laboratoire du GRAAL, Universit´ e Laval, Qu´ La th´ ebec, Canada eorie PAC-Bayes en apprentissage supervis´ e

The mathematics of the PAC-Bayes Theory PAC-Bayes bounds and algorithms Derivation of classical PAC-Bayes bound References The Gibbs clasifier PAC-Bayes approach does not directly bounds the risk of B Q It bounds the risk of the Gibbs classifier G Q : to predict the label of x , G Q draws h from H and predicts h ( x ) The risk and the training error of G Q are thus defined as: R ( G Q ) = h ∼ Q R ( h ) ; R S ( G Q ) = h ∼ Q R S ( h ) . E E Fran¸ cois Laviolette, Laboratoire du GRAAL, Universit´ e Laval, Qu´ La th´ ebec, Canada eorie PAC-Bayes en apprentissage supervis´ e

The mathematics of the PAC-Bayes Theory PAC-Bayes bounds and algorithms Derivation of classical PAC-Bayes bound References G Q , B Q , and KL ( Q � P ) If B Q misclassifies x , then at least half of the classifiers (under measure Q ) err on x . Hence: R ( B Q ) ≤ 2 R ( G Q ) Thus, an upper bound on R ( G Q ) gives rise to an upper bound on R ( B Q ) PAC-Bayes makes use of a prior distribution P on H . The risk bound depends on the Kullback-Leibler divergence : h ∼ Q ln Q ( h ) def KL ( Q � P ) = E P ( h ) . Fran¸ cois Laviolette, Laboratoire du GRAAL, Universit´ e Laval, Qu´ La th´ ebec, Canada eorie PAC-Bayes en apprentissage supervis´ e

The mathematics of the PAC-Bayes Theory PAC-Bayes bounds and algorithms Derivation of classical PAC-Bayes bound References A PAC-Bayes bound to rule them all ! , J.R.R. Tolkien, roughly or John Langford, less roughly. Theorem 1 Germain et al. 2009 For any distribution D on X × Y , for any set H of classifiers, for any prior distribution P of support H , for any δ ∈ (0 , 1], and for any convex function D : [0 , 1] × [0 , 1] → R , we have � ∀ Q on H : D ( R S ( G Q ) , R ( G Q )) ≤ Pr S ∼ D m 1 � � 1 �� h ∼ P e m D ( R S ( h ) , R ( h )) KL ( Q � P ) + ln δ E E m S ∼ D ≥ 1 − δ . Fran¸ cois Laviolette, Laboratoire du GRAAL, Universit´ e Laval, Qu´ La th´ ebec, Canada eorie PAC-Bayes en apprentissage supervis´ e

The mathematics of the PAC-Bayes Theory PAC-Bayes bounds and algorithms Derivation of classical PAC-Bayes bound References Proof of Theorem 1 h ∼ P e m D ( R S ( h ) , R ( h )) is a non-negative r.v., Markov’s inequality gives Since E „ « h ∼ P e m D ( RS ( h ) , R ( h )) ≤ 1 h ∼ P e m D ( RS ( h ) , R ( h )) Pr E E E ≥ 1 − δ . δ S ∼ Dm S ∼ Dm Hence, by taking the logarithm on each side of the inequality and by transforming the expectation over P into an expectation over Q : „ – « » – » P ( h ) Q ( h ) e m D ( RS ( h ) , R ( h )) 1 h ∼ P e m D ( RS ( h ) , R ( h )) ∀ Q : ln ≤ ln ≥ 1 − δ . Pr E E E δ S ∼ Dm S ∼ Dm h ∼ Q Then, exploiting the fact that the logarithm is a concave function, by an application of Jensen’s inequality, we obtain „ – « h P ( h ) » Q ( h ) e m D ( RS ( h ) , R ( h )) i 1 h ∼ P e m D ( RS ( h ) , R ( h )) Pr ∀ Q : h ∼ Q ln E ≤ ln E E ≥ 1 − δ . δ S ∼ Dm S ∼ Dm Fran¸ cois Laviolette, Laboratoire du GRAAL, Universit´ e Laval, Qu´ La th´ ebec, Canada eorie PAC-Bayes en apprentissage supervis´ e

The mathematics of the PAC-Bayes Theory PAC-Bayes bounds and algorithms Derivation of classical PAC-Bayes bound References Proof of Theorem 1 (cont) „ – « h P ( h ) » Q ( h ) e m D ( RS ( h ) , R ( h )) i 1 h ∼ P e m D ( RS ( h ) , R ( h )) Pr ∀ Q : h ∼ Q ln E ≤ ln E E ≥ 1 − δ . δ S ∼ Dm S ∼ Dm From basic logarithm properties, and from the fact that i def h P ( h ) h ∼ Q ln = − KL ( Q � P ), we now have E Q ( h ) „ – « » 1 h ∼ P e m D ( RS ( h ) , R ( h )) Pr ∀ Q : − KL ( Q � P )+ E h ∼ Q m D ( R S ( h ) , R ( h )) ≤ ln E E ≥ 1 − δ . S ∼ Dm δ S ∼ Dm Then, since D has been supposed convexe, again by the Jensen inequality, we have „ « h ∼ Q m D ( R S ( h ) , R ( h )) = m D E h ∼ Q R S ( h ) , E E h ∼ Q R ( h ) , which immediately implies the result. ✷ Fran¸ cois Laviolette, Laboratoire du GRAAL, Universit´ e Laval, Qu´ La th´ ebec, Canada eorie PAC-Bayes en apprentissage supervis´ e

The mathematics of the PAC-Bayes Theory PAC-Bayes bounds and algorithms Derivation of classical PAC-Bayes bound References Applicability of Theorem 1 � � 1 h ∼ P e m D ( R S ( h ) , R ( h )) How can we estimate ln ? E E δ S ∼ D m Fran¸ cois Laviolette, Laboratoire du GRAAL, Universit´ e Laval, Qu´ La th´ ebec, Canada eorie PAC-Bayes en apprentissage supervis´ e

The mathematics of the PAC-Bayes Theory PAC-Bayes bounds and algorithms Derivation of classical PAC-Bayes bound References The Seeger’s bound (2002) Seeger Bound For any D , any H , any P of support H , any δ ∈ (0 , 1], we have � Pr ∀ Q on H : kl ( R S ( G Q ) , R ( G Q )) ≤ S ∼ D m � �� 1 KL ( Q � P ) + ln ξ ( m ) ≥ 1 − δ , δ m = q ln q def p + (1 − q ) ln 1 − q where kl ( q , p ) 1 − p , = � m def � m � ( k / m ) k (1 − k / m ) m − k . and where ξ ( m ) k =0 k ξ ( m ) ≤ 2 √ m Note: Fran¸ cois Laviolette, Laboratoire du GRAAL, Universit´ e Laval, Qu´ La th´ ebec, Canada eorie PAC-Bayes en apprentissage supervis´ e

The mathematics of the PAC-Bayes Theory PAC-Bayes bounds and algorithms Derivation of classical PAC-Bayes bound References Graphical illustration of the Seeger bound kl(0.1||R(Q)) 0. 4 0. 3 0. 2 0. 1 R(Q) 0. 1 0. 2 0. 3 0. 4 0. 5 Borne Inf Borne Sup Fran¸ cois Laviolette, Laboratoire du GRAAL, Universit´ e Laval, Qu´ La th´ ebec, Canada eorie PAC-Bayes en apprentissage supervis´ e

The mathematics of the PAC-Bayes Theory PAC-Bayes bounds and algorithms Derivation of classical PAC-Bayes bound References Proof of the Seeger bound Follows immediately from Theorem 1 by choosing D ( q , p ) = kl ( q , p ). Indeed, in that case we have “ RS ( h ) ” mRS ( h ) “ 1 − RS ( h ) ” m (1 − RS ( h )) h ∼ P e m D ( RS ( h ) , R ( h )) = E E E E R ( h ) 1 − R ( h ) S ∼ Dm S ∼ Dm h ∼ P ! k m − k k 1 − k ! = P m S ∼ Dm ( R S ( h )= k m ) m m E Pr k =0 R ( h ) 1 − R ( h ) h ∼ P k ) ( k / m ) k (1 − k / m ) m − k , = P m k =0 ( m (1) 2 √ m . ≤ ✷ ` R S ( h ) = k ´ Note that, in Line (1) of the proof, is replaced by the Pr m S ∼ D m probability mass function of the binomial. (i.e., S ∼ D m ) This is only true if the examples of S are drawn iid. So this result is no longuer valid in the non iid case, even if Theorem 1 is. Fran¸ cois Laviolette, Laboratoire du GRAAL, Universit´ e Laval, Qu´ La th´ ebec, Canada eorie PAC-Bayes en apprentissage supervis´ e

The mathematics of the PAC-Bayes Theory PAC-Bayes bounds and algorithms Derivation of classical PAC-Bayes bound References The McAllester’s bound (1998) Put D ( q , p ) = 1 2 ( q − p ) 2 , Theorem 1 then gives McAllester Bound For any D , any H , any P of support H , any δ ∈ (0 , 1], we have � ∀ Q on H : 1 2( R S ( G Q ) , R ( G Q )) 2 ≤ Pr S ∼ D m 1 � KL ( Q � P ) + ln ξ ( m ) �� ≥ 1 − δ , m δ p + (1 − q ) ln 1 − q = q ln q def where kl ( q , p ) 1 − p , def = � m � m ( k / m ) k (1 − k / m ) m − k . � and where ξ ( m ) k =0 k Fran¸ cois Laviolette, Laboratoire du GRAAL, Universit´ e Laval, Qu´ La th´ ebec, Canada eorie PAC-Bayes en apprentissage supervis´ e

La th eorie PAC-Bayes en apprentissage supervis e Pr esentation - PowerPoint PPT Presentation

The mathematics of the PAC-Bayes Theory PAC-Bayes bounds and algorithms References La th eorie PAC-Bayes en apprentissage supervis e Pr esentation au LRI de luniversit e Paris XI Fran cois Laviolette, Laboratoire du GRAAL,

Guiding Financial Controls and Practices for PACs and PAC Treasurers PAC Treasurers Workshop

Naive Bayes and Gaussian Bayes Classifier Ladislav Rampasek slides by Mengye Ren and others

NAPSLO PAC Contributions How contributing to the NAPSLO PAC will benefit you, your company and the

WELCOME June 2011 PAC Presentation Opening Remarks Introductions June 2011 PAC

AAOS Orthopaedic PAC The Orthopaedic PAC is the only national political action committee

LArIAT Fermilab PAC Meeting November 11, 2016 Jen Raaf PAC Charge Fermilab PAC Meeting, J.

The Nave Bayes Classifier Machine Learning 1 Todays lecture The nave Bayes Classifier

Bayes Theorem Thomas Bayes (1701-1761) Simple form of Bayes Theorem, for

Data Dependent Priors in PAC-Bayes Bounds John Shawe-Taylor University College London Joint work

DATA MINING: NAVE BAYES 1 Nave Bayes Classifier Thomas Bayes 1702 - 1761 We will start off

Cognitive Modeling Unseen Examples 2 Bayes Classifiers Lecture 14: Naive Bayes Classifiers

STAT 339 Naive Bayes Classification 8-10 March 2017 Colin Reimer Dawson Outline Naive Bayes

Bayes Classifiers Nave Bayes Classification Patrick Mair Bayes Classifiers Weather data

I ntroduction to Mobile Robotics Bayes Filter Kalm an Filter Wolfram Burgard 1 Bayes

HERITAGE SQUARE CONSIDERATIONS Public Process Project Advisory Committee Meetings: PAC Meeting

Interferometric Sensor (MAGIS-100) PAC Meeting Jason Hogan on behalf of the MAGIS

Computational Learning Theory Based on Machine Learning, T. Mitchell, McGRAW Hill, 1997, ch.

Meeting January 10, 2018 2 Agenda Networking 1. CCB Update: Where Weve Been and Where

PAC Statistical Model Checking for Markov Decision Processes and Stochastic Games 1 Pranav Ashok,

L ECTURE 15: Regrade requests: L EARNING T HEORY Send us email, and come and see me next

Money 101 2019 PAC Conference March 30, 2019 Background Owner of Green Mountain Financial

PAC Learning and The VC Dimension Rectangle Game Fix a rectangle (unknown to you): From An

La Costa Canyon High School Culinary Arts Modernization Start Date: 5/1/20 Est. Completion

Pa C aml Pac-Man Game Programming Language Chun-Kang Chen/Hui-Hsiang Kuo/Wenxin Zhu/Shuwei Cao