On adaptation for the posterior distribution under local and - PowerPoint PPT Presentation

On adaptation for the posterior distribution under local and sup-norm Judith Rousseau, Marc Hoffman and Johannes Schmidt - Hieber ENSAE - CREST et CEREMADE, Université Paris-Dauphine Brown 1/ 19

Outline Bayesian nonparametric : posterior concentration 1 Generalities Adaptation Idea of the proof Why adaptation easy : white noise model 2 What about f ( x 0 ) ? or � f − f 0 � ∞ ? 3 A series of negative result 4 2/ 19

Generalities ◮ Model : Y n 1 | θ ∼ p n θ (density wrt µ ), θ ∈ Θ A priori : θ ∼ Π : prior distribution − → posterior distribution d Π( θ | X n ) = d Π( θ ) p n θ ( Y n 1 ) Y n , 1 = ( Y 1 , . . . , Y n ) m ( Y n 1 ) ◮ Posterior concentration d ( ., . ) = loss on Θ & θ 0 ∈ Θ = True E θ 0 (Π [ U ǫ n | Y n U ǫ n = { θ ; d ( θ, θ 0 ) ≤ ǫ n } ǫ n ↓ 0 1 ]) = 1 + o ( 1 ) , ◮ Minimax concentration rates on a Class Θ α ( L ) , � � �� U c M ǫ n ( α ) | Y n sup E θ 0 Π = o ( 1 ) , 1 θ 0 ∈ Θ α ( L ) where ǫ n ( α ) = minimax rate under d ( ., . ) & over Θ α ( L ) . 4/ 19

Examples of Models-losses for which nice results exist ◮ Density estimation Y i ∼ p θ i.i.d. ( √ p θ −√ p θ ′ ) 2 ( x ) dx , � � d ( p θ , p θ ′ ) 2 = d ( p θ , p θ ′ ) = | p θ − p θ ′ | ( x ) dx ◮ Regression function ǫ i ∼ N ( 0 , σ 2 ) , Y i = f ( x i ) + ǫ i , θ = ( f , σ ) n � d ( p θ , p θ ′ ) = � f − f ′ � 2 , d ( p θ , p θ ′ ) = n − 1 H 2 ( p θ ( y | X i ) , p θ ′ ( y | X i )) i = 1 H = Hellinger ◮ White noise dY ( t ) = f ( t ) dt + n − 1 / 2 dW ( t ) ⇔ Y i = θ i + n − 1 / 2 ǫ i , i ∈ N d ( p θ , p θ ′ ) = � f − f ′ � 2 5/ 19

Examples : functional classes Θ α ( L ) = Hölder ( H ( α, L ) ) ǫ n ( α ) = n − α/ ( 2 α + 1 ) minimax rate over H ( α, L ) ◮ Density example : Hellinger loss Prior = DPM � f ( x ) = f P ,σ ( x ) = φ σ ( x − µ ) dP ( µ ) , σ ∼ I Γ( a , b ) P ∼ DP ( A , G 0 ) � � �� U c M ( n / log n ) − α/ ( 2 α + 1 ) ( f 0 ) | Y n Π = o ( 1 ) , sup E f 0 1 f 0 ∈ Θ α ( L ) U ǫ ( f 0 ) = { f , h ( f 0 , f ) ≤ ǫ } [ log n term necessary ? ] � f , f 0 ) 2 � h (ˆ ˆ � ( n / log n ) − α/ ( 2 α + 1 ) , f ( x ) = E π [ f ( x ) | Y n ] ⇒ E f 0 6/ 19

Adaptation For such d ( ., . ) Adaptation is easy : The prior does not depend on α : � � �� U c M ( n / log n ) − α/ ( 2 α + 1 ) | Y n Π = o ( 1 ) , sup sup E θ 0 1 α 1 ≤ α ≤ α 2 θ 0 ∈ Θ α ( L ) ◮ why ? 8/ 19

Outline U n = U M ( n / log n ) − α/ ( 2 α + 1 ) and l n ( θ ) = log p n θ ( Y n 1 ) ǫ n = ( n / log n ) − α/ ( 2 α + 1 ) ¯ n e l n ( θ ) − l n ( θ 0 ) d Π( θ ) � Θ e l n ( θ ) − l n ( θ 0 ) d Π( θ ) := N n U c Π [ U c n | Y n 1 ] = � D n φ n = φ n ( Y n 1 ) ∈ [ 0 , 1 ] � � � ǫ n 2 � 1 ] > e − τ n ǫ 2 Π [ U c n | Y n ≤ E n D n < e − cn ¯ P θ 0 θ 0 [ φ n ] + P θ 0 n � + e ( c + τ ) n ǫ 2 E θ [ 1 − φ n ] d π ( θ ) n U c n 10/ 19

Constraints ǫ n 2 ) → d ( ., . ) E n E θ [ 1 − φ n ] = o ( e − cn ¯ θ 0 [ φ n ] = o ( 1 ) & sup d ( θ,θ 0 ) > M ¯ ǫ n � ǫ n 2 � D n < e − cn ¯ P θ 0 = o ( 1 ) We need : � e l n ( θ ) − l n ( θ 0 ) d Π( θ ) D n ≥ S n � � ≥ e − 2 n ǫ 2 ǫ n 2 } S n ∩ { l n ( θ ) − l n ( θ 0 ) > − 2 n ¯ n Π Ok if S n = { KL ( p n θ 0 , p n ǫ n 2 ; V ( p n θ 0 , p n ǫ n 2 } and θ ) ≤ n ¯ θ ) ≤ n ¯ ǫ n 2 → links d ( ., . ) Π( S n ) ≥ e − cn ¯ with KL ( ., . ) 11/ 19

example : white noise model + L 2 loss Y ik = θ ik + n − 1 / 2 ǫ ik i ∈ N , k ≤ 2 i − 1 ǫ ik ∼ N ( 0 , 1 ) , ( dY ( t ) = f ( t ) dt + n − 1 / 2 dW ( t )) ◮ Hölder class ( α ) θ 0 ∈ { θ ; | θ ik | ≤ Li − α − 1 / 2 , ∀ i , k } ◮ Prior : spike and slab θ ik ∼ ( 1 − p n ) δ ( 0 ) + p n g , e . g . g = N ( 0 , v ) , p n = 1 / n ◮ Concentration S n ≈ {� θ − θ 0 � 2 ≤ ( n / log n ) − 2 α/ ( 2 α + 1 ) } → ∀ j ≥ J n ,α , k ≤ 2 j ; θ j , k = 0 2 J n ,α = ( n / log n ) 1 / ( 2 α + 1 ) := R n , Π( S n ) � e − CR n log n := e − Cn ǫ 2 n E θ [ 1 − φ n ] ≤ e − cn ǫ 2 E θ 0 [ φ n ] = o ( 1 ) , & sup n θ ∈ Θ n ; � θ − θ 0 � � ǫ n 12/ 19

What about f ( x 0 ) ? or � f − f 0 � ∞ ? Y ik = θ ik + n − 1 / 2 ǫ ik θ 0 ∈ { θ ; | θ ik | ≤ Li − α − 1 / 2 , ǫ ik ∼ N ( 0 , 1 ) , ∀ i , k } ◮ Prior : spike and slab θ ik = ( 1 − p n ) δ ( 0 ) + p n g , p n = 1 / n ◮ losses : � ( θ ik − θ o ik ) ψ ik ( x 0 ) 2 i / 2 ) 2 l ( θ, θ 0 ) = ( (local) ik � | θ ik − θ o ik | 2 i / 2 l ( θ, θ 0 ) = � θ − θ 0 � ∞ = max (sup) k i ◮ Bayesian concentration ∀ α > 0, ∃ θ 0 ∈ Θ α ( L ) s.t. � � l ( θ, θ 0 ) ≤ n − ( α − 1 / 2 ) / ( 2 α + 1 ) log n q | Y n �� E θ 0 Π = o ( 1 ) 1 i 0 = ρ n 2 − i / 2 and θ o Sub-optimal θ o ik = 0, i ≤ I n : ∀ J > 0 ik ) 2 ≤ n − 2 α/ ( 2 α + 1 ) , ik | > n − ( α − 1 / 2 ) / ( 2 α + 1 ) log n q � � � ( θ o | θ o max k i > J k i > J 13/ 19

Risk ? Y ik = θ ik + n − 1 / 2 ǫ ik θ 0 ∈ { θ ; | θ ik | ≤ Li − α − 1 / 2 , ǫ ik ∼ N ( 0 , 1 ) , ∀ i , k } • Prior : θ ik = ( 1 − p n ) δ ( 0 ) + p n g , p n = 1 / n • Suboptimal concentration BUT ˆ θ = E π [ θ | Y n ] ( n / log n ) 2 α/ ( 2 α + 1 ) sup � � E n l (ˆ lim sup sup θ, θ 0 ) < + ∞ θ 0 n α 1 ≤ α ≤ α 2 θ 0 ∈ Θ α Questions 14/ 19

Risk ? Y ik = θ ik + n − 1 / 2 ǫ ik θ 0 ∈ { θ ; | θ ik | ≤ Li − α − 1 / 2 , ǫ ik ∼ N ( 0 , 1 ) , ∀ i , k } • Prior : θ ik = ( 1 − p n ) δ ( 0 ) + p n g , p n = 1 / n • Suboptimal concentration BUT ˆ θ = E π [ θ | Y n ] ( n / log n ) 2 α/ ( 2 α + 1 ) sup � � E n l (ˆ lim sup sup θ, θ 0 ) < + ∞ θ 0 n α 1 ≤ α ≤ α 2 θ 0 ∈ Θ α Questions ◮ Question 1 How general is this (negative) result ? 14/ 19

Risk ? Y ik = θ ik + n − 1 / 2 ǫ ik θ 0 ∈ { θ ; | θ ik | ≤ Li − α − 1 / 2 , ǫ ik ∼ N ( 0 , 1 ) , ∀ i , k } • Prior : θ ik = ( 1 − p n ) δ ( 0 ) + p n g , p n = 1 / n • Suboptimal concentration BUT ˆ θ = E π [ θ | Y n ] ( n / log n ) 2 α/ ( 2 α + 1 ) sup � � E n l (ˆ lim sup sup θ, θ 0 ) < + ∞ θ 0 n α 1 ≤ α ≤ α 2 θ 0 ∈ Θ α Questions ◮ Question 1 How general is this (negative) result ? ◮ Question 2 What does it tell us about posterior concentration ? 14/ 19

A first general result H ( α 1 , L ) ∪ H ( α 2 , L ) ⊂ Θ , α 1 < α 2 ◮ Local loss l ( θ, θ 0 ) = ( θ ( x ) − θ 0 ( x )) 2 Result : There exist no prior that leads to adaptive minimax concentration over any collection of Hölder balls : ∀ π prior on Θ , ∀ M > 0 � � l ( θ, θ 0 ) > Mn − 2 α j / ( 2 α j + 1 ) | Y n �� max sup E θ 0 Π = 1 j θ 0 ∈H ( α j , L ) • What do we loose ? ◮ L ∞ and local loss If ∃ θ 0 ∈ Θ , � � l ( θ, θ 0 ) > Mn − 2 α 2 / ( 2 α 2 + 1 ) | Y n � > e − n τ � P θ 0 Π = o ( 1 ) , τ > 0 Then worse � � l ( θ, θ 0 ) > n − ( 2 α j − τ ) / ( 2 α j + 1 ) | Y n �� max sup E θ 0 Π = 1 j θ 0 ∈H ( α j , L ) 15/ 19

Still not completely satisfying • For local loss : If we could find a prior with only log n loss then who cares ! • L ∞ loss : Smaller than e − n τ to be expected because of test can we be more precise ? Slightly 16/ 19

Another negative result H ( α 1 , L ) ∪ H ( α 2 , L ) ⊂ Θ , α 1 < α 2 ǫ n ( α ) = ( n / log n ) − α/ ( 2 α + 1 ) If there exists θ 0 ∈ H ( α 2 , L ) and 2 J n ,α 2 = ( n / log n ) 1 / ( 2 α 2 + 1 ) . π ( � θ − θ 0 � 2 ≤ c ǫ n ( α 2 )) � e − n ǫ 2 n ( α 2 ) Then there ∃ θ 1 ∈ H ( α 1 , L ) E θ 1 (Π [ l ( θ, θ 0 ) >> ǫ n ( α 1 ) | Y n ]) = 1 17/ 19

Another negative result H ( α 1 , L ) ∪ H ( α 2 , L ) ⊂ Θ , α 1 < α 2 ǫ n ( α ) = ( n / log n ) − α/ ( 2 α + 1 ) If there exists θ 0 ∈ H ( α 2 , L ) and 2 J n ,α 2 = ( n / log n ) 1 / ( 2 α 2 + 1 ) . π ( � θ − θ 0 � 2 ≤ c ǫ n ( α 2 )) � e − n ǫ 2 n ( α 2 )    ≤ e − Bn ǫ 2  � � θ 2 jk > A ǫ n ( α 2 ) 2 n ( α 2 ) π j ≥ J n ,α 2 k Then there ∃ θ 1 ∈ H ( α 1 , L ) E θ 1 (Π [ l ( θ, θ 0 ) >> ǫ n ( α 1 ) | Y n ]) = 1 17/ 19

On adaptation for the posterior distribution under local and - PowerPoint PPT Presentation

On adaptation for the posterior distribution under local and sup-norm Judith Rousseau, Marc Hoffman and Johannes Schmidt - Hieber ENSAE - CREST et CEREMADE, Universit Paris-Dauphine Brown 1/ 19 Outline Bayesian nonparametric : posterior

A O I Posterior View A O I Posterior View A O I

Section 33: Hip Structural Components 33-1 posterior posterior anterior anterior head of

Dynamic Adaptation Dynamic Adaptation Dynamic Adaptation Dynamic Adaptation Minema Minema

Coastal Adaptation Kellie Fisher FCERM Senior Advisor Why Adaptation? Adaptation to a

5b Kinesiology: AOIs - Posterior Upper Body 5b Kinesiology: AOIs - Posterior Upper Body

5b Kinesiology AOIs - Posterior Upper Body 5b Kinesiology AOIs - Posterior Upper Body Class

7b Swedish: Technique Demo and Practice - Posterior Lower Body 7b Swedish: Technique Demo and

6b Swedish: Technique Review and Practice - Posterior Upper Body 6b Swedish: Technique Review

Posterior Lower Body 54b Deep Tissue: Technique Demo and Practice - Posterior Lower Body

4b Swedish: Technique Demo and Practice - Posterior Upper Body 4b Swedish: Technique Demo and

5b Kinesiology: AOIs - Posterior Upper Body 5b Kinesiology: AOIs - Posterior Upper Body

8b Kinesiology: AOIs - Posterior Lower Body 8b Kinesiology: AOIs - Posterior Lower Body Class

FEASIBLE JOINT POSTERIOR BELIEFS BAYESIAN COMMUNICATION N Receivers: POSTERIOR s 1 S 1 p

1. Normal distribution 2. Geometric distribution 3. Binomial distribution 4.

Adaptation Philipp Koehn 27 October 2020 Philipp Koehn Machine Translation: Adaptation 27

Adaptation Techniques for Acoustic Adaptation Techniques for Acoustic Adaptation Techniques for

User-Adaptive and Other Smart Adaptive Systems: Possible Synergies Anthony Jameson DFKI, German

Adaptation of the AAPOR Final Disposition Codes for the German Survey Context GESIS Survey

Climate Adaptation Planning Climate Resilience Webinar Series U.S. Department of Housing and

for Domain Adaptation in Chest X-ray Classification Matthias Lenga, Heinrich Schulz, Axel

22 Advanced Topics 4: Adaptation Methods In this section, we will cover methods for adapting

Migrating to Scala 2.13 Ju Julien Richar ard-Fo Foy , Scala Center St Stefan Zeiger , Lightbend

Good Practice barry.smith@iied.org Monitoring, evaluation and learning for adaptation and SDGs

Sept 12 Class Jameson and Horvitz papers 1 Overview Functions and Forms of Adaptive IUIs

Sambuz

Useful Links

Newsletter

Mail Us