on adaptation for the posterior distribution under local
play

On adaptation for the posterior distribution under local and - PowerPoint PPT Presentation

On adaptation for the posterior distribution under local and sup-norm Judith Rousseau, Marc Hoffman and Johannes Schmidt - Hieber ENSAE - CREST et CEREMADE, Universit Paris-Dauphine Brown 1/ 19 Outline Bayesian nonparametric : posterior


  1. On adaptation for the posterior distribution under local and sup-norm Judith Rousseau, Marc Hoffman and Johannes Schmidt - Hieber ENSAE - CREST et CEREMADE, Université Paris-Dauphine Brown 1/ 19

  2. Outline Bayesian nonparametric : posterior concentration 1 Generalities Adaptation Idea of the proof Why adaptation easy : white noise model 2 What about f ( x 0 ) ? or � f − f 0 � ∞ ? 3 A series of negative result 4 2/ 19

  3. Outline Bayesian nonparametric : posterior concentration 1 Generalities Adaptation Idea of the proof Why adaptation easy : white noise model 2 What about f ( x 0 ) ? or � f − f 0 � ∞ ? 3 A series of negative result 4 3/ 19

  4. Generalities ◮ Model : Y n 1 | θ ∼ p n θ (density wrt µ ), θ ∈ Θ A priori : θ ∼ Π : prior distribution − → posterior distribution d Π( θ | X n ) = d Π( θ ) p n θ ( Y n 1 ) Y n , 1 = ( Y 1 , . . . , Y n ) m ( Y n 1 ) ◮ Posterior concentration d ( ., . ) = loss on Θ & θ 0 ∈ Θ = True E θ 0 (Π [ U ǫ n | Y n U ǫ n = { θ ; d ( θ, θ 0 ) ≤ ǫ n } ǫ n ↓ 0 1 ]) = 1 + o ( 1 ) , ◮ Minimax concentration rates on a Class Θ α ( L ) , � � �� U c M ǫ n ( α ) | Y n sup E θ 0 Π = o ( 1 ) , 1 θ 0 ∈ Θ α ( L ) where ǫ n ( α ) = minimax rate under d ( ., . ) & over Θ α ( L ) . 4/ 19

  5. Examples of Models-losses for which nice results exist ◮ Density estimation Y i ∼ p θ i.i.d. ( √ p θ −√ p θ ′ ) 2 ( x ) dx , � � d ( p θ , p θ ′ ) 2 = d ( p θ , p θ ′ ) = | p θ − p θ ′ | ( x ) dx ◮ Regression function ǫ i ∼ N ( 0 , σ 2 ) , Y i = f ( x i ) + ǫ i , θ = ( f , σ ) n � d ( p θ , p θ ′ ) = � f − f ′ � 2 , d ( p θ , p θ ′ ) = n − 1 H 2 ( p θ ( y | X i ) , p θ ′ ( y | X i )) i = 1 H = Hellinger ◮ White noise dY ( t ) = f ( t ) dt + n − 1 / 2 dW ( t ) ⇔ Y i = θ i + n − 1 / 2 ǫ i , i ∈ N d ( p θ , p θ ′ ) = � f − f ′ � 2 5/ 19

  6. Examples : functional classes Θ α ( L ) = Hölder ( H ( α, L ) ) ǫ n ( α ) = n − α/ ( 2 α + 1 ) minimax rate over H ( α, L ) ◮ Density example : Hellinger loss Prior = DPM � f ( x ) = f P ,σ ( x ) = φ σ ( x − µ ) dP ( µ ) , σ ∼ I Γ( a , b ) P ∼ DP ( A , G 0 ) � � �� U c M ( n / log n ) − α/ ( 2 α + 1 ) ( f 0 ) | Y n Π = o ( 1 ) , sup E f 0 1 f 0 ∈ Θ α ( L ) U ǫ ( f 0 ) = { f , h ( f 0 , f ) ≤ ǫ } [ log n term necessary ? ] � f , f 0 ) 2 � h (ˆ ˆ � ( n / log n ) − α/ ( 2 α + 1 ) , f ( x ) = E π [ f ( x ) | Y n ] ⇒ E f 0 6/ 19

  7. Outline Bayesian nonparametric : posterior concentration 1 Generalities Adaptation Idea of the proof Why adaptation easy : white noise model 2 What about f ( x 0 ) ? or � f − f 0 � ∞ ? 3 A series of negative result 4 7/ 19

  8. Adaptation For such d ( ., . ) Adaptation is easy : The prior does not depend on α : � � �� U c M ( n / log n ) − α/ ( 2 α + 1 ) | Y n Π = o ( 1 ) , sup sup E θ 0 1 α 1 ≤ α ≤ α 2 θ 0 ∈ Θ α ( L ) ◮ why ? 8/ 19

  9. Outline Bayesian nonparametric : posterior concentration 1 Generalities Adaptation Idea of the proof Why adaptation easy : white noise model 2 What about f ( x 0 ) ? or � f − f 0 � ∞ ? 3 A series of negative result 4 9/ 19

  10. Outline U n = U M ( n / log n ) − α/ ( 2 α + 1 ) and l n ( θ ) = log p n θ ( Y n 1 ) ǫ n = ( n / log n ) − α/ ( 2 α + 1 ) ¯ n e l n ( θ ) − l n ( θ 0 ) d Π( θ ) � Θ e l n ( θ ) − l n ( θ 0 ) d Π( θ ) := N n U c Π [ U c n | Y n 1 ] = � D n φ n = φ n ( Y n 1 ) ∈ [ 0 , 1 ] � � � ǫ n 2 � 1 ] > e − τ n ǫ 2 Π [ U c n | Y n ≤ E n D n < e − cn ¯ P θ 0 θ 0 [ φ n ] + P θ 0 n � + e ( c + τ ) n ǫ 2 E θ [ 1 − φ n ] d π ( θ ) n U c n 10/ 19

  11. Constraints ǫ n 2 ) → d ( ., . ) E n E θ [ 1 − φ n ] = o ( e − cn ¯ θ 0 [ φ n ] = o ( 1 ) & sup d ( θ,θ 0 ) > M ¯ ǫ n � ǫ n 2 � D n < e − cn ¯ P θ 0 = o ( 1 ) We need : � e l n ( θ ) − l n ( θ 0 ) d Π( θ ) D n ≥ S n � � ≥ e − 2 n ǫ 2 ǫ n 2 } S n ∩ { l n ( θ ) − l n ( θ 0 ) > − 2 n ¯ n Π Ok if S n = { KL ( p n θ 0 , p n ǫ n 2 ; V ( p n θ 0 , p n ǫ n 2 } and θ ) ≤ n ¯ θ ) ≤ n ¯ ǫ n 2 → links d ( ., . ) Π( S n ) ≥ e − cn ¯ with KL ( ., . ) 11/ 19

  12. example : white noise model + L 2 loss Y ik = θ ik + n − 1 / 2 ǫ ik i ∈ N , k ≤ 2 i − 1 ǫ ik ∼ N ( 0 , 1 ) , ( dY ( t ) = f ( t ) dt + n − 1 / 2 dW ( t )) ◮ Hölder class ( α ) θ 0 ∈ { θ ; | θ ik | ≤ Li − α − 1 / 2 , ∀ i , k } ◮ Prior : spike and slab θ ik ∼ ( 1 − p n ) δ ( 0 ) + p n g , e . g . g = N ( 0 , v ) , p n = 1 / n ◮ Concentration S n ≈ {� θ − θ 0 � 2 ≤ ( n / log n ) − 2 α/ ( 2 α + 1 ) } → ∀ j ≥ J n ,α , k ≤ 2 j ; θ j , k = 0 2 J n ,α = ( n / log n ) 1 / ( 2 α + 1 ) := R n , Π( S n ) � e − CR n log n := e − Cn ǫ 2 n E θ [ 1 − φ n ] ≤ e − cn ǫ 2 E θ 0 [ φ n ] = o ( 1 ) , & sup n θ ∈ Θ n ; � θ − θ 0 � � ǫ n 12/ 19

  13. What about f ( x 0 ) ? or � f − f 0 � ∞ ? Y ik = θ ik + n − 1 / 2 ǫ ik θ 0 ∈ { θ ; | θ ik | ≤ Li − α − 1 / 2 , ǫ ik ∼ N ( 0 , 1 ) , ∀ i , k } ◮ Prior : spike and slab θ ik = ( 1 − p n ) δ ( 0 ) + p n g , p n = 1 / n ◮ losses : � ( θ ik − θ o ik ) ψ ik ( x 0 ) 2 i / 2 ) 2 l ( θ, θ 0 ) = ( (local) ik � | θ ik − θ o ik | 2 i / 2 l ( θ, θ 0 ) = � θ − θ 0 � ∞ = max (sup) k i ◮ Bayesian concentration ∀ α > 0, ∃ θ 0 ∈ Θ α ( L ) s.t. � � l ( θ, θ 0 ) ≤ n − ( α − 1 / 2 ) / ( 2 α + 1 ) log n q | Y n �� E θ 0 Π = o ( 1 ) 1 i 0 = ρ n 2 − i / 2 and θ o Sub-optimal θ o ik = 0, i ≤ I n : ∀ J > 0 ik ) 2 ≤ n − 2 α/ ( 2 α + 1 ) , ik | > n − ( α − 1 / 2 ) / ( 2 α + 1 ) log n q � � � ( θ o | θ o max k i > J k i > J 13/ 19

  14. Risk ? Y ik = θ ik + n − 1 / 2 ǫ ik θ 0 ∈ { θ ; | θ ik | ≤ Li − α − 1 / 2 , ǫ ik ∼ N ( 0 , 1 ) , ∀ i , k } • Prior : θ ik = ( 1 − p n ) δ ( 0 ) + p n g , p n = 1 / n • Suboptimal concentration BUT ˆ θ = E π [ θ | Y n ] ( n / log n ) 2 α/ ( 2 α + 1 ) sup � � E n l (ˆ lim sup sup θ, θ 0 ) < + ∞ θ 0 n α 1 ≤ α ≤ α 2 θ 0 ∈ Θ α Questions 14/ 19

  15. Risk ? Y ik = θ ik + n − 1 / 2 ǫ ik θ 0 ∈ { θ ; | θ ik | ≤ Li − α − 1 / 2 , ǫ ik ∼ N ( 0 , 1 ) , ∀ i , k } • Prior : θ ik = ( 1 − p n ) δ ( 0 ) + p n g , p n = 1 / n • Suboptimal concentration BUT ˆ θ = E π [ θ | Y n ] ( n / log n ) 2 α/ ( 2 α + 1 ) sup � � E n l (ˆ lim sup sup θ, θ 0 ) < + ∞ θ 0 n α 1 ≤ α ≤ α 2 θ 0 ∈ Θ α Questions ◮ Question 1 How general is this (negative) result ? 14/ 19

  16. Risk ? Y ik = θ ik + n − 1 / 2 ǫ ik θ 0 ∈ { θ ; | θ ik | ≤ Li − α − 1 / 2 , ǫ ik ∼ N ( 0 , 1 ) , ∀ i , k } • Prior : θ ik = ( 1 − p n ) δ ( 0 ) + p n g , p n = 1 / n • Suboptimal concentration BUT ˆ θ = E π [ θ | Y n ] ( n / log n ) 2 α/ ( 2 α + 1 ) sup � � E n l (ˆ lim sup sup θ, θ 0 ) < + ∞ θ 0 n α 1 ≤ α ≤ α 2 θ 0 ∈ Θ α Questions ◮ Question 1 How general is this (negative) result ? ◮ Question 2 What does it tell us about posterior concentration ? 14/ 19

  17. A first general result H ( α 1 , L ) ∪ H ( α 2 , L ) ⊂ Θ , α 1 < α 2 ◮ Local loss l ( θ, θ 0 ) = ( θ ( x ) − θ 0 ( x )) 2 Result : There exist no prior that leads to adaptive minimax concentration over any collection of Hölder balls : ∀ π prior on Θ , ∀ M > 0 � � l ( θ, θ 0 ) > Mn − 2 α j / ( 2 α j + 1 ) | Y n �� max sup E θ 0 Π = 1 j θ 0 ∈H ( α j , L ) • What do we loose ? ◮ L ∞ and local loss If ∃ θ 0 ∈ Θ , � � l ( θ, θ 0 ) > Mn − 2 α 2 / ( 2 α 2 + 1 ) | Y n � > e − n τ � P θ 0 Π = o ( 1 ) , τ > 0 Then worse � � l ( θ, θ 0 ) > n − ( 2 α j − τ ) / ( 2 α j + 1 ) | Y n �� max sup E θ 0 Π = 1 j θ 0 ∈H ( α j , L ) 15/ 19

  18. Still not completely satisfying • For local loss : If we could find a prior with only log n loss then who cares ! • L ∞ loss : Smaller than e − n τ to be expected because of test can we be more precise ? Slightly 16/ 19

  19. Another negative result H ( α 1 , L ) ∪ H ( α 2 , L ) ⊂ Θ , α 1 < α 2 ǫ n ( α ) = ( n / log n ) − α/ ( 2 α + 1 ) If there exists θ 0 ∈ H ( α 2 , L ) and 2 J n ,α 2 = ( n / log n ) 1 / ( 2 α 2 + 1 ) . π ( � θ − θ 0 � 2 ≤ c ǫ n ( α 2 )) � e − n ǫ 2 n ( α 2 ) Then there ∃ θ 1 ∈ H ( α 1 , L ) E θ 1 (Π [ l ( θ, θ 0 ) >> ǫ n ( α 1 ) | Y n ]) = 1 17/ 19

  20. Another negative result H ( α 1 , L ) ∪ H ( α 2 , L ) ⊂ Θ , α 1 < α 2 ǫ n ( α ) = ( n / log n ) − α/ ( 2 α + 1 ) If there exists θ 0 ∈ H ( α 2 , L ) and 2 J n ,α 2 = ( n / log n ) 1 / ( 2 α 2 + 1 ) . π ( � θ − θ 0 � 2 ≤ c ǫ n ( α 2 )) � e − n ǫ 2 n ( α 2 )    ≤ e − Bn ǫ 2  � � θ 2 jk > A ǫ n ( α 2 ) 2 n ( α 2 ) π j ≥ J n ,α 2 k Then there ∃ θ 1 ∈ H ( α 1 , L ) E θ 1 (Π [ l ( θ, θ 0 ) >> ǫ n ( α 1 ) | Y n ]) = 1 17/ 19

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend