linking losses for density ratio and class probability
play

Linking losses for density ratio and class-probability estimation - PowerPoint PPT Presentation

Linking losses for density ratio and class-probability estimation Aditya Krishna Menon Cheng Soon Ong NICTA and The Australian National University 0 / 34 Linking losses for density ratio and class-probability estimation Aditya Krishna Menon


  1. Linking losses for density ratio and class-probability estimation Aditya Krishna Menon Cheng Soon Ong NICTA and The Australian National University 0 / 34

  2. Linking losses for density ratio and class-probability estimation Aditya Krishna Menon Cheng Soon Ong Data61 and The Australian National University 0 / 34

  3. Class-probability estimation (CPE) From labelled instances +" +" +" #" +" #" #" #" 1 / 34

  4. Class-probability estimation (CPE) From labelled instances, estimate probability of instance being +’ve e.g. using logistic regression +" +" +" #" +" #" #" #" 0.6" 1 / 34

  5. Density ratio estimation (DRE) Given samples from densities p , q 2 p q 1.5 1 0.5 0 −5 0 5 2 / 34

  6. Density ratio estimation (DRE) Given samples from densities p , q , estimate density ratio r = p / q 2 p q r 1.5 1 0.5 0 −5 0 5 2 / 34

  7. Application: covariate shift adaptation Marginal training distribution +" +" +" #" +" #" #" #" 3 / 34

  8. Application: covariate shift adaptation Marginal training distribution � = marginal test distribution +" +" +" +" +" +" #" #" +" +" #" #" #" #" #" #" 3 / 34

  9. Application: covariate shift adaptation Marginal training distribution � = marginal test distribution +" +" +" +" +" +" #" #" +" +" #" #" #" #" #" #" Can overcome by reweighting training instances use ratio between test and test densities train e.g. weighted class-probability estimator 3 / 34

  10. This paper Formal link between CPE and DRE Proper losses Logistic Bregman Exponential CPE DRE Square Hinge KLIEP Ranking LSIF 4 / 34

  11. This paper Formal link between CPE and DRE existing DRE approaches → implicitly performing CPE Proper losses Logistic Bregman Exponential CPE DRE Square Hinge KLIEP Ranking LSIF 4 / 34

  12. This paper Formal link between CPE and DRE existing DRE approaches → implicitly performing CPE CPE → Bregman minimisation for DRE Proper losses Logistic Bregman Exponential CPE DRE Square Hinge KLIEP Ranking LSIF 4 / 34

  13. This paper Formal link between CPE and DRE existing DRE approaches → implicitly performing CPE CPE → Bregman minimisation for DRE new application of DRE losses to “top ranking” Proper losses Logistic Bregman Exponential CPE DRE Square Hinge KLIEP Ranking LSIF 4 / 34

  14. DRE and CPE: formally 5 / 34

  15. Distributions for learning with binary labels Fix an instance space X (e.g. R n ) Let D be a distribution over X ×{± 1 } , with P ( Y = 1 ) = 1 2 and ( P ( x ) , Q ( x )) = ( P ( X = x | Y = 1 ) , P ( X = x | Y = − 1 )) Class conditionals 1 0 . 8 0 . 6 0 . 4 0 . 2 − 3 − 2 − 1 1 2 3 6 / 34

  16. Distributions for learning with binary labels Fix an instance space X (e.g. R n ) Let D be a distribution over X ×{± 1 } , with P ( Y = 1 ) = 1 2 and ( P ( x ) , Q ( x )) = ( P ( X = x | Y = 1 ) , P ( X = x | Y = − 1 )) ( M ( x ) , η ( x )) = ( P ( X = x ) , P ( Y = 1 | X = x )) Class conditionals Marginal and class-probability function 1 0 . 8 0 . 8 0 . 6 0 . 6 0 . 4 0 . 4 0 . 2 0 . 2 − 3 − 2 − 1 1 2 3 − 3 − 2 − 1 1 2 3 6 / 34

  17. Scorers, losses, risks +" +" A scorer is any s : X → R +" #" +" #" e.g. linear scorer s : x �→ � w , x � #" #" 7 / 34

  18. Scorers, losses, risks +" +" A scorer is any s : X → R +" #" +" #" e.g. linear scorer s : x �→ � w , x � #" #" 6 A loss is any ℓ : {± 1 }× R → R + 5 4 ℓ ( y, v ) 3 2 e.g. logistic loss ℓ : ( y , v ) �→ log ( 1 + e − yv ) 1 0 −3 −2 −1 0 1 2 3 v 7 / 34

  19. Scorers, losses, risks +" +" A scorer is any s : X → R +" #" +" #" e.g. linear scorer s : x �→ � w , x � #" #" 6 A loss is any ℓ : {± 1 }× R → R + 5 4 ℓ ( y, v ) 3 2 e.g. logistic loss ℓ : ( y , v ) �→ log ( 1 + e − yv ) 1 0 −3 −2 −1 0 1 2 3 v 1" The risk of scorer s wrt loss ℓ and distribution D is 5" 10" L ( s ; D ,ℓ ) = E ( X , Y ) ∼ D [ ℓ ( Y , s ( X ))] 10" 20" 4" 30" 6" average loss on a random sample 7 / 34

  20. CPE versus DRE Given samples S ∼ D N , with D = ( P , Q ) = ( M , η ) : 8 / 34

  21. CPE versus DRE Given samples S ∼ D N , with D = ( P , Q ) = ( M , η ) : Class-probability estimation (CPE) Estimate η class-probability function +" +" +" #" +" #" #" #" 0.6" 8 / 34

  22. CPE versus DRE Given samples S ∼ D N , with D = ( P , Q ) = ( M , η ) : Class-probability estimation (CPE) Density ratio estimation (DRE) Estimate η Estimate r = p / q class-probability function class-conditional density ratio +" 2 p +" q +" r 1.5 #" +" #" 1 #" 0.5 #" 0.6" 0 −5 0 5 8 / 34

  23. CPE approaches: proper composite losses For suitable S ⊆ R X , find L ( s ; D ,ℓ ) argmin s ∈ S where ℓ is such that, for some invertible Ψ : [ 0 , 1 ] → R , s ∈ R X L ( s ; D ,ℓ ) = Ψ ◦ η argmin η = Ψ − 1 ◦ s estimate ˆ 9 / 34

  24. CPE approaches: proper composite losses For suitable S ⊆ R X , find L ( s ; D ,ℓ ) argmin s ∈ S where ℓ is such that, for some invertible Ψ : [ 0 , 1 ] → R , s ∈ R X L ( s ; D ,ℓ ) = Ψ ◦ η argmin η = Ψ − 1 ◦ s estimate ˆ Such an ℓ is called strictly proper composite with link Ψ 9 / 34

  25. Examples of proper composite losses 6 6 5 5 4 4 ℓ ( y, v ) ℓ ( y, v ) 3 3 2 2 1 1 0 0 −3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3 v v Logistic loss Exponential loss Ψ − 1 : v �→ σ ( v ) Ψ − 1 : v �→ σ ( 2 v ) 6 5 4 ℓ ( y, v ) 3 2 1 0 −3 −2 −1 0 1 2 3 v Square hinge loss Ψ − 1 : v �→ min ( max ( 0 , ( v + 1 ) / 2 ) , 1 ) 10 / 34

  26. DRE approaches: divergence minimisation For suitable S ⊆ R X , find KLIEP : (Sugiyama et al., 2008) KL ( p � q ⊙ s ) argmin s ∈ S constrained KL minimisation LSIF : (Kanamori et al., 2009) � ( r ( X ) − s ( X )) 2 � argmin E X ∼ Q s ∈ S direct least squares minimisation 11 / 34

  27. Story so far Logistic Exponential CPE DRE Square Hinge KLIEP LSIF 12 / 34

  28. Roadmap We begin by showing existing DRE losses implicitly perform CPE Proper losses Logistic Exponential CPE DRE Square Hinge KLIEP LSIF 12 / 34

  29. Existing DRE losses are proper composite 13 / 34

  30. Existing DRE approaches Suppose D = ( P , Q ) KLIEP : (Sugiyama et al., 2008) argmin KL ( p � q ⊙ s ) s ∈ S LSIF : (Kanamori et al., 2009) � ( r ( X ) − s ( X )) 2 � argmin E X ∼ Q s ∈ S 14 / 34

  31. Existing DRE approaches as loss minimisation Suppose D = ( P , Q ) KLIEP : (Sugiyama et al., 2008) argmin E ( X , Y ) ∼ D [ ℓ ( Y , s ( X ))] s ∈ S ℓ ( − 1 , v ) = a · v and ℓ ( 1 , v ) = − log v for suitable a > 0 LSIF : (Kanamori et al., 2009) E ( X , Y ) ∼ D [ ℓ ( Y , s ( X ))] argmin s ∈ S ℓ ( − 1 , v ) = 1 2 · v 2 and ℓ ( 1 , v ) = − v 14 / 34

  32. Existing DRE approaches as loss minimisation Suppose D = ( P , Q ) KLIEP : (Sugiyama et al., 2008) argmin E ( X , Y ) ∼ D [ ℓ ( Y , s ( X ))] s ∈ S ℓ ( − 1 , v ) = a · v and ℓ ( 1 , v ) = − log v for suitable a > 0 LSIF : (Kanamori et al., 2009) E ( X , Y ) ∼ D [ ℓ ( Y , s ( X ))] argmin s ∈ S ℓ ( − 1 , v ) = 1 2 · v 2 and ℓ ( 1 , v ) = − v These are no ordinary losses 14 / 34

  33. Existing DRE approaches as CPE For u ∈ [ 0 , 1 ] , let u Ψ dr : u �→ 1 − u . Lemma The LSIF loss is strictly proper composite with link Ψ dr . The KLIEP loss with a > 0 is strictly proper composite with link a − 1 · Ψ dr . 15 / 34

  34. Existing DRE approaches as CPE For u ∈ [ 0 , 1 ] , let u Ψ dr : u �→ 1 − u . Lemma The LSIF loss is strictly proper composite with link Ψ dr . The KLIEP loss with a > 0 is strictly proper composite with link a − 1 · Ψ dr . KLIEP and LSIF perform CPE in disguise! 15 / 34

  35. Proof For LSIF and KLIEP (with a = 1 ), ℓ ′ ( 1 , v ) ℓ ′ ( − 1 , v ) = − 1 v , so that 16 / 34

  36. Proof For LSIF and KLIEP (with a = 1 ), ℓ ′ ( 1 , v ) ℓ ′ ( − 1 , v ) = − 1 v , so that 1 f ( v ) = 1 − ℓ ′ ( 1 , v ) ℓ ′ ( − 1 , v ) v = 1 + v 16 / 34

  37. Proof For LSIF and KLIEP (with a = 1 ), ℓ ′ ( 1 , v ) ℓ ′ ( − 1 , v ) = − 1 v , so that 1 f ( v ) = 1 − ℓ ′ ( 1 , v ) ℓ ′ ( − 1 , v ) v = 1 + v = Ψ − 1 dr ( v ) . 16 / 34

  38. Proof For LSIF and KLIEP (with a = 1 ), ℓ ′ ( 1 , v ) ℓ ′ ( − 1 , v ) = − 1 v , so that 1 f ( v ) = 1 − ℓ ′ ( 1 , v ) ℓ ′ ( − 1 , v ) v = 1 + v = Ψ − 1 dr ( v ) . Proper compositeness follows from (Reid and Williamson, 2010). The link Ψ dr is especially suitable for DRE... 16 / 34

  39. Another view of Ψ dr Bayes’ rule shows targets of DRE and CPE are linked: ( ∀ x ∈ X ) r ( x ) · = p ( x ) q ( x ) 17 / 34

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend