Linking losses for density ratio and class-probability estimation - PowerPoint PPT Presentation

Linking losses for density ratio and class-probability estimation Aditya Krishna Menon Cheng Soon Ong NICTA and The Australian National University 0 / 34

Linking losses for density ratio and class-probability estimation Aditya Krishna Menon Cheng Soon Ong Data61 and The Australian National University 0 / 34

Class-probability estimation (CPE) From labelled instances +" +" +" #" +" #" #" #" 1 / 34

Class-probability estimation (CPE) From labelled instances, estimate probability of instance being +’ve e.g. using logistic regression +" +" +" #" +" #" #" #" 0.6" 1 / 34

Density ratio estimation (DRE) Given samples from densities p , q 2 p q 1.5 1 0.5 0 −5 0 5 2 / 34

Density ratio estimation (DRE) Given samples from densities p , q , estimate density ratio r = p / q 2 p q r 1.5 1 0.5 0 −5 0 5 2 / 34

Application: covariate shift adaptation Marginal training distribution +" +" +" #" +" #" #" #" 3 / 34

Application: covariate shift adaptation Marginal training distribution � = marginal test distribution +" +" +" +" +" +" #" #" +" +" #" #" #" #" #" #" 3 / 34

Application: covariate shift adaptation Marginal training distribution � = marginal test distribution +" +" +" +" +" +" #" #" +" +" #" #" #" #" #" #" Can overcome by reweighting training instances use ratio between test and test densities train e.g. weighted class-probability estimator 3 / 34

This paper Formal link between CPE and DRE Proper losses Logistic Bregman Exponential CPE DRE Square Hinge KLIEP Ranking LSIF 4 / 34

This paper Formal link between CPE and DRE existing DRE approaches → implicitly performing CPE Proper losses Logistic Bregman Exponential CPE DRE Square Hinge KLIEP Ranking LSIF 4 / 34

This paper Formal link between CPE and DRE existing DRE approaches → implicitly performing CPE CPE → Bregman minimisation for DRE Proper losses Logistic Bregman Exponential CPE DRE Square Hinge KLIEP Ranking LSIF 4 / 34

This paper Formal link between CPE and DRE existing DRE approaches → implicitly performing CPE CPE → Bregman minimisation for DRE new application of DRE losses to “top ranking” Proper losses Logistic Bregman Exponential CPE DRE Square Hinge KLIEP Ranking LSIF 4 / 34

DRE and CPE: formally 5 / 34

Distributions for learning with binary labels Fix an instance space X (e.g. R n ) Let D be a distribution over X ×{± 1 } , with P ( Y = 1 ) = 1 2 and ( P ( x ) , Q ( x )) = ( P ( X = x | Y = 1 ) , P ( X = x | Y = − 1 )) Class conditionals 1 0 . 8 0 . 6 0 . 4 0 . 2 − 3 − 2 − 1 1 2 3 6 / 34

Distributions for learning with binary labels Fix an instance space X (e.g. R n ) Let D be a distribution over X ×{± 1 } , with P ( Y = 1 ) = 1 2 and ( P ( x ) , Q ( x )) = ( P ( X = x | Y = 1 ) , P ( X = x | Y = − 1 )) ( M ( x ) , η ( x )) = ( P ( X = x ) , P ( Y = 1 | X = x )) Class conditionals Marginal and class-probability function 1 0 . 8 0 . 8 0 . 6 0 . 6 0 . 4 0 . 4 0 . 2 0 . 2 − 3 − 2 − 1 1 2 3 − 3 − 2 − 1 1 2 3 6 / 34

Scorers, losses, risks +" +" A scorer is any s : X → R +" #" +" #" e.g. linear scorer s : x �→ � w , x � #" #" 7 / 34

Scorers, losses, risks +" +" A scorer is any s : X → R +" #" +" #" e.g. linear scorer s : x �→ � w , x � #" #" 6 A loss is any ℓ : {± 1 }× R → R + 5 4 ℓ ( y, v ) 3 2 e.g. logistic loss ℓ : ( y , v ) �→ log ( 1 + e − yv ) 1 0 −3 −2 −1 0 1 2 3 v 7 / 34

Scorers, losses, risks +" +" A scorer is any s : X → R +" #" +" #" e.g. linear scorer s : x �→ � w , x � #" #" 6 A loss is any ℓ : {± 1 }× R → R + 5 4 ℓ ( y, v ) 3 2 e.g. logistic loss ℓ : ( y , v ) �→ log ( 1 + e − yv ) 1 0 −3 −2 −1 0 1 2 3 v 1" The risk of scorer s wrt loss ℓ and distribution D is 5" 10" L ( s ; D ,ℓ ) = E ( X , Y ) ∼ D [ ℓ ( Y , s ( X ))] 10" 20" 4" 30" 6" average loss on a random sample 7 / 34

CPE versus DRE Given samples S ∼ D N , with D = ( P , Q ) = ( M , η ) : 8 / 34

CPE versus DRE Given samples S ∼ D N , with D = ( P , Q ) = ( M , η ) : Class-probability estimation (CPE) Estimate η class-probability function +" +" +" #" +" #" #" #" 0.6" 8 / 34

CPE versus DRE Given samples S ∼ D N , with D = ( P , Q ) = ( M , η ) : Class-probability estimation (CPE) Density ratio estimation (DRE) Estimate η Estimate r = p / q class-probability function class-conditional density ratio +" 2 p +" q +" r 1.5 #" +" #" 1 #" 0.5 #" 0.6" 0 −5 0 5 8 / 34

CPE approaches: proper composite losses For suitable S ⊆ R X , find L ( s ; D ,ℓ ) argmin s ∈ S where ℓ is such that, for some invertible Ψ : [ 0 , 1 ] → R , s ∈ R X L ( s ; D ,ℓ ) = Ψ ◦ η argmin η = Ψ − 1 ◦ s estimate ˆ 9 / 34

CPE approaches: proper composite losses For suitable S ⊆ R X , find L ( s ; D ,ℓ ) argmin s ∈ S where ℓ is such that, for some invertible Ψ : [ 0 , 1 ] → R , s ∈ R X L ( s ; D ,ℓ ) = Ψ ◦ η argmin η = Ψ − 1 ◦ s estimate ˆ Such an ℓ is called strictly proper composite with link Ψ 9 / 34

Examples of proper composite losses 6 6 5 5 4 4 ℓ ( y, v ) ℓ ( y, v ) 3 3 2 2 1 1 0 0 −3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3 v v Logistic loss Exponential loss Ψ − 1 : v �→ σ ( v ) Ψ − 1 : v �→ σ ( 2 v ) 6 5 4 ℓ ( y, v ) 3 2 1 0 −3 −2 −1 0 1 2 3 v Square hinge loss Ψ − 1 : v �→ min ( max ( 0 , ( v + 1 ) / 2 ) , 1 ) 10 / 34

DRE approaches: divergence minimisation For suitable S ⊆ R X , find KLIEP : (Sugiyama et al., 2008) KL ( p � q ⊙ s ) argmin s ∈ S constrained KL minimisation LSIF : (Kanamori et al., 2009) � ( r ( X ) − s ( X )) 2 � argmin E X ∼ Q s ∈ S direct least squares minimisation 11 / 34

Story so far Logistic Exponential CPE DRE Square Hinge KLIEP LSIF 12 / 34

Roadmap We begin by showing existing DRE losses implicitly perform CPE Proper losses Logistic Exponential CPE DRE Square Hinge KLIEP LSIF 12 / 34

Existing DRE losses are proper composite 13 / 34

Existing DRE approaches Suppose D = ( P , Q ) KLIEP : (Sugiyama et al., 2008) argmin KL ( p � q ⊙ s ) s ∈ S LSIF : (Kanamori et al., 2009) � ( r ( X ) − s ( X )) 2 � argmin E X ∼ Q s ∈ S 14 / 34

Existing DRE approaches as loss minimisation Suppose D = ( P , Q ) KLIEP : (Sugiyama et al., 2008) argmin E ( X , Y ) ∼ D [ ℓ ( Y , s ( X ))] s ∈ S ℓ ( − 1 , v ) = a · v and ℓ ( 1 , v ) = − log v for suitable a > 0 LSIF : (Kanamori et al., 2009) E ( X , Y ) ∼ D [ ℓ ( Y , s ( X ))] argmin s ∈ S ℓ ( − 1 , v ) = 1 2 · v 2 and ℓ ( 1 , v ) = − v 14 / 34

Existing DRE approaches as loss minimisation Suppose D = ( P , Q ) KLIEP : (Sugiyama et al., 2008) argmin E ( X , Y ) ∼ D [ ℓ ( Y , s ( X ))] s ∈ S ℓ ( − 1 , v ) = a · v and ℓ ( 1 , v ) = − log v for suitable a > 0 LSIF : (Kanamori et al., 2009) E ( X , Y ) ∼ D [ ℓ ( Y , s ( X ))] argmin s ∈ S ℓ ( − 1 , v ) = 1 2 · v 2 and ℓ ( 1 , v ) = − v These are no ordinary losses 14 / 34

Existing DRE approaches as CPE For u ∈ [ 0 , 1 ] , let u Ψ dr : u �→ 1 − u . Lemma The LSIF loss is strictly proper composite with link Ψ dr . The KLIEP loss with a > 0 is strictly proper composite with link a − 1 · Ψ dr . 15 / 34

Existing DRE approaches as CPE For u ∈ [ 0 , 1 ] , let u Ψ dr : u �→ 1 − u . Lemma The LSIF loss is strictly proper composite with link Ψ dr . The KLIEP loss with a > 0 is strictly proper composite with link a − 1 · Ψ dr . KLIEP and LSIF perform CPE in disguise! 15 / 34

Proof For LSIF and KLIEP (with a = 1 ), ℓ ′ ( 1 , v ) ℓ ′ ( − 1 , v ) = − 1 v , so that 16 / 34

Proof For LSIF and KLIEP (with a = 1 ), ℓ ′ ( 1 , v ) ℓ ′ ( − 1 , v ) = − 1 v , so that 1 f ( v ) = 1 − ℓ ′ ( 1 , v ) ℓ ′ ( − 1 , v ) v = 1 + v 16 / 34

Proof For LSIF and KLIEP (with a = 1 ), ℓ ′ ( 1 , v ) ℓ ′ ( − 1 , v ) = − 1 v , so that 1 f ( v ) = 1 − ℓ ′ ( 1 , v ) ℓ ′ ( − 1 , v ) v = 1 + v = Ψ − 1 dr ( v ) . 16 / 34

Proof For LSIF and KLIEP (with a = 1 ), ℓ ′ ( 1 , v ) ℓ ′ ( − 1 , v ) = − 1 v , so that 1 f ( v ) = 1 − ℓ ′ ( 1 , v ) ℓ ′ ( − 1 , v ) v = 1 + v = Ψ − 1 dr ( v ) . Proper compositeness follows from (Reid and Williamson, 2010). The link Ψ dr is especially suitable for DRE... 16 / 34

Another view of Ψ dr Bayes’ rule shows targets of DRE and CPE are linked: ( ∀ x ∈ X ) r ( x ) · = p ( x ) q ( x ) 17 / 34

Linking losses for density ratio and class-probability estimation - PowerPoint PPT Presentation

Linking losses for density ratio and class-probability estimation Aditya Krishna Menon Cheng Soon Ong NICTA and The Australian National University 0 / 34 Linking losses for density ratio and class-probability estimation Aditya Krishna Menon

Contents of Presentation Types of losses Causes of losses Prevention of losses

Linking linking Weak forms Linking Weak forms Elision (sound cut)

Syntax 3 Predicates Predicates and Linking Verbs Linking Verbs Linking Verbs

Probability Basics Martin Emms October 1, 2020 Probability Basics Outline Probability

Density Ratio Estimation Density Ratio Estimation in Machine Learning in Machine Learning

A framework for linking land use and A framework for linking land use and A framework for linking

Food Losses/Waste in Food Value Chains Food Losses/Waste in Food Value Chains Areas

THE GOLDEN RATIO AND THE FIBONACCI NUMBERS Common Measures 1 foot 2 feet 3 feet 3 2 Ratio

Continuing Probability. Wrap up: Total Probability and Conditional Probability. Continuing

Chapter 2 Probability 1. Definition of Probability 2. Probability of disjoint events 3.

Probability Basics Probability Background Martin Emms October 1, 2020 Probability Basics

Chapter 2 Probability 1. Definition of Probability 2. Probability of disjoint events 3.

Probability Density (1) Let f ( x 1 , x 2 . . . x n ) be a probability density for the variables {

Relative Density Chapters 3.5 Relative Density 1 2/5/2015 Minimum Density Pluviate soil from

Counting and Probability Whats to come? Counting and Probability Whats to come?

Design Challenges for Entity Linking Xiao Ling , Sameer Singh, Daniel S. Weld Entity Linking

CS 171: Introduction to Computer Science II Linked List Li Xiong What we have learned so far

MiTextExplorer: Text Exploration using Linked Brushing and Mutual Information on Document

Retarded normal coordinates Work in progress (1/4 completed); in collaboration with Claude

Quality Models, Linked Data and XLIFF: Standardisa(on Efforts for

StarAI 2015 Fifth International Workshop on Statistical Relational AI At the 31st

Tries and String Matching Where We've Been Fundamental Data Structures Red/black trees,

AGN populations in X-ray surveys Contents Advantage of X-ray surveys What they find

Spectroscopies . Kevin E. Smith Department of Physics Department of Chemistry Division of