Learning from Corrupted Binary Labels via Class-Probability - PowerPoint PPT Presentation

Learning from Corrupted Binary Labels via Class-Probability Estimation Aditya Krishna Menon Brendan van Rooyen Cheng Soon Ong Robert C. Williamson xxx National ICT Australia and The Australian National University 1 / 57

Learning from binary labels +" +" +" #" +" #" #" #" 2 / 57

Learning from binary labels +" ?" +" +" #" +" #" #" #" 3 / 57

Learning from binary labels +" +" +" #" +" #" #" #" 4 / 57

Learning from noisy labels #" #" +" +" +" #" +" #" 5 / 57

Learning from positive and unlabelled data ?" ?" +" ?" +" ?" ?" ?" 6 / 57

Learning from binary labels +" +" +" #" +" #" #" #" S ⇠ D n nature learner Goal : good classification wrt distribution D 7 / 57

Learning from corrupted labels +" +" +" #" +" #" #" #" S ⇠ D n S ⇠ D n corruptor nature learner Goal : good classification wrt (unobserved) distribution D 8 / 57

Paper summary Can we learn a good classifier from corrupted samples? 9 / 57

Paper summary Can we learn a good classifier from corrupted samples? Prior work: in special cases (with a rich enough model), yes! 10 / 57

Paper summary Can we learn a good classifier from corrupted samples? Prior work: in special cases (with a rich enough model), yes! can treat samples as if uncorrupted! (Elkan and Noto, 2008), (Zhang and Lee, 2008), (Natarajan et al., 2013), (duPlessis and Sugiyama, 2014) ... 11 / 57

Paper summary Can we learn a good classifier from corrupted samples? Prior work: in special cases (with a rich enough model), yes! can treat samples as if uncorrupted! (Elkan and Noto, 2008), (Zhang and Lee, 2008), (Natarajan et al., 2013), (duPlessis and Sugiyama, 2014) ... This work: unified treatment via class-probability estimation analysis for general class of corruptions 12 / 57

Assumed corruption model 13 / 57

Learning from binary labels: distributions Fix instance space X (e.g. R N ) Underlying distribution D over X ⇥ {± 1 } Constituent components of D : ( P ( x ) , Q ( x ) , π ) = ( P [ X = x | Y = 1 ] , P [ X = x | Y = � 1 ] , P [ Y = 1 ]) 14 / 57

Learning from binary labels: distributions Fix instance space X (e.g. R N ) Underlying distribution D over X ⇥ {± 1 } Constituent components of D : ( P ( x ) , Q ( x ) , π ) = ( P [ X = x | Y = 1 ] , P [ X = x | Y = � 1 ] , P [ Y = 1 ]) ( M ( x ) , η ( x )) = ( P [ X = x ] , P [ Y = 1 | X = x ]) 15 / 57

Learning from corrupted binary labels S ⇠ D n S ⇠ D n corruptor nature learner Samples from corrupted distribution D = ( P , Q , π ) Goal : good classification wrt (unobserved) distribution D 16 / 57

Learning from corrupted binary labels S ⇠ D n S ⇠ D n corruptor nature learner Samples from corrupted distribution D = ( P , Q , π ) , where P = ( 1 � α ) · P + α · Q Q = β · P +( 1 � β ) · Q and π is arbitrary α , β are noise rates mutually contaminated distributions (Scott et al., 2013) Goal : good classification wrt (unobserved) distribution D 17 / 57

Special cases Label noise PU learning Labels flipped w.p. ρ Observe M instead of Q π = ( 1 � 2 ρ ) · π + ρ π = arbitrary α = π � 1 · ( 1 � π ) · ρ P = 1 · P + 0 · Q Q = M β = ( 1 � π ) � 1 · π · ρ = π · P +( 1 � π ) · Q #" ?" #" ?" +" +" +" ?" +" +" #" ?" +" ?" ?" #" 18 / 57

Corrupted class-probabilities Structure of corrupted class-probabilities underpins analysis 19 / 57

Corrupted class-probabilities Structure of corrupted class-probabilities underpins analysis Proposition For any D , D , η ( x ) = φ α , β , π ( η ( x )) where φ α , β , π is strictly monotone for fixed α , β , π . 20 / 57

Corrupted class-probabilities Structure of corrupted class-probabilities underpins analysis Proposition For any D , D , η ( x ) = φ α , β , π ( η ( x )) where φ α , β , π is strictly monotone for fixed α , β , π . Follows from Bayes’ rule: η ( x ) 1 � π · P ( x ) π 1 � η ( x ) = Q ( x ) 21 / 57

Corrupted class-probabilities Structure of corrupted class-probabilities underpins analysis Proposition For any D , D , η ( x ) = φ α , β , π ( η ( x )) where φ α , β , π is strictly monotone for fixed α , β , π . Follows from Bayes’ rule: ( 1 � α ) · P ( x ) Q ( x ) + α η ( x ) 1 � π · P ( x ) π π 1 � η ( x ) = Q ( x ) = 1 � π · . β · P ( x ) Q ( x ) +( 1 � β ) 22 / 57

Corrupted class-probabilities: special cases Label noise PU learning π · η ( x ) η ( x ) = ( 1 � 2 ρ ) · η ( x )+ ρ η ( x ) = π · η ( x )+( 1 � π ) · π ρ unknown π unknown (Natarajan et al., 2013) (Ward et al., 2009) 23 / 57

Roadmap ˆ η D D class-prob corruptor nature classifier estimator Kernel logistic regression 24 / 57

Roadmap Exploit monotone relationship between η and η ˆ η D D class-prob ? corruptor nature classifier estimator Kernel logistic regression 25 / 57

Classification with noise rates 26 / 57

Class-probabilities and classification Many classification measures optimised by sign ( η ( x ) � t ) 0-1 error ! t = 1 2 Balanced error ! t = π F-score ! optimal t depends on D I (Lipton et al., 2014, Koyejo et al., 2014) 27 / 57

Class-probabilities and classification Many classification measures optimised by sign ( η ( x ) � t ) 0-1 error ! t = 1 2 Balanced error ! t = π F-score ! optimal t depends on D I (Lipton et al., 2014, Koyejo et al., 2014) We can relate this to thresholding of η ! 28 / 57

Corrupted class-probabilities and classification By monotone relationship, η ( x ) > t ( ) η ( x ) > φ α , β , π ( t ) . Threshold η at φ α , β , π ( t ) ! optimal classification on D Can translate into regret bound e.g. for 0-1 loss 29 / 57

Story so far Classification scheme requires: η t α , β , π noise oracle α , ˆ ˆ β , ˆ π ˆ η class-prob D D sign ( ˆ corruptor η ( x ) � φ ˆ π ( t )) nature classifier α , ˆ β , ˆ estimator 30 / 57

Story so far Classification scheme requires: η ! class-probability estimation t α , β , π noise oracle α , ˆ ˆ β , ˆ π ˆ η class-prob D D sign ( ˆ corruptor η ( x ) � φ ˆ π ( t )) nature classifier α , ˆ β , ˆ estimator Kernel logistic regression 31 / 57

Story so far Classification scheme requires: η ! class-probability estimation t ! if unknown, alternate approach (see poster) α , β , π noise oracle α , ˆ ˆ β , ˆ π ˆ η class-prob D D sign ( ˆ corruptor η ( x ) � φ ˆ π ( t )) nature classifier α , ˆ β , ˆ estimator Kernel logistic regression 32 / 57

Story so far Classification scheme requires: η ! class-probability estimation t ! if unknown, alternate approach (see poster) α , β , π ! can we estimate these? noise ? estimator α , ˆ ˆ β , ˆ π ˆ η D D class-prob sign ( ˆ η ( x ) � φ ˆ π ( t )) nature corruptor classifier α , ˆ β , ˆ estimator Kernel logistic regression 33 / 57

Estimating noise rates: some bad news π strongly non-identifiable! π allowed to be arbitrary (e.g. PU learning) α , β non-identifiable without assumptions (Scott et al., 2013) Can we estimate α , β under assumptions? 34 / 57

Weak separability assumption Assume that D is “weakly separable”: x 2 X η ( x ) = 0 min x 2 X η ( x ) = 1 max i.e. 9 deterministically +’ve and -’ve instances weaker than full separability 35 / 57

Weak separability assumption Assume that D is “weakly separable”: x 2 X η ( x ) = 0 min x 2 X η ( x ) = 1 max i.e. 9 deterministically +’ve and -’ve instances weaker than full separability Assumed range of η constrains observed range of η ! 36 / 57

Estimating noise rates Proposition Pick any weakly separable D . Then, for any D , α = η min · ( η max � π ) π · ( η max � η min ) and β = ( 1 � η max ) · ( π � η min ) ( 1 � π ) · ( η max � η min ) where η min = min x 2 X η ( x ) η max = max x 2 X η ( x ) α , β can be estimated from corrupted data alone 37 / 57

Estimating noise rates: special cases Label noise PU learning ρ = 1 � η max α = 0 = η min β = π π � η min = 1 � η max π π = · η max � η min η max 1 � π (Elkan and Noto, 2008), (Liu and Tao, 2014) c.f. mixture proportion estimate of (Scott et al., 2013) In these cases, π can be estimated as well 38 / 57

Story so far Optimal classification in general requires α , β , π Range of ˆ η noise estimator ˆ η α , ˆ ˆ β , ˆ π ˆ η D D class-prob sign ( ˆ corruptor η ( x ) � φ ˆ π ( t )) nature classifier α , ˆ β , ˆ estimator Kernel logistic regression 39 / 57

Learning from Corrupted Binary Labels via Class-Probability - PowerPoint PPT Presentation

Learning from Corrupted Binary Labels via Class-Probability Estimation Aditya Krishna Menon Brendan van Rooyen Cheng Soon Ong Robert C. Williamson xxx National ICT Australia and The Australian National University 1 / 57 Learning from binary

61A Lecture 21 Announcements Binary Trees Binary Tree Class 4 Binary Tree Class class

Binary Numbers Binary numbers look like this Binary Numbers or Binary Code Binary numbers or

A Quick Review Decimal to binary Binary to decimal Binary to hexadecimal

2016 Vegetable Pesticide Update: Weeds 1) New/Changed labels 2) Labels soon 3) Auxin Technologies

2012 GFVGA: Herbicide Update 2012 Weed Control Update 1. Recent labels 2. New labels 3. Near

Binary Trees, Heaps Binary Trees, Heaps Binary trees Binary trees A binary tree (

On Symmetric Losses for Learning from Corrupted Labels Nontawat Charoenphakdee 1,2 , Jongyeong Lee

Unsupervised Learning Unsupervised Learning Learning without Class Labels (or correct Learning

Corrupted Labels Nontawat Charoenphakdee 1,2 , Jongyeong Lee 1,2 and Masashi Sugiyama 2,1 The

Balanced Search Trees Binary Search Trees Binary Search Tree Binary Search Tree A binary tree is

Binary Numbers 723 Binary Numbers 723 = 7x100 + 2x10 + 3x1 Binary Numbers 723 = 7x100 + 2x10 +

CMSC 206 Binary Search Trees 1 Binary Search Tree n A Binary Search Tree is a Binary Tree in

Binary Search Trees and Balanced Binary Search Trees using AVL Trees Mark Redekopp David Kempe

LECTURE 2 Review 1 Binary Math and Assembly BINARY MATH In this section, we review Binary

Binary trees Binary trees David Morgan Binary trees Binary trees elements have up to 2

.fr BCP Plan overview Major risks and threats Corrupted data Datacenter issues HR issues

The Distortionary Effects of Power Sharing on Political Corruption and Accountability: Evidence

Political Economy - Economics 410/510 February 14, 2014 1/40 Outline Introduction Corruption

Incentives for Corruption Ben Olken MIT February 2011 Olken Incentives for Corruption

WAITING.... In the waiting room How Long, O Lord Psalm 83:1 O God, do not keep silence; do not

The Bribery Act Adequate Procedures Susannah Cogman, Herbert Smith LLP February 2011 1 The

Political Market Failures and Corruption November 2008 () Political Market Failures and

Secure Computation using Leaky Correlations (Asymptotically Optimal Constructions) Alexander R.

Lobbying and Corruption Dr James Tremewan (james.tremewan@univie.ac.at) Gender and Corruption