Cla lass Prior Shif ift and Asymmetric Error Nontawat - - PowerPoint PPT Presentation

cla lass prior shif ift and asymmetric error
SMART_READER_LITE
LIVE PREVIEW

Cla lass Prior Shif ift and Asymmetric Error Nontawat - - PowerPoint PPT Presentation

Positive-Unlabeled Cla lassification under Cla lass Prior Shif ift and Asymmetric Error Nontawat Charoenphakdee 1,2 and Masashi Sugiyama 2,1 The University of Tokyo 1 RIKEN AIP 2 2 Supervised binary ry classification (P (PN classification)


slide-1
SLIDE 1

Positive-Unlabeled Cla lassification under Cla lass Prior Shif ift and Asymmetric Error

The University of Tokyo1 RIKEN AIP2

Nontawat Charoenphakdee1,2 and Masashi Sugiyama2,1

slide-2
SLIDE 2

https://t.pimg.jp/006/570/886/1/6570886.jpg https://www.kullabs.com/uploads/meauring-clip-art-at-clker-com-vector-clip-art-online-royalty-free-H2SJHF-clipart.png https://d3njjcbhbojbot.cloudfront.net/api/utilities/v1/imageproxy/https://coursera.s3.amazonaws.com/topics/ml/large-icon.png\

Labels (output)

No noise robustness

Binary Classifier

2

Machine learning

Data collection

+ -

Supervised binary ry classification (P (PN classification)

Positive and Negative data are given. Features (input)

slide-3
SLIDE 3

https://t.pimg.jp/006/570/886/1/6570886.jpg https://www.kullabs.com/uploads/meauring-clip-art-at-clker-com-vector-clip-art-online-royalty-free-H2SJHF-clipart.png https://d3njjcbhbojbot.cloudfront.net/api/utilities/v1/imageproxy/https://coursera.s3.amazonaws.com/topics/ml/large-icon.png\

Features (input) Labels (output)

No noise robustness

3

Machine learning

Data collection

+

Positive and Unlabeled data are given.

Positive-unlabeled classification (P (PU classification)

Binary Classifier

slide-4
SLIDE 4

Why PU classification?

Unlabeled data are cheaper to obtain. Sometimes, negative data are hard to describe. In some real-world applications, collecting negative data is impossible. Applications:

  • Bioinformatics (Yang+, 2012, Singh-Blom+ 2013, Ren+, 2015)
  • Text classification (Li+, 2003)
  • Time series classification (Nguyen+, 2011)
  • Medical diagnosis (Zuluaga+, 2011)
  • Remote-sensing classification (Li+, 2011)

4

slide-5
SLIDE 5

Class prior shift

The ratio of positive-negative in the training and test data are different.

5

Examples:

  • Collect unlabeled data from the internet.
  • Collect unlabeled data from all users/patients/etc. for personalized application.

Train Test

neg. pos. neg. pos.

Decision boundary is also shifted Lead to low accuracy!

slide-6
SLIDE 6

Class prior shift (c (cont.)

6

Existing PU classification work assumes class prior of training and test data are the same (du Plessis+, 2014 2015, Kiryo+, 2017). Existing class prior shift work is not applicable since they require positive-negative data (Saerens, 2002, du Plessis+, 2012).

slide-7
SLIDE 7

Given: Two sets of data

Unobserved

: Class prior shift!

PU classification under class prior shift

Positive Unlabeled

7

Test

Observed

Q: Does class prior shift heavily degrade the performance?

slide-8
SLIDE 8

Classifier may fail miserably under class prior shift… 8

Dataset Accuracy (no shift) banana

90.1 (0.6)

ijcnn1

72.9 (0.4)

MNIST

86.0 (0.4)

susy

79.5 (0.5)

cod-rna

87.4 (0.6)

magic

76.7 (0.5)

Accuracy (shifted)

82.3 (0.5) 37.8 (0.7) 69.8 (0.7) 57.5 (0.9) 78.5 (0.6) 60.6 (1.4)

Accuracy (shifted)

87.9 (0.3) 71.7 (0.3) 82.5 (0.6) 75.9 (0.5) 84.7 (0.4) 79.0 (0.5)

Our method

Accuracy drops heavily!!

No shift: Accuracy reported in mean and std. error of 10 trials with density ratio method. Shift!

slide-9
SLIDE 9
  • Given: Two sets of data and test class prior
  • Goal: Find a prediction function that minimizes

Problem setting

Positive Unlabeled

9

slide-10
SLIDE 10

Proposed methods

We proposed two approaches for PU classification under class prior shift:

  • Risk minimization approach:

Learn a classifier based on empirical risk minimization principle (Vapnik, 1998).

  • Density ratio approach:
  • 1. Estimate a density ratio of positive and unlabeled densities.
  • 2. Use an appropriate threshold to classify.

10

Later, we will show that our methods are also applicable for

PU classification with asymmetric error.

slide-11
SLIDE 11

With , we can rewrite as

Risk minimization approach

Equivalent to existing methods (du Plessis+, 2015) if

.

11

No access to distribution: we minimize empirical error (Vapnik, 1998):

Consider the following classification risk:

slide-12
SLIDE 12

Directly minimize 0-1 loss is difficult.

  • NP-Hard, discontinuous, not differentiable (Ben-david+, 2003, Feldman+, 2012)

In practice, minimize a surrogate loss (regularization can also be added):

Surrogate losses for binary ry classification

12

slide-13
SLIDE 13

Density ratio estimation

Goal: Estimate the density ratio: from two sets of data

13

Applications: outlier detection (Hido+, 2011),

change-point detection (Liu+, 2013), robot control (Hachiya+, 2009) event detection in images/movies/text (Yamanaka, 2011, Matsugu, 2011, Liu, 2012), etc.

Please check this book to learn more about density ratio estimation (Sugiyama+, 2012)

Naïve approach: estimate , separately then perform division . Does not work well (estimation error is amplified from division operation).

slide-14
SLIDE 14

Unconstrained le least-squares im important fi fitting (uLSIF)

Goal: Estimate the density ratio: How: estimate by minimizing squared loss objective:

14

(Kanamori+, 2012)

Empirical minimization (constant can be safely ignored): Squared loss decomposition:

slide-15
SLIDE 15

Global solution can be computed analytically: Parameter tuning (regularization, basis) can be done by cross-validation. Model: linear-in parameter model Objective:

Unconstrained le least-squares im important fi fitting (c (cont.) 15

(Kanamori+, 2012) : basis function (e.g., Gaussian kernel) : regularization parameter : identity matrix

slide-16
SLIDE 16

Density ratio approach

Consider Bayes-optimal classifier of binary classification (no prior shift) We can rewrite it as

16

Density ratio!

Q1: How to modify when class prior shift occurs? Q2: Which formulation is preferable? Another formulation is

slide-17
SLIDE 17

Q1: : Density ratio approach (s (shift)

Consider Bayes-optimal classifier of binary classification We can rewrite it as Another formulation is

17

Density ratio! Simply modifying the threshold can solve this problem!

slide-18
SLIDE 18

Q2: : Difficulty of f density ratio estimation

is unbounded when . This raises issues of robustness and stability.

We show that the density ratio is bounded in PU classification.

18

In general, density ratio is unbounded.

slide-19
SLIDE 19

In PU classification, density ratio is bounded.

Q2: : Density ratio in PU

Insight: estimate is preferable. Our experimental results agree with this observation.

19

Lower and upper bounded Unbounded from above

slide-20
SLIDE 20

Experiments: class prior shift t train 0.7 .7 -> test 0.3 .3

20

Datasets: banana, ijcnn1, MNIST, susy, cod-rna, magic Methods:

  • Density ratio

(

𝒒 𝒗uLSIF )

  • Density ratio (

𝒗 𝒒 uLSIF )

  • Linear-in input model (Lin): Double hinge loss (DH-Lin), squared loss (Sq-Lin)
  • Kernel model (Ker): Double hinge loss (DH-Ker) , squared loss (Sq-Ker)

Parameter selection: (regularization, kernel width) 5-fold cross-validation. We also investigated when wrong test class prior is given.

Results reported in mean and std. error of accuracy of 10 trials. Outperforming methods are bolded based on one-sided t-test with significance level 5%. Dataset information and more experiments and can be found in the paper.

slide-21
SLIDE 21

Results: class prior shift

21

Traditional PU Wrong test prior is given Correct test prior is given

Preferable method in our experiments (density ratio

𝒒 𝒗uLSIF)

slide-22
SLIDE 22
  • Given: Given two sets of sample:
  • Goal: Find a prediction function that minimizes

Positive Unlabeled

22

PU classification with asymmetric error

Reduce to symmetric error when

slide-23
SLIDE 23

The equivalence of f pri rior shif ift and asymmetric error 23

We can relate these problems based on the analysis of Bayes-optimal classifier.

slide-24
SLIDE 24

Conclusion

Class prior shift may heavily degrade the performance of positive-unlabeled classification (PU classification).

  • Proposed two approaches for handling this problem effectively:

▪ Risk minimization approach ▪ Density ratio approach

  • Showed the equivalence of class prior shift and asymmetric error

problems in PU classification.

▪ Our methods are applicable for both problems. ▪ Also applicable when considering both problems simultaneously.

  • Poster: #31: May 2nd from 7:00-9:00PM

24