Performance Measures: Stochastic Optimization & Statistical - - PowerPoint PPT Presentation

performance measures
SMART_READER_LITE
LIVE PREVIEW

Performance Measures: Stochastic Optimization & Statistical - - PowerPoint PPT Presentation

Learning with Non-decomposable Performance Measures: Stochastic Optimization & Statistical Consistency Harikrishna Narasimhan Department of Computer Science and Automation Indian Institute of Science, Bangalore perfor ormance mance measu


slide-1
SLIDE 1

Learning with Non-decomposable Performance Measures:

Stochastic Optimization & Statistical Consistency

Harikrishna Narasimhan

Department of Computer Science and Automation Indian Institute of Science, Bangalore

slide-2
SLIDE 2
slide-3
SLIDE 3

perfor

  • rmance

mance measu asure re?

slide-4
SLIDE 4

0-1 Classification Error

slide-5
SLIDE 5

0-1 Classification Error

point-wise se loss

slide-6
SLIDE 6

Text Retrieval

F-measure

2 x Precision x Recall Precision + Recall

slide-7
SLIDE 7

Medical Diagnosis

Area Under the ROC Curve (AUC)

False e Positive ive Rate True Positiv ive e Rate

slide-8
SLIDE 8

Information Retrieval

Precision@K

  • No. of positive objects

in Top-K positions

slide-9
SLIDE 9

http://www.tagxedo.com

slide-10
SLIDE 10

Non-decomposable Performance Measures

……

slide-11
SLIDE 11

Non-decomposable Performance Measures

……

cannot t be e ex expres esse sed as d as a s sum um of point-wise se errors!

slide-12
SLIDE 12

Perform formance ance Measure sures

Algorit rithms hms

……

slide-13
SLIDE 13

Perform formance ance Measure sures

Algorit rithms hms

……

Q1: Efficient Optimization?

slide-14
SLIDE 14

Perform formance ance Measure sures

Algorit rithms hms

……

Q1: Efficient Optimization? Q2: Statistical Consistency?

slide-15
SLIDE 15

Perform formance ance Measure sures

Algorit rithms hms

……

Efficient Learning Algorithms

Kar, P., Narasimhan, H. and Jain, P. “Online and Stochastic Gradient Methods for Non-decomposable Loss Functions”, NIPS 2014. To appear. Narasimhan, H. and Agarwal, S. “SVMpAUC-tight: A new support vector method for optimizing partial AUC based on a tight convex upper bound”, KDD 2013. Narasimhan, H. and Agarwal, S. “A structural SVM based approach for optimizing partial AUC”, ICML 2013.

slide-16
SLIDE 16

Perform formance ance Measure sures

Algorit rithms hms

……

Statistical Consistency of Learning Algorithms

Narasimhan, H. and Agarwal, S. “On the statistical consistency

  • f plug-in classifiers for non-decomposable performance

measures”. NIPS 2014. To appear. Narasimhan, H. and Agarwal, S. “On the relationship between binary classification, bipartite ranking, and binary class probability estimation”. NIPS 2013. Menon, A., Narasimhan, H., Agarwal, S. and Chawla, S. “On the statistical consistency of algorithms for binary classification under class imbalance ”, ICML 2013.

slide-17
SLIDE 17

Part I

Stochastic Gradient Methods for Non-decomposable Performance Measures

slide-18
SLIDE 18

Part I

Stochastic Gradient Methods for Non-decomposable Performance Measures

Part II

Statistical Consistency of Plug-in Methods for Non-decomposable Performance Measures

slide-19
SLIDE 19

Part I

Stochastic Gradient Methods for Non-decomposable Performance Measures

Purushottam Kar MSR, Bangalore Prateek Jain MSR, Bangalore

slide-20
SLIDE 20

Stochastic Gradient Descent

convex (point-wise)

slide-21
SLIDE 21

Stochastic Gradient Descent

convex (point-wise)

slide-22
SLIDE 22

Stochastic Gradient Descent

convex (point-wise)

slide-23
SLIDE 23

Stochastic Gradient Descent

convex (point-wise)

slide-24
SLIDE 24

Stochastic Gradient Descent

convex (point-wise) point-wise se u upd pdate te

slide-25
SLIDE 25

Stochastic Gradient Descent

slide-26
SLIDE 26

Stochastic Gradient Descent

Note on Proof: – Unbiased gradient estimates (estimated gradient = true gradient)

slide-27
SLIDE 27

Stochastic Gradient Descent

Note on Proof: – Unbiased gradient estimates (estimated gradient = true gradient) – Point-wise arguments!

slide-28
SLIDE 28

Stochastic Gradient Descent

convex function of all points!

slide-29
SLIDE 29

Stochastic Gradient Descent

convex function of all points! point-wise se u upd pdate te

slide-30
SLIDE 30

Previous Work

slide-31
SLIDE 31

Previous Work

  • Stochastic methods for pair-wise performance

measures (Zhao et al., 11; Kar et al., 13)

– Finite buffer sampling schemes

slide-32
SLIDE 32

Previous Work

  • Stochastic methods for pair-wise performance

measures (Zhao et al., 11; Kar et al., 13)

– Finite buffer sampling schemes

pair ir-wise se decomp

  • mposabi

sability

slide-33
SLIDE 33

Previous Work

  • Stochastic methods for pair-wise performance

measures (Zhao et al., 11; Kar et al., 13)

– Finite buffer sampling schemes

  • Online learning with non-additive regret

(Rakhlin et al., 11)

– Algorithms provided not tractable; instantiation to popular losses not clear

pair ir-wise se decomp

  • mposabi

sability

slide-34
SLIDE 34
  • Convex surrogates for non-decomposable measures
  • Mini-batch stochastic methods
  • Convergence guarantees
  • Experimental results

Part I

Stochastic Gradient Methods for Non-decomposable Performance Measures

Part II

Statistical Consistency of Plug-in Methods for Non-decomposable Performance Measures

slide-35
SLIDE 35

non-decomposable performance measures … non-convex, discontinuous

convex

slide-36
SLIDE 36

non-decomposable performance measures … non-convex, discontinuous … convex relaxation (J (Joachims, hims, 2005) 5)

convex

(SVMPerf Package)

slide-37
SLIDE 37

F-measure

slide-38
SLIDE 38

F-measure

(true ue labeling) (predicte icted d labeling) g)

slide-39
SLIDE 39

F-measure

Convex Surrogate Loss (SVMPerf: Joachims, 05)

slide-40
SLIDE 40

F-measure

non-dec ecomp mposab sable le

Convex Surrogate Loss (SVMPerf: Joachims, 05)

slide-41
SLIDE 41

Precision@K

  • No. of positive instances in the Top-K

positions of the ranked list

slide-42
SLIDE 42

Precision@K

  • No. of positive instances in the Top-K

positions of the ranked list

non-dec ecomp mposab sable le

Convex Surrogate Loss (SVMPerf: Joachims, 05)

slide-43
SLIDE 43

(Partial) AUC

slide-44
SLIDE 44

(Partial) AUC

slide-45
SLIDE 45

(Partial) AUC

non-dec ecomp mposab sable le

Convex Surrogate Loss (Narasimhan & Agarwal, 13)

slide-46
SLIDE 46
  • Convex surrogates for non-decomposable measures
  • Mini-batch stochastic methods
  • Convergence guarantees
  • Experimental results

Part I

Stochastic Gradient Methods for Non-decomposable Performance Measures

Part II

Statistical Consistency of Plug-in Methods for Non-decomposable Performance Measures

slide-47
SLIDE 47

x1 y1 x2 y2 x3 y3 x4 y4 … … … … … … … … … … … … … …

point-wise updates?

slide-48
SLIDE 48

x1 y1 x2 y2 x3 y3 x4 y4 … … … … … … … … … … … … … …

……

slide-49
SLIDE 49

x1 y1 x2 y2 x3 y3 x4 y4 … … … … … … … … … … … … … …

……

slide-50
SLIDE 50

1-Pass Mini-Batch

slide-51
SLIDE 51

2-Pass Mini-Batch

slide-52
SLIDE 52
  • Convex surrogates for non-decomposable measures
  • Mini-batch stochastic methods
  • Convergence guarantees
  • Experimental results

Part I

Stochastic Gradient Methods for Non-decomposable Performance Measures

Part II

Statistical Consistency of Plug-in Methods for Non-decomposable Performance Measures

slide-53
SLIDE 53

…… … … … … … … …..

But first, some intuition

(‘s’ random points)

slide-54
SLIDE 54

…… … … … … … … …..

But first, some intuition

… … … … … … … … … … … … … … … … … …

(‘s’ random points) (population of ‘n’ points)

slide-55
SLIDE 55

how w w well l d does es t the l e loss e evalua uate ted d o

  • n ‘s’ random points

gen eneral alize e t to the e en entire e p popu pulation:

…… … … … … … … …..

But first, some intuition

… … … … … … … … … … … … … … … … … …

(‘s’ random points) (population of ‘n’ points)

?

slide-56
SLIDE 56

how w w well l d does es t the l e loss e evalua uate ted d o

  • n ‘s’ random points

gen eneral alize e t to the e en entire e p popu pulation:

…… … … … … … … …..

But first, some intuition

… … … … … … … … … … … … … … … … … …

(‘s’ random points) (population of ‘n’ points)

?

slide-57
SLIDE 57

Uniform Convergence

slide-58
SLIDE 58

Uniform Convergence

dec ecreases ases w with m mini-batch length ‘s’

slide-59
SLIDE 59

Convergence Guarantee

slide-60
SLIDE 60

Convergence Guarantee

dec ecreases ases w with m mini-batch length ‘s’

slide-61
SLIDE 61

Convergence Guarantee

dec ecreases ases w with m mini-batch length ‘s’ (no. o

  • f

up update tes)

slide-62
SLIDE 62

Convergence Guarantee

dec ecreases ases w with m mini-batch length ‘s’ (no. o

  • f

up update tes) increases ases w with mini-batch length ‘s’

slide-63
SLIDE 63

Instantiation to Specific Measures

slide-64
SLIDE 64
  • Convex surrogates for non-decomposable measures
  • Mini-batch stochastic methods
  • Convergence guarantees
  • Experimental results

Part I

Stochastic Gradient Methods for Non-decomposable Performance Measures

Part II

Statistical Consistency of Plug-in Methods for Non-decomposable Performance Measures

slide-65
SLIDE 65

Experimental Results

(Partial AUC)

PPI KDD Cup 08 IJCNN Letter

slide-66
SLIDE 66

Experimental Results

(Partial AUC)

PPI KDD Cup 08 IJCNN Letter

Batch ch Methods hods

slide-67
SLIDE 67

Experimental Results

(Precision@K)

PPI KDD Cup 08 IJCNN Letter

Batch ch Method hod

slide-68
SLIDE 68

Experimental Results

(Robustness to Epoch Lengths)

slide-69
SLIDE 69
  • Convex surrogates for non-decomposable measures
  • Mini-batch stochastic methods
  • Convergence guarantees
  • Experimental results

Part I

Stochastic Gradient Methods for Non-decomposable Performance Measures

Part II

Statistical Consistency of Plug-in Methods for Non-decomposable Performance Measures

slide-70
SLIDE 70

Part II

Statistical Consistency of Plug-in Methods for Non-decomposable Performance Measures

Rohit Vaish IISc, Bangalore Shivani Agarwal IISc, Bangalore

slide-71
SLIDE 71

Our goal: In n practice ice: (surrog

  • gate)

te)

slide-72
SLIDE 72

Our goal: In n practice ice: (surrog

  • gate)

te)

slide-73
SLIDE 73

Our goal: In n practice ice: (surrog

  • gate)

te)

slide-74
SLIDE 74

Our goal: In n practice ice: (surrog

  • gate)

te)

Part I was about solving this problem for non-decomposable measures with linear predictors

slide-75
SLIDE 75

Our goal: In n practice ice: (surrog

  • gate)

te)

?

slide-76
SLIDE 76

does the given learning algorithm for a performance measure converge in in the limi imit

  • f
  • f

inf nfinite te tr training data ta to the (Bayes) optimal mal pre redict ictor

  • r

for the measure?

slide-77
SLIDE 77

Statistical Consistency

Data Space Model l Space

slide-78
SLIDE 78

Statistical Consistency

Data Space Model l Space

slide-79
SLIDE 79

Statistical Consistency

Data Space Model l Space

regr gret

slide-80
SLIDE 80

Statistical Consistency

Data Space Model l Space

regr gret

regret

0 ?

P

slide-81
SLIDE 81

Statistical Consistency

Underlying (unknown) distribution D over instances and labels

slide-82
SLIDE 82

Statistical Consistency

Underlying (unknown) distribution D over instances and labels

slide-83
SLIDE 83

Statistical Consistency

Underlying (unknown) distribution D over instances and labels

slide-84
SLIDE 84

Statistical Consistency

Underlying (unknown) distribution D over instances and labels

slide-85
SLIDE 85

Statistical Consistency

slide-86
SLIDE 86

Statistical Consistency

  • Decomposable measures

– 0-1 classification error: Zhang, 04; Bartlett et al., 06 – Cost-weighted classification error: Scott, 12 – Balanced classification error: Narasimhan et al. , 13 – Logistic, squared, exponential losses (strictly proper losses): Reid & Williamson, 09, 10

  • Pair-wise measures

– AUC: Clemencon et al., 08; Agarwal et al., 14

slide-87
SLIDE 87

Statistical Consistency

  • Decomposable measures

– 0-1 classification error: Zhang, 04; Bartlett et al., 06 – Cost-weighted classification error: Scott, 12 – Balanced classification error: Narasimhan et al. , 13 – Logistic, squared, exponential losses (strictly proper losses): Reid & Williamson, 09, 10

  • Pair-wise measures

– AUC: Clemencon et al., 08; Agarwal et al., 14

  • General non-decomposable measure?
slide-88
SLIDE 88
  • Plug-in methods for classification measures
  • Main consistency result
  • Experimental results
  • Proof intuition

Part I

Stochastic Gradient Methods for Non-decomposable Performance Measures

Part II

Statistical Consistency of Plug-in Methods for Non-decomposable Performance Measures

slide-89
SLIDE 89

Plug-in Method

Training Set

slide-90
SLIDE 90

Plug-in Method

Training Set Class Probability Estimate

slide-91
SLIDE 91

Plug-in Method

Training Set Class Probability Estimate Threshold Choice

slide-92
SLIDE 92

Classification Measures

+1

  • 1

+1

  • 1
slide-93
SLIDE 93

Classification Measures

+1

  • 1

tr true ue positive ive

(TPR)

tr true ue nega gative ive

(TNR)

+1

  • 1
slide-94
SLIDE 94

Classification Measures

slide-95
SLIDE 95

AM-measure (1 - BER)

Classification Measures

slide-96
SLIDE 96

Classification Measures

G-mean

slide-97
SLIDE 97

F-measure

Classification Measures

where Prec = p proportion of p points ts with y=1 | h(x) = 1

slide-98
SLIDE 98

Classification Measures

non-dec ecomp mposab sable le

slide-99
SLIDE 99

More formally,

Underlying (unknown) distribution D with:

slide-100
SLIDE 100

More formally,

Underlying (unknown) distribution D with: proportion

  • f positives
slide-101
SLIDE 101

More formally,

Underlying (unknown) distribution D with: proportion

  • f positives

Plug ug-in n Method

  • d : (S1, S2) ~ Dn

estim imate te: (using S1)

slide-102
SLIDE 102

More formally,

Underlying (unknown) distribution D with: proportion

  • f positives

Plug ug-in n Method

  • d : (S1, S2) ~ Dn

estim imate te: thres esho hold ld: (using S2) (using S1)

slide-103
SLIDE 103
  • Plug-in methods for classification measures
  • Main consistency results
  • Experimental results
  • Proof intuition

Part I

Stochastic Gradient Methods for Non-decomposable Performance Measures

Part II

Statistical Consistency of Plug-in Methods for Non-decomposable Performance Measures

slide-104
SLIDE 104

But first, some intuition

slide-105
SLIDE 105

But first, some intuition

slide-106
SLIDE 106

But first, some intuition

Optimal classifier for ?

slide-107
SLIDE 107

But first, some intuition

0.5

Classification error

slide-108
SLIDE 108

But first, some intuition

0.5

Classification error General non-decomposable measure

?

slide-109
SLIDE 109

Main Consistency Result

slide-110
SLIDE 110

Main Consistency Result

(w.r.t. S1)

slide-111
SLIDE 111

Main Consistency Result

(w.r.t. S1)

slide-112
SLIDE 112

Main Consistency Result

(w.r.t. S1)

?

slide-113
SLIDE 113

Instantiation to Specific Measures

(Menon et al., 13) (Ye et al., 12)

slide-114
SLIDE 114

Instantiation to Specific Measures

(Menon et al., 13) (Ye et al., 12)

slide-115
SLIDE 115

Instantiation to Specific Measures

slide-116
SLIDE 116
  • Plug-in methods for classification measures
  • Main consistency result
  • Experimental results
  • Proof intuition

Part I

Stochastic Gradient Methods for Non-decomposable Performance Measures

Part II

Statistical Consistency of Plug-in Methods for Non-decomposable Performance Measures

slide-117
SLIDE 117

Experimental Results

  • Synthetic data:

– Gaussian class conditionals, equal covariance, p = 0.1 – Optimal classifier can be computed by hand

slide-118
SLIDE 118

Experimental Results

  • Synthetic data:

– Gaussian class conditionals, equal covariance, p = 0.1 – Optimal classifier can be computed by hand

slide-119
SLIDE 119

Experimental Results

  • Synthetic data:

– Gaussian class conditionals, equal covariance, p = 0.1 – Optimal classifier can be computed by hand

do do not conve nverge e to zero

  • regret
slide-120
SLIDE 120

Experimental Results

  • Synthetic data:

– Gaussian class conditionals, equal covariance, p = 0.1 – Optimal classifier can be computed by hand

slide-121
SLIDE 121
  • Plug-in methods for classification measures
  • Main consistency result
  • Experimental results
  • Proof intuition

Part I

Stochastic Gradient Methods for Non-decomposable Performance Measures

Part II

Statistical Consistency of Plug-in Methods for Non-decomposable Performance Measures

slide-122
SLIDE 122

Proof Intuition

slide-123
SLIDE 123

Proof Intuition

  • im

implies ies for any ny fix ixed d ‘c’

slide-124
SLIDE 124

Proof Intuition

  • im

implies ies for any ny fix ixed d ‘c’ Unif iform m conv nvergence ence gene neral raliza zation ion bo boun und d for

slide-125
SLIDE 125
  • Plug-in methods for classification measures
  • Main consistency result
  • Experimental results
  • Proof intuition

Part I

Stochastic Gradient Methods for Non-decomposable Performance Measures

Part II

Statistical Consistency of Plug-in Methods for Non-decomposable Performance Measures

slide-126
SLIDE 126

Questions?