Learning From Non-iid Data: Fast Rates for the One-vs-All Multiclass - - PowerPoint PPT Presentation

learning from non iid data fast rates for the one vs all
SMART_READER_LITE
LIVE PREVIEW

Learning From Non-iid Data: Fast Rates for the One-vs-All Multiclass - - PowerPoint PPT Presentation

Learning From Non-iid Data: Fast Rates for the One-vs-All Multiclass Plug-in Classifiers Vu Dinh 1 Lam Si Tung Ho 2 Nguyen Viet Cuong 3 Duy Duc Nguyen 4 Binh T. Nguyen 5 1 Purdue University 2 University of California, Los Angeles 3 National


slide-1
SLIDE 1

Learning From Non-iid Data: Fast Rates for the One-vs-All Multiclass Plug-in Classifiers

Vu Dinh1 Lam Si Tung Ho2 Nguyen Viet Cuong3 Duy Duc Nguyen4 Binh T. Nguyen5

1Purdue University 2University of California, Los Angeles 3National University of Singapore 4University of Wisconsin-Madison 5University of Science, Vietnam

V.Dinh, L.S.T.Ho, N.V.Cuong, D.D.Nguyen, B.T.Nguyen Fast Rates for One-vs-All Multiclass Plug-in Classifiers 1/6

slide-2
SLIDE 2

Introduction

Fast and super fast learning rates for plug-in classifier

Multiclass setting Non-iid data

Non-iid data

Exponentially strongly mixing data Converging drifting data

Generalization of previous result for binary-class and iid case Algorithm does not need to know the exponent in the margin assumption The rates have nice properties

Not depend on the number of classes Retain optimal learning rate for the H¨

  • lder class in iid case

V.Dinh, L.S.T.Ho, N.V.Cuong, D.D.Nguyen, B.T.Nguyen Fast Rates for One-vs-All Multiclass Plug-in Classifiers 2/6

slide-3
SLIDE 3

Assumptions

1 All label distribution functions ηj(X) are H¨

  • lder continuous

with exponent β.

2 Marginal distribution PX satisfies strong density assumption.

Its density has positive upper and lower bounds on a compact regular set of Rd.

3 P satisfies multiclass margin assumption. V.Dinh, L.S.T.Ho, N.V.Cuong, D.D.Nguyen, B.T.Nguyen Fast Rates for One-vs-All Multiclass Plug-in Classifiers 3/6

slide-4
SLIDE 4

Fast Rates for Exponentially Strongly Mixing Data

Theorem We can construct a one-vs-all multiclass plug-in classifier fn that satisfies: there exist C1, C2 > 0 such that for all large enough n, ER( fn) − R(f ∗) ≤ C1n−C2β(1+α)/(2β+d). α: constant in the margin assumption β: exponent in the H¨

  • lder continuous assumption

d: dimension of the input space Rd Expected risk of plug-in classifier converges to optimal risk with rate n−C2β(1+α)/(2β+d).

Fast rate when C2β(1 + α)/(2β + d) > 1/2 Super fast rate when C2β(1 + α)/(2β + d) > 1

V.Dinh, L.S.T.Ho, N.V.Cuong, D.D.Nguyen, B.T.Nguyen Fast Rates for One-vs-All Multiclass Plug-in Classifiers 4/6

slide-5
SLIDE 5

Fast Rates for Drifting Data

Theorem We can construct a one-vs-all multiclass plug-in classifier fn that satisfies: there exists C > 0 such that for all large enough n, ER( fn) − R(f ∗) ≤ C n−β(1+α)/(2β+d). Expected risk of plug-in classifier converges to optimal risk with rate n−β(1+α)/(2β+d).

Fast rate when β(1 + α)/(2β + d) > 1/2 Super fast rate when β(1 + α)/(2β + d) > 1

V.Dinh, L.S.T.Ho, N.V.Cuong, D.D.Nguyen, B.T.Nguyen Fast Rates for One-vs-All Multiclass Plug-in Classifiers 5/6

slide-6
SLIDE 6

Thank you.

V.Dinh, L.S.T.Ho, N.V.Cuong, D.D.Nguyen, B.T.Nguyen Fast Rates for One-vs-All Multiclass Plug-in Classifiers 6/6