Optimal and Adaptive Algorithms for Online Boosting
Alina Beygelzimer1 Satyen Kale1 Haipeng Luo2
1Yahoo! Labs, NYC 2Computer Science Department, Princeton University
Optimal and Adaptive Algorithms for Online Boosting Alina Beygelzimer - - PowerPoint PPT Presentation
Optimal and Adaptive Algorithms for Online Boosting Alina Beygelzimer 1 Satyen Kale 1 Haipeng Luo 2 1 Yahoo! Labs, NYC 2 Computer Science Department, Princeton University December 11, 2015 Boosting: An Example Idea: combine weak rules of
1Yahoo! Labs, NYC 2Computer Science Department, Princeton University
2 / 18
2 / 18
◮ (“Attn: Beneficiary Contractor Foreign Money Transfer ...”, spam) ◮ (“Let’s meet to discuss QPR –Edo”, not spam) 2 / 18
◮ (“Attn: Beneficiary Contractor Foreign Money Transfer ...”, spam) ◮ (“Let’s meet to discuss QPR –Edo”, not spam)
◮ e.g. contains the word “money” ⇒ spam. 2 / 18
◮ (“Attn: Beneficiary Contractor Foreign Money Transfer ...”, spam) ◮ (“Let’s meet to discuss QPR –Edo”, not spam)
◮ e.g. contains the word “money” ⇒ spam.
◮ e.g. spam that doesn’t contain “money”. 2 / 18
◮ (“Attn: Beneficiary Contractor Foreign Money Transfer ...”, spam) ◮ (“Let’s meet to discuss QPR –Edo”, not spam)
◮ e.g. contains the word “money” ⇒ spam.
◮ e.g. spam that doesn’t contain “money”.
◮ e.g. empty “to address” ⇒ spam. 2 / 18
◮ (“Attn: Beneficiary Contractor Foreign Money Transfer ...”, spam) ◮ (“Let’s meet to discuss QPR –Edo”, not spam)
◮ e.g. contains the word “money” ⇒ spam.
◮ e.g. spam that doesn’t contain “money”.
◮ e.g. empty “to address” ⇒ spam.
2 / 18
◮ (“Attn: Beneficiary Contractor Foreign Money Transfer ...”, spam) ◮ (“Let’s meet to discuss QPR –Edo”, not spam)
◮ e.g. contains the word “money” ⇒ spam.
◮ e.g. spam that doesn’t contain “money”.
◮ e.g. empty “to address” ⇒ spam.
2 / 18
3 / 18
3 / 18
◮ e.g. spam detection. 3 / 18
◮ e.g. spam detection.
3 / 18
4 / 18
4 / 18
5 / 18
5 / 18
5 / 18
5 / 18
5 / 18
5 / 18
5 / 18
5 / 18
6 / 18
7 / 18
8 / 18
9 / 18
10 / 18
10 / 18
10 / 18
10 / 18
11 / 18
11 / 18
11 / 18
◮ w i
t = Pr[ki t heads in N − i flips of a γ 2 -biased coin]
11 / 18
◮ w i
t = Pr[ki t heads in N − i flips of a γ 2 -biased coin] ≤ 4 √N−i
11 / 18
◮ w i
t = Pr[ki t heads in N − i flips of a γ 2 -biased coin] ≤ 4 √N−i
11 / 18
12 / 18
12 / 18
12 / 18
13 / 18
13 / 18
13 / 18
13 / 18
14 / 18
◮ for each i, add output of weak learner with step-size α found by line
14 / 18
◮ for each i, add output of weak learner with step-size α found by line
◮ final prediction is weighted majority with weights α 14 / 18
◮ for each i, add output of weak learner with step-size α found by line
◮ final prediction is weighted majority with weights α
14 / 18
◮ for each i, add output of weak learner with step-size α found by line
◮ final prediction is weighted majority with weights α
14 / 18
15 / 18
15 / 18
16 / 18
17 / 18
17 / 18