Machine Learning
The Perceptron Mistake Bound
1
Some slides based on lectures from Dan Roth, Avrim Blum and others
The Perceptron Mistake Bound Machine Learning 1 Some slides based - - PowerPoint PPT Presentation
The Perceptron Mistake Bound Machine Learning 1 Some slides based on lectures from Dan Roth, Avrim Blum and others Where are we? The Perceptron Algorithm Variants of Perceptron Perceptron Mistake Bound 2 Convergence Convergence
1
Some slides based on lectures from Dan Roth, Avrim Blum and others
2
3
4
5
6
7
8
9
We can always find such an 𝑆. Just look for the farthest data point from the origin.
10
11
The data has a margin 𝛿. Importantly, the data is separable. 𝛿 is the complexity parameter that defines the separability of data.
12
13
If u hadn’t been a unit vector, then we could scale ° in the mistake bound. This will change the final mistake bound to (||u||R/°)2
14
– Effectively scales inputs, but does not change the behavior
– That is, for every example (𝐲!, 𝑧!), we have 𝐲! ≤ 𝑆
– That is, for every example (𝐲!, 𝑧!), we have 𝑧!𝐯"𝐲! ≥ 𝛿
15
#𝐲! ≠ 𝑧!:
Update 𝐱"$% ← 𝐱" + 𝑧!𝐲!
16
#𝐲! ≠ 𝑧!:
Update 𝐱"$% ← 𝐱" + 𝑧!𝐲!
17
Because the data is separable by a margin 𝛿
#𝐲! ≠ 𝑧!:
Update 𝐱"$% ← 𝐱" + 𝑧!𝐲!
18
#𝐲! ≠ 𝑧!:
Update 𝐱"$% ← 𝐱" + 𝑧!𝐲! Because the data is separable by a margin 𝛿
19
#𝐲! ≠ 𝑧!:
Update 𝐱"$% ← 𝐱" + 𝑧!𝐲!
20
The weight is updated only when there is a mistake. That is when 𝑧!𝐱"
#𝐲! < 0.
𝐲! ≤ 𝑆, by definition of R
#𝐲! ≠ 𝑧!:
Update 𝐱"$% ← 𝐱" + 𝑧!𝐲!
21
#𝐲! ≠ 𝑧!:
Update 𝐱"$% ← 𝐱" + 𝑧!𝐲!
" ≤ 𝑢𝑆"
22
" ≤ 𝑢𝑆"
23
From (2)
" ≤ 𝑢𝑆"
24
From (2) 𝒗𝑼𝐱" = 𝐯 𝐱" 𝑑𝑝𝑡 angle between them But 𝐯 = 1 and cosine is less than 1 So 𝐯#𝐱" ≤ 𝐱"
𝒗𝑼𝐱" = 𝐯 𝐱" 𝑑𝑝𝑡 angle between them But 𝐯 = 1 and cosine is less than 1 So 𝐯#𝐱" ≤ 𝐱"
" ≤ 𝑢𝑆"
25
From (2) (Cauchy-Schwarz inequality)
" ≤ 𝑢𝑆"
26
From (2) From (1) 𝒗𝑼𝐱" = 𝐯 𝐱" 𝑑𝑝𝑡 angle between them But 𝐯 = 1 and cosine is less than 1 So 𝐯#𝐱" ≤ 𝐱"
" ≤ 𝑢𝑆"
27
Number of mistakes
" ≤ 𝑢𝑆"
28
Number of mistakes
29
– For Boolean functions with 𝑜 attributes, show that 𝑆# = 𝑜.
– How many mistakes will the Perceptron algorithm make for disjunctions with 𝑜 attributes?
– How many mistakes will the Perceptron algorithm make for 𝑙-disjunctions with 𝑜 attributes? – Find a sequence of examples that will force the Perceptron algorithm to make 𝑃 𝑜 mistakes for a concept that is a 𝑙-disjunction.
30
Number of mistakes
31
32
33