10701 Machine Learning
Recitation 7 - Tail bounds and Averages
Ahmed Hefny Slides mostly by Alex Smola
10701 Machine Learning Recitation 7 - Tail bounds and Averages - - PowerPoint PPT Presentation
10701 Machine Learning Recitation 7 - Tail bounds and Averages Ahmed Hefny Slides mostly by Alex Smola Why this stuff ? Can machine learning work ? Why this stuff ? Can machine learning work ? Yes, otherwise: No Google No
Recitation 7 - Tail bounds and Averages
Ahmed Hefny Slides mostly by Alex Smola
? ? Class 1 Class 2
learning algorithms.
theory.
averages ?
24 120 60 12
24 120 60 12
24 120 60 12
24 120 60 12
is it guaranteed to converge ? how quickly does it converge?
(same trick works for intervals)
convergence in probability
Almost sure convergence
5 sample traces
and standard deviation σi
converges to a Normal Distribution
convergence
unscaled scaled
Non-negative Random variable X with mean μ
Random variable X with mean μ and variance σ2
confidence ε2 yields the result.
𝜀
Random variable X with mean μ and variance σ2
confidence ε2 yields the result.
𝜀 Correct ?
Random variable X with mean μ and variance σ2
confidence ε2 yields the result.
𝜀 Approximately Correct ?
Random variable X with mean μ and variance σ2
confidence ε2 yields the result.
𝜀 Probably Approximately Correct !
Scales properly in μ but expensive in δ
Proper scaling in σ but still bad in δ Can we get logarithmic scaling in δ?
probability of getting head Pr 𝜈 𝑜 − 𝑞 ≥ 𝜗 ≤ exp(−2𝑜𝜗2)
Pinsker’s inequality Pinsker’s inequality
bounds since we only pay logarithmically in terms of their combination.
Heoffding’s Inequality
Pr 𝜈 𝑛 − 𝜈 ≥ 𝜗 ≤ 2 exp − 𝑜𝜗 2/2 𝐹 𝑌𝑗
2 + 𝐷𝑜𝜗/3 𝑗
Tighter Bounds More Assumptions
𝑄 𝐵 ∪ 𝐶 = 𝑄 𝐵 + 𝑄 𝐶 − 𝑄 𝐵 ∩ 𝐶 ≤ 𝑄 𝐵 + 𝑄 𝐶 In general 𝑄( 𝐵𝑗) ≤ 𝑄(𝐵𝑗)
𝑗 𝑗
transform
(need to assume that we can bound the tail)
convolution vanishing higher
mean
integral would have to converge
(we assume they’re all identical WLOG)
We looked at basic building blocks of learning theory
Evaluate classifier C on N data points and estimate
Evaluate classifier C on N data points and estimate
Yes, Chernoff bound / Heoffding’s inequality
Evaluate a set classifiers on N data points and pick the one with best accuracy. Can we upper-bound estimation error ?
Evaluate a set classifiers on N data points and pick the one with best accuracy. Can we upper-bound estimation error ? Yes, Chernoff bound / Heoffding’s inequality + union bound
What if the set of classifiers is infinite ??