SLIDE 47 Mathematics of relevance The Information Bottleneck Method Further work and Conclusions Relations to learning theory Finite sample bounds Consistency and optimality
Generalization bounds
Theorem (Shamir & NT 2007)
For any confidence parameter δ ≥ 0, we have with probability of at least 1 − δ, for any T defined via p(t|x) and any constants a, b1, . . . , b|T|, c simultaneously: |I(X; T) − ˆ I(X; T)| ≤
f n(δ)p(t|x) − bt √m
n(δ)H(T|x) − a √m , |I(Y; T) − ˆ I(Y; T)| ≤ 2
f n(δ)p(t|x) − bt √m
n(δ)ˆ H(T|y) − c √m . where n(δ) = 2 +
|Y|+2
δ
- , and f(x) is monotonically increasing and concave in |x|, defined as:
f(x) =
|x| ≤ 1/e 1/e |x| > 1/e Naftali Tishby Extracting Relevant Information from Samples