Robust Models in Information Retrieval
Nedim Lipka Benno Stein Bauhaus-Universität Weimar
[www.webis.de]
Robust Models in Information Retrieval Nedim Lipka Benno Stein - - PowerPoint PPT Presentation
Robust Models in Information Retrieval Nedim Lipka Benno Stein Bauhaus-Universitt Weimar [www.webis.de] Robust Models in Information Retrieval Outline Introduction Bias and Variance Robust Models in IR Summary Excursus: Bias
[www.webis.de]
[∧] c stein TIR’11
❑ set O of real-world objects o ❑ feature space X with feature vectors x ❑ classification function (closed form unknown) c : X → Y ❑ sample S = {(x, y) | x ∈ X, y = c(x)}
[∧] c stein TIR’11
❑ set O of real-world objects o ❑ feature space X with feature vectors x ❑ classification function (closed form unknown) c : X → Y ❑ sample S = {(x, y) | x ∈ X, y = c(x)}
❑ hypothesis h ∈ H that minimizes P(h(x) = c(x))
[∧] c stein TIR’11
❑ set O of real-world objects o ❑ feature space X with feature vectors x ❑ classification function (closed form unknown) c : X → Y ❑ sample S = {(x, y) | x ∈ X, y = c(x)}
❑ hypothesis h ∈ H that minimizes P(h(x) = c(x))
❑ err S(h) = 1
❑ err(h∗) := min h∈H err(h) defines lower bound for err(h) ➜ restriction bias.
[∧] c stein TIR’11
❑ set O of real-world objects o ❑ feature space X with feature vectors x ❑ classification function (closed form unknown) c : X → Y ❑ sample S = {(x, y) | x ∈ X, y = c(x)}
❑ hypothesis h ∈ H that minimizes P(h(x) = c(x))
❑ err S(h) = 1
❑ err(h∗) := min h∈H err(h) defines lower bound for err(h) ➜ restriction bias.
[∧] c stein TIR’11
[∧] c stein TIR’11
[∧] c stein TIR’11
x x h h
[∧] c stein TIR’11
x x h h
α1) > err S(h∗ α2)
α1) < err(h∗ α2)
[∧] c stein TIR’11
[∧] c stein TIR’11
[∧] c stein TIR’11
[∧] c stein TIR’11
[∧] c stein TIR’11
❑ A feature vector x and its predicted class label ˆ
❑ h is characterized by a weight vector θ, where ❑ θ has been estimated based on a random sample S = {(x, c(x)}.
[∧] c stein TIR’11
❑ A feature vector x and its predicted class label ˆ
❑ h is characterized by a weight vector θ, where ❑ θ has been estimated based on a random sample S = {(x, c(x)}.
❑ A series of samples Si, Si ⊆ U, entails a series of hypotheses h(θi), ❑ giving for a feature vector x a series of class labels ˆ
[∧] c stein TIR’11
❑ A feature vector x and its predicted class label ˆ
❑ h is characterized by a weight vector θ, where ❑ θ has been estimated based on a random sample S = {(x, c(x)}.
❑ A series of samples Si, Si ⊆ U, entails a series of hypotheses h(θi), ❑ giving for a feature vector x a series of class labels ˆ
❑ σ2(Z) is the variance of Z, (= variance of the prediction) ❑ |θ| : |S| ↑
❑ |S| : |U| ↓
[∧] c stein TIR’11
[∧] c stein TIR’11
[∧] c stein TIR’11
[∧] c stein TIR’11
[∧] c stein TIR’11
[∧] c stein TIR’11
[∧] c stein TIR’11
[∧] c stein TIR’11
[∧] c stein TIR’11
Parameter number (hypothesis complexity)
MSE Bias Variance
[∧] c stein TIR’11
Parameter number (hypothesis complexity)
MSE Bias Variance
[∧] c stein TIR’11
Parameter number (hypothesis complexity)
MSE Bias Variance
[∧] c stein TIR’11
Parameter number (hypothesis complexity)
MSE Bias Variance
α1) > err S(h∗ α2)
[∧] c stein TIR’11
Parameter number (hypothesis complexity)
MSE Bias Variance
α1) < err(h∗ α2)
[∧] c stein TIR’11
❑ Even when properly choosing training and test sets, a model selection
❑ Rationale: the concept of representativeness gets lost for extreme ratios
❑ This behavior is consistent with the concept of the bias-variance-tradeoff.
[∧] c stein TIR’11
[∧] c stein TIR’11
α1) > err S(h∗ α2)
α1) < err(h∗ α2)
❑ Topic classification for the web is learned on extremely small samples. ❑ The web generalization error of a classifier h cannot be computed.
[∧] c stein TIR’11
❑ Corpus
❑ Corpus Size
❑ Considered classes
❑ Sample size
❑ Ratio sample and corpus
❑ Inductive learner
❑ Model formation functions α
[∧] c stein TIR’11
Sample error errS
[∧] c stein TIR’11
Sample error errS
[∧] c stein TIR’11
Sample error errS
Optimum model
[∧] c stein TIR’11
❑ Corpus
❑ Corpus Size
❑ Considered classes
❑ Sample size
❑ Ratio sample and corpus
❑ Inductive learner
❑ Model formation functions α
[∧] c stein TIR’11
[∧] c stein TIR’11
[∧] c stein TIR’11
Optimum model
[∧] c stein TIR’11
Help Article Discussion Shop Non-pers. home Personal home Link collection Download
❑ The sizes of existing genre corpora vary between 200 - 2500 documents. ❑ The number of the web genres in these corpora is between 3 and 16. ❑ The researchers report an very good (too good?) classification results.
[∧] c stein TIR’11
❑ Corpus A
❑ Considered classes
❑ Corpus B
❑ Considered classes
❑ Sample sizes
❑ Inductive learner
❑ Model formation functions α
[∧] c stein TIR’11
45 50 55 60 65 70 75 100 150 200 250 300
predictive accuracy number of training instances
Corpus A. (KI-04)
〈α1, h〉 〈α2, h〉
α1) < err S(h∗ α2)
[∧] c stein TIR’11
45 50 55 60 65 70 75 100 200 300 400 500 600 700
export accuracy number of training instances
Training corpus A. (KI-04) Test corpus B. (7-Web-Genre)
〈α1, h〉 〈α2, h〉
α1) > err(h∗ α2)
[∧] c stein TIR’11
[∧] c stein TIR’11
❑ a bias over-estimation of the less complex classifier or ❑ a variance under-estimation of the more complex classifier.
[∧] c stein TIR’11
[∧] c stein TIR’11
[∧] c stein TIR’11