robust models in information retrieval
play

Robust Models in Information Retrieval Nedim Lipka Benno Stein - PowerPoint PPT Presentation

Robust Models in Information Retrieval Nedim Lipka Benno Stein Bauhaus-Universitt Weimar [www.webis.de] Robust Models in Information Retrieval Outline Introduction Bias and Variance Robust Models in IR Summary Excursus: Bias


  1. Robust Models in Information Retrieval Nedim Lipka Benno Stein Bauhaus-Universität Weimar [www.webis.de]

  2. Robust Models in Information Retrieval Outline · Introduction · Bias and Variance · Robust Models in IR · Summary · Excursus: Bias Types

  3. Introduction c [ ∧ ] � stein TIR’11

  4. Introduction Classification Task Given: ❑ set O of real-world objects o ❑ feature space X with feature vectors x ❑ classification function (closed form unknown) c : X → Y ❑ sample S = { ( x , y ) | x ∈ X, y = c ( x ) } c [ ∧ ] � stein TIR’11

  5. Introduction Classification Task Given: ❑ set O of real-world objects o ❑ feature space X with feature vectors x ❑ classification function (closed form unknown) c : X → Y ❑ sample S = { ( x , y ) | x ∈ X, y = c ( x ) } Searched: ❑ hypothesis h ∈ H that minimizes P ( h ( x ) � = c ( x )) , the generalization error. � �� � err ( h ) c [ ∧ ] � stein TIR’11

  6. Introduction Classification Task Given: ❑ set O of real-world objects o ❑ feature space X with feature vectors x ❑ classification function (closed form unknown) c : X → Y ❑ sample S = { ( x , y ) | x ∈ X, y = c ( x ) } Searched: ❑ hypothesis h ∈ H that minimizes P ( h ( x ) � = c ( x )) , the generalization error. � �� � err ( h ) Measuring effectiveness of h : ❑ err S ( h ) = 1 � loss 0 / 1 ( h ( x ) , c ( x )) | S | x ∈ S err S ( h ) is called test error if S is not used for the construction of h . ❑ err ( h ∗ ) := min h ∈ H err ( h ) defines lower bound for err ( h ) ➜ restriction bias. c [ ∧ ] � stein TIR’11

  7. Introduction Classification Task Given: ❑ set O of real-world objects o ❑ feature space X with feature vectors x ❑ classification function (closed form unknown) c : X → Y ❑ sample S = { ( x , y ) | x ∈ X, y = c ( x ) } Searched: ❑ hypothesis h ∈ H that minimizes P ( h ( x ) � = c ( x )) , the generalization error. � �� � err ( h ) Measuring effectiveness of h : ❑ err S ( h ) = 1 � loss 0 / 1 ( h ( x ) , c ( x )) | S | x ∈ S err S ( h ) is called test error if S is not used for the construction of h . ❑ err ( h ∗ ) := min h ∈ H err ( h ) defines lower bound for err ( h ) ➜ restriction bias. c [ ∧ ] � stein TIR’11

  8. Introduction Model Formation Task The process (the function) α for deriving x from o is called model formation . α : O → X c [ ∧ ] � stein TIR’11

  9. Introduction Model Formation Task The process (the function) α for deriving x from o is called model formation . α : O → X Choosing between different model formation functions α 1 , . . . , α m ➜ choosing between different feature spaces X α 1 , . . . , X α m ➜ choosing between different hypotheses spaces H α 1 , . . . , H α m c [ ∧ ] � stein TIR’11

  10. Introduction Model Formation Task The process (the function) α for deriving x from o is called model formation . α : O → X Choosing between different model formation functions α 1 , . . . , α m ➜ choosing between different feature spaces X α 1 , . . . , X α m ➜ choosing between different hypotheses spaces H α 1 , . . . , H α m Feature spaces X α 1 X α m x x ... Hypotheses spaces h H α 1 h H α m ... c [ ∧ ] � stein TIR’11

  11. Introduction Model Formation Task The process (the function) α for deriving x from o is called model formation . α : O → X Choosing between different model formation functions α 1 , . . . , α m ➜ choosing between different feature spaces X α 1 , . . . , X α m ➜ choosing between different hypotheses spaces H α 1 , . . . , H α m Feature spaces X α 1 X α m x x ... Hypotheses spaces h H α 1 h H α m ... We call the model under α 1 being more robust than the model under α 2 ⇔ err S ( h ∗ α 1 ) > err S ( h ∗ err ( h ∗ α 1 ) < err ( h ∗ α 2 ) and α 2 ) c [ ∧ ] � stein TIR’11

  12. Introduction The Whole Picture Object classification (real-world) Objects Classes O Y c [ ∧ ] � stein TIR’11

  13. Introduction The Whole Picture Object classification (real-world) Objects Classes O Y Model formation α X Feature space c [ ∧ ] � stein TIR’11

  14. Introduction The Whole Picture Object classification (real-world) Objects Classes O Y Model formation α Feature vector classification c X Feature space Learning means searching for a h ∈ H such that P ( h ( x ) � = c ( x )) is minimum. c [ ∧ ] � stein TIR’11

  15. Bias and Variance c [ ∧ ] � stein TIR’11

  16. Bias and Variance Error Decomposition Consider: ❑ A feature vector x and its predicted class label ˆ y = h ( x ) , where ❑ h is characterized by a weight vector θ , where ❑ θ has been estimated based on a random sample S = { ( x , c ( x ) } . ➜ θ ≡ θ ( S ) , and hence h ≡ h ( θ S ) c [ ∧ ] � stein TIR’11

  17. Bias and Variance Error Decomposition Consider: ❑ A feature vector x and its predicted class label ˆ y = h ( x ) , where ❑ h is characterized by a weight vector θ , where ❑ θ has been estimated based on a random sample S = { ( x , c ( x ) } . ➜ θ ≡ θ ( S ) , and hence h ≡ h ( θ S ) Observations: ❑ A series of samples S i , S i ⊆ U , entails a series of hypotheses h ( θ i ) , ❑ giving for a feature vector x a series of class labels ˆ y i = h ( θ i , x ) . ➜ ˆ y is considered as a random variable, denoted as Z . c [ ∧ ] � stein TIR’11

  18. Bias and Variance Error Decomposition Consider: ❑ A feature vector x and its predicted class label ˆ y = h ( x ) , where ❑ h is characterized by a weight vector θ , where ❑ θ has been estimated based on a random sample S = { ( x , c ( x ) } . ➜ θ ≡ θ ( S ) , and hence h ≡ h ( θ S ) Observations: ❑ A series of samples S i , S i ⊆ U , entails a series of hypotheses h ( θ i ) , ❑ giving for a feature vector x a series of class labels ˆ y i = h ( θ i , x ) . ➜ ˆ y is considered as a random variable, denoted as Z . Consequences: ❑ σ 2 ( Z ) is the variance of Z , (= variance of the prediction) σ 2 ( Z ) ↑ ❑ | θ | : | S | ↑ ➜ σ 2 ( Z ) ↑ ❑ | S | : | U | ↓ ➜ c [ ∧ ] � stein TIR’11

  19. Bias and Variance Error Decomposition (continued) Let Z and Y denote the random variables for ˆ y ( = h ( θ S , x ) ) and y ( = c ( x ) ) . MSE ( Z ) = E (( Z − Y ) 2 ) = E ( Z 2 − 2 · Z · Y + Y 2 ) = E ( Z 2 ) − 2 · E ( Z · Y ) + E ( Y 2 ) = ( E ( Z )) 2 + σ 2 ( Z ) − 2 · E ( Z · Y ) + E ( Y 2 ) = ( E ( Z )) 2 + σ 2 ( Z ) − 2 · E ( Z · Y ) + ( E ( Y )) 2 + σ 2 ( Y ) + σ 2 ( Y ) + σ 2 ( Z ) = ( E ( Z )) 2 − 2 · E ( Z · Y ) + ( E ( Y )) 2 = ( E ( Z ) − E ( Y )) 2 + σ 2 ( Y ) + σ 2 ( Z ) = ( E ( Z − Y )) 2 + σ 2 ( Z ) + σ 2 ( Y ) = ( bias ( Z )) 2 + σ 2 ( Z ) + IrreducibleError If Y is constant: = ( E ( Z ) − Y ) 2 + σ 2 ( Z ) c [ ∧ ] � stein TIR’11

  20. Bias and Variance Error Decomposition (continued) Let Z and Y denote the random variables for ˆ y ( = h ( θ S , x ) ) and y ( = c ( x ) ) . MSE ( Z ) = E (( Z − Y ) 2 ) = E ( Z 2 − 2 · Z · Y + Y 2 ) = E ( Z 2 ) − 2 · E ( Z · Y ) + E ( Y 2 ) = ( E ( Z )) 2 + σ 2 ( Z ) − 2 · E ( Z · Y ) + E ( Y 2 ) = ( E ( Z )) 2 + σ 2 ( Z ) − 2 · E ( Z · Y ) + ( E ( Y )) 2 + σ 2 ( Y ) + σ 2 ( Y ) + σ 2 ( Z ) = ( E ( Z )) 2 − 2 · E ( Z · Y ) + ( E ( Y )) 2 = ( E ( Z ) − E ( Y )) 2 + σ 2 ( Y ) + σ 2 ( Z ) = ( E ( Z − Y )) 2 + σ 2 ( Z ) + σ 2 ( Y ) = ( bias ( Z )) 2 + σ 2 ( Z ) + IrreducibleError If Y is constant: = ( E ( Z ) − Y ) 2 + σ 2 ( Z ) c [ ∧ ] � stein TIR’11

  21. Bias and Variance Error Decomposition (continued) Let Z and Y denote the random variables for ˆ y ( = h ( θ S , x ) ) and y ( = c ( x ) ) . MSE ( Z ) = E (( Z − Y ) 2 ) = E ( Z 2 − 2 · Z · Y + Y 2 ) = E ( Z 2 ) − 2 · E ( Z · Y ) + E ( Y 2 ) = ( E ( Z )) 2 + σ 2 ( Z ) − 2 · E ( Z · Y ) + E ( Y 2 ) = ( E ( Z )) 2 + σ 2 ( Z ) − 2 · E ( Z · Y ) + ( E ( Y )) 2 + σ 2 ( Y ) + σ 2 ( Y ) + σ 2 ( Z ) = ( E ( Z )) 2 − 2 · E ( Z · Y ) + ( E ( Y )) 2 = ( E ( Z ) − E ( Y )) 2 + σ 2 ( Y ) + σ 2 ( Z ) = ( E ( Z − Y )) 2 + σ 2 ( Z ) + σ 2 ( Y ) = ( bias ( Z )) 2 + σ 2 ( Z ) + IrreducibleError If Y is constant: = ( E ( Z ) − Y ) 2 + σ 2 ( Z ) c [ ∧ ] � stein TIR’11

  22. Bias and Variance Error Decomposition (continued) Let Z and Y denote the random variables for ˆ y ( = h ( θ S , x ) ) and y ( = c ( x ) ) . MSE ( Z ) = E (( Z − Y ) 2 ) = E ( Z 2 − 2 · Z · Y + Y 2 ) = E ( Z 2 ) − 2 · E ( Z · Y ) + E ( Y 2 ) = ( E ( Z )) 2 + σ 2 ( Z ) − 2 · E ( Z · Y ) + E ( Y 2 ) = ( E ( Z )) 2 + σ 2 ( Z ) − 2 · E ( Z · Y ) + ( E ( Y )) 2 + σ 2 ( Y ) + σ 2 ( Y ) + σ 2 ( Z ) = ( E ( Z )) 2 − 2 · E ( Z · Y ) + ( E ( Y )) 2 = ( E ( Z ) − E ( Y )) 2 + σ 2 ( Y ) + σ 2 ( Z ) = ( E ( Z − Y )) 2 + σ 2 ( Z ) + σ 2 ( Y ) = ( bias ( Z )) 2 + σ 2 ( Z ) + IrreducibleError If Y is constant: = ( E ( Z ) − Y ) 2 + σ 2 ( Z ) c [ ∧ ] � stein TIR’11

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend