bilinear text regression and applications
play

Bilinear Text Regression and Applications Vasileios Lampos - PowerPoint PPT Presentation

Bilinear Text Regression and Applications Vasileios Lampos Department of Computer Science University College London May, 2014 1 / 45 V. Lampos v.lampos@ucl.ac.uk Bilinear Text Regression and Applications 1/45 Outline Linear Regression


  1. Bilinear Text Regression and Applications Vasileios Lampos Department of Computer Science University College London May, 2014 1 / 45 V. Lampos v.lampos@ucl.ac.uk Bilinear Text Regression and Applications 1/45

  2. Outline ⊥ Linear Regression Methods ⊣ Bilinear Regression Methods ⊣ Applications | = Conclusions 2 / 45 V. Lampos v.lampos@ucl.ac.uk Bilinear Text Regression and Applications 2/45

  3. Recap on regression methods 3 / 45 V. Lampos v.lampos@ucl.ac.uk Bilinear Text Regression and Applications 3/45

  4. Regression basics — Ordinary Least Squares (1/2) x i ∈ R m , • observations x x — X X X i ∈ { 1 , ..., n } • responses y i ∈ R , — y y y i ∈ { 1 , ..., n } • weights, bias w w w j , β ∈ R , — w w ∗ = [ w w ; β ] j ∈ { 1 , ..., m } Ordinary Least Squares (OLS)   2 n m � � argmin  y i − β − x ij w j  w w,β w i =1 j =1 or in matrix form y � 2 argmin � X X X ∗ w w w ∗ − y y ℓ 2 , where X X X ∗ = [ X X diag ( I X I )] I w ∗ w w � − 1 X � X T X T ⇒ w w w ∗ = X X ∗ X X ∗ X X ∗ y y y 4 / 45 V. Lampos v.lampos@ucl.ac.uk Bilinear Text Regression and Applications 4/45

  5. Regression basics — Ordinary Least Squares (2/2) x i ∈ R m , • observations x x — X X X i ∈ { 1 , ..., n } • responses y i ∈ R , — y y y i ∈ { 1 , ..., n } • weights, bias w w w j , β ∈ R , — w w ∗ = [ w w ; β ] j ∈ { 1 , ..., m } Ordinary Least Squares (OLS) � − 1 X � y � 2 X T X T argmin � X X X ∗ w w w ∗ − y y ℓ 2 ⇒ w w ∗ = w X X ∗ X X X ∗ X ∗ y y y w w w ∗ Why not? X T − − − X X ∗ X X X ∗ may be singular (thus difficult to invert) − − − high-dimensional models difficult to interpret − − − unsatisfactory prediction accuracy (estimates have large variance) 5 / 45 V. Lampos v.lampos@ucl.ac.uk Bilinear Text Regression and Applications 5/45

  6. Regression basics — Ridge Regression (1/2) x i ∈ R m , • observations x x — X X X i ∈ { 1 , ..., n } • responses y i ∈ R , — y y y i ∈ { 1 , ..., n } • weights, bias w w w j , β ∈ R , — w w ∗ = [ w w ; β ] j ∈ { 1 , ..., m } Ridge Regression (RR) � � − 1 X X T X T w w ∗ = w X X ∗ X X X ∗ + λI I I X ∗ y y y (Hoerl & Kennard, 1970) � �� � non singular     2  n m m    � � � w 2 argmin  y i − β − x ij w j + λ  j   w w,β w  i =1 j =1 j =1  � � y � 2 w � 2 or argmin � X X ∗ w X w w ∗ − y y ℓ 2 + λ � w w ℓ 2 w ∗ w w 6 / 45 V. Lampos v.lampos@ucl.ac.uk Bilinear Text Regression and Applications 6/45

  7. Regression basics — Ridge Regression (2/2) x i ∈ R m , • observations x x — X X X i ∈ { 1 , ..., n } • responses y i ∈ R , — y y y i ∈ { 1 , ..., n } • weights, bias w w w j , β ∈ R , — w w ∗ = [ w w ; β ] j ∈ { 1 , ..., m } Ridge Regression (RR) � � y � 2 w � 2 argmin � X X X ∗ w w w ∗ − y y ℓ 2 + λ � w w ℓ 2 w w w ∗ + + + size constraint on the weight coefficients ( regularisation ) → resolves problems caused by collinear variables + + + less degrees of freedom, better predictive accuracy than OLS − − − does not perform feature selection (nonzero coefficients) 7 / 45 V. Lampos v.lampos@ucl.ac.uk Bilinear Text Regression and Applications 7/45

  8. Regression basics — Lasso x i ∈ R m , • observations x x — X X X i ∈ { 1 , ..., n } • responses y i ∈ R , — y y y i ∈ { 1 , ..., n } • weights, bias w j , β ∈ R , — w w w ∗ = [ w w ; β ] w j ∈ { 1 , ..., m } ℓ 1 ℓ 1 ℓ 1 –norm regularisation or lasso (Tibshirani, 1996)     2 n m m     � � � argmin  y i − β − x ij w j + λ | w j |    w w w,β   i =1 j =1 j =1 � � y � 2 X w y w or argmin � X X ∗ w w ∗ − y ℓ 2 + λ � w w � ℓ 1 w ∗ w w − − − no closed form solution — quadratic programming problem + Least Angle Regression explores entire reg. path (Efron et al. , 2004) + + + w + + sparse w w , interpretability, better performance (Hastie et al. , 2009) − if m > n , at most n variables can be selected − − − − − strongly corr. predictors → model-inconsistent (Zhao & Yu, 2009) 8 / 45 V. Lampos v.lampos@ucl.ac.uk Bilinear Text Regression and Applications 8/45

  9. Regression basics — Lasso for Text Regression x i ∈ R m , • n-gram frequencies x x X — X X i ∈ { 1 , ..., n } • flu rates y i ∈ R , — y y y i ∈ { 1 , ..., n } • weights, bias w w w j , β ∈ R , j ∈ { 1 , ..., m } — w w ∗ = [ w w ; β ] ℓ 1 ℓ 1 ℓ 1 –norm regularisation or lasso � � y � 2 X w y w or argmin � X X ∗ w w ∗ − y ℓ 2 + λ � w w � ℓ 1 w w ∗ w ‘unwel’, ‘temperatur’, ‘headach’, ‘appetit’, ‘symptom’, ‘diarrhoea’, ‘muscl’, ‘feel’, ... 150 HPA HPA 100 Inferred Inferred 100 Flu rate Flu rate A B C D E 50 50 0 0 180 200 220 240 260 280 300 320 340 0 10 20 30 40 50 60 70 80 90 Day Number (2009) Days Figure 1 : Flu rate predictions for the UK by applying lasso on Twitter data (Lampos & Cristianini, 2010) 9 / 45 V. Lampos v.lampos@ucl.ac.uk Bilinear Text Regression and Applications 9/45

  10. Regression basics — Elastic Net • observations x i ∈ R m , x x — X X X i ∈ { 1 , ..., n } • responses y y i ∈ R , — y y i ∈ { 1 , ..., n } • weights, bias w j , β ∈ R , — w w w ∗ = [ w w w ; β ] j ∈ { 1 , ..., m } [ Linear ] Elastic Net (LEN) (Zhou & Hastie, 2005)       y � 2 w � 2 argmin � X X X ∗ w w w ∗ − y y + λ 1 � w w + λ 2 � w w � ℓ 1 w ℓ 2 ℓ 2 w w w ∗   � �� � � �� � � �� �   Lasso reg. OLS RR reg. + + + ‘compromise’ between ridge regression (handles collinear predictors) and lasso (favours sparsity) + + entire reg. path can be explored by modifying LAR + + + + if m > n , number of selected variables not limited to n − − − may select redundant variables! 10 / 45 V. Lampos v.lampos@ucl.ac.uk Bilinear Text Regression and Applications 10/45

  11. Would a slightly different text regression approach be more suitable for Social Media content? 11 / 45 V. Lampos v.lampos@ucl.ac.uk Bilinear Text Regression and Applications 11/45

  12. About Twitter (1/2) Tweet Examples @PaulLondon: I would strongly support a coalition government. It is the best thing for our country right now. #electionsUK2010 @JohnsonMP: Socialism is something forgotten in our country #supportLabour @FarageNOT: Far-right ‘movements’ come along with crises in capitalism #UKIP @JohnK 1999: RT @HannahB: Stop talking about politics and listen to Justin!! Bieber rules, peace and love ♥ ♥ ♥ The Twitter basics : • 140 characters per status (tweet) • users follow and be followed • embedded usage of topics (#elections) • retweets ( RT ), @replies, @mentions, favourites • real-time nature • biased user demographics 12 / 45 V. Lampos v.lampos@ucl.ac.uk Bilinear Text Regression and Applications 12/45

  13. About Twitter (2/2) Tweet Examples @PaulLondon: I would strongly support a coalition government. It is the best thing for our country right now. #electionsUK2010 @JohnsonMP: Socialism is something forgotten in our country #supportLabour @FarageNOT: Far-right ‘movements’ come along with crises in capitalism #UKIP @JohnK 1999: RT @HannahB: Stop talking about politics and listen to Justin!! Bieber rules, peace and love ♥ ♥ ♥ • contains a vast amount of information about various topics • this information ( X X y X ) can be used to assist predictions ( y y ) (Lampos & Cristianini, 2012; Sakaki et al. , 2010; Bollen et al. , 2011) − X y − − f : X X → y y , f usually formulates a linear regression task − − − X X X represents word frequencies only... + is it possible to incorporate a user contribution somehow? + + word selection + + user selection + 13 / 45 V. Lampos v.lampos@ucl.ac.uk Bilinear Text Regression and Applications 13/45

  14. Bi-linear Text Regression 14 / 45 V. Lampos v.lampos@ucl.ac.uk Bilinear Text Regression and Applications 14/45

  15. Bilinear Text Regression — The general idea (1/2) x T Linear regression: f ( x x x i ) = x x i w w w + β x i ∈ R m , • observations x x — X X X i ∈ { 1 , ..., n } • responses y i ∈ R , — y y y i ∈ { 1 , ..., n } • weights, bias w j , β ∈ R , — w w w ∗ = [ w w w ; β ] j ∈ { 1 , ..., m } u T Q Bilinear regression: f ( Q Q Q i ) = u u Q Q i w w w + β p ∈ Z + • users • observations Q i ∈ R p × m , Q Q — X X X i ∈ { 1 , ..., n } • responses y y i ∈ R , — y y i ∈ { 1 , ..., n } • weights, bias u k , w j , β ∈ R , — u u, w u w w, β k ∈ { 1 , ..., p } j ∈ { 1 , ..., m } 15 / 45 V. Lampos v.lampos@ucl.ac.uk Bilinear Text Regression and Applications 15/45

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend