statistical natural language processing
play

Statistical Natural Language Processing Statistical models: - PowerPoint PPT Presentation

Statistical Natural Language Processing Statistical models: learning, inference, estimation, prediction ar ltekin University of Tbingen Seminar fr Sprachwissenschaft Summer Semester 2017 Statistical models: learning, inference,


  1. Statistical Natural Language Processing Statistical models: learning, inference, estimation, prediction Çağrı Çöltekin University of Tübingen Seminar für Sprachwissenschaft Summer Semester 2017

  2. Statistical models: learning, inference, estimation, prediction Overview classifjed as statistical models data analysis this lecture Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2017 1 / 22 • Many methods/tools we use in NLP can broadly be • Statistical models have a central role in ML and statistical • We will go through an overview of statistical modeling in

  3. Statistical models: learning, inference, estimation, prediction Models in science and practice Modeling is a basic activity in science and practice. A few examples: Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2017 2 / 22 • Galilean model of solar system • Bohr model of atom • Animal models in medicine • Scale models of buildings, bridges, cars, … • Econometric models • Models of atmosphere

  4. Statistical models: learning, inference, estimation, prediction What do we do with models? – verify or compare hypotheses on the model model Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2017 3 / 22 • Inference: learn more about the reality being modeled • Prediction: predict the (feature) events/behavior using the

  5. Statistical models: learning, inference, estimation, prediction Models are not reality All models are wrong, some are useful. not match with reality of) these assumptions / simplifjcations Box and Draper (1986, p. 424) Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2017 4 / 22 • All models make some (simplifying) assumptions that do • (some) models are useful despite (or, sometimes, because

  6. Statistical models: learning, inference, estimation, prediction Statistical models uncertainty into account Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2017 5 / 22 • Statistical models are mathematical models that take • Statistical models are models of data • We express a statistical model in the form, outcome = model prediction + error • ‘error’ or uncertainty is part of the model description

  7. Statistical models: learning, inference, estimation, prediction Parametric models or account for (may include additional parameters) Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2017 6 / 22 Most statistical models are described by a set of parameters w y = f ( x ; w ) + ϵ x is the input to the model y is the quantity or label assigned to for a given input w is the parameter(s) of the model f ( x ; w ) is the model’s estimate ( ˆ y ) of y given the input x ϵ represents the uncertainty or noise that we cannot explain

  8. Statistical models: learning, inference, estimation, prediction Parametric models explaining the observed phenomena) variance) is important Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2017 7 / 22 y = f ( x ; w ) + ϵ • In machine learning (and in this course), focus is on prediction: given x , make accurate predictions of y • In statistics, the focus is on inference (testing hypotheses or – for example, does x have an efgect on y ? • For both purposes, fjnding a good estimate w is important • For inference, properties of ϵ (e.g., its distribution and

  9. Statistical models: learning, inference, estimation, prediction the squared deviations from the mean estimate Summer Semester 2017 SfS / University of Tübingen Ç. Çöltekin, variance results in high bias. But there is a trade-ofg: reducing one increases the other. low We want low bias low variance. What are good estimates / estimators? 8 / 22 estimate Variance of an estimate is, simply its variance, the value of being estimated, and the expected value of the Bias of an estimate is the difgerence between the value B ( ˆ w ) = E [ ˆ w ] − w • An unbiased estimator has 0 bias [ w ]) 2 ] var ( ˆ w ) = E ( ˆ w − E [ ˆ

  10. Statistical models: learning, inference, estimation, prediction the squared deviations from the mean estimate Summer Semester 2017 SfS / University of Tübingen Ç. Çöltekin, variance results in high bias. But there is a trade-ofg: reducing one increases the other. low We want low bias low variance. What are good estimates / estimators? 8 / 22 estimate Variance of an estimate is, simply its variance, the value of being estimated, and the expected value of the Bias of an estimate is the difgerence between the value B ( ˆ w ) = E [ ˆ w ] − w • An unbiased estimator has 0 bias [ w ]) 2 ] var ( ˆ w ) = E ( ˆ w − E [ ˆ

  11. Statistical models: learning, inference, estimation, prediction Estimating parameters: Bayesian approach calculating the expected value of the posteriror the uncertainty of the estimate Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2017 9 / 22 Given the training data x , we fjnd the posterior distribution p ( w | x ) = p ( x | w ) p ( w ) p ( x ) • The result, posterior, is a distribution over the parameter(s) • One can get a point estimate of w , for example, by • The posterior distribution also contains the information on • A prior distribution required for the estimation

  12. Statistical models: learning, inference, estimation, prediction Estimating parameters: frequentist approach Summer Semester 2017 SfS / University of Tübingen Ç. Çöltekin, a function the parameters 10 / 22 maximizes the likelihood Maximum likelihood estimation (MLE) Given the training data x , we fjnd the value of w that w = arg max ˆ p ( x | w ) w • The likelihood function L ( w | x ) = p ( x | w ) , is a function of • The problem becomes searching for the maximum value of • Note that we cannot make probabilistic statements about w • Uncertainty of the estimate is less straightforward

  13. Statistical models: learning, inference, estimation, prediction A simple example Summer Semester 2017 SfS / University of Tübingen Ç. Çöltekin, 11 / 22 Data: We have two data sets (samples) characters in tweets. Problem: We want to estimate the average number of defjnition small x = 87 , 101 , 88 , 45 , 138 x ) is 91 . 8 – The mean of the sample ( ¯ – Variance of the sample ( sd 2 ) is 1111 . 7 ( sd = 33 . 34 ) large x = ( 87 , 101 , 88 , 45 , 138 , 66 , 79 , 78 , 140 , 102 ) x = 92 . 4 – ¯ – sd 2 = 876 . 71 ( sd = 29 . 61 )

  14. Statistical models: learning, inference, estimation, prediction A simple example the task population) – Given a sample, what is the most likely population mean? – How certain is our estimate of the population mean? Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2017 12 / 22 • We are interested in the mean of all tweets (a large • We only have samples • Questions:

  15. Statistical models: learning, inference, estimation, prediction (we won’t cover it here) Summer Semester 2017 SfS / University of Tübingen Ç. Çöltekin, predictors. We are normally interested in conditional models , models with model A simple example Equivalently, the model 13 / 22 where µ ∼ N ( 0 , σ 2 ) y = µ + ϵ y ∼ N ( µ + σ 2 ) • The model is known as the mean/constant/intercept • It is related to well-known statistical tests such as t-test

  16. Statistical models: learning, inference, estimation, prediction A simple example Summer Semester 2017 SfS / University of Tübingen Ç. Çöltekin, be calculated analytically estimate. With more data, we get a more certain estimate prior and the data mean mean is (almost) the same as the mean of the data We simply use the Bayes’ formula: Bayesian estimation / inference 14 / 22 p ( µ | x ) = p ( x | µ ) p ( µ ) p ( x ) • With a vague prior (high variance/entropy), the posterior • With a prior with lower variance, posterior is between the • Posterior variance indicates the uncertainty of our • With a normal prior, posterior will also be normal, and can

  17. Statistical models: learning, inference, estimation, prediction A simple example Summer Semester 2017 SfS / University of Tübingen Ç. Çöltekin, 15 / 22 Bayesian estimation: vague prior, small sample Prior: N ( µ = 70 , 1000 2 ) Likelihood: N ( 91 . 8 , 33 . 34 2 ) Posterior: N ( 91 . 78 , 14 . 91 2 ) 0 . 04 0 . 03 0 . 02 0 . 01 0 . 00 0 20 40 60 80 100 120 140 160 180 200

  18. Statistical models: learning, inference, estimation, prediction A simple example Summer Semester 2017 SfS / University of Tübingen Ç. Çöltekin, 16 / 22 Bayesian estimation: vague prior, larger sample Prior: N ( µ = 70 , 1000 2 ) Likelihood: N ( 92 . 4 , 29 . 61 2 ) Posterior: N ( 92 . 39 , 9 . 36 2 ) 0 . 04 0 . 03 0 . 02 0 . 01 0 . 00 0 20 40 60 80 100 120 140 160 180 200

  19. Statistical models: learning, inference, estimation, prediction A simple example Summer Semester 2017 SfS / University of Tübingen Ç. Çöltekin, 17 / 22 Bayesian estimation: stronger prior, small sample Prior: N ( µ = 70 , 50 2 ) Likelihood: N ( 91 . 8 , 33 . 34 2 ) Posterior: N ( 90 . 02 , 14 . 29 2 ) 0 . 04 0 . 03 0 . 02 0 . 01 0 . 00 0 20 40 60 80 100 120 140 160 180 200

  20. Statistical models: learning, inference, estimation, prediction A simple example Summer Semester 2017 SfS / University of Tübingen Ç. Çöltekin, 18 / 22 MLE estimation µ = arg max L ( µ ; x ) ˆ µ = arg max p ( x | µ ) µ ∏ = arg max p ( x | µ ) µ x ∈ x e − ( x − µ ) 2 2σ2 ∏ = arg max √ σ 2π µ x ∈ x = ¯ x µ = ¯ x = 91 . 8 (cf. 91 . 78 ) • For 5-tweet sample: ˆ µ = ¯ x = 92 . 4 (cf. 92 . 39 ) • For 10-tweet sample: ˆ

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend