Statistical Natural Language Processing Statistical models: - PowerPoint PPT Presentation

Statistical Natural Language Processing Statistical models: learning, inference, estimation, prediction Çağrı Çöltekin University of Tübingen Seminar für Sprachwissenschaft Summer Semester 2017

Statistical models: learning, inference, estimation, prediction Overview classifjed as statistical models data analysis this lecture Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2017 1 / 22 • Many methods/tools we use in NLP can broadly be • Statistical models have a central role in ML and statistical • We will go through an overview of statistical modeling in

Statistical models: learning, inference, estimation, prediction Models in science and practice Modeling is a basic activity in science and practice. A few examples: Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2017 2 / 22 • Galilean model of solar system • Bohr model of atom • Animal models in medicine • Scale models of buildings, bridges, cars, … • Econometric models • Models of atmosphere

Statistical models: learning, inference, estimation, prediction What do we do with models? – verify or compare hypotheses on the model model Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2017 3 / 22 • Inference: learn more about the reality being modeled • Prediction: predict the (feature) events/behavior using the

Statistical models: learning, inference, estimation, prediction Models are not reality All models are wrong, some are useful. not match with reality of) these assumptions / simplifjcations Box and Draper (1986, p. 424) Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2017 4 / 22 • All models make some (simplifying) assumptions that do • (some) models are useful despite (or, sometimes, because

Statistical models: learning, inference, estimation, prediction Statistical models uncertainty into account Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2017 5 / 22 • Statistical models are mathematical models that take • Statistical models are models of data • We express a statistical model in the form, outcome = model prediction + error • ‘error’ or uncertainty is part of the model description

Statistical models: learning, inference, estimation, prediction Parametric models or account for (may include additional parameters) Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2017 6 / 22 Most statistical models are described by a set of parameters w y = f ( x ; w ) + ϵ x is the input to the model y is the quantity or label assigned to for a given input w is the parameter(s) of the model f ( x ; w ) is the model’s estimate ( ˆ y ) of y given the input x ϵ represents the uncertainty or noise that we cannot explain

Statistical models: learning, inference, estimation, prediction Parametric models explaining the observed phenomena) variance) is important Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2017 7 / 22 y = f ( x ; w ) + ϵ • In machine learning (and in this course), focus is on prediction: given x , make accurate predictions of y • In statistics, the focus is on inference (testing hypotheses or – for example, does x have an efgect on y ? • For both purposes, fjnding a good estimate w is important • For inference, properties of ϵ (e.g., its distribution and

Statistical models: learning, inference, estimation, prediction the squared deviations from the mean estimate Summer Semester 2017 SfS / University of Tübingen Ç. Çöltekin, variance results in high bias. But there is a trade-ofg: reducing one increases the other. low We want low bias low variance. What are good estimates / estimators? 8 / 22 estimate Variance of an estimate is, simply its variance, the value of being estimated, and the expected value of the Bias of an estimate is the difgerence between the value B ( ˆ w ) = E [ ˆ w ] − w • An unbiased estimator has 0 bias [ w ]) 2 ] var ( ˆ w ) = E ( ˆ w − E [ ˆ

Statistical models: learning, inference, estimation, prediction Estimating parameters: Bayesian approach calculating the expected value of the posteriror the uncertainty of the estimate Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2017 9 / 22 Given the training data x , we fjnd the posterior distribution p ( w | x ) = p ( x | w ) p ( w ) p ( x ) • The result, posterior, is a distribution over the parameter(s) • One can get a point estimate of w , for example, by • The posterior distribution also contains the information on • A prior distribution required for the estimation

Statistical models: learning, inference, estimation, prediction Estimating parameters: frequentist approach Summer Semester 2017 SfS / University of Tübingen Ç. Çöltekin, a function the parameters 10 / 22 maximizes the likelihood Maximum likelihood estimation (MLE) Given the training data x , we fjnd the value of w that w = arg max ˆ p ( x | w ) w • The likelihood function L ( w | x ) = p ( x | w ) , is a function of • The problem becomes searching for the maximum value of • Note that we cannot make probabilistic statements about w • Uncertainty of the estimate is less straightforward

Statistical models: learning, inference, estimation, prediction A simple example Summer Semester 2017 SfS / University of Tübingen Ç. Çöltekin, 11 / 22 Data: We have two data sets (samples) characters in tweets. Problem: We want to estimate the average number of defjnition small x = 87 , 101 , 88 , 45 , 138 x ) is 91 . 8 – The mean of the sample ( ¯ – Variance of the sample ( sd 2 ) is 1111 . 7 ( sd = 33 . 34 ) large x = ( 87 , 101 , 88 , 45 , 138 , 66 , 79 , 78 , 140 , 102 ) x = 92 . 4 – ¯ – sd 2 = 876 . 71 ( sd = 29 . 61 )

Statistical models: learning, inference, estimation, prediction A simple example the task population) – Given a sample, what is the most likely population mean? – How certain is our estimate of the population mean? Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2017 12 / 22 • We are interested in the mean of all tweets (a large • We only have samples • Questions:

Statistical models: learning, inference, estimation, prediction (we won’t cover it here) Summer Semester 2017 SfS / University of Tübingen Ç. Çöltekin, predictors. We are normally interested in conditional models , models with model A simple example Equivalently, the model 13 / 22 where µ ∼ N ( 0 , σ 2 ) y = µ + ϵ y ∼ N ( µ + σ 2 ) • The model is known as the mean/constant/intercept • It is related to well-known statistical tests such as t-test

Statistical models: learning, inference, estimation, prediction A simple example Summer Semester 2017 SfS / University of Tübingen Ç. Çöltekin, be calculated analytically estimate. With more data, we get a more certain estimate prior and the data mean mean is (almost) the same as the mean of the data We simply use the Bayes’ formula: Bayesian estimation / inference 14 / 22 p ( µ | x ) = p ( x | µ ) p ( µ ) p ( x ) • With a vague prior (high variance/entropy), the posterior • With a prior with lower variance, posterior is between the • Posterior variance indicates the uncertainty of our • With a normal prior, posterior will also be normal, and can

Statistical models: learning, inference, estimation, prediction A simple example Summer Semester 2017 SfS / University of Tübingen Ç. Çöltekin, 15 / 22 Bayesian estimation: vague prior, small sample Prior: N ( µ = 70 , 1000 2 ) Likelihood: N ( 91 . 8 , 33 . 34 2 ) Posterior: N ( 91 . 78 , 14 . 91 2 ) 0 . 04 0 . 03 0 . 02 0 . 01 0 . 00 0 20 40 60 80 100 120 140 160 180 200

Statistical models: learning, inference, estimation, prediction A simple example Summer Semester 2017 SfS / University of Tübingen Ç. Çöltekin, 16 / 22 Bayesian estimation: vague prior, larger sample Prior: N ( µ = 70 , 1000 2 ) Likelihood: N ( 92 . 4 , 29 . 61 2 ) Posterior: N ( 92 . 39 , 9 . 36 2 ) 0 . 04 0 . 03 0 . 02 0 . 01 0 . 00 0 20 40 60 80 100 120 140 160 180 200

Statistical models: learning, inference, estimation, prediction A simple example Summer Semester 2017 SfS / University of Tübingen Ç. Çöltekin, 17 / 22 Bayesian estimation: stronger prior, small sample Prior: N ( µ = 70 , 50 2 ) Likelihood: N ( 91 . 8 , 33 . 34 2 ) Posterior: N ( 90 . 02 , 14 . 29 2 ) 0 . 04 0 . 03 0 . 02 0 . 01 0 . 00 0 20 40 60 80 100 120 140 160 180 200

Statistical models: learning, inference, estimation, prediction A simple example Summer Semester 2017 SfS / University of Tübingen Ç. Çöltekin, 18 / 22 MLE estimation µ = arg max L ( µ ; x ) ˆ µ = arg max p ( x | µ ) µ ∏ = arg max p ( x | µ ) µ x ∈ x e − ( x − µ ) 2 2σ2 ∏ = arg max √ σ 2π µ x ∈ x = ¯ x µ = ¯ x = 91 . 8 (cf. 91 . 78 ) • For 5-tweet sample: ˆ µ = ¯ x = 92 . 4 (cf. 92 . 39 ) • For 10-tweet sample: ˆ

Statistical Natural Language Processing Statistical models: - PowerPoint PPT Presentation

Statistical Natural Language Processing Statistical models: learning, inference, estimation, prediction ar ltekin University of Tbingen Seminar fr Sprachwissenschaft Summer Semester 2017 Statistical models: learning, inference,

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Paula

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Information Extraction Industrial Natural Language Processing Industrial Natural Language

Natural Language Processing 1 Lecture 11: Language generation and summarisation Katia Shutova

Natural Language Processing 1 Lecture 10: Language generation and summarisation Katia Shutova

Natural Language Processing 1 Lecture 8: Compositional semantics and discourse processing Katia

Natural Language Processing Fall 2018 Frank Ferraro Natural language processing ITE 358

Natural Language Processing George Konidaris gdk@cs.brown.edu Fall 2019 Natural Language

Statistical Natural Language Processing Prasad Tadepalli CS430 lecture Natural Language

MIA - Master on Artificial Intelligence Advanced Natural Language Processing Advanced Natural

Advanced Natural Language Processing: What is Natural Language Processing (NLP)? Background

Introduction Karl Stratos Rutgers University Karl Stratos CS 533: Natural Language Processing

Statistical natural language processing 24.05.19 Statistical Natural Language Processing 1 The

Causality in Econometrics and Statistics: Structural Models are Causal Models Some Formal

Introduction Ping Yu School of Economics and Finance The University of Hong Kong Ping Yu (HKU)

Bayesian Generalized linear mixed models with data missing not at random Overview: Two simple

Bayesian Estimation of Low-rank Matrices Pierre Alquier Journes de Statistique du Sud,

Analysing identification issues in DSGE models Nikolai Iskrev, Marco Ratto Bank of Portugal,

Introducing Bayes Rasmus Bth, rasmus.baath@gmail.com King Digital Entertainment Some ways to

Generalized Bayesian Inference with Sets of Conjugate Priors for Dealing with Prior-Data Conflict

CS 440/ECE448 Lecture 19: Bayes Net Inference Mark Hasegawa-Johnson, 3/2019 modified by Julia