counting words
play

Counting Words: Type probabilities Population models Type-rich - PowerPoint PPT Presentation

Populations & samples Baroni & Evert The population Counting Words: Type probabilities Population models Type-rich populations, samples, ZM & fZM Sampling from and statistical models the population Random samples Expectation


  1. . . . and its solution Populations & ➥ We need a model for the population samples Baroni & Evert ◮ This model embodies our hypothesis that the distribution of type probabilities has a certain general shape The population Type probabilities (more precisely, we speak of a family of models) Population models ZM & fZM Sampling from the population Random samples Expectation Mini-example Parameter estimation Trial & error Automatic estimation A practical example

  2. . . . and its solution Populations & ➥ We need a model for the population samples Baroni & Evert ◮ This model embodies our hypothesis that the distribution of type probabilities has a certain general shape The population Type probabilities (more precisely, we speak of a family of models) Population models ZM & fZM ◮ The exact form of the distribution is then determined by Sampling from the population a small number of parameters (typically 2 or 3) Random samples Expectation Mini-example Parameter estimation Trial & error Automatic estimation A practical example

  3. . . . and its solution Populations & ➥ We need a model for the population samples Baroni & Evert ◮ This model embodies our hypothesis that the distribution of type probabilities has a certain general shape The population Type probabilities (more precisely, we speak of a family of models) Population models ZM & fZM ◮ The exact form of the distribution is then determined by Sampling from the population a small number of parameters (typically 2 or 3) Random samples Expectation ◮ These parameters can be estimated with relative ease Mini-example Parameter estimation Trial & error Automatic estimation A practical example

  4. Examples of population models Populations & 0.10 0.10 samples ● 0.08 0.08 Baroni & Evert ●●●● ● ● ● 0.06 ● 0.06 ● The population ● ● ● Type probabilities π k ● π k ● ● Population models 0.04 0.04 ● ● ● ● ZM & fZM ● ● ● ● ● ● Sampling from 0.02 0.02 ● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● the population ● ● ● ●●●●●●●●●●●●●●●●●●●●● Random samples 0.00 0.00 Expectation 0 10 20 30 40 50 0 10 20 30 40 50 Mini-example k k Parameter estimation 0.10 0.10 Trial & error ● Automatic 0.08 0.08 ● estimation ● A practical ● 0.06 0.06 ● example ● ● ● π k π k ● ● 0.04 ● 0.04 ● ● ● ● ● ● ● ● ● ● ● ● 0.02 0.02 ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● 0.00 0.00 0 10 20 30 40 50 0 10 20 30 40 50 k k

  5. The Zipf-Mandelbrot law as a population model Populations & What is the right family of models for lexical frequency samples distributions? Baroni & Evert ◮ We have already seen that the Zipf-Mandelbrot law The population Type probabilities captures the distribution of observed frequencies very Population models ZM & fZM well, across many phenomena and data sets Sampling from the population Random samples Expectation Mini-example Parameter estimation Trial & error Automatic estimation A practical example

  6. The Zipf-Mandelbrot law as a population model Populations & What is the right family of models for lexical frequency samples distributions? Baroni & Evert ◮ We have already seen that the Zipf-Mandelbrot law The population Type probabilities captures the distribution of observed frequencies very Population models ZM & fZM well, across many phenomena and data sets Sampling from the population ◮ Re-phrase the law for type probabilities instead of Random samples Expectation frequencies: Mini-example C Parameter π k := estimation ( k + b ) a Trial & error Automatic estimation A practical example

  7. The Zipf-Mandelbrot law as a population model Populations & What is the right family of models for lexical frequency samples distributions? Baroni & Evert ◮ We have already seen that the Zipf-Mandelbrot law The population Type probabilities captures the distribution of observed frequencies very Population models ZM & fZM well, across many phenomena and data sets Sampling from the population ◮ Re-phrase the law for type probabilities instead of Random samples Expectation frequencies: Mini-example C Parameter π k := estimation ( k + b ) a Trial & error Automatic estimation A practical ◮ Two free parameters: a > 1 and b ≥ 0 example ◮ C is not a parameter but a normalization constant, needed to ensure that � k π k = 1

  8. The Zipf-Mandelbrot law as a population model Populations & What is the right family of models for lexical frequency samples distributions? Baroni & Evert ◮ We have already seen that the Zipf-Mandelbrot law The population Type probabilities captures the distribution of observed frequencies very Population models ZM & fZM well, across many phenomena and data sets Sampling from the population ◮ Re-phrase the law for type probabilities instead of Random samples Expectation frequencies: Mini-example C Parameter π k := estimation ( k + b ) a Trial & error Automatic estimation A practical ◮ Two free parameters: a > 1 and b ≥ 0 example ◮ C is not a parameter but a normalization constant, needed to ensure that � k π k = 1 ➥ the Zipf-Mandelbrot population model

  9. The parameters of the Zipf-Mandelbrot model 0.10 0.10 Populations & ● samples a = 1.2 ● a = 2 0.08 0.08 b = 1.5 b = 10 Baroni & Evert ● 0.06 0.06 ● ● The population ● π k π k ● Type probabilities 0.04 0.04 ● ● Population models ● ● ● ZM & fZM ● ● ● 0.02 0.02 ● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● Sampling from ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● the population 0.00 0.00 Random samples 0 10 20 30 40 50 0 10 20 30 40 50 Expectation Mini-example k k Parameter 0.10 0.10 estimation ● Trial & error a = 2 a = 5 0.08 0.08 ● Automatic b = 15 b = 40 estimation ● ● A practical 0.06 0.06 ● ● example ● ● π k π k ● ● 0.04 ● 0.04 ● ● ● ● ● ● ● ● ● ● ● ● 0.02 0.02 ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● 0.00 0.00 0 10 20 30 40 50 0 10 20 30 40 50 k k

  10. The parameters of the Zipf-Mandelbrot model ● Populations & ● ● 5e−02 5e−02 ● ● ● samples a = 1.2 a = 2 ● ● ● ●●●●● ● b = 1.5 b = 10 ● ● ● ● Baroni & Evert ● ● ● ●●●● ● ● ● ● ● ● ● ● ● ● ● ● ● 5e−03 ● 5e−03 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● The population ● ● ● ● ● ● ● ● ● ● ● ● π k ● ● π k ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Type probabilities ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Population models ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ZM & fZM 5e−04 ● ● ● ● ● ● ● 5e−04 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Sampling from ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● the population ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1e−04 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1e−04 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Random samples ● ● ● ● ● ● ● ● ● ● ● 1 2 5 10 20 50 100 1 2 5 10 20 50 100 Expectation Mini-example k k Parameter estimation ● ● 5e−02 5e−02 ● ● ● ● ● ●●●●● ● Trial & error ● a = 2 a = 5 ● ● ● ●●●●● Automatic b = 15 ● b = 40 ● ● ● ● estimation ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● A practical ● ● 5e−03 ● ● 5e−03 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● example ● ● ● ● ● ● ● ● ● ● ● ● ● ● π k ● ● ● π k ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 5e−04 ● ● ● ● ● ● 5e−04 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1e−04 ● ● ● ● ● ● ● ● ● ● ● ● ● 1e−04 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1 2 5 10 20 50 100 1 2 5 10 20 50 100 k k

  11. The finite Zipf-Mandelbrot model Populations & ◮ Zipf-Mandelbrot population model characterizes an samples infinite type population: there is no upper bound on k , Baroni & Evert and the type probabilities π k can become arbitrarily small The population Type probabilities Population models ZM & fZM Sampling from the population Random samples Expectation Mini-example Parameter estimation Trial & error Automatic estimation A practical example

  12. The finite Zipf-Mandelbrot model Populations & ◮ Zipf-Mandelbrot population model characterizes an samples infinite type population: there is no upper bound on k , Baroni & Evert and the type probabilities π k can become arbitrarily small The population ◮ π = 10 − 6 (once every million words), π = 10 − 9 (once Type probabilities Population models every billion words), π = 10 − 12 (once on the entire ZM & fZM Sampling from Internet), π = 10 − 100 (once in the universe?) the population Random samples Expectation Mini-example Parameter estimation Trial & error Automatic estimation A practical example

  13. The finite Zipf-Mandelbrot model Populations & ◮ Zipf-Mandelbrot population model characterizes an samples infinite type population: there is no upper bound on k , Baroni & Evert and the type probabilities π k can become arbitrarily small The population ◮ π = 10 − 6 (once every million words), π = 10 − 9 (once Type probabilities Population models every billion words), π = 10 − 12 (once on the entire ZM & fZM Sampling from Internet), π = 10 − 100 (once in the universe?) the population Random samples ◮ Alternative: finite (but often very large) number Expectation Mini-example of types in the population Parameter estimation ◮ We call this the population vocabulary size S Trial & error Automatic (and write S = ∞ for an infinite type population) estimation A practical example

  14. The finite Zipf-Mandelbrot model Populations & ◮ The finite Zipf-Mandelbrot model simply stops after samples the first S types ( w 1 , . . . , w S ) Baroni & Evert The population Type probabilities Population models ZM & fZM Sampling from the population Random samples Expectation Mini-example Parameter estimation Trial & error Automatic estimation A practical example

  15. The finite Zipf-Mandelbrot model Populations & ◮ The finite Zipf-Mandelbrot model simply stops after samples the first S types ( w 1 , . . . , w S ) Baroni & Evert ◮ S becomes a new parameter of the model The population Type probabilities ➜ the finite Zipf-Mandelbrot model has 3 parameters Population models ZM & fZM ◮ NB: C will not have the same value as for the Sampling from the population corresponding infinite ZM model Random samples Expectation Mini-example Parameter estimation Trial & error Automatic estimation A practical example

  16. The finite Zipf-Mandelbrot model Populations & ◮ The finite Zipf-Mandelbrot model simply stops after samples the first S types ( w 1 , . . . , w S ) Baroni & Evert ◮ S becomes a new parameter of the model The population Type probabilities ➜ the finite Zipf-Mandelbrot model has 3 parameters Population models ZM & fZM ◮ NB: C will not have the same value as for the Sampling from the population corresponding infinite ZM model Random samples Expectation Abbreviations: ZM for Zipf-Mandelbrot model, Mini-example Parameter and fZM for finite Zipf-Mandelbrot model estimation Trial & error Automatic estimation A practical example

  17. The next steps Populations & Once we have a population model . . . samples Baroni & Evert ◮ We still need to estimate the values of its parameters ◮ we’ll see later how we can do this The population Type probabilities Population models ZM & fZM Sampling from the population Random samples Expectation Mini-example Parameter estimation Trial & error Automatic estimation A practical example

  18. The next steps Populations & Once we have a population model . . . samples Baroni & Evert ◮ We still need to estimate the values of its parameters ◮ we’ll see later how we can do this The population Type probabilities ◮ We want to simulate random samples from the Population models ZM & fZM population described by the model Sampling from ◮ basic assumption: real data sets (such as corpora) are the population Random samples random samples from this population Expectation Mini-example Parameter estimation Trial & error Automatic estimation A practical example

  19. The next steps Populations & Once we have a population model . . . samples Baroni & Evert ◮ We still need to estimate the values of its parameters ◮ we’ll see later how we can do this The population Type probabilities ◮ We want to simulate random samples from the Population models ZM & fZM population described by the model Sampling from ◮ basic assumption: real data sets (such as corpora) are the population Random samples random samples from this population Expectation Mini-example ◮ this allows us to predict vocabulary growth, the number Parameter of previously unseen types as more text is added to a estimation Trial & error corpus, the frequency spectrum of a larger data set, etc. Automatic estimation A practical example

  20. The next steps Populations & Once we have a population model . . . samples Baroni & Evert ◮ We still need to estimate the values of its parameters ◮ we’ll see later how we can do this The population Type probabilities ◮ We want to simulate random samples from the Population models ZM & fZM population described by the model Sampling from ◮ basic assumption: real data sets (such as corpora) are the population Random samples random samples from this population Expectation Mini-example ◮ this allows us to predict vocabulary growth, the number Parameter of previously unseen types as more text is added to a estimation Trial & error corpus, the frequency spectrum of a larger data set, etc. Automatic ◮ it will also allow us to estimate the model parameters estimation A practical example

  21. Outline Populations & samples The type population Baroni & Evert The population Sampling from the population Type probabilities Population models ZM & fZM Sampling from Parameter estimation the population Random samples Expectation Mini-example A practical example Parameter estimation Trial & error Automatic estimation A practical example

  22. Sampling from a population model Populations & Assume we believe that the population we are interested in samples can be described by a Zipf-Mandelbrot model: Baroni & Evert The population 0.05 Type probabilities 5e−02 Population models a = 3 a = 3 ● ● ● 0.04 ● ● ● ●●●●● ZM & fZM b = 50 b = 50 ● ● ● ● ● ● ● ● ● ● ● ● Sampling from ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.03 ● ● the population ● ● ● ● 5e−03 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Random samples ● ● ● ● π k π k ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Expectation ● ● ● 0.02 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ● ● ● Mini-example ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 5e−04 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Parameter ● ● ● ● ● ● 0.01 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● estimation ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Trial & error ● ● ● ● ● ● ● ● 1e−04 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.00 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Automatic ● ● ● ● ● ● ● ● ● ● estimation 0 10 20 30 40 50 1 2 5 10 20 50 100 A practical k k example

  23. Sampling from a population model Populations & Assume we believe that the population we are interested in samples can be described by a Zipf-Mandelbrot model: Baroni & Evert The population 0.05 Type probabilities 5e−02 Population models a = 3 a = 3 ● ● ● 0.04 ● ● ● ●●●●● ZM & fZM b = 50 b = 50 ● ● ● ● ● ● ● ● ● ● ● ● Sampling from ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.03 ● ● the population ● ● ● ● 5e−03 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Random samples ● ● ● ● π k π k ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Expectation ● ● ● 0.02 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ● ● ● Mini-example ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 5e−04 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Parameter ● ● ● ● ● ● 0.01 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● estimation ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Trial & error ● ● ● ● ● ● ● ● 1e−04 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.00 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Automatic ● ● ● ● ● ● ● ● ● ● estimation 0 10 20 30 40 50 1 2 5 10 20 50 100 A practical k k example Use computer simulation to sample from this model: ◮ Draw N tokens from the population such that in each step, type w k has probability π k to be picked

  24. Sampling from a population model Populations & samples #1: 1 42 34 23 108 18 48 18 1 . . . Baroni & Evert The population Type probabilities Population models ZM & fZM Sampling from the population Random samples Expectation Mini-example Parameter estimation Trial & error Automatic estimation A practical example

  25. Sampling from a population model Populations & samples #1: 1 42 34 23 108 18 48 18 1 . . . Baroni & Evert time order room school town course area course time . . . The population Type probabilities Population models ZM & fZM Sampling from the population Random samples Expectation Mini-example Parameter estimation Trial & error Automatic estimation A practical example

  26. Sampling from a population model Populations & samples #1: 1 42 34 23 108 18 48 18 1 . . . Baroni & Evert time order room school town course area course time . . . The population #2: 286 28 23 36 3 4 7 4 8 . . . Type probabilities Population models ZM & fZM Sampling from the population Random samples Expectation Mini-example Parameter estimation Trial & error Automatic estimation A practical example

  27. Sampling from a population model Populations & samples #1: 1 42 34 23 108 18 48 18 1 . . . Baroni & Evert time order room school town course area course time . . . The population #2: 286 28 23 36 3 4 7 4 8 . . . Type probabilities Population models ZM & fZM #3: 2 11 105 21 11 17 17 1 16 . . . Sampling from the population Random samples Expectation Mini-example Parameter estimation Trial & error Automatic estimation A practical example

  28. Sampling from a population model Populations & samples #1: 1 42 34 23 108 18 48 18 1 . . . Baroni & Evert time order room school town course area course time . . . The population #2: 286 28 23 36 3 4 7 4 8 . . . Type probabilities Population models ZM & fZM #3: 2 11 105 21 11 17 17 1 16 . . . Sampling from the population Random samples #4: 44 3 110 34 223 2 25 20 28 . . . Expectation Mini-example Parameter estimation Trial & error Automatic estimation A practical example

  29. Sampling from a population model Populations & samples #1: 1 42 34 23 108 18 48 18 1 . . . Baroni & Evert time order room school town course area course time . . . The population #2: 286 28 23 36 3 4 7 4 8 . . . Type probabilities Population models ZM & fZM #3: 2 11 105 21 11 17 17 1 16 . . . Sampling from the population Random samples #4: 44 3 110 34 223 2 25 20 28 . . . Expectation Mini-example Parameter #5: 24 81 54 11 8 61 1 31 35 . . . estimation Trial & error Automatic #6: 3 65 9 165 5 42 16 20 7 . . . estimation A practical #7: 10 21 11 60 164 54 18 16 203 . . . example #8: 11 7 147 5 24 19 15 85 37 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

  30. Sampling from a population model Populations & In this way, we can . . . samples ◮ draw samples of arbitrary size N Baroni & Evert ◮ the computer can do it efficiently even for large N The population Type probabilities Population models ZM & fZM Sampling from the population Random samples Expectation Mini-example Parameter estimation Trial & error Automatic estimation A practical example

  31. Sampling from a population model Populations & In this way, we can . . . samples ◮ draw samples of arbitrary size N Baroni & Evert ◮ the computer can do it efficiently even for large N The population Type probabilities ◮ draw as many samples as we need Population models ZM & fZM Sampling from the population Random samples Expectation Mini-example Parameter estimation Trial & error Automatic estimation A practical example

  32. Sampling from a population model Populations & In this way, we can . . . samples ◮ draw samples of arbitrary size N Baroni & Evert ◮ the computer can do it efficiently even for large N The population Type probabilities ◮ draw as many samples as we need Population models ZM & fZM ◮ compute type frequency lists, frequency spectra and Sampling from the population vocabulary growth curves from these samples Random samples ◮ i.e., we can analyze them with the same methods that we Expectation Mini-example have applied to the observed data sets Parameter estimation Trial & error Automatic estimation A practical example

  33. Sampling from a population model Populations & In this way, we can . . . samples ◮ draw samples of arbitrary size N Baroni & Evert ◮ the computer can do it efficiently even for large N The population Type probabilities ◮ draw as many samples as we need Population models ZM & fZM ◮ compute type frequency lists, frequency spectra and Sampling from the population vocabulary growth curves from these samples Random samples ◮ i.e., we can analyze them with the same methods that we Expectation Mini-example have applied to the observed data sets Parameter estimation Trial & error Here are some results for samples of size N = 1000 . . . Automatic estimation A practical example

  34. Samples: type frequency list & spectrum Populations & samples rank r type k f r m V m Baroni & Evert 1 37 6 1 83 The population 2 36 1 2 22 Type probabilities Population models 3 33 3 3 20 ZM & fZM Sampling from 4 31 7 4 12 the population 5 31 10 5 10 Random samples Expectation 6 30 5 6 5 Mini-example Parameter 7 28 12 7 5 estimation Trial & error 8 27 2 8 3 Automatic estimation 9 24 4 9 3 A practical 10 24 16 10 3 example . . 11 23 8 . . . . 12 22 14 . . . . . . . . . sample #1

  35. Samples: type frequency list & spectrum Populations & samples rank r type k f r m V m Baroni & Evert 1 39 2 1 76 The population 2 34 3 2 27 Type probabilities Population models 3 30 5 3 17 ZM & fZM Sampling from 4 29 10 4 10 the population 5 28 8 5 6 Random samples Expectation 6 26 1 6 5 Mini-example Parameter 7 25 13 7 7 estimation Trial & error 8 24 7 8 3 Automatic estimation 9 23 6 10 4 A practical 10 23 11 11 2 example . . 11 20 4 . . . . 12 19 17 . . . . . . . . . sample #2

  36. Random variation in type-frequency lists Populations & Sample #1 Sample #2 40 40 ● samples ● ● ● ● Baroni & Evert ●● 30 30 ● ● ● ● ● ● ● ● The population ●● ● ● ●● ● ● Type probabilities 20 20 f r f r ● ● ●● r ↔ f r ● ●● Population models ●●● ●●●●● ● ● ●●● ZM & fZM ● ● ●● ● ●●●● ●●● ● ●● 10 10 ●●● ●●●● Sampling from ●●● ●●● ●●● ●●●●● ●●●●●●● the population ●●●●● ●●●●● ●●●● Random samples 0 0 Expectation 0 10 20 30 40 50 0 10 20 30 40 50 Mini-example r r Parameter estimation Trial & error Automatic estimation A practical example

  37. Random variation in type-frequency lists Populations & Sample #1 Sample #2 40 40 ● samples ● ● ● ● Baroni & Evert ●● 30 30 ● ● ● ● ● ● ● ● The population ●● ● ● ●● ● ● Type probabilities 20 20 f r f r ● ● ●● r ↔ f r ● ●● Population models ●●● ●●●●● ● ● ●●● ZM & fZM ● ● ●● ● ●●●● ●●● ● ●● 10 10 ●●● ●●●● Sampling from ●●● ●●● ●●● ●●●●● ●●●●●●● the population ●●●●● ●●●●● ●●●● Random samples 0 0 Expectation 0 10 20 30 40 50 0 10 20 30 40 50 Mini-example r r Parameter estimation Sample #1 Sample #2 40 40 ● Trial & error ● ● Automatic ● ● estimation ● ● 30 30 ● ● ● ● ● A practical ● ● ● example ● ● ● ● ● ● ● ● 20 20 f k f k ● k ↔ f k ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● 10 10 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● 0 0 0 10 20 30 40 50 0 10 20 30 40 50 k k

  38. Random variation in type-frequency lists Populations & ◮ Random variation leads to different type frequencies f k samples in every new sample Baroni & Evert ◮ particularly obvious when we plot them in population The population order (bottom row, k ↔ f k ) Type probabilities Population models ZM & fZM Sampling from the population Random samples Expectation Mini-example Parameter estimation Trial & error Automatic estimation A practical example

  39. Random variation in type-frequency lists Populations & ◮ Random variation leads to different type frequencies f k samples in every new sample Baroni & Evert ◮ particularly obvious when we plot them in population The population order (bottom row, k ↔ f k ) Type probabilities Population models ◮ Different ordering of types in the Zipf ranking ZM & fZM Sampling from for every new sample the population ◮ Zipf rank r in sample � = population rank k ! Random samples Expectation ◮ leads to severe problems with statistical methods Mini-example Parameter estimation Trial & error Automatic estimation A practical example

  40. Random variation in type-frequency lists Populations & ◮ Random variation leads to different type frequencies f k samples in every new sample Baroni & Evert ◮ particularly obvious when we plot them in population The population order (bottom row, k ↔ f k ) Type probabilities Population models ◮ Different ordering of types in the Zipf ranking ZM & fZM Sampling from for every new sample the population ◮ Zipf rank r in sample � = population rank k ! Random samples Expectation ◮ leads to severe problems with statistical methods Mini-example Parameter ◮ Individual types are irrelevant for our purposes, so let us estimation take a perspective that abstracts away from them Trial & error Automatic estimation ◮ frequency spectrum A practical ◮ vocabulary growth curve example

  41. Random variation in type-frequency lists Populations & ◮ Random variation leads to different type frequencies f k samples in every new sample Baroni & Evert ◮ particularly obvious when we plot them in population The population order (bottom row, k ↔ f k ) Type probabilities Population models ◮ Different ordering of types in the Zipf ranking ZM & fZM Sampling from for every new sample the population ◮ Zipf rank r in sample � = population rank k ! Random samples Expectation ◮ leads to severe problems with statistical methods Mini-example Parameter ◮ Individual types are irrelevant for our purposes, so let us estimation take a perspective that abstracts away from them Trial & error Automatic estimation ◮ frequency spectrum A practical ◮ vocabulary growth curve example ➥ considerable amount of random variation still visible

  42. Random variation: frequency spectrum 100 Sample #1 100 Sample #2 Populations & samples 80 80 Baroni & Evert 60 60 The population V m V m Type probabilities 40 40 Population models ZM & fZM 20 20 Sampling from the population 0 0 Random samples Expectation Mini-example m m Parameter Sample #3 Sample #4 100 100 estimation Trial & error Automatic 80 80 estimation A practical 60 60 example V m V m 40 40 20 20 0 0 m m

  43. Random variation: vocabulary growth curve 200 Sample #1 200 Sample #2 Populations & samples 150 150 Baroni & Evert V ( N ) V 1 ( N ) V ( N ) V 1 ( N ) The population 100 100 Type probabilities Population models ZM & fZM 50 50 Sampling from the population 0 0 Random samples 0 200 400 600 800 1000 0 200 400 600 800 1000 Expectation Mini-example N N Parameter Sample #3 Sample #4 200 200 estimation Trial & error Automatic 150 150 estimation A practical V ( N ) V 1 ( N ) V ( N ) V 1 ( N ) example 100 100 50 50 0 0 0 200 400 600 800 1000 0 200 400 600 800 1000 N N

  44. Expected values Populations & ◮ There is no reason why we should choose a particular samples sample to make a prediction for the real data – each one Baroni & Evert is equally likely or unlikely The population Type probabilities Population models ZM & fZM Sampling from the population Random samples Expectation Mini-example Parameter estimation Trial & error Automatic estimation A practical example

  45. Expected values Populations & ◮ There is no reason why we should choose a particular samples sample to make a prediction for the real data – each one Baroni & Evert is equally likely or unlikely The population Type probabilities ➥ Take the average over a large number of samples Population models ZM & fZM Sampling from the population Random samples Expectation Mini-example Parameter estimation Trial & error Automatic estimation A practical example

  46. Expected values Populations & ◮ There is no reason why we should choose a particular samples sample to make a prediction for the real data – each one Baroni & Evert is equally likely or unlikely The population Type probabilities ➥ Take the average over a large number of samples Population models ZM & fZM ◮ Such averages are called expected values or Sampling from the population expectations in statistics (frequentist approach) Random samples Expectation Mini-example Parameter estimation Trial & error Automatic estimation A practical example

  47. Expected values Populations & ◮ There is no reason why we should choose a particular samples sample to make a prediction for the real data – each one Baroni & Evert is equally likely or unlikely The population Type probabilities ➥ Take the average over a large number of samples Population models ZM & fZM ◮ Such averages are called expected values or Sampling from the population expectations in statistics (frequentist approach) Random samples ◮ Notation: E [ V ( N )] and E [ V m ( N )] Expectation Mini-example ◮ indicates that we are referring to expected values for a Parameter estimation sample of size N Trial & error ◮ rather than to the specific values V and V m Automatic estimation observed in a particular sample or a real-world data set A practical example ◮ Usually we can omit the sample size: E [ V ] and E [ V m ]

  48. The expected frequency spectrum 100 Sample #1 100 Sample #2 Populations & V m V m samples E [ V m ] E [ V m ] 80 80 Baroni & Evert 60 60 V m E [ V m ] V m E [ V m ] The population Type probabilities 40 40 Population models ZM & fZM 20 20 Sampling from the population 0 0 Random samples Expectation Mini-example m m Parameter Sample #3 Sample #4 100 100 estimation V m V m Trial & error E [ V m ] E [ V m ] Automatic 80 80 estimation A practical 60 60 V m E [ V m ] V m E [ V m ] example 40 40 20 20 0 0 m m

  49. The expected vocabulary growth curve Populations & samples Baroni & Evert Sample #1 Sample #1 200 200 The population Type probabilities Population models 150 150 ZM & fZM Sampling from E [ V 1 ( N )] E [ V ( N )] the population 100 100 Random samples Expectation Mini-example Parameter 50 50 estimation V ( N ) V 1 ( N ) E [ V ( N )] E [ V 1 ( N )] Trial & error Automatic estimation 0 0 0 200 400 600 800 1000 0 200 400 600 800 1000 A practical example N N

  50. Great expectations made easy Populations & ◮ Fortunately, we don’t have to take many thousands of samples samples to calculate expectations: there is a (relatively Baroni & Evert simple) mathematical solution ( ➜ Wednesday) The population Type probabilities Population models ZM & fZM Sampling from the population Random samples Expectation Mini-example Parameter estimation Trial & error Automatic estimation A practical example

  51. Great expectations made easy Populations & ◮ Fortunately, we don’t have to take many thousands of samples samples to calculate expectations: there is a (relatively Baroni & Evert simple) mathematical solution ( ➜ Wednesday) The population ◮ This solution also allows us to estimate the amount of Type probabilities Population models random variation ➜ variance and confidence intervals ZM & fZM Sampling from ◮ example: expected VGCs with confidence intervals the population ◮ we won’t pursue variance any further in this course Random samples Expectation Mini-example Parameter estimation Trial & error Automatic estimation A practical example

  52. Confidence intervals for the expected VGC Populations & samples Baroni & Evert Sample #1 Sample #1 200 200 The population Type probabilities Population models 150 150 ZM & fZM Sampling from E [ V 1 ( N )] E [ V ( N )] the population 100 100 Random samples Expectation Mini-example Parameter 50 50 estimation V ( N ) V 1 ( N ) E [ V ( N )] E [ V 1 ( N )] Trial & error Automatic estimation 0 0 0 200 400 600 800 1000 0 200 400 600 800 1000 A practical example N N

  53. A mini-example Populations & ◮ G. K. Zipf claimed that the distribution of English word samples frequencies follows Zipf’s law with a ≈ 1 Baroni & Evert ◮ a ≈ 1 . 5 seems a more reasonable value when you The population look at larger text samples than Zipf did Type probabilities Population models ZM & fZM Sampling from the population Random samples Expectation Mini-example Parameter estimation Trial & error Automatic estimation A practical example

  54. A mini-example Populations & ◮ G. K. Zipf claimed that the distribution of English word samples frequencies follows Zipf’s law with a ≈ 1 Baroni & Evert ◮ a ≈ 1 . 5 seems a more reasonable value when you The population look at larger text samples than Zipf did Type probabilities Population models ◮ The most frequent word in English is the with π ≈ . 06 ZM & fZM Sampling from the population Random samples Expectation Mini-example Parameter estimation Trial & error Automatic estimation A practical example

  55. A mini-example Populations & ◮ G. K. Zipf claimed that the distribution of English word samples frequencies follows Zipf’s law with a ≈ 1 Baroni & Evert ◮ a ≈ 1 . 5 seems a more reasonable value when you The population look at larger text samples than Zipf did Type probabilities Population models ◮ The most frequent word in English is the with π ≈ . 06 ZM & fZM Sampling from ◮ Zipf-Mandelbrot law with a = 1 . 5 and b = 7 . 5 yields a the population Random samples population model where π 1 ≈ . 06 (by trial & error) Expectation Mini-example Parameter estimation Trial & error Automatic estimation A practical example

  56. A mini-example Populations & ◮ How many different words do we expect to find in a samples 1-million word text? Baroni & Evert ◮ N = 1,000,000 ➜ E [ V ( N )] = 33026 . 7 The population ◮ 95%-confidence interval: V ( N ) = 32753 . 6 . . . 33299 . 7 Type probabilities Population models ZM & fZM Sampling from the population Random samples Expectation Mini-example Parameter estimation Trial & error Automatic estimation A practical example

  57. A mini-example Populations & ◮ How many different words do we expect to find in a samples 1-million word text? Baroni & Evert ◮ N = 1,000,000 ➜ E [ V ( N )] = 33026 . 7 The population ◮ 95%-confidence interval: V ( N ) = 32753 . 6 . . . 33299 . 7 Type probabilities Population models ◮ How many do we really find? ZM & fZM ◮ Brown corpus: 1 million words of edited American English Sampling from the population ◮ V = 45215 ➜ ZM model is not quite right Random samples Expectation Mini-example Parameter estimation Trial & error Automatic estimation A practical example

  58. A mini-example Populations & ◮ How many different words do we expect to find in a samples 1-million word text? Baroni & Evert ◮ N = 1,000,000 ➜ E [ V ( N )] = 33026 . 7 The population ◮ 95%-confidence interval: V ( N ) = 32753 . 6 . . . 33299 . 7 Type probabilities Population models ◮ How many do we really find? ZM & fZM ◮ Brown corpus: 1 million words of edited American English Sampling from the population ◮ V = 45215 ➜ ZM model is not quite right Random samples Expectation ◮ Physicists (and some mathematicians) are happy as long Mini-example as they get the order of magnitude right . . . Parameter estimation Trial & error Automatic estimation A practical example

  59. A mini-example Populations & ◮ How many different words do we expect to find in a samples 1-million word text? Baroni & Evert ◮ N = 1,000,000 ➜ E [ V ( N )] = 33026 . 7 The population ◮ 95%-confidence interval: V ( N ) = 32753 . 6 . . . 33299 . 7 Type probabilities Population models ◮ How many do we really find? ZM & fZM ◮ Brown corpus: 1 million words of edited American English Sampling from the population ◮ V = 45215 ➜ ZM model is not quite right Random samples Expectation ◮ Physicists (and some mathematicians) are happy as long Mini-example as they get the order of magnitude right . . . Parameter estimation ☞ Model was not based on actual data! Trial & error Automatic estimation A practical example

  60. Outline Populations & samples The type population Baroni & Evert The population Sampling from the population Type probabilities Population models ZM & fZM Sampling from Parameter estimation the population Random samples Expectation Mini-example A practical example Parameter estimation Trial & error Automatic estimation A practical example

  61. Estimating model parameters Populations & ◮ Parameter settings in the mini-example were based on samples general assumptions (claims from the literature) Baroni & Evert ◮ But we also have empirical data on the word frequency The population Type probabilities distribution of English available (the Brown corpus) Population models ZM & fZM Sampling from the population Random samples Expectation Mini-example Parameter estimation Trial & error Automatic estimation A practical example

  62. Estimating model parameters Populations & ◮ Parameter settings in the mini-example were based on samples general assumptions (claims from the literature) Baroni & Evert ◮ But we also have empirical data on the word frequency The population Type probabilities distribution of English available (the Brown corpus) Population models ZM & fZM ◮ Choose parameters so that population model matches Sampling from the population the empirical distribution as well as possible Random samples Expectation Mini-example Parameter estimation Trial & error Automatic estimation A practical example

  63. Estimating model parameters Populations & ◮ Parameter settings in the mini-example were based on samples general assumptions (claims from the literature) Baroni & Evert ◮ But we also have empirical data on the word frequency The population Type probabilities distribution of English available (the Brown corpus) Population models ZM & fZM ◮ Choose parameters so that population model matches Sampling from the population the empirical distribution as well as possible Random samples ◮ E.g. by trial and error . . . Expectation Mini-example ◮ guess parameters Parameter estimation ◮ compare model predictions for sample of size N 0 Trial & error Automatic with observed data ( N 0 tokens) estimation ◮ based on frequency spectrum or vocabulary growth curve A practical ◮ change parameters & repeat until satisfied example ◮ This process is called parameter estimation

  64. Parameter estimation by trial & error Populations & samples Baroni & Evert 25000 a = 1.5 , b = 7.5 50000 a = 1.5 , b = 7.5 The population observed ZM model Type probabilities 20000 40000 Population models ZM & fZM 15000 30000 V ( N ) E [ V ( N )] Sampling from V m E [ V m ] the population Random samples 10000 20000 Expectation Mini-example Parameter 10000 5000 estimation observed Trial & error ZM model Automatic estimation 0 0 0e+00 2e+05 4e+05 6e+05 8e+05 1e+06 A practical example m N

  65. Parameter estimation by trial & error Populations & samples Baroni & Evert 25000 a = 1.3 , b = 7.5 50000 a = 1.3 , b = 7.5 The population observed ZM model Type probabilities 20000 40000 Population models ZM & fZM 15000 30000 V ( N ) E [ V ( N )] Sampling from V m E [ V m ] the population Random samples 10000 20000 Expectation Mini-example Parameter 10000 5000 estimation observed Trial & error ZM model Automatic estimation 0 0 0e+00 2e+05 4e+05 6e+05 8e+05 1e+06 A practical example m N

  66. Parameter estimation by trial & error Populations & samples Baroni & Evert 25000 a = 1.3 , b = 0.2 50000 a = 1.3 , b = 0.2 The population observed ZM model Type probabilities 20000 40000 Population models ZM & fZM 15000 30000 V ( N ) E [ V ( N )] Sampling from V m E [ V m ] the population Random samples 10000 20000 Expectation Mini-example Parameter 10000 5000 estimation observed Trial & error ZM model Automatic estimation 0 0 0e+00 2e+05 4e+05 6e+05 8e+05 1e+06 A practical example m N

  67. Parameter estimation by trial & error Populations & samples Baroni & Evert 25000 a = 1.5 , b = 7.5 50000 a = 1.5 , b = 7.5 The population observed ZM model Type probabilities 20000 40000 Population models ZM & fZM 15000 30000 V ( N ) E [ V ( N )] Sampling from V m E [ V m ] the population Random samples 10000 20000 Expectation Mini-example Parameter 10000 5000 estimation observed Trial & error ZM model Automatic estimation 0 0 0e+00 2e+05 4e+05 6e+05 8e+05 1e+06 A practical example m N

  68. Parameter estimation by trial & error Populations & samples Baroni & Evert 25000 a = 1.7 , b = 7.5 50000 a = 1.7 , b = 7.5 The population observed ZM model Type probabilities 20000 40000 Population models ZM & fZM 15000 30000 V ( N ) E [ V ( N )] Sampling from V m E [ V m ] the population Random samples 10000 20000 Expectation Mini-example Parameter 10000 5000 estimation observed Trial & error ZM model Automatic estimation 0 0 0e+00 2e+05 4e+05 6e+05 8e+05 1e+06 A practical example m N

  69. Parameter estimation by trial & error Populations & samples Baroni & Evert 25000 a = 1.7 , b = 80 50000 a = 1.7 , b = 80 The population observed ZM model Type probabilities 20000 40000 Population models ZM & fZM 15000 30000 V ( N ) E [ V ( N )] Sampling from V m E [ V m ] the population Random samples 10000 20000 Expectation Mini-example Parameter 10000 5000 estimation observed Trial & error ZM model Automatic estimation 0 0 0e+00 2e+05 4e+05 6e+05 8e+05 1e+06 A practical example m N

  70. Parameter estimation by trial & error Populations & samples Baroni & Evert 25000 a = 2 , b = 550 50000 a = 2 , b = 550 The population observed ZM model Type probabilities 20000 40000 Population models ZM & fZM 15000 30000 V ( N ) E [ V ( N )] Sampling from V m E [ V m ] the population Random samples 10000 20000 Expectation Mini-example Parameter 10000 5000 estimation observed Trial & error ZM model Automatic estimation 0 0 0e+00 2e+05 4e+05 6e+05 8e+05 1e+06 A practical example m N

  71. Automatic parameter estimation Populations & ◮ Parameter estimation by trial & error is tedious samples ➜ let the computer to the work! Baroni & Evert The population Type probabilities Population models ZM & fZM Sampling from the population Random samples Expectation Mini-example Parameter estimation Trial & error Automatic estimation A practical example

  72. Automatic parameter estimation Populations & ◮ Parameter estimation by trial & error is tedious samples ➜ let the computer to the work! Baroni & Evert ◮ Need cost function to quantify “distance” between The population model expectations and observed data Type probabilities Population models ◮ based on vocabulary size and vocabulary spectrum ZM & fZM Sampling from (these are the most convenient criteria) the population Random samples Expectation Mini-example Parameter estimation Trial & error Automatic estimation A practical example

  73. Automatic parameter estimation Populations & ◮ Parameter estimation by trial & error is tedious samples ➜ let the computer to the work! Baroni & Evert ◮ Need cost function to quantify “distance” between The population model expectations and observed data Type probabilities Population models ◮ based on vocabulary size and vocabulary spectrum ZM & fZM Sampling from (these are the most convenient criteria) the population ◮ Computer estimates parameters by automatic Random samples Expectation minimization of cost function Mini-example Parameter ◮ clever algorithms exist that find out quickly in which estimation direction they have to “push” the parameters to Trial & error Automatic approach the minimum estimation ◮ implemented in standard software packages A practical example

  74. Cost functions for parameter estimation Populations & ◮ Cost functions compare expected frequency spectrum samples E [ V m ( N 0 )] with observed spectrum V m ( N 0 ) Baroni & Evert The population Type probabilities Population models ZM & fZM Sampling from the population Random samples Expectation Mini-example Parameter estimation Trial & error Automatic estimation A practical example

  75. Cost functions for parameter estimation Populations & ◮ Cost functions compare expected frequency spectrum samples E [ V m ( N 0 )] with observed spectrum V m ( N 0 ) Baroni & Evert ◮ Choice #1: how to weight differences The population Type probabilities M Population models � ◮ absolute values of differences � � � V m − E [ V m ] ZM & fZM � Sampling from m =1 the population M ◮ mean squared error 1 � 2 Random samples � � V m − E [ V m ] Expectation M Mini-example m =1 ◮ chi-squared criterion: scale by estimated variances Parameter estimation Trial & error Automatic estimation A practical example

  76. Cost functions for parameter estimation Populations & ◮ Cost functions compare expected frequency spectrum samples E [ V m ( N 0 )] with observed spectrum V m ( N 0 ) Baroni & Evert ◮ Choice #1: how to weight differences The population Type probabilities ◮ Choice #2: how many spectrum elements to use Population models ZM & fZM ◮ typically between M = 2 and M = 15 Sampling from ◮ what happens if M < number of parameters? the population Random samples Expectation Mini-example Parameter estimation Trial & error Automatic estimation A practical example

  77. Cost functions for parameter estimation Populations & ◮ Cost functions compare expected frequency spectrum samples E [ V m ( N 0 )] with observed spectrum V m ( N 0 ) Baroni & Evert ◮ Choice #1: how to weight differences The population Type probabilities ◮ Choice #2: how many spectrum elements to use Population models ZM & fZM ◮ typically between M = 2 and M = 15 Sampling from ◮ what happens if M < number of parameters? the population Random samples ◮ For many applications, it is important to match V Expectation Mini-example precisely: additional constraint E [ V ( N 0 )] = V ( N 0 ) Parameter ◮ general principle: you can match as many constraints estimation Trial & error as there are free parameters in the model Automatic estimation A practical example

  78. Cost functions for parameter estimation Populations & ◮ Cost functions compare expected frequency spectrum samples E [ V m ( N 0 )] with observed spectrum V m ( N 0 ) Baroni & Evert ◮ Choice #1: how to weight differences The population Type probabilities ◮ Choice #2: how many spectrum elements to use Population models ZM & fZM ◮ typically between M = 2 and M = 15 Sampling from ◮ what happens if M < number of parameters? the population Random samples ◮ For many applications, it is important to match V Expectation Mini-example precisely: additional constraint E [ V ( N 0 )] = V ( N 0 ) Parameter ◮ general principle: you can match as many constraints estimation Trial & error as there are free parameters in the model Automatic estimation ◮ Felicitous choice of cost function and M can A practical example substantially improve the quality of the estimated model ◮ It isn’t a science, it’s an art . . .

  79. Goodness-of-fit Populations & ◮ Automatic estimation procedure minimizes cost function samples until no further improvement can be found Baroni & Evert ◮ this is a so-called local minimum of the cost function The population ◮ not necessarily the global minimum that we want to find Type probabilities Population models ZM & fZM Sampling from the population Random samples Expectation Mini-example Parameter estimation Trial & error Automatic estimation A practical example

  80. Goodness-of-fit Populations & ◮ Automatic estimation procedure minimizes cost function samples until no further improvement can be found Baroni & Evert ◮ this is a so-called local minimum of the cost function The population ◮ not necessarily the global minimum that we want to find Type probabilities Population models ◮ Key question: is the estimated model good enough? ZM & fZM Sampling from the population Random samples Expectation Mini-example Parameter estimation Trial & error Automatic estimation A practical example

  81. Goodness-of-fit Populations & ◮ Automatic estimation procedure minimizes cost function samples until no further improvement can be found Baroni & Evert ◮ this is a so-called local minimum of the cost function The population ◮ not necessarily the global minimum that we want to find Type probabilities Population models ◮ Key question: is the estimated model good enough? ZM & fZM Sampling from ◮ In other words: does the model provide a plausible the population Random samples explanation of the observed data as a random sample Expectation Mini-example from the population? Parameter estimation Trial & error Automatic estimation A practical example

  82. Goodness-of-fit Populations & ◮ Automatic estimation procedure minimizes cost function samples until no further improvement can be found Baroni & Evert ◮ this is a so-called local minimum of the cost function The population ◮ not necessarily the global minimum that we want to find Type probabilities Population models ◮ Key question: is the estimated model good enough? ZM & fZM Sampling from ◮ In other words: does the model provide a plausible the population Random samples explanation of the observed data as a random sample Expectation Mini-example from the population? Parameter ◮ Can be measured by goodness-of-fit test estimation Trial & error ◮ use special tests for such models (Baayen 2001) Automatic estimation ◮ p-value specifies whether model is plausible A practical ◮ small p-value ➜ reject model as explanation for data example ➥ we want to achieve a high p-value

  83. Goodness-of-fit Populations & ◮ Automatic estimation procedure minimizes cost function samples until no further improvement can be found Baroni & Evert ◮ this is a so-called local minimum of the cost function The population ◮ not necessarily the global minimum that we want to find Type probabilities Population models ◮ Key question: is the estimated model good enough? ZM & fZM Sampling from ◮ In other words: does the model provide a plausible the population Random samples explanation of the observed data as a random sample Expectation Mini-example from the population? Parameter ◮ Can be measured by goodness-of-fit test estimation Trial & error ◮ use special tests for such models (Baayen 2001) Automatic estimation ◮ p-value specifies whether model is plausible A practical ◮ small p-value ➜ reject model as explanation for data example ➥ we want to achieve a high p-value ◮ Typically, we find p < . 001 – but the models can still be useful for many purposes!

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend