Counting Words: Type probabilities Population models Type-rich - - PowerPoint PPT Presentation

counting words
SMART_READER_LITE
LIVE PREVIEW

Counting Words: Type probabilities Population models Type-rich - - PowerPoint PPT Presentation

Populations & samples Baroni & Evert The population Counting Words: Type probabilities Population models Type-rich populations, samples, ZM & fZM Sampling from and statistical models the population Random samples Expectation


slide-1
SLIDE 1

Populations & samples Baroni & Evert The population

Type probabilities Population models ZM & fZM

Sampling from the population

Random samples Expectation Mini-example

Parameter estimation

Trial & error Automatic estimation

A practical example

Counting Words: Type-rich populations, samples, and statistical models

Marco Baroni & Stefan Evert M´ alaga, 8 August 2006

slide-2
SLIDE 2

Populations & samples Baroni & Evert The population

Type probabilities Population models ZM & fZM

Sampling from the population

Random samples Expectation Mini-example

Parameter estimation

Trial & error Automatic estimation

A practical example

The type population Sampling from the population Parameter estimation A practical example

slide-3
SLIDE 3

Populations & samples Baroni & Evert The population

Type probabilities Population models ZM & fZM

Sampling from the population

Random samples Expectation Mini-example

Parameter estimation

Trial & error Automatic estimation

A practical example

Why we need the population

There are two reasons why we want to construct a model of the type population distribution:

◮ Population distribution is interesting by itself, for

theoretical reasons or in NLP applications

◮ We know how to simulate sampling from population

➜ once we have a population model, we can obtain

estimates of V (N), V1(N) and similar quantities for arbitrary sample sizes N

slide-4
SLIDE 4

Populations & samples Baroni & Evert The population

Type probabilities Population models ZM & fZM

Sampling from the population

Random samples Expectation Mini-example

Parameter estimation

Trial & error Automatic estimation

A practical example

Why we need the population

There are two reasons why we want to construct a model of the type population distribution:

◮ Population distribution is interesting by itself, for

theoretical reasons or in NLP applications

◮ We know how to simulate sampling from population

➜ once we have a population model, we can obtain

estimates of V (N), V1(N) and similar quantities for arbitrary sample sizes N A third reason:

◮ The bell-bottom shape of the observed Zipf ranking does

not fit Zipf’s law (type frequencies must be integers!)

◮ It is more natural to characterize occurrence probabilities

(for which there is no such restriction) by Zipf’s law

slide-5
SLIDE 5

Populations & samples Baroni & Evert The population

Type probabilities Population models ZM & fZM

Sampling from the population

Random samples Expectation Mini-example

Parameter estimation

Trial & error Automatic estimation

A practical example

A population of types

◮ A type population is characterized by

a) a set of types wk b) the corresponding occurrence probabilities πk

slide-6
SLIDE 6

Populations & samples Baroni & Evert The population

Type probabilities Population models ZM & fZM

Sampling from the population

Random samples Expectation Mini-example

Parameter estimation

Trial & error Automatic estimation

A practical example

A population of types

◮ A type population is characterized by

a) a set of types wk b) the corresponding occurrence probabilities πk

◮ The actual “identities” of the types are irrelevant

(for word frequency distributions)

◮ we don’t care whether w43194 is wormhole or heatwave

slide-7
SLIDE 7

Populations & samples Baroni & Evert The population

Type probabilities Population models ZM & fZM

Sampling from the population

Random samples Expectation Mini-example

Parameter estimation

Trial & error Automatic estimation

A practical example

A population of types

◮ A type population is characterized by

a) a set of types wk b) the corresponding occurrence probabilities πk

◮ The actual “identities” of the types are irrelevant

(for word frequency distributions)

◮ we don’t care whether w43194 is wormhole or heatwave

◮ It is customary (and convenient) to arrange types in

  • rder of decreasing probability: π1 ≥ π2 ≥ π3 ≥ · · ·

◮ NB: this is usually not the same ordering as in the

  • bserved Zipf ranking (we will see examples of this later)
slide-8
SLIDE 8

Populations & samples Baroni & Evert The population

Type probabilities Population models ZM & fZM

Sampling from the population

Random samples Expectation Mini-example

Parameter estimation

Trial & error Automatic estimation

A practical example

Today’s quiz . . .

Everybody remember what probabilities are?

slide-9
SLIDE 9

Populations & samples Baroni & Evert The population

Type probabilities Population models ZM & fZM

Sampling from the population

Random samples Expectation Mini-example

Parameter estimation

Trial & error Automatic estimation

A practical example

Today’s quiz . . .

Everybody remember what probabilities are?

◮ 0 ≤ πk ≤ 1 (for all k)

slide-10
SLIDE 10

Populations & samples Baroni & Evert The population

Type probabilities Population models ZM & fZM

Sampling from the population

Random samples Expectation Mini-example

Parameter estimation

Trial & error Automatic estimation

A practical example

Today’s quiz . . .

Everybody remember what probabilities are?

◮ 0 ≤ πk ≤ 1 (for all k) ◮ k πk = π1 + π2 + π3 + · · · = 1

slide-11
SLIDE 11

Populations & samples Baroni & Evert The population

Type probabilities Population models ZM & fZM

Sampling from the population

Random samples Expectation Mini-example

Parameter estimation

Trial & error Automatic estimation

A practical example

Today’s quiz (cont’d)

And what their interpretation is?

slide-12
SLIDE 12

Populations & samples Baroni & Evert The population

Type probabilities Population models ZM & fZM

Sampling from the population

Random samples Expectation Mini-example

Parameter estimation

Trial & error Automatic estimation

A practical example

Today’s quiz (cont’d)

And what their interpretation is?

◮ πk = relative frequency of wk in huge body of text

◮ e.g. population = “written English”, formalized as all

English writing that has ever been published

◮ also: πk = chances that a token drawn at random

belongs to type wk

slide-13
SLIDE 13

Populations & samples Baroni & Evert The population

Type probabilities Population models ZM & fZM

Sampling from the population

Random samples Expectation Mini-example

Parameter estimation

Trial & error Automatic estimation

A practical example

Today’s quiz (cont’d)

And what their interpretation is?

◮ πk = relative frequency of wk in huge body of text

◮ e.g. population = “written English”, formalized as all

English writing that has ever been published

◮ also: πk = chances that a token drawn at random

belongs to type wk

◮ πk = output probability for wk in generative model

◮ e.g. psycholinguistic model of a human speaker ◮ πk = probability that next word uttered by the speaker

belongs to type wk (without knowledge about context and previous words)

slide-14
SLIDE 14

Populations & samples Baroni & Evert The population

Type probabilities Population models ZM & fZM

Sampling from the population

Random samples Expectation Mini-example

Parameter estimation

Trial & error Automatic estimation

A practical example

Today’s quiz (cont’d)

And what their interpretation is?

◮ πk = relative frequency of wk in huge body of text

◮ e.g. population = “written English”, formalized as all

English writing that has ever been published

◮ also: πk = chances that a token drawn at random

belongs to type wk

◮ πk = output probability for wk in generative model

◮ e.g. psycholinguistic model of a human speaker ◮ πk = probability that next word uttered by the speaker

belongs to type wk (without knowledge about context and previous words)

◮ analogous interpretations for other linguistic and

non-linguistic phenomena

slide-15
SLIDE 15

Populations & samples Baroni & Evert The population

Type probabilities Population models ZM & fZM

Sampling from the population

Random samples Expectation Mini-example

Parameter estimation

Trial & error Automatic estimation

A practical example

The problem with probabilities . . .

◮ We cannot measure these probabilities directly ◮ In principle, such probabilities can be estimated from a

sample (that’s what most of statistics is about), e.g. π ≈ f n

slide-16
SLIDE 16

Populations & samples Baroni & Evert The population

Type probabilities Population models ZM & fZM

Sampling from the population

Random samples Expectation Mini-example

Parameter estimation

Trial & error Automatic estimation

A practical example

The problem with probabilities . . .

◮ We cannot measure these probabilities directly ◮ In principle, such probabilities can be estimated from a

sample (that’s what most of statistics is about), e.g. π ≈ f n

◮ But we cannot reliably estimate thousands or millions of

πk’s from any finite sample (just think of all the unseen types that do not occur in the sample)

slide-17
SLIDE 17

Populations & samples Baroni & Evert The population

Type probabilities Population models ZM & fZM

Sampling from the population

Random samples Expectation Mini-example

Parameter estimation

Trial & error Automatic estimation

A practical example

. . . and its solution

➥ We need a model for the population

slide-18
SLIDE 18

Populations & samples Baroni & Evert The population

Type probabilities Population models ZM & fZM

Sampling from the population

Random samples Expectation Mini-example

Parameter estimation

Trial & error Automatic estimation

A practical example

. . . and its solution

➥ We need a model for the population

◮ This model embodies our hypothesis that the distribution

  • f type probabilities has a certain general shape

(more precisely, we speak of a family of models)

slide-19
SLIDE 19

Populations & samples Baroni & Evert The population

Type probabilities Population models ZM & fZM

Sampling from the population

Random samples Expectation Mini-example

Parameter estimation

Trial & error Automatic estimation

A practical example

. . . and its solution

➥ We need a model for the population

◮ This model embodies our hypothesis that the distribution

  • f type probabilities has a certain general shape

(more precisely, we speak of a family of models)

◮ The exact form of the distribution is then determined by

a small number of parameters (typically 2 or 3)

slide-20
SLIDE 20

Populations & samples Baroni & Evert The population

Type probabilities Population models ZM & fZM

Sampling from the population

Random samples Expectation Mini-example

Parameter estimation

Trial & error Automatic estimation

A practical example

. . . and its solution

➥ We need a model for the population

◮ This model embodies our hypothesis that the distribution

  • f type probabilities has a certain general shape

(more precisely, we speak of a family of models)

◮ The exact form of the distribution is then determined by

a small number of parameters (typically 2 or 3)

◮ These parameters can be estimated with relative ease

slide-21
SLIDE 21

Populations & samples Baroni & Evert The population

Type probabilities Population models ZM & fZM

Sampling from the population

Random samples Expectation Mini-example

Parameter estimation

Trial & error Automatic estimation

A practical example

Examples of population models

  • 10

20 30 40 50 0.00 0.02 0.04 0.06 0.08 0.10 k πk

  • 10

20 30 40 50 0.00 0.02 0.04 0.06 0.08 0.10 k πk

  • 10

20 30 40 50 0.00 0.02 0.04 0.06 0.08 0.10 k πk

  • 10

20 30 40 50 0.00 0.02 0.04 0.06 0.08 0.10 k πk

slide-22
SLIDE 22

Populations & samples Baroni & Evert The population

Type probabilities Population models ZM & fZM

Sampling from the population

Random samples Expectation Mini-example

Parameter estimation

Trial & error Automatic estimation

A practical example

The Zipf-Mandelbrot law as a population model

What is the right family of models for lexical frequency distributions?

◮ We have already seen that the Zipf-Mandelbrot law

captures the distribution of observed frequencies very well, across many phenomena and data sets

slide-23
SLIDE 23

Populations & samples Baroni & Evert The population

Type probabilities Population models ZM & fZM

Sampling from the population

Random samples Expectation Mini-example

Parameter estimation

Trial & error Automatic estimation

A practical example

The Zipf-Mandelbrot law as a population model

What is the right family of models for lexical frequency distributions?

◮ We have already seen that the Zipf-Mandelbrot law

captures the distribution of observed frequencies very well, across many phenomena and data sets

◮ Re-phrase the law for type probabilities instead of

frequencies: πk := C (k + b)a

slide-24
SLIDE 24

Populations & samples Baroni & Evert The population

Type probabilities Population models ZM & fZM

Sampling from the population

Random samples Expectation Mini-example

Parameter estimation

Trial & error Automatic estimation

A practical example

The Zipf-Mandelbrot law as a population model

What is the right family of models for lexical frequency distributions?

◮ We have already seen that the Zipf-Mandelbrot law

captures the distribution of observed frequencies very well, across many phenomena and data sets

◮ Re-phrase the law for type probabilities instead of

frequencies: πk := C (k + b)a

◮ Two free parameters: a > 1 and b ≥ 0 ◮ C is not a parameter but a normalization constant,

needed to ensure that

k πk = 1

slide-25
SLIDE 25

Populations & samples Baroni & Evert The population

Type probabilities Population models ZM & fZM

Sampling from the population

Random samples Expectation Mini-example

Parameter estimation

Trial & error Automatic estimation

A practical example

The Zipf-Mandelbrot law as a population model

What is the right family of models for lexical frequency distributions?

◮ We have already seen that the Zipf-Mandelbrot law

captures the distribution of observed frequencies very well, across many phenomena and data sets

◮ Re-phrase the law for type probabilities instead of

frequencies: πk := C (k + b)a

◮ Two free parameters: a > 1 and b ≥ 0 ◮ C is not a parameter but a normalization constant,

needed to ensure that

k πk = 1

➥ the Zipf-Mandelbrot population model

slide-26
SLIDE 26

Populations & samples Baroni & Evert The population

Type probabilities Population models ZM & fZM

Sampling from the population

Random samples Expectation Mini-example

Parameter estimation

Trial & error Automatic estimation

A practical example

The parameters of the Zipf-Mandelbrot model

  • 10

20 30 40 50 0.00 0.02 0.04 0.06 0.08 0.10 k πk

a = 1.2 b = 1.5

  • 10

20 30 40 50 0.00 0.02 0.04 0.06 0.08 0.10 k πk

a = 2 b = 10

  • 10

20 30 40 50 0.00 0.02 0.04 0.06 0.08 0.10 k πk

a = 2 b = 15

  • 10

20 30 40 50 0.00 0.02 0.04 0.06 0.08 0.10 k πk

a = 5 b = 40

slide-27
SLIDE 27

Populations & samples Baroni & Evert The population

Type probabilities Population models ZM & fZM

Sampling from the population

Random samples Expectation Mini-example

Parameter estimation

Trial & error Automatic estimation

A practical example

The parameters of the Zipf-Mandelbrot model

  • 1

2 5 10 20 50 100 1e−04 5e−04 5e−03 5e−02 k πk

a = 1.2 b = 1.5

  • ●●●●●
  • 1

2 5 10 20 50 100 1e−04 5e−04 5e−03 5e−02 k πk

a = 2 b = 10

  • ● ● ●●●●●
  • 1

2 5 10 20 50 100 1e−04 5e−04 5e−03 5e−02 k πk

a = 2 b = 15

  • ● ● ●●●●●
  • 1

2 5 10 20 50 100 1e−04 5e−04 5e−03 5e−02 k πk

a = 5 b = 40

slide-28
SLIDE 28

Populations & samples Baroni & Evert The population

Type probabilities Population models ZM & fZM

Sampling from the population

Random samples Expectation Mini-example

Parameter estimation

Trial & error Automatic estimation

A practical example

The finite Zipf-Mandelbrot model

◮ Zipf-Mandelbrot population model characterizes an

infinite type population: there is no upper bound on k, and the type probabilities πk can become arbitrarily small

slide-29
SLIDE 29

Populations & samples Baroni & Evert The population

Type probabilities Population models ZM & fZM

Sampling from the population

Random samples Expectation Mini-example

Parameter estimation

Trial & error Automatic estimation

A practical example

The finite Zipf-Mandelbrot model

◮ Zipf-Mandelbrot population model characterizes an

infinite type population: there is no upper bound on k, and the type probabilities πk can become arbitrarily small

◮ π = 10−6 (once every million words), π = 10−9 (once

every billion words), π = 10−12 (once on the entire Internet), π = 10−100 (once in the universe?)

slide-30
SLIDE 30

Populations & samples Baroni & Evert The population

Type probabilities Population models ZM & fZM

Sampling from the population

Random samples Expectation Mini-example

Parameter estimation

Trial & error Automatic estimation

A practical example

The finite Zipf-Mandelbrot model

◮ Zipf-Mandelbrot population model characterizes an

infinite type population: there is no upper bound on k, and the type probabilities πk can become arbitrarily small

◮ π = 10−6 (once every million words), π = 10−9 (once

every billion words), π = 10−12 (once on the entire Internet), π = 10−100 (once in the universe?)

◮ Alternative: finite (but often very large) number

  • f types in the population

◮ We call this the population vocabulary size S

(and write S = ∞ for an infinite type population)

slide-31
SLIDE 31

Populations & samples Baroni & Evert The population

Type probabilities Population models ZM & fZM

Sampling from the population

Random samples Expectation Mini-example

Parameter estimation

Trial & error Automatic estimation

A practical example

The finite Zipf-Mandelbrot model

◮ The finite Zipf-Mandelbrot model simply stops after

the first S types (w1, . . . , wS)

slide-32
SLIDE 32

Populations & samples Baroni & Evert The population

Type probabilities Population models ZM & fZM

Sampling from the population

Random samples Expectation Mini-example

Parameter estimation

Trial & error Automatic estimation

A practical example

The finite Zipf-Mandelbrot model

◮ The finite Zipf-Mandelbrot model simply stops after

the first S types (w1, . . . , wS)

◮ S becomes a new parameter of the model

➜ the finite Zipf-Mandelbrot model has 3 parameters

◮ NB: C will not have the same value as for the

corresponding infinite ZM model

slide-33
SLIDE 33

Populations & samples Baroni & Evert The population

Type probabilities Population models ZM & fZM

Sampling from the population

Random samples Expectation Mini-example

Parameter estimation

Trial & error Automatic estimation

A practical example

The finite Zipf-Mandelbrot model

◮ The finite Zipf-Mandelbrot model simply stops after

the first S types (w1, . . . , wS)

◮ S becomes a new parameter of the model

➜ the finite Zipf-Mandelbrot model has 3 parameters

◮ NB: C will not have the same value as for the

corresponding infinite ZM model Abbreviations: ZM for Zipf-Mandelbrot model, and fZM for finite Zipf-Mandelbrot model

slide-34
SLIDE 34

Populations & samples Baroni & Evert The population

Type probabilities Population models ZM & fZM

Sampling from the population

Random samples Expectation Mini-example

Parameter estimation

Trial & error Automatic estimation

A practical example

The next steps

Once we have a population model . . .

◮ We still need to estimate the values of its parameters

◮ we’ll see later how we can do this

slide-35
SLIDE 35

Populations & samples Baroni & Evert The population

Type probabilities Population models ZM & fZM

Sampling from the population

Random samples Expectation Mini-example

Parameter estimation

Trial & error Automatic estimation

A practical example

The next steps

Once we have a population model . . .

◮ We still need to estimate the values of its parameters

◮ we’ll see later how we can do this

◮ We want to simulate random samples from the

population described by the model

◮ basic assumption: real data sets (such as corpora) are

random samples from this population

slide-36
SLIDE 36

Populations & samples Baroni & Evert The population

Type probabilities Population models ZM & fZM

Sampling from the population

Random samples Expectation Mini-example

Parameter estimation

Trial & error Automatic estimation

A practical example

The next steps

Once we have a population model . . .

◮ We still need to estimate the values of its parameters

◮ we’ll see later how we can do this

◮ We want to simulate random samples from the

population described by the model

◮ basic assumption: real data sets (such as corpora) are

random samples from this population

◮ this allows us to predict vocabulary growth, the number

  • f previously unseen types as more text is added to a

corpus, the frequency spectrum of a larger data set, etc.

slide-37
SLIDE 37

Populations & samples Baroni & Evert The population

Type probabilities Population models ZM & fZM

Sampling from the population

Random samples Expectation Mini-example

Parameter estimation

Trial & error Automatic estimation

A practical example

The next steps

Once we have a population model . . .

◮ We still need to estimate the values of its parameters

◮ we’ll see later how we can do this

◮ We want to simulate random samples from the

population described by the model

◮ basic assumption: real data sets (such as corpora) are

random samples from this population

◮ this allows us to predict vocabulary growth, the number

  • f previously unseen types as more text is added to a

corpus, the frequency spectrum of a larger data set, etc.

◮ it will also allow us to estimate the model parameters

slide-38
SLIDE 38

Populations & samples Baroni & Evert The population

Type probabilities Population models ZM & fZM

Sampling from the population

Random samples Expectation Mini-example

Parameter estimation

Trial & error Automatic estimation

A practical example

Outline

The type population Sampling from the population Parameter estimation A practical example

slide-39
SLIDE 39

Populations & samples Baroni & Evert The population

Type probabilities Population models ZM & fZM

Sampling from the population

Random samples Expectation Mini-example

Parameter estimation

Trial & error Automatic estimation

A practical example

Sampling from a population model

Assume we believe that the population we are interested in can be described by a Zipf-Mandelbrot model:

  • 10

20 30 40 50 0.00 0.01 0.02 0.03 0.04 0.05 k πk

a = 3 b = 50

  • ● ● ●●●●●
  • 1

2 5 10 20 50 100 1e−04 5e−04 5e−03 5e−02 k πk

a = 3 b = 50

slide-40
SLIDE 40

Populations & samples Baroni & Evert The population

Type probabilities Population models ZM & fZM

Sampling from the population

Random samples Expectation Mini-example

Parameter estimation

Trial & error Automatic estimation

A practical example

Sampling from a population model

Assume we believe that the population we are interested in can be described by a Zipf-Mandelbrot model:

  • 10

20 30 40 50 0.00 0.01 0.02 0.03 0.04 0.05 k πk

a = 3 b = 50

  • ● ● ●●●●●
  • 1

2 5 10 20 50 100 1e−04 5e−04 5e−03 5e−02 k πk

a = 3 b = 50

Use computer simulation to sample from this model:

◮ Draw N tokens from the population such that in

each step, type wk has probability πk to be picked

slide-41
SLIDE 41

Populations & samples Baroni & Evert The population

Type probabilities Population models ZM & fZM

Sampling from the population

Random samples Expectation Mini-example

Parameter estimation

Trial & error Automatic estimation

A practical example

Sampling from a population model

#1: 1 42 34 23 108 18 48 18 1 . . .

slide-42
SLIDE 42

Populations & samples Baroni & Evert The population

Type probabilities Population models ZM & fZM

Sampling from the population

Random samples Expectation Mini-example

Parameter estimation

Trial & error Automatic estimation

A practical example

Sampling from a population model

#1: 1 42 34 23 108 18 48 18 1 . . . time order room school town course area course time . . .

slide-43
SLIDE 43

Populations & samples Baroni & Evert The population

Type probabilities Population models ZM & fZM

Sampling from the population

Random samples Expectation Mini-example

Parameter estimation

Trial & error Automatic estimation

A practical example

Sampling from a population model

#1: 1 42 34 23 108 18 48 18 1 . . . time order room school town course area course time . . . #2: 286 28 23 36 3 4 7 4 8 . . .

slide-44
SLIDE 44

Populations & samples Baroni & Evert The population

Type probabilities Population models ZM & fZM

Sampling from the population

Random samples Expectation Mini-example

Parameter estimation

Trial & error Automatic estimation

A practical example

Sampling from a population model

#1: 1 42 34 23 108 18 48 18 1 . . . time order room school town course area course time . . . #2: 286 28 23 36 3 4 7 4 8 . . . #3: 2 11 105 21 11 17 17 1 16 . . .

slide-45
SLIDE 45

Populations & samples Baroni & Evert The population

Type probabilities Population models ZM & fZM

Sampling from the population

Random samples Expectation Mini-example

Parameter estimation

Trial & error Automatic estimation

A practical example

Sampling from a population model

#1: 1 42 34 23 108 18 48 18 1 . . . time order room school town course area course time . . . #2: 286 28 23 36 3 4 7 4 8 . . . #3: 2 11 105 21 11 17 17 1 16 . . . #4: 44 3 110 34 223 2 25 20 28 . . .

slide-46
SLIDE 46

Populations & samples Baroni & Evert The population

Type probabilities Population models ZM & fZM

Sampling from the population

Random samples Expectation Mini-example

Parameter estimation

Trial & error Automatic estimation

A practical example

Sampling from a population model

#1: 1 42 34 23 108 18 48 18 1 . . . time order room school town course area course time . . . #2: 286 28 23 36 3 4 7 4 8 . . . #3: 2 11 105 21 11 17 17 1 16 . . . #4: 44 3 110 34 223 2 25 20 28 . . . #5: 24 81 54 11 8 61 1 31 35 . . . #6: 3 65 9 165 5 42 16 20 7 . . . #7: 10 21 11 60 164 54 18 16 203 . . . #8: 11 7 147 5 24 19 15 85 37 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

slide-47
SLIDE 47

Populations & samples Baroni & Evert The population

Type probabilities Population models ZM & fZM

Sampling from the population

Random samples Expectation Mini-example

Parameter estimation

Trial & error Automatic estimation

A practical example

Sampling from a population model

In this way, we can . . .

◮ draw samples of arbitrary size N

◮ the computer can do it efficiently even for large N

slide-48
SLIDE 48

Populations & samples Baroni & Evert The population

Type probabilities Population models ZM & fZM

Sampling from the population

Random samples Expectation Mini-example

Parameter estimation

Trial & error Automatic estimation

A practical example

Sampling from a population model

In this way, we can . . .

◮ draw samples of arbitrary size N

◮ the computer can do it efficiently even for large N

◮ draw as many samples as we need

slide-49
SLIDE 49

Populations & samples Baroni & Evert The population

Type probabilities Population models ZM & fZM

Sampling from the population

Random samples Expectation Mini-example

Parameter estimation

Trial & error Automatic estimation

A practical example

Sampling from a population model

In this way, we can . . .

◮ draw samples of arbitrary size N

◮ the computer can do it efficiently even for large N

◮ draw as many samples as we need ◮ compute type frequency lists, frequency spectra and

vocabulary growth curves from these samples

◮ i.e., we can analyze them with the same methods that we

have applied to the observed data sets

slide-50
SLIDE 50

Populations & samples Baroni & Evert The population

Type probabilities Population models ZM & fZM

Sampling from the population

Random samples Expectation Mini-example

Parameter estimation

Trial & error Automatic estimation

A practical example

Sampling from a population model

In this way, we can . . .

◮ draw samples of arbitrary size N

◮ the computer can do it efficiently even for large N

◮ draw as many samples as we need ◮ compute type frequency lists, frequency spectra and

vocabulary growth curves from these samples

◮ i.e., we can analyze them with the same methods that we

have applied to the observed data sets

Here are some results for samples of size N = 1000 . . .

slide-51
SLIDE 51

Populations & samples Baroni & Evert The population

Type probabilities Population models ZM & fZM

Sampling from the population

Random samples Expectation Mini-example

Parameter estimation

Trial & error Automatic estimation

A practical example

Samples: type frequency list & spectrum

rank r fr type k 1 37 6 2 36 1 3 33 3 4 31 7 5 31 10 6 30 5 7 28 12 8 27 2 9 24 4 10 24 16 11 23 8 12 22 14 . . . . . . . . . m Vm 1 83 2 22 3 20 4 12 5 10 6 5 7 5 8 3 9 3 10 3 . . . . . . sample #1

slide-52
SLIDE 52

Populations & samples Baroni & Evert The population

Type probabilities Population models ZM & fZM

Sampling from the population

Random samples Expectation Mini-example

Parameter estimation

Trial & error Automatic estimation

A practical example

Samples: type frequency list & spectrum

rank r fr type k 1 39 2 2 34 3 3 30 5 4 29 10 5 28 8 6 26 1 7 25 13 8 24 7 9 23 6 10 23 11 11 20 4 12 19 17 . . . . . . . . . m Vm 1 76 2 27 3 17 4 10 5 6 6 5 7 7 8 3 10 4 11 2 . . . . . . sample #2

slide-53
SLIDE 53

Populations & samples Baroni & Evert The population

Type probabilities Population models ZM & fZM

Sampling from the population

Random samples Expectation Mini-example

Parameter estimation

Trial & error Automatic estimation

A practical example

Random variation in type-frequency lists

  • 10

20 30 40 50 10 20 30 40

Sample #1

r fr

  • 10

20 30 40 50 10 20 30 40

Sample #2

r fr

r ↔ fr

slide-54
SLIDE 54

Populations & samples Baroni & Evert The population

Type probabilities Population models ZM & fZM

Sampling from the population

Random samples Expectation Mini-example

Parameter estimation

Trial & error Automatic estimation

A practical example

Random variation in type-frequency lists

  • 10

20 30 40 50 10 20 30 40

Sample #1

r fr

  • 10

20 30 40 50 10 20 30 40

Sample #2

r fr

r ↔ fr

  • 10

20 30 40 50 10 20 30 40

Sample #1

k fk

  • 10

20 30 40 50 10 20 30 40

Sample #2

k fk

k ↔ fk

slide-55
SLIDE 55

Populations & samples Baroni & Evert The population

Type probabilities Population models ZM & fZM

Sampling from the population

Random samples Expectation Mini-example

Parameter estimation

Trial & error Automatic estimation

A practical example

Random variation in type-frequency lists

◮ Random variation leads to different type frequencies fk

in every new sample

◮ particularly obvious when we plot them in population

  • rder (bottom row, k ↔ fk)
slide-56
SLIDE 56

Populations & samples Baroni & Evert The population

Type probabilities Population models ZM & fZM

Sampling from the population

Random samples Expectation Mini-example

Parameter estimation

Trial & error Automatic estimation

A practical example

Random variation in type-frequency lists

◮ Random variation leads to different type frequencies fk

in every new sample

◮ particularly obvious when we plot them in population

  • rder (bottom row, k ↔ fk)

◮ Different ordering of types in the Zipf ranking

for every new sample

◮ Zipf rank r in sample = population rank k! ◮ leads to severe problems with statistical methods

slide-57
SLIDE 57

Populations & samples Baroni & Evert The population

Type probabilities Population models ZM & fZM

Sampling from the population

Random samples Expectation Mini-example

Parameter estimation

Trial & error Automatic estimation

A practical example

Random variation in type-frequency lists

◮ Random variation leads to different type frequencies fk

in every new sample

◮ particularly obvious when we plot them in population

  • rder (bottom row, k ↔ fk)

◮ Different ordering of types in the Zipf ranking

for every new sample

◮ Zipf rank r in sample = population rank k! ◮ leads to severe problems with statistical methods

◮ Individual types are irrelevant for our purposes, so let us

take a perspective that abstracts away from them

◮ frequency spectrum ◮ vocabulary growth curve

slide-58
SLIDE 58

Populations & samples Baroni & Evert The population

Type probabilities Population models ZM & fZM

Sampling from the population

Random samples Expectation Mini-example

Parameter estimation

Trial & error Automatic estimation

A practical example

Random variation in type-frequency lists

◮ Random variation leads to different type frequencies fk

in every new sample

◮ particularly obvious when we plot them in population

  • rder (bottom row, k ↔ fk)

◮ Different ordering of types in the Zipf ranking

for every new sample

◮ Zipf rank r in sample = population rank k! ◮ leads to severe problems with statistical methods

◮ Individual types are irrelevant for our purposes, so let us

take a perspective that abstracts away from them

◮ frequency spectrum ◮ vocabulary growth curve

➥ considerable amount of random variation still visible

slide-59
SLIDE 59

Populations & samples Baroni & Evert The population

Type probabilities Population models ZM & fZM

Sampling from the population

Random samples Expectation Mini-example

Parameter estimation

Trial & error Automatic estimation

A practical example

Random variation: frequency spectrum

Sample #1

m Vm 20 40 60 80 100

Sample #2

m Vm 20 40 60 80 100

Sample #3

m Vm 20 40 60 80 100

Sample #4

m Vm 20 40 60 80 100

slide-60
SLIDE 60

Populations & samples Baroni & Evert The population

Type probabilities Population models ZM & fZM

Sampling from the population

Random samples Expectation Mini-example

Parameter estimation

Trial & error Automatic estimation

A practical example

Random variation: vocabulary growth curve

200 400 600 800 1000 50 100 150 200

Sample #1

N V(N) V1(N) 200 400 600 800 1000 50 100 150 200

Sample #2

N V(N) V1(N) 200 400 600 800 1000 50 100 150 200

Sample #3

N V(N) V1(N) 200 400 600 800 1000 50 100 150 200

Sample #4

N V(N) V1(N)

slide-61
SLIDE 61

Populations & samples Baroni & Evert The population

Type probabilities Population models ZM & fZM

Sampling from the population

Random samples Expectation Mini-example

Parameter estimation

Trial & error Automatic estimation

A practical example

Expected values

◮ There is no reason why we should choose a particular

sample to make a prediction for the real data – each one is equally likely or unlikely

slide-62
SLIDE 62

Populations & samples Baroni & Evert The population

Type probabilities Population models ZM & fZM

Sampling from the population

Random samples Expectation Mini-example

Parameter estimation

Trial & error Automatic estimation

A practical example

Expected values

◮ There is no reason why we should choose a particular

sample to make a prediction for the real data – each one is equally likely or unlikely ➥ Take the average over a large number of samples

slide-63
SLIDE 63

Populations & samples Baroni & Evert The population

Type probabilities Population models ZM & fZM

Sampling from the population

Random samples Expectation Mini-example

Parameter estimation

Trial & error Automatic estimation

A practical example

Expected values

◮ There is no reason why we should choose a particular

sample to make a prediction for the real data – each one is equally likely or unlikely ➥ Take the average over a large number of samples

◮ Such averages are called expected values or

expectations in statistics (frequentist approach)

slide-64
SLIDE 64

Populations & samples Baroni & Evert The population

Type probabilities Population models ZM & fZM

Sampling from the population

Random samples Expectation Mini-example

Parameter estimation

Trial & error Automatic estimation

A practical example

Expected values

◮ There is no reason why we should choose a particular

sample to make a prediction for the real data – each one is equally likely or unlikely ➥ Take the average over a large number of samples

◮ Such averages are called expected values or

expectations in statistics (frequentist approach)

◮ Notation: E[V (N)] and E[Vm(N)]

◮ indicates that we are referring to expected values for a

sample of size N

◮ rather than to the specific values V and Vm

  • bserved in a particular sample or a real-world data set

◮ Usually we can omit the sample size: E[V ] and E[Vm]

slide-65
SLIDE 65

Populations & samples Baroni & Evert The population

Type probabilities Population models ZM & fZM

Sampling from the population

Random samples Expectation Mini-example

Parameter estimation

Trial & error Automatic estimation

A practical example

The expected frequency spectrum

Vm E[Vm]

Sample #1

m Vm E[Vm] 20 40 60 80 100 Vm E[Vm]

Sample #2

m Vm E[Vm] 20 40 60 80 100 Vm E[Vm]

Sample #3

m Vm E[Vm] 20 40 60 80 100 Vm E[Vm]

Sample #4

m Vm E[Vm] 20 40 60 80 100

slide-66
SLIDE 66

Populations & samples Baroni & Evert The population

Type probabilities Population models ZM & fZM

Sampling from the population

Random samples Expectation Mini-example

Parameter estimation

Trial & error Automatic estimation

A practical example

The expected vocabulary growth curve

200 400 600 800 1000 50 100 150 200

Sample #1

N E[V(N)] V(N) E[V(N)] 200 400 600 800 1000 50 100 150 200

Sample #1

N E[V1(N)] V1(N) E[V1(N)]

slide-67
SLIDE 67

Populations & samples Baroni & Evert The population

Type probabilities Population models ZM & fZM

Sampling from the population

Random samples Expectation Mini-example

Parameter estimation

Trial & error Automatic estimation

A practical example

Great expectations made easy

◮ Fortunately, we don’t have to take many thousands of

samples to calculate expectations: there is a (relatively simple) mathematical solution (➜ Wednesday)

slide-68
SLIDE 68

Populations & samples Baroni & Evert The population

Type probabilities Population models ZM & fZM

Sampling from the population

Random samples Expectation Mini-example

Parameter estimation

Trial & error Automatic estimation

A practical example

Great expectations made easy

◮ Fortunately, we don’t have to take many thousands of

samples to calculate expectations: there is a (relatively simple) mathematical solution (➜ Wednesday)

◮ This solution also allows us to estimate the amount of

random variation ➜ variance and confidence intervals

◮ example: expected VGCs with confidence intervals ◮ we won’t pursue variance any further in this course

slide-69
SLIDE 69

Populations & samples Baroni & Evert The population

Type probabilities Population models ZM & fZM

Sampling from the population

Random samples Expectation Mini-example

Parameter estimation

Trial & error Automatic estimation

A practical example

Confidence intervals for the expected VGC

200 400 600 800 1000 50 100 150 200

Sample #1

N E[V(N)] V(N) E[V(N)] 200 400 600 800 1000 50 100 150 200

Sample #1

N E[V1(N)] V1(N) E[V1(N)]

slide-70
SLIDE 70

Populations & samples Baroni & Evert The population

Type probabilities Population models ZM & fZM

Sampling from the population

Random samples Expectation Mini-example

Parameter estimation

Trial & error Automatic estimation

A practical example

A mini-example

◮ G. K. Zipf claimed that the distribution of English word

frequencies follows Zipf’s law with a ≈ 1

◮ a ≈ 1.5 seems a more reasonable value when you

look at larger text samples than Zipf did

slide-71
SLIDE 71

Populations & samples Baroni & Evert The population

Type probabilities Population models ZM & fZM

Sampling from the population

Random samples Expectation Mini-example

Parameter estimation

Trial & error Automatic estimation

A practical example

A mini-example

◮ G. K. Zipf claimed that the distribution of English word

frequencies follows Zipf’s law with a ≈ 1

◮ a ≈ 1.5 seems a more reasonable value when you

look at larger text samples than Zipf did

◮ The most frequent word in English is the with π ≈ .06

slide-72
SLIDE 72

Populations & samples Baroni & Evert The population

Type probabilities Population models ZM & fZM

Sampling from the population

Random samples Expectation Mini-example

Parameter estimation

Trial & error Automatic estimation

A practical example

A mini-example

◮ G. K. Zipf claimed that the distribution of English word

frequencies follows Zipf’s law with a ≈ 1

◮ a ≈ 1.5 seems a more reasonable value when you

look at larger text samples than Zipf did

◮ The most frequent word in English is the with π ≈ .06 ◮ Zipf-Mandelbrot law with a = 1.5 and b = 7.5 yields a

population model where π1 ≈ .06 (by trial & error)

slide-73
SLIDE 73

Populations & samples Baroni & Evert The population

Type probabilities Population models ZM & fZM

Sampling from the population

Random samples Expectation Mini-example

Parameter estimation

Trial & error Automatic estimation

A practical example

A mini-example

◮ How many different words do we expect to find in a

1-million word text?

◮ N = 1,000,000 ➜ E[V (N)] = 33026.7 ◮ 95%-confidence interval: V (N) = 32753.6 . . . 33299.7

slide-74
SLIDE 74

Populations & samples Baroni & Evert The population

Type probabilities Population models ZM & fZM

Sampling from the population

Random samples Expectation Mini-example

Parameter estimation

Trial & error Automatic estimation

A practical example

A mini-example

◮ How many different words do we expect to find in a

1-million word text?

◮ N = 1,000,000 ➜ E[V (N)] = 33026.7 ◮ 95%-confidence interval: V (N) = 32753.6 . . . 33299.7

◮ How many do we really find?

◮ Brown corpus: 1 million words of edited American English ◮ V = 45215 ➜ ZM model is not quite right

slide-75
SLIDE 75

Populations & samples Baroni & Evert The population

Type probabilities Population models ZM & fZM

Sampling from the population

Random samples Expectation Mini-example

Parameter estimation

Trial & error Automatic estimation

A practical example

A mini-example

◮ How many different words do we expect to find in a

1-million word text?

◮ N = 1,000,000 ➜ E[V (N)] = 33026.7 ◮ 95%-confidence interval: V (N) = 32753.6 . . . 33299.7

◮ How many do we really find?

◮ Brown corpus: 1 million words of edited American English ◮ V = 45215 ➜ ZM model is not quite right ◮ Physicists (and some mathematicians) are happy as long

as they get the order of magnitude right . . .

slide-76
SLIDE 76

Populations & samples Baroni & Evert The population

Type probabilities Population models ZM & fZM

Sampling from the population

Random samples Expectation Mini-example

Parameter estimation

Trial & error Automatic estimation

A practical example

A mini-example

◮ How many different words do we expect to find in a

1-million word text?

◮ N = 1,000,000 ➜ E[V (N)] = 33026.7 ◮ 95%-confidence interval: V (N) = 32753.6 . . . 33299.7

◮ How many do we really find?

◮ Brown corpus: 1 million words of edited American English ◮ V = 45215 ➜ ZM model is not quite right ◮ Physicists (and some mathematicians) are happy as long

as they get the order of magnitude right . . .

☞ Model was not based on actual data!

slide-77
SLIDE 77

Populations & samples Baroni & Evert The population

Type probabilities Population models ZM & fZM

Sampling from the population

Random samples Expectation Mini-example

Parameter estimation

Trial & error Automatic estimation

A practical example

Outline

The type population Sampling from the population Parameter estimation A practical example

slide-78
SLIDE 78

Populations & samples Baroni & Evert The population

Type probabilities Population models ZM & fZM

Sampling from the population

Random samples Expectation Mini-example

Parameter estimation

Trial & error Automatic estimation

A practical example

Estimating model parameters

◮ Parameter settings in the mini-example were based on

general assumptions (claims from the literature)

◮ But we also have empirical data on the word frequency

distribution of English available (the Brown corpus)

slide-79
SLIDE 79

Populations & samples Baroni & Evert The population

Type probabilities Population models ZM & fZM

Sampling from the population

Random samples Expectation Mini-example

Parameter estimation

Trial & error Automatic estimation

A practical example

Estimating model parameters

◮ Parameter settings in the mini-example were based on

general assumptions (claims from the literature)

◮ But we also have empirical data on the word frequency

distribution of English available (the Brown corpus)

◮ Choose parameters so that population model matches

the empirical distribution as well as possible

slide-80
SLIDE 80

Populations & samples Baroni & Evert The population

Type probabilities Population models ZM & fZM

Sampling from the population

Random samples Expectation Mini-example

Parameter estimation

Trial & error Automatic estimation

A practical example

Estimating model parameters

◮ Parameter settings in the mini-example were based on

general assumptions (claims from the literature)

◮ But we also have empirical data on the word frequency

distribution of English available (the Brown corpus)

◮ Choose parameters so that population model matches

the empirical distribution as well as possible

◮ E.g. by trial and error . . .

◮ guess parameters ◮ compare model predictions for sample of size N0

with observed data (N0 tokens)

◮ based on frequency spectrum or vocabulary growth curve ◮ change parameters & repeat until satisfied

◮ This process is called parameter estimation

slide-81
SLIDE 81

Populations & samples Baroni & Evert The population

Type probabilities Population models ZM & fZM

Sampling from the population

Random samples Expectation Mini-example

Parameter estimation

Trial & error Automatic estimation

A practical example

Parameter estimation by trial & error

  • bserved

ZM model

a = 1.5, b = 7.5

m Vm E[Vm] 5000 10000 15000 20000 25000 0e+00 2e+05 4e+05 6e+05 8e+05 1e+06 10000 20000 30000 40000 50000

a = 1.5, b = 7.5

N V(N) E[V(N)]

  • bserved

ZM model

slide-82
SLIDE 82

Populations & samples Baroni & Evert The population

Type probabilities Population models ZM & fZM

Sampling from the population

Random samples Expectation Mini-example

Parameter estimation

Trial & error Automatic estimation

A practical example

Parameter estimation by trial & error

  • bserved

ZM model

a = 1.3, b = 7.5

m Vm E[Vm] 5000 10000 15000 20000 25000 0e+00 2e+05 4e+05 6e+05 8e+05 1e+06 10000 20000 30000 40000 50000

a = 1.3, b = 7.5

N V(N) E[V(N)]

  • bserved

ZM model

slide-83
SLIDE 83

Populations & samples Baroni & Evert The population

Type probabilities Population models ZM & fZM

Sampling from the population

Random samples Expectation Mini-example

Parameter estimation

Trial & error Automatic estimation

A practical example

Parameter estimation by trial & error

  • bserved

ZM model

a = 1.3, b = 0.2

m Vm E[Vm] 5000 10000 15000 20000 25000 0e+00 2e+05 4e+05 6e+05 8e+05 1e+06 10000 20000 30000 40000 50000

a = 1.3, b = 0.2

N V(N) E[V(N)]

  • bserved

ZM model

slide-84
SLIDE 84

Populations & samples Baroni & Evert The population

Type probabilities Population models ZM & fZM

Sampling from the population

Random samples Expectation Mini-example

Parameter estimation

Trial & error Automatic estimation

A practical example

Parameter estimation by trial & error

  • bserved

ZM model

a = 1.5, b = 7.5

m Vm E[Vm] 5000 10000 15000 20000 25000 0e+00 2e+05 4e+05 6e+05 8e+05 1e+06 10000 20000 30000 40000 50000

a = 1.5, b = 7.5

N V(N) E[V(N)]

  • bserved

ZM model

slide-85
SLIDE 85

Populations & samples Baroni & Evert The population

Type probabilities Population models ZM & fZM

Sampling from the population

Random samples Expectation Mini-example

Parameter estimation

Trial & error Automatic estimation

A practical example

Parameter estimation by trial & error

  • bserved

ZM model

a = 1.7, b = 7.5

m Vm E[Vm] 5000 10000 15000 20000 25000 0e+00 2e+05 4e+05 6e+05 8e+05 1e+06 10000 20000 30000 40000 50000

a = 1.7, b = 7.5

N V(N) E[V(N)]

  • bserved

ZM model

slide-86
SLIDE 86

Populations & samples Baroni & Evert The population

Type probabilities Population models ZM & fZM

Sampling from the population

Random samples Expectation Mini-example

Parameter estimation

Trial & error Automatic estimation

A practical example

Parameter estimation by trial & error

  • bserved

ZM model

a = 1.7, b = 80

m Vm E[Vm] 5000 10000 15000 20000 25000 0e+00 2e+05 4e+05 6e+05 8e+05 1e+06 10000 20000 30000 40000 50000

a = 1.7, b = 80

N V(N) E[V(N)]

  • bserved

ZM model

slide-87
SLIDE 87

Populations & samples Baroni & Evert The population

Type probabilities Population models ZM & fZM

Sampling from the population

Random samples Expectation Mini-example

Parameter estimation

Trial & error Automatic estimation

A practical example

Parameter estimation by trial & error

  • bserved

ZM model

a = 2, b = 550

m Vm E[Vm] 5000 10000 15000 20000 25000 0e+00 2e+05 4e+05 6e+05 8e+05 1e+06 10000 20000 30000 40000 50000

a = 2, b = 550

N V(N) E[V(N)]

  • bserved

ZM model

slide-88
SLIDE 88

Populations & samples Baroni & Evert The population

Type probabilities Population models ZM & fZM

Sampling from the population

Random samples Expectation Mini-example

Parameter estimation

Trial & error Automatic estimation

A practical example

Automatic parameter estimation

◮ Parameter estimation by trial & error is tedious

➜ let the computer to the work!

slide-89
SLIDE 89

Populations & samples Baroni & Evert The population

Type probabilities Population models ZM & fZM

Sampling from the population

Random samples Expectation Mini-example

Parameter estimation

Trial & error Automatic estimation

A practical example

Automatic parameter estimation

◮ Parameter estimation by trial & error is tedious

➜ let the computer to the work!

◮ Need cost function to quantify “distance” between

model expectations and observed data

◮ based on vocabulary size and vocabulary spectrum

(these are the most convenient criteria)

slide-90
SLIDE 90

Populations & samples Baroni & Evert The population

Type probabilities Population models ZM & fZM

Sampling from the population

Random samples Expectation Mini-example

Parameter estimation

Trial & error Automatic estimation

A practical example

Automatic parameter estimation

◮ Parameter estimation by trial & error is tedious

➜ let the computer to the work!

◮ Need cost function to quantify “distance” between

model expectations and observed data

◮ based on vocabulary size and vocabulary spectrum

(these are the most convenient criteria)

◮ Computer estimates parameters by automatic

minimization of cost function

◮ clever algorithms exist that find out quickly in which

direction they have to “push” the parameters to approach the minimum

◮ implemented in standard software packages

slide-91
SLIDE 91

Populations & samples Baroni & Evert The population

Type probabilities Population models ZM & fZM

Sampling from the population

Random samples Expectation Mini-example

Parameter estimation

Trial & error Automatic estimation

A practical example

Cost functions for parameter estimation

◮ Cost functions compare expected frequency spectrum

E[Vm(N0)] with observed spectrum Vm(N0)

slide-92
SLIDE 92

Populations & samples Baroni & Evert The population

Type probabilities Population models ZM & fZM

Sampling from the population

Random samples Expectation Mini-example

Parameter estimation

Trial & error Automatic estimation

A practical example

Cost functions for parameter estimation

◮ Cost functions compare expected frequency spectrum

E[Vm(N0)] with observed spectrum Vm(N0)

◮ Choice #1: how to weight differences

◮ absolute values of differences

M

  • m=1
  • Vm − E[Vm]
  • ◮ mean squared error 1

M

M

  • m=1
  • Vm − E[Vm]

2

◮ chi-squared criterion: scale by estimated variances

slide-93
SLIDE 93

Populations & samples Baroni & Evert The population

Type probabilities Population models ZM & fZM

Sampling from the population

Random samples Expectation Mini-example

Parameter estimation

Trial & error Automatic estimation

A practical example

Cost functions for parameter estimation

◮ Cost functions compare expected frequency spectrum

E[Vm(N0)] with observed spectrum Vm(N0)

◮ Choice #1: how to weight differences ◮ Choice #2: how many spectrum elements to use

◮ typically between M = 2 and M = 15 ◮ what happens if M < number of parameters?

slide-94
SLIDE 94

Populations & samples Baroni & Evert The population

Type probabilities Population models ZM & fZM

Sampling from the population

Random samples Expectation Mini-example

Parameter estimation

Trial & error Automatic estimation

A practical example

Cost functions for parameter estimation

◮ Cost functions compare expected frequency spectrum

E[Vm(N0)] with observed spectrum Vm(N0)

◮ Choice #1: how to weight differences ◮ Choice #2: how many spectrum elements to use

◮ typically between M = 2 and M = 15 ◮ what happens if M < number of parameters?

◮ For many applications, it is important to match V

precisely: additional constraint E[V (N0)] = V (N0)

◮ general principle: you can match as many constraints

as there are free parameters in the model

slide-95
SLIDE 95

Populations & samples Baroni & Evert The population

Type probabilities Population models ZM & fZM

Sampling from the population

Random samples Expectation Mini-example

Parameter estimation

Trial & error Automatic estimation

A practical example

Cost functions for parameter estimation

◮ Cost functions compare expected frequency spectrum

E[Vm(N0)] with observed spectrum Vm(N0)

◮ Choice #1: how to weight differences ◮ Choice #2: how many spectrum elements to use

◮ typically between M = 2 and M = 15 ◮ what happens if M < number of parameters?

◮ For many applications, it is important to match V

precisely: additional constraint E[V (N0)] = V (N0)

◮ general principle: you can match as many constraints

as there are free parameters in the model

◮ Felicitous choice of cost function and M can

substantially improve the quality of the estimated model

◮ It isn’t a science, it’s an art . . .

slide-96
SLIDE 96

Populations & samples Baroni & Evert The population

Type probabilities Population models ZM & fZM

Sampling from the population

Random samples Expectation Mini-example

Parameter estimation

Trial & error Automatic estimation

A practical example

Goodness-of-fit

◮ Automatic estimation procedure minimizes cost function

until no further improvement can be found

◮ this is a so-called local minimum of the cost function ◮ not necessarily the global minimum that we want to find

slide-97
SLIDE 97

Populations & samples Baroni & Evert The population

Type probabilities Population models ZM & fZM

Sampling from the population

Random samples Expectation Mini-example

Parameter estimation

Trial & error Automatic estimation

A practical example

Goodness-of-fit

◮ Automatic estimation procedure minimizes cost function

until no further improvement can be found

◮ this is a so-called local minimum of the cost function ◮ not necessarily the global minimum that we want to find

◮ Key question: is the estimated model good enough?

slide-98
SLIDE 98

Populations & samples Baroni & Evert The population

Type probabilities Population models ZM & fZM

Sampling from the population

Random samples Expectation Mini-example

Parameter estimation

Trial & error Automatic estimation

A practical example

Goodness-of-fit

◮ Automatic estimation procedure minimizes cost function

until no further improvement can be found

◮ this is a so-called local minimum of the cost function ◮ not necessarily the global minimum that we want to find

◮ Key question: is the estimated model good enough? ◮ In other words: does the model provide a plausible

explanation of the observed data as a random sample from the population?

slide-99
SLIDE 99

Populations & samples Baroni & Evert The population

Type probabilities Population models ZM & fZM

Sampling from the population

Random samples Expectation Mini-example

Parameter estimation

Trial & error Automatic estimation

A practical example

Goodness-of-fit

◮ Automatic estimation procedure minimizes cost function

until no further improvement can be found

◮ this is a so-called local minimum of the cost function ◮ not necessarily the global minimum that we want to find

◮ Key question: is the estimated model good enough? ◮ In other words: does the model provide a plausible

explanation of the observed data as a random sample from the population?

◮ Can be measured by goodness-of-fit test

◮ use special tests for such models (Baayen 2001) ◮ p-value specifies whether model is plausible ◮ small p-value ➜ reject model as explanation for data

➥ we want to achieve a high p-value

slide-100
SLIDE 100

Populations & samples Baroni & Evert The population

Type probabilities Population models ZM & fZM

Sampling from the population

Random samples Expectation Mini-example

Parameter estimation

Trial & error Automatic estimation

A practical example

Goodness-of-fit

◮ Automatic estimation procedure minimizes cost function

until no further improvement can be found

◮ this is a so-called local minimum of the cost function ◮ not necessarily the global minimum that we want to find

◮ Key question: is the estimated model good enough? ◮ In other words: does the model provide a plausible

explanation of the observed data as a random sample from the population?

◮ Can be measured by goodness-of-fit test

◮ use special tests for such models (Baayen 2001) ◮ p-value specifies whether model is plausible ◮ small p-value ➜ reject model as explanation for data

➥ we want to achieve a high p-value

◮ Typically, we find p < .001 – but the models can still be

useful for many purposes!

slide-101
SLIDE 101

Populations & samples Baroni & Evert The population

Type probabilities Population models ZM & fZM

Sampling from the population

Random samples Expectation Mini-example

Parameter estimation

Trial & error Automatic estimation

A practical example

Mini-example (cont’d)

  • bserved

ZM model

a = 1.5, b = 7.5

m Vm E[Vm] 5000 10000 15000 20000 25000 0e+00 2e+05 4e+05 6e+05 8e+05 1e+06 10000 20000 30000 40000 50000

a = 1.5, b = 7.5

N V(N) E[V(N)]

  • bserved

ZM model

◮ We started with a = 1.5 and b = 7.5

(general assumptions)

slide-102
SLIDE 102

Populations & samples Baroni & Evert The population

Type probabilities Population models ZM & fZM

Sampling from the population

Random samples Expectation Mini-example

Parameter estimation

Trial & error Automatic estimation

A practical example

Mini-example (cont’d)

  • bserved

ZM model

a = 2, b = 550

m Vm E[Vm] 5000 10000 15000 20000 25000 0e+00 2e+05 4e+05 6e+05 8e+05 1e+06 10000 20000 30000 40000 50000

a = 2, b = 550

N V(N) E[V(N)]

  • bserved

ZM model

◮ By trial & error we found a = 2.0 and b = 550

slide-103
SLIDE 103

Populations & samples Baroni & Evert The population

Type probabilities Population models ZM & fZM

Sampling from the population

Random samples Expectation Mini-example

Parameter estimation

Trial & error Automatic estimation

A practical example

Mini-example (cont’d)

  • bserved

expected

a = 2.39, b = 1968.49

m Vm E[Vm] 5000 10000 15000 20000 25000 0e+00 2e+05 4e+05 6e+05 8e+05 1e+06 10000 20000 30000 40000 50000

a = 2.39, b = 1968.49

N V(N) E[V(N)]

  • bserved

expected

◮ Automatic estimation procedure: a = 2.39 and b = 1968 ◮ Goodness-of-fit: p ≈ 0 (but much better than before!)

slide-104
SLIDE 104

Populations & samples Baroni & Evert The population

Type probabilities Population models ZM & fZM

Sampling from the population

Random samples Expectation Mini-example

Parameter estimation

Trial & error Automatic estimation

A practical example

Outline

The type population Sampling from the population Parameter estimation A practical example

slide-105
SLIDE 105

Populations & samples Baroni & Evert The population

Type probabilities Population models ZM & fZM

Sampling from the population

Random samples Expectation Mini-example

Parameter estimation

Trial & error Automatic estimation

A practical example

Practical example: Oliver Twist

◮ A practical example: extrapolate vocabulary growth in

Dickens’ novel Oliver Twist

slide-106
SLIDE 106

Populations & samples Baroni & Evert The population

Type probabilities Population models ZM & fZM

Sampling from the population

Random samples Expectation Mini-example

Parameter estimation

Trial & error Automatic estimation

A practical example

Practical example: Oliver Twist

◮ A practical example: extrapolate vocabulary growth in

Dickens’ novel Oliver Twist

◮ Observed data: N0 = 157302, V (N0) = 10710

slide-107
SLIDE 107

Populations & samples Baroni & Evert The population

Type probabilities Population models ZM & fZM

Sampling from the population

Random samples Expectation Mini-example

Parameter estimation

Trial & error Automatic estimation

A practical example

Practical example: Oliver Twist

◮ A practical example: extrapolate vocabulary growth in

Dickens’ novel Oliver Twist

◮ Observed data: N0 = 157302, V (N0) = 10710 ◮ Our choices (experimentation & experience):

◮ population model: finite Zipf-Mandelbrot ◮ cost function: chi-squared type ◮ number of spectrum elements: M = 10 ◮ additional constraint: E[V (N0)] = V (N0)

slide-108
SLIDE 108

Populations & samples Baroni & Evert The population

Type probabilities Population models ZM & fZM

Sampling from the population

Random samples Expectation Mini-example

Parameter estimation

Trial & error Automatic estimation

A practical example

Practical example: Oliver Twist

◮ A practical example: extrapolate vocabulary growth in

Dickens’ novel Oliver Twist

◮ Observed data: N0 = 157302, V (N0) = 10710 ◮ Our choices (experimentation & experience):

◮ population model: finite Zipf-Mandelbrot ◮ cost function: chi-squared type ◮ number of spectrum elements: M = 10 ◮ additional constraint: E[V (N0)] = V (N0)

◮ Automatic parameter estimation yields

a = 1.45, b = 34.6, S = 20587

◮ population vocabulary size is extremely small ◮ but this model extrapolates only the vocabulary used in

Oliver Twist, not the full vocabulary of Charles Dickens

slide-109
SLIDE 109

Populations & samples Baroni & Evert The population

Type probabilities Population models ZM & fZM

Sampling from the population

Random samples Expectation Mini-example

Parameter estimation

Trial & error Automatic estimation

A practical example

Results for Oliver Twist

  • bserved

expected

a = 1.45, b = 34.59, S = 20587

m Vm E[Vm] 1000 2000 3000 4000 5000 50000 150000 250000 350000 5000 10000 15000

a = 1.45, b = 34.59, S = 20587

N V(N) E[V(N)] V1(N) E[V1(N)]

  • bserved

expected

◮ Goodness-of-fit: p = 3.6 · 10−40

◮ but visually, the approximation is very good

slide-110
SLIDE 110

Populations & samples Baroni & Evert The population

Type probabilities Population models ZM & fZM

Sampling from the population

Random samples Expectation Mini-example

Parameter estimation

Trial & error Automatic estimation

A practical example

Results for Oliver Twist

  • bserved

expected

a = 1.45, b = 34.59, S = 20587

m Vm E[Vm] 1000 2000 3000 4000 5000 50000 150000 250000 350000 5000 10000 15000

a = 1.45, b = 34.59, S = 20587

N V(N) E[V(N)] V1(N) E[V1(N)]

  • bserved

expected

◮ Goodness-of-fit: p = 3.6 · 10−40

◮ but visually, the approximation is very good