Populations & samples Baroni & Evert The population
Type probabilities Population models ZM & fZM
Sampling from the population
Random samples Expectation Mini-example
Parameter estimation
Trial & error Automatic estimation
A practical example
Counting Words: Type probabilities Population models Type-rich - - PowerPoint PPT Presentation
Populations & samples Baroni & Evert The population Counting Words: Type probabilities Population models Type-rich populations, samples, ZM & fZM Sampling from and statistical models the population Random samples Expectation
Populations & samples Baroni & Evert The population
Type probabilities Population models ZM & fZM
Sampling from the population
Random samples Expectation Mini-example
Parameter estimation
Trial & error Automatic estimation
A practical example
Populations & samples Baroni & Evert The population
Type probabilities Population models ZM & fZM
Sampling from the population
Random samples Expectation Mini-example
Parameter estimation
Trial & error Automatic estimation
A practical example
Populations & samples Baroni & Evert The population
Type probabilities Population models ZM & fZM
Sampling from the population
Random samples Expectation Mini-example
Parameter estimation
Trial & error Automatic estimation
A practical example
Populations & samples Baroni & Evert The population
Type probabilities Population models ZM & fZM
Sampling from the population
Random samples Expectation Mini-example
Parameter estimation
Trial & error Automatic estimation
A practical example
Populations & samples Baroni & Evert The population
Type probabilities Population models ZM & fZM
Sampling from the population
Random samples Expectation Mini-example
Parameter estimation
Trial & error Automatic estimation
A practical example
Populations & samples Baroni & Evert The population
Type probabilities Population models ZM & fZM
Sampling from the population
Random samples Expectation Mini-example
Parameter estimation
Trial & error Automatic estimation
A practical example
◮ we don’t care whether w43194 is wormhole or heatwave
Populations & samples Baroni & Evert The population
Type probabilities Population models ZM & fZM
Sampling from the population
Random samples Expectation Mini-example
Parameter estimation
Trial & error Automatic estimation
A practical example
◮ we don’t care whether w43194 is wormhole or heatwave
Populations & samples Baroni & Evert The population
Type probabilities Population models ZM & fZM
Sampling from the population
Random samples Expectation Mini-example
Parameter estimation
Trial & error Automatic estimation
A practical example
Populations & samples Baroni & Evert The population
Type probabilities Population models ZM & fZM
Sampling from the population
Random samples Expectation Mini-example
Parameter estimation
Trial & error Automatic estimation
A practical example
Populations & samples Baroni & Evert The population
Type probabilities Population models ZM & fZM
Sampling from the population
Random samples Expectation Mini-example
Parameter estimation
Trial & error Automatic estimation
A practical example
Populations & samples Baroni & Evert The population
Type probabilities Population models ZM & fZM
Sampling from the population
Random samples Expectation Mini-example
Parameter estimation
Trial & error Automatic estimation
A practical example
Populations & samples Baroni & Evert The population
Type probabilities Population models ZM & fZM
Sampling from the population
Random samples Expectation Mini-example
Parameter estimation
Trial & error Automatic estimation
A practical example
◮ e.g. population = “written English”, formalized as all
◮ also: πk = chances that a token drawn at random
Populations & samples Baroni & Evert The population
Type probabilities Population models ZM & fZM
Sampling from the population
Random samples Expectation Mini-example
Parameter estimation
Trial & error Automatic estimation
A practical example
◮ e.g. population = “written English”, formalized as all
◮ also: πk = chances that a token drawn at random
◮ e.g. psycholinguistic model of a human speaker ◮ πk = probability that next word uttered by the speaker
Populations & samples Baroni & Evert The population
Type probabilities Population models ZM & fZM
Sampling from the population
Random samples Expectation Mini-example
Parameter estimation
Trial & error Automatic estimation
A practical example
◮ e.g. population = “written English”, formalized as all
◮ also: πk = chances that a token drawn at random
◮ e.g. psycholinguistic model of a human speaker ◮ πk = probability that next word uttered by the speaker
Populations & samples Baroni & Evert The population
Type probabilities Population models ZM & fZM
Sampling from the population
Random samples Expectation Mini-example
Parameter estimation
Trial & error Automatic estimation
A practical example
Populations & samples Baroni & Evert The population
Type probabilities Population models ZM & fZM
Sampling from the population
Random samples Expectation Mini-example
Parameter estimation
Trial & error Automatic estimation
A practical example
Populations & samples Baroni & Evert The population
Type probabilities Population models ZM & fZM
Sampling from the population
Random samples Expectation Mini-example
Parameter estimation
Trial & error Automatic estimation
A practical example
Populations & samples Baroni & Evert The population
Type probabilities Population models ZM & fZM
Sampling from the population
Random samples Expectation Mini-example
Parameter estimation
Trial & error Automatic estimation
A practical example
Populations & samples Baroni & Evert The population
Type probabilities Population models ZM & fZM
Sampling from the population
Random samples Expectation Mini-example
Parameter estimation
Trial & error Automatic estimation
A practical example
Populations & samples Baroni & Evert The population
Type probabilities Population models ZM & fZM
Sampling from the population
Random samples Expectation Mini-example
Parameter estimation
Trial & error Automatic estimation
A practical example
Populations & samples Baroni & Evert The population
Type probabilities Population models ZM & fZM
Sampling from the population
Random samples Expectation Mini-example
Parameter estimation
Trial & error Automatic estimation
A practical example
20 30 40 50 0.00 0.02 0.04 0.06 0.08 0.10 k πk
20 30 40 50 0.00 0.02 0.04 0.06 0.08 0.10 k πk
20 30 40 50 0.00 0.02 0.04 0.06 0.08 0.10 k πk
20 30 40 50 0.00 0.02 0.04 0.06 0.08 0.10 k πk
Populations & samples Baroni & Evert The population
Type probabilities Population models ZM & fZM
Sampling from the population
Random samples Expectation Mini-example
Parameter estimation
Trial & error Automatic estimation
A practical example
Populations & samples Baroni & Evert The population
Type probabilities Population models ZM & fZM
Sampling from the population
Random samples Expectation Mini-example
Parameter estimation
Trial & error Automatic estimation
A practical example
Populations & samples Baroni & Evert The population
Type probabilities Population models ZM & fZM
Sampling from the population
Random samples Expectation Mini-example
Parameter estimation
Trial & error Automatic estimation
A practical example
Populations & samples Baroni & Evert The population
Type probabilities Population models ZM & fZM
Sampling from the population
Random samples Expectation Mini-example
Parameter estimation
Trial & error Automatic estimation
A practical example
Populations & samples Baroni & Evert The population
Type probabilities Population models ZM & fZM
Sampling from the population
Random samples Expectation Mini-example
Parameter estimation
Trial & error Automatic estimation
A practical example
20 30 40 50 0.00 0.02 0.04 0.06 0.08 0.10 k πk
a = 1.2 b = 1.5
20 30 40 50 0.00 0.02 0.04 0.06 0.08 0.10 k πk
a = 2 b = 10
20 30 40 50 0.00 0.02 0.04 0.06 0.08 0.10 k πk
a = 2 b = 15
20 30 40 50 0.00 0.02 0.04 0.06 0.08 0.10 k πk
a = 5 b = 40
Populations & samples Baroni & Evert The population
Type probabilities Population models ZM & fZM
Sampling from the population
Random samples Expectation Mini-example
Parameter estimation
Trial & error Automatic estimation
A practical example
2 5 10 20 50 100 1e−04 5e−04 5e−03 5e−02 k πk
a = 1.2 b = 1.5
2 5 10 20 50 100 1e−04 5e−04 5e−03 5e−02 k πk
a = 2 b = 10
2 5 10 20 50 100 1e−04 5e−04 5e−03 5e−02 k πk
a = 2 b = 15
2 5 10 20 50 100 1e−04 5e−04 5e−03 5e−02 k πk
a = 5 b = 40
Populations & samples Baroni & Evert The population
Type probabilities Population models ZM & fZM
Sampling from the population
Random samples Expectation Mini-example
Parameter estimation
Trial & error Automatic estimation
A practical example
Populations & samples Baroni & Evert The population
Type probabilities Population models ZM & fZM
Sampling from the population
Random samples Expectation Mini-example
Parameter estimation
Trial & error Automatic estimation
A practical example
Populations & samples Baroni & Evert The population
Type probabilities Population models ZM & fZM
Sampling from the population
Random samples Expectation Mini-example
Parameter estimation
Trial & error Automatic estimation
A practical example
Populations & samples Baroni & Evert The population
Type probabilities Population models ZM & fZM
Sampling from the population
Random samples Expectation Mini-example
Parameter estimation
Trial & error Automatic estimation
A practical example
Populations & samples Baroni & Evert The population
Type probabilities Population models ZM & fZM
Sampling from the population
Random samples Expectation Mini-example
Parameter estimation
Trial & error Automatic estimation
A practical example
Populations & samples Baroni & Evert The population
Type probabilities Population models ZM & fZM
Sampling from the population
Random samples Expectation Mini-example
Parameter estimation
Trial & error Automatic estimation
A practical example
Populations & samples Baroni & Evert The population
Type probabilities Population models ZM & fZM
Sampling from the population
Random samples Expectation Mini-example
Parameter estimation
Trial & error Automatic estimation
A practical example
◮ we’ll see later how we can do this
Populations & samples Baroni & Evert The population
Type probabilities Population models ZM & fZM
Sampling from the population
Random samples Expectation Mini-example
Parameter estimation
Trial & error Automatic estimation
A practical example
◮ we’ll see later how we can do this
◮ basic assumption: real data sets (such as corpora) are
Populations & samples Baroni & Evert The population
Type probabilities Population models ZM & fZM
Sampling from the population
Random samples Expectation Mini-example
Parameter estimation
Trial & error Automatic estimation
A practical example
◮ we’ll see later how we can do this
◮ basic assumption: real data sets (such as corpora) are
◮ this allows us to predict vocabulary growth, the number
Populations & samples Baroni & Evert The population
Type probabilities Population models ZM & fZM
Sampling from the population
Random samples Expectation Mini-example
Parameter estimation
Trial & error Automatic estimation
A practical example
◮ we’ll see later how we can do this
◮ basic assumption: real data sets (such as corpora) are
◮ this allows us to predict vocabulary growth, the number
◮ it will also allow us to estimate the model parameters
Populations & samples Baroni & Evert The population
Type probabilities Population models ZM & fZM
Sampling from the population
Random samples Expectation Mini-example
Parameter estimation
Trial & error Automatic estimation
A practical example
Populations & samples Baroni & Evert The population
Type probabilities Population models ZM & fZM
Sampling from the population
Random samples Expectation Mini-example
Parameter estimation
Trial & error Automatic estimation
A practical example
20 30 40 50 0.00 0.01 0.02 0.03 0.04 0.05 k πk
a = 3 b = 50
2 5 10 20 50 100 1e−04 5e−04 5e−03 5e−02 k πk
a = 3 b = 50
Populations & samples Baroni & Evert The population
Type probabilities Population models ZM & fZM
Sampling from the population
Random samples Expectation Mini-example
Parameter estimation
Trial & error Automatic estimation
A practical example
20 30 40 50 0.00 0.01 0.02 0.03 0.04 0.05 k πk
a = 3 b = 50
2 5 10 20 50 100 1e−04 5e−04 5e−03 5e−02 k πk
a = 3 b = 50
Populations & samples Baroni & Evert The population
Type probabilities Population models ZM & fZM
Sampling from the population
Random samples Expectation Mini-example
Parameter estimation
Trial & error Automatic estimation
A practical example
Populations & samples Baroni & Evert The population
Type probabilities Population models ZM & fZM
Sampling from the population
Random samples Expectation Mini-example
Parameter estimation
Trial & error Automatic estimation
A practical example
Populations & samples Baroni & Evert The population
Type probabilities Population models ZM & fZM
Sampling from the population
Random samples Expectation Mini-example
Parameter estimation
Trial & error Automatic estimation
A practical example
Populations & samples Baroni & Evert The population
Type probabilities Population models ZM & fZM
Sampling from the population
Random samples Expectation Mini-example
Parameter estimation
Trial & error Automatic estimation
A practical example
Populations & samples Baroni & Evert The population
Type probabilities Population models ZM & fZM
Sampling from the population
Random samples Expectation Mini-example
Parameter estimation
Trial & error Automatic estimation
A practical example
Populations & samples Baroni & Evert The population
Type probabilities Population models ZM & fZM
Sampling from the population
Random samples Expectation Mini-example
Parameter estimation
Trial & error Automatic estimation
A practical example
Populations & samples Baroni & Evert The population
Type probabilities Population models ZM & fZM
Sampling from the population
Random samples Expectation Mini-example
Parameter estimation
Trial & error Automatic estimation
A practical example
◮ the computer can do it efficiently even for large N
Populations & samples Baroni & Evert The population
Type probabilities Population models ZM & fZM
Sampling from the population
Random samples Expectation Mini-example
Parameter estimation
Trial & error Automatic estimation
A practical example
◮ the computer can do it efficiently even for large N
Populations & samples Baroni & Evert The population
Type probabilities Population models ZM & fZM
Sampling from the population
Random samples Expectation Mini-example
Parameter estimation
Trial & error Automatic estimation
A practical example
◮ the computer can do it efficiently even for large N
◮ i.e., we can analyze them with the same methods that we
Populations & samples Baroni & Evert The population
Type probabilities Population models ZM & fZM
Sampling from the population
Random samples Expectation Mini-example
Parameter estimation
Trial & error Automatic estimation
A practical example
◮ the computer can do it efficiently even for large N
◮ i.e., we can analyze them with the same methods that we
Populations & samples Baroni & Evert The population
Type probabilities Population models ZM & fZM
Sampling from the population
Random samples Expectation Mini-example
Parameter estimation
Trial & error Automatic estimation
A practical example
Populations & samples Baroni & Evert The population
Type probabilities Population models ZM & fZM
Sampling from the population
Random samples Expectation Mini-example
Parameter estimation
Trial & error Automatic estimation
A practical example
Populations & samples Baroni & Evert The population
Type probabilities Population models ZM & fZM
Sampling from the population
Random samples Expectation Mini-example
Parameter estimation
Trial & error Automatic estimation
A practical example
20 30 40 50 10 20 30 40
Sample #1
r fr
20 30 40 50 10 20 30 40
Sample #2
r fr
Populations & samples Baroni & Evert The population
Type probabilities Population models ZM & fZM
Sampling from the population
Random samples Expectation Mini-example
Parameter estimation
Trial & error Automatic estimation
A practical example
20 30 40 50 10 20 30 40
Sample #1
r fr
20 30 40 50 10 20 30 40
Sample #2
r fr
20 30 40 50 10 20 30 40
Sample #1
k fk
20 30 40 50 10 20 30 40
Sample #2
k fk
Populations & samples Baroni & Evert The population
Type probabilities Population models ZM & fZM
Sampling from the population
Random samples Expectation Mini-example
Parameter estimation
Trial & error Automatic estimation
A practical example
◮ particularly obvious when we plot them in population
Populations & samples Baroni & Evert The population
Type probabilities Population models ZM & fZM
Sampling from the population
Random samples Expectation Mini-example
Parameter estimation
Trial & error Automatic estimation
A practical example
◮ particularly obvious when we plot them in population
◮ Zipf rank r in sample = population rank k! ◮ leads to severe problems with statistical methods
Populations & samples Baroni & Evert The population
Type probabilities Population models ZM & fZM
Sampling from the population
Random samples Expectation Mini-example
Parameter estimation
Trial & error Automatic estimation
A practical example
◮ particularly obvious when we plot them in population
◮ Zipf rank r in sample = population rank k! ◮ leads to severe problems with statistical methods
◮ frequency spectrum ◮ vocabulary growth curve
Populations & samples Baroni & Evert The population
Type probabilities Population models ZM & fZM
Sampling from the population
Random samples Expectation Mini-example
Parameter estimation
Trial & error Automatic estimation
A practical example
◮ particularly obvious when we plot them in population
◮ Zipf rank r in sample = population rank k! ◮ leads to severe problems with statistical methods
◮ frequency spectrum ◮ vocabulary growth curve
Populations & samples Baroni & Evert The population
Type probabilities Population models ZM & fZM
Sampling from the population
Random samples Expectation Mini-example
Parameter estimation
Trial & error Automatic estimation
A practical example
Sample #1
m Vm 20 40 60 80 100
Sample #2
m Vm 20 40 60 80 100
Sample #3
m Vm 20 40 60 80 100
Sample #4
m Vm 20 40 60 80 100
Populations & samples Baroni & Evert The population
Type probabilities Population models ZM & fZM
Sampling from the population
Random samples Expectation Mini-example
Parameter estimation
Trial & error Automatic estimation
A practical example
200 400 600 800 1000 50 100 150 200
Sample #1
N V(N) V1(N) 200 400 600 800 1000 50 100 150 200
Sample #2
N V(N) V1(N) 200 400 600 800 1000 50 100 150 200
Sample #3
N V(N) V1(N) 200 400 600 800 1000 50 100 150 200
Sample #4
N V(N) V1(N)
Populations & samples Baroni & Evert The population
Type probabilities Population models ZM & fZM
Sampling from the population
Random samples Expectation Mini-example
Parameter estimation
Trial & error Automatic estimation
A practical example
Populations & samples Baroni & Evert The population
Type probabilities Population models ZM & fZM
Sampling from the population
Random samples Expectation Mini-example
Parameter estimation
Trial & error Automatic estimation
A practical example
Populations & samples Baroni & Evert The population
Type probabilities Population models ZM & fZM
Sampling from the population
Random samples Expectation Mini-example
Parameter estimation
Trial & error Automatic estimation
A practical example
Populations & samples Baroni & Evert The population
Type probabilities Population models ZM & fZM
Sampling from the population
Random samples Expectation Mini-example
Parameter estimation
Trial & error Automatic estimation
A practical example
◮ indicates that we are referring to expected values for a
◮ rather than to the specific values V and Vm
Populations & samples Baroni & Evert The population
Type probabilities Population models ZM & fZM
Sampling from the population
Random samples Expectation Mini-example
Parameter estimation
Trial & error Automatic estimation
A practical example
Vm E[Vm]
Sample #1
m Vm E[Vm] 20 40 60 80 100 Vm E[Vm]
Sample #2
m Vm E[Vm] 20 40 60 80 100 Vm E[Vm]
Sample #3
m Vm E[Vm] 20 40 60 80 100 Vm E[Vm]
Sample #4
m Vm E[Vm] 20 40 60 80 100
Populations & samples Baroni & Evert The population
Type probabilities Population models ZM & fZM
Sampling from the population
Random samples Expectation Mini-example
Parameter estimation
Trial & error Automatic estimation
A practical example
200 400 600 800 1000 50 100 150 200
Sample #1
N E[V(N)] V(N) E[V(N)] 200 400 600 800 1000 50 100 150 200
Sample #1
N E[V1(N)] V1(N) E[V1(N)]
Populations & samples Baroni & Evert The population
Type probabilities Population models ZM & fZM
Sampling from the population
Random samples Expectation Mini-example
Parameter estimation
Trial & error Automatic estimation
A practical example
Populations & samples Baroni & Evert The population
Type probabilities Population models ZM & fZM
Sampling from the population
Random samples Expectation Mini-example
Parameter estimation
Trial & error Automatic estimation
A practical example
◮ example: expected VGCs with confidence intervals ◮ we won’t pursue variance any further in this course
Populations & samples Baroni & Evert The population
Type probabilities Population models ZM & fZM
Sampling from the population
Random samples Expectation Mini-example
Parameter estimation
Trial & error Automatic estimation
A practical example
200 400 600 800 1000 50 100 150 200
Sample #1
N E[V(N)] V(N) E[V(N)] 200 400 600 800 1000 50 100 150 200
Sample #1
N E[V1(N)] V1(N) E[V1(N)]
Populations & samples Baroni & Evert The population
Type probabilities Population models ZM & fZM
Sampling from the population
Random samples Expectation Mini-example
Parameter estimation
Trial & error Automatic estimation
A practical example
◮ a ≈ 1.5 seems a more reasonable value when you
Populations & samples Baroni & Evert The population
Type probabilities Population models ZM & fZM
Sampling from the population
Random samples Expectation Mini-example
Parameter estimation
Trial & error Automatic estimation
A practical example
◮ a ≈ 1.5 seems a more reasonable value when you
Populations & samples Baroni & Evert The population
Type probabilities Population models ZM & fZM
Sampling from the population
Random samples Expectation Mini-example
Parameter estimation
Trial & error Automatic estimation
A practical example
◮ a ≈ 1.5 seems a more reasonable value when you
Populations & samples Baroni & Evert The population
Type probabilities Population models ZM & fZM
Sampling from the population
Random samples Expectation Mini-example
Parameter estimation
Trial & error Automatic estimation
A practical example
◮ N = 1,000,000 ➜ E[V (N)] = 33026.7 ◮ 95%-confidence interval: V (N) = 32753.6 . . . 33299.7
Populations & samples Baroni & Evert The population
Type probabilities Population models ZM & fZM
Sampling from the population
Random samples Expectation Mini-example
Parameter estimation
Trial & error Automatic estimation
A practical example
◮ N = 1,000,000 ➜ E[V (N)] = 33026.7 ◮ 95%-confidence interval: V (N) = 32753.6 . . . 33299.7
◮ Brown corpus: 1 million words of edited American English ◮ V = 45215 ➜ ZM model is not quite right
Populations & samples Baroni & Evert The population
Type probabilities Population models ZM & fZM
Sampling from the population
Random samples Expectation Mini-example
Parameter estimation
Trial & error Automatic estimation
A practical example
◮ N = 1,000,000 ➜ E[V (N)] = 33026.7 ◮ 95%-confidence interval: V (N) = 32753.6 . . . 33299.7
◮ Brown corpus: 1 million words of edited American English ◮ V = 45215 ➜ ZM model is not quite right ◮ Physicists (and some mathematicians) are happy as long
Populations & samples Baroni & Evert The population
Type probabilities Population models ZM & fZM
Sampling from the population
Random samples Expectation Mini-example
Parameter estimation
Trial & error Automatic estimation
A practical example
◮ N = 1,000,000 ➜ E[V (N)] = 33026.7 ◮ 95%-confidence interval: V (N) = 32753.6 . . . 33299.7
◮ Brown corpus: 1 million words of edited American English ◮ V = 45215 ➜ ZM model is not quite right ◮ Physicists (and some mathematicians) are happy as long
Populations & samples Baroni & Evert The population
Type probabilities Population models ZM & fZM
Sampling from the population
Random samples Expectation Mini-example
Parameter estimation
Trial & error Automatic estimation
A practical example
Populations & samples Baroni & Evert The population
Type probabilities Population models ZM & fZM
Sampling from the population
Random samples Expectation Mini-example
Parameter estimation
Trial & error Automatic estimation
A practical example
Populations & samples Baroni & Evert The population
Type probabilities Population models ZM & fZM
Sampling from the population
Random samples Expectation Mini-example
Parameter estimation
Trial & error Automatic estimation
A practical example
Populations & samples Baroni & Evert The population
Type probabilities Population models ZM & fZM
Sampling from the population
Random samples Expectation Mini-example
Parameter estimation
Trial & error Automatic estimation
A practical example
◮ guess parameters ◮ compare model predictions for sample of size N0
◮ based on frequency spectrum or vocabulary growth curve ◮ change parameters & repeat until satisfied
Populations & samples Baroni & Evert The population
Type probabilities Population models ZM & fZM
Sampling from the population
Random samples Expectation Mini-example
Parameter estimation
Trial & error Automatic estimation
A practical example
ZM model
a = 1.5, b = 7.5
m Vm E[Vm] 5000 10000 15000 20000 25000 0e+00 2e+05 4e+05 6e+05 8e+05 1e+06 10000 20000 30000 40000 50000
a = 1.5, b = 7.5
N V(N) E[V(N)]
ZM model
Populations & samples Baroni & Evert The population
Type probabilities Population models ZM & fZM
Sampling from the population
Random samples Expectation Mini-example
Parameter estimation
Trial & error Automatic estimation
A practical example
ZM model
a = 1.3, b = 7.5
m Vm E[Vm] 5000 10000 15000 20000 25000 0e+00 2e+05 4e+05 6e+05 8e+05 1e+06 10000 20000 30000 40000 50000
a = 1.3, b = 7.5
N V(N) E[V(N)]
ZM model
Populations & samples Baroni & Evert The population
Type probabilities Population models ZM & fZM
Sampling from the population
Random samples Expectation Mini-example
Parameter estimation
Trial & error Automatic estimation
A practical example
ZM model
a = 1.3, b = 0.2
m Vm E[Vm] 5000 10000 15000 20000 25000 0e+00 2e+05 4e+05 6e+05 8e+05 1e+06 10000 20000 30000 40000 50000
a = 1.3, b = 0.2
N V(N) E[V(N)]
ZM model
Populations & samples Baroni & Evert The population
Type probabilities Population models ZM & fZM
Sampling from the population
Random samples Expectation Mini-example
Parameter estimation
Trial & error Automatic estimation
A practical example
ZM model
a = 1.5, b = 7.5
m Vm E[Vm] 5000 10000 15000 20000 25000 0e+00 2e+05 4e+05 6e+05 8e+05 1e+06 10000 20000 30000 40000 50000
a = 1.5, b = 7.5
N V(N) E[V(N)]
ZM model
Populations & samples Baroni & Evert The population
Type probabilities Population models ZM & fZM
Sampling from the population
Random samples Expectation Mini-example
Parameter estimation
Trial & error Automatic estimation
A practical example
ZM model
a = 1.7, b = 7.5
m Vm E[Vm] 5000 10000 15000 20000 25000 0e+00 2e+05 4e+05 6e+05 8e+05 1e+06 10000 20000 30000 40000 50000
a = 1.7, b = 7.5
N V(N) E[V(N)]
ZM model
Populations & samples Baroni & Evert The population
Type probabilities Population models ZM & fZM
Sampling from the population
Random samples Expectation Mini-example
Parameter estimation
Trial & error Automatic estimation
A practical example
ZM model
a = 1.7, b = 80
m Vm E[Vm] 5000 10000 15000 20000 25000 0e+00 2e+05 4e+05 6e+05 8e+05 1e+06 10000 20000 30000 40000 50000
a = 1.7, b = 80
N V(N) E[V(N)]
ZM model
Populations & samples Baroni & Evert The population
Type probabilities Population models ZM & fZM
Sampling from the population
Random samples Expectation Mini-example
Parameter estimation
Trial & error Automatic estimation
A practical example
ZM model
a = 2, b = 550
m Vm E[Vm] 5000 10000 15000 20000 25000 0e+00 2e+05 4e+05 6e+05 8e+05 1e+06 10000 20000 30000 40000 50000
a = 2, b = 550
N V(N) E[V(N)]
ZM model
Populations & samples Baroni & Evert The population
Type probabilities Population models ZM & fZM
Sampling from the population
Random samples Expectation Mini-example
Parameter estimation
Trial & error Automatic estimation
A practical example
Populations & samples Baroni & Evert The population
Type probabilities Population models ZM & fZM
Sampling from the population
Random samples Expectation Mini-example
Parameter estimation
Trial & error Automatic estimation
A practical example
◮ based on vocabulary size and vocabulary spectrum
Populations & samples Baroni & Evert The population
Type probabilities Population models ZM & fZM
Sampling from the population
Random samples Expectation Mini-example
Parameter estimation
Trial & error Automatic estimation
A practical example
◮ based on vocabulary size and vocabulary spectrum
◮ clever algorithms exist that find out quickly in which
◮ implemented in standard software packages
Populations & samples Baroni & Evert The population
Type probabilities Population models ZM & fZM
Sampling from the population
Random samples Expectation Mini-example
Parameter estimation
Trial & error Automatic estimation
A practical example
Populations & samples Baroni & Evert The population
Type probabilities Population models ZM & fZM
Sampling from the population
Random samples Expectation Mini-example
Parameter estimation
Trial & error Automatic estimation
A practical example
◮ absolute values of differences
M
M
◮ chi-squared criterion: scale by estimated variances
Populations & samples Baroni & Evert The population
Type probabilities Population models ZM & fZM
Sampling from the population
Random samples Expectation Mini-example
Parameter estimation
Trial & error Automatic estimation
A practical example
◮ typically between M = 2 and M = 15 ◮ what happens if M < number of parameters?
Populations & samples Baroni & Evert The population
Type probabilities Population models ZM & fZM
Sampling from the population
Random samples Expectation Mini-example
Parameter estimation
Trial & error Automatic estimation
A practical example
◮ typically between M = 2 and M = 15 ◮ what happens if M < number of parameters?
◮ general principle: you can match as many constraints
Populations & samples Baroni & Evert The population
Type probabilities Population models ZM & fZM
Sampling from the population
Random samples Expectation Mini-example
Parameter estimation
Trial & error Automatic estimation
A practical example
◮ typically between M = 2 and M = 15 ◮ what happens if M < number of parameters?
◮ general principle: you can match as many constraints
◮ It isn’t a science, it’s an art . . .
Populations & samples Baroni & Evert The population
Type probabilities Population models ZM & fZM
Sampling from the population
Random samples Expectation Mini-example
Parameter estimation
Trial & error Automatic estimation
A practical example
◮ this is a so-called local minimum of the cost function ◮ not necessarily the global minimum that we want to find
Populations & samples Baroni & Evert The population
Type probabilities Population models ZM & fZM
Sampling from the population
Random samples Expectation Mini-example
Parameter estimation
Trial & error Automatic estimation
A practical example
◮ this is a so-called local minimum of the cost function ◮ not necessarily the global minimum that we want to find
Populations & samples Baroni & Evert The population
Type probabilities Population models ZM & fZM
Sampling from the population
Random samples Expectation Mini-example
Parameter estimation
Trial & error Automatic estimation
A practical example
◮ this is a so-called local minimum of the cost function ◮ not necessarily the global minimum that we want to find
Populations & samples Baroni & Evert The population
Type probabilities Population models ZM & fZM
Sampling from the population
Random samples Expectation Mini-example
Parameter estimation
Trial & error Automatic estimation
A practical example
◮ this is a so-called local minimum of the cost function ◮ not necessarily the global minimum that we want to find
◮ use special tests for such models (Baayen 2001) ◮ p-value specifies whether model is plausible ◮ small p-value ➜ reject model as explanation for data
Populations & samples Baroni & Evert The population
Type probabilities Population models ZM & fZM
Sampling from the population
Random samples Expectation Mini-example
Parameter estimation
Trial & error Automatic estimation
A practical example
◮ this is a so-called local minimum of the cost function ◮ not necessarily the global minimum that we want to find
◮ use special tests for such models (Baayen 2001) ◮ p-value specifies whether model is plausible ◮ small p-value ➜ reject model as explanation for data
Populations & samples Baroni & Evert The population
Type probabilities Population models ZM & fZM
Sampling from the population
Random samples Expectation Mini-example
Parameter estimation
Trial & error Automatic estimation
A practical example
ZM model
a = 1.5, b = 7.5
m Vm E[Vm] 5000 10000 15000 20000 25000 0e+00 2e+05 4e+05 6e+05 8e+05 1e+06 10000 20000 30000 40000 50000
a = 1.5, b = 7.5
N V(N) E[V(N)]
ZM model
Populations & samples Baroni & Evert The population
Type probabilities Population models ZM & fZM
Sampling from the population
Random samples Expectation Mini-example
Parameter estimation
Trial & error Automatic estimation
A practical example
ZM model
a = 2, b = 550
m Vm E[Vm] 5000 10000 15000 20000 25000 0e+00 2e+05 4e+05 6e+05 8e+05 1e+06 10000 20000 30000 40000 50000
a = 2, b = 550
N V(N) E[V(N)]
ZM model
Populations & samples Baroni & Evert The population
Type probabilities Population models ZM & fZM
Sampling from the population
Random samples Expectation Mini-example
Parameter estimation
Trial & error Automatic estimation
A practical example
expected
a = 2.39, b = 1968.49
m Vm E[Vm] 5000 10000 15000 20000 25000 0e+00 2e+05 4e+05 6e+05 8e+05 1e+06 10000 20000 30000 40000 50000
a = 2.39, b = 1968.49
N V(N) E[V(N)]
expected
Populations & samples Baroni & Evert The population
Type probabilities Population models ZM & fZM
Sampling from the population
Random samples Expectation Mini-example
Parameter estimation
Trial & error Automatic estimation
A practical example
Populations & samples Baroni & Evert The population
Type probabilities Population models ZM & fZM
Sampling from the population
Random samples Expectation Mini-example
Parameter estimation
Trial & error Automatic estimation
A practical example
Populations & samples Baroni & Evert The population
Type probabilities Population models ZM & fZM
Sampling from the population
Random samples Expectation Mini-example
Parameter estimation
Trial & error Automatic estimation
A practical example
Populations & samples Baroni & Evert The population
Type probabilities Population models ZM & fZM
Sampling from the population
Random samples Expectation Mini-example
Parameter estimation
Trial & error Automatic estimation
A practical example
◮ population model: finite Zipf-Mandelbrot ◮ cost function: chi-squared type ◮ number of spectrum elements: M = 10 ◮ additional constraint: E[V (N0)] = V (N0)
Populations & samples Baroni & Evert The population
Type probabilities Population models ZM & fZM
Sampling from the population
Random samples Expectation Mini-example
Parameter estimation
Trial & error Automatic estimation
A practical example
◮ population model: finite Zipf-Mandelbrot ◮ cost function: chi-squared type ◮ number of spectrum elements: M = 10 ◮ additional constraint: E[V (N0)] = V (N0)
◮ population vocabulary size is extremely small ◮ but this model extrapolates only the vocabulary used in
Populations & samples Baroni & Evert The population
Type probabilities Population models ZM & fZM
Sampling from the population
Random samples Expectation Mini-example
Parameter estimation
Trial & error Automatic estimation
A practical example
expected
a = 1.45, b = 34.59, S = 20587
m Vm E[Vm] 1000 2000 3000 4000 5000 50000 150000 250000 350000 5000 10000 15000
a = 1.45, b = 34.59, S = 20587
N V(N) E[V(N)] V1(N) E[V1(N)]
expected
◮ but visually, the approximation is very good
Populations & samples Baroni & Evert The population
Type probabilities Population models ZM & fZM
Sampling from the population
Random samples Expectation Mini-example
Parameter estimation
Trial & error Automatic estimation
A practical example
expected
a = 1.45, b = 34.59, S = 20587
m Vm E[Vm] 1000 2000 3000 4000 5000 50000 150000 250000 350000 5000 10000 15000
a = 1.45, b = 34.59, S = 20587
N V(N) E[V(N)] V1(N) E[V1(N)]
expected
◮ but visually, the approximation is very good