An Introduction to Title Maximum Pawel Zabczyk Likelihood and - - PowerPoint PPT Presentation

an introduction to
SMART_READER_LITE
LIVE PREVIEW

An Introduction to Title Maximum Pawel Zabczyk Likelihood and - - PowerPoint PPT Presentation

Centre for Central Banking Studies Date Nairobi, April 25, 2016 An Introduction to Title Maximum Pawel Zabczyk Likelihood and pawel.zabczyk@bankofengland.co.uk Optimisation The Bank of England does not accept any liability for misleading


slide-1
SLIDE 1

Centre for Central Banking Studies

An Introduction to Maximum Likelihood and Optimisation

Date Nairobi, April 25, 2016 Title Pawel Zabczyk pawel.zabczyk@bankofengland.co.uk

The Bank of England does not accept any liability for misleading or inaccurate information or omissions in the information provided.

slide-2
SLIDE 2

Centre for Central Banking Studies Modelling and Forecasting 2

Goals for this session

By the end of these sessions we will have:

  • Covered the idea behind maximum likelihood (ML)

estimation

  • Presented some results on the properties of ML estimators
  • Discussed the basics of numerical optimisation
  • Estimated several simple models (including linear

regressions) using maximum likelihood in Eviews These results:

  • Will help us estimate state space models
  • Are a necessary prerequisite for Bayesian estimation
slide-3
SLIDE 3

Centre for Central Banking Studies Modelling and Forecasting 2

Goals for this session

By the end of these sessions we will have:

  • Covered the idea behind maximum likelihood (ML)

estimation

  • Presented some results on the properties of ML estimators
  • Discussed the basics of numerical optimisation
  • Estimated several simple models (including linear

regressions) using maximum likelihood in Eviews These results:

  • Will help us estimate state space models
  • Are a necessary prerequisite for Bayesian estimation
slide-4
SLIDE 4

Centre for Central Banking Studies Modelling and Forecasting 2

Goals for this session

By the end of these sessions we will have:

  • Covered the idea behind maximum likelihood (ML)

estimation

  • Presented some results on the properties of ML estimators
  • Discussed the basics of numerical optimisation
  • Estimated several simple models (including linear

regressions) using maximum likelihood in Eviews These results:

  • Will help us estimate state space models
  • Are a necessary prerequisite for Bayesian estimation
slide-5
SLIDE 5

Centre for Central Banking Studies Modelling and Forecasting 2

Goals for this session

By the end of these sessions we will have:

  • Covered the idea behind maximum likelihood (ML)

estimation

  • Presented some results on the properties of ML estimators
  • Discussed the basics of numerical optimisation
  • Estimated several simple models (including linear

regressions) using maximum likelihood in Eviews These results:

  • Will help us estimate state space models
  • Are a necessary prerequisite for Bayesian estimation
slide-6
SLIDE 6

Centre for Central Banking Studies Modelling and Forecasting 2

Goals for this session

By the end of these sessions we will have:

  • Covered the idea behind maximum likelihood (ML)

estimation

  • Presented some results on the properties of ML estimators
  • Discussed the basics of numerical optimisation
  • Estimated several simple models (including linear

regressions) using maximum likelihood in Eviews These results:

  • Will help us estimate state space models
  • Are a necessary prerequisite for Bayesian estimation
slide-7
SLIDE 7

Centre for Central Banking Studies Modelling and Forecasting 2

Goals for this session

By the end of these sessions we will have:

  • Covered the idea behind maximum likelihood (ML)

estimation

  • Presented some results on the properties of ML estimators
  • Discussed the basics of numerical optimisation
  • Estimated several simple models (including linear

regressions) using maximum likelihood in Eviews These results:

  • Will help us estimate state space models
  • Are a necessary prerequisite for Bayesian estimation
slide-8
SLIDE 8

Centre for Central Banking Studies Modelling and Forecasting 3

Plan Ahead

These results:

  • Will help us estimate state space models
  • Are a necessary prerequisite for Bayesian estimation
slide-9
SLIDE 9

Centre for Central Banking Studies Modelling and Forecasting 3

Plan Ahead

These results:

  • Will help us estimate state space models
  • Are a necessary prerequisite for Bayesian estimation
slide-10
SLIDE 10

Centre for Central Banking Studies Modelling and Forecasting 4

Do you know this man?

slide-11
SLIDE 11

Centre for Central Banking Studies Modelling and Forecasting 5

Archie Karas

Considered by some to be the greatest gambler the world!

  • Grew up in poverty in Greece
  • Won money playing marbles so he could eat
  • Ran away from home after argument with his dad
  • Started working as a waiter on a cruise ship earning $60/month
  • Arrived in Portland and headed to Los Angeles
  • Started playing competitive pool; so good no one wanted to play him
  • Switched to poker: won and lost $2 million
  • With the last $50 in his pocket drove to Las Vegas
slide-12
SLIDE 12

Centre for Central Banking Studies Modelling and Forecasting 5

Archie Karas

Considered by some to be the greatest gambler the world!

  • Grew up in poverty in Greece
  • Won money playing marbles so he could eat
  • Ran away from home after argument with his dad
  • Started working as a waiter on a cruise ship earning $60/month
  • Arrived in Portland and headed to Los Angeles
  • Started playing competitive pool; so good no one wanted to play him
  • Switched to poker: won and lost $2 million
  • With the last $50 in his pocket drove to Las Vegas
slide-13
SLIDE 13

Centre for Central Banking Studies Modelling and Forecasting 5

Archie Karas

Considered by some to be the greatest gambler the world!

  • Grew up in poverty in Greece
  • Won money playing marbles so he could eat
  • Ran away from home after argument with his dad
  • Started working as a waiter on a cruise ship earning $60/month
  • Arrived in Portland and headed to Los Angeles
  • Started playing competitive pool; so good no one wanted to play him
  • Switched to poker: won and lost $2 million
  • With the last $50 in his pocket drove to Las Vegas
slide-14
SLIDE 14

Centre for Central Banking Studies Modelling and Forecasting 5

Archie Karas

Considered by some to be the greatest gambler the world!

  • Grew up in poverty in Greece
  • Won money playing marbles so he could eat
  • Ran away from home after argument with his dad
  • Started working as a waiter on a cruise ship earning $60/month
  • Arrived in Portland and headed to Los Angeles
  • Started playing competitive pool; so good no one wanted to play him
  • Switched to poker: won and lost $2 million
  • With the last $50 in his pocket drove to Las Vegas
slide-15
SLIDE 15

Centre for Central Banking Studies Modelling and Forecasting 5

Archie Karas

Considered by some to be the greatest gambler the world!

  • Grew up in poverty in Greece
  • Won money playing marbles so he could eat
  • Ran away from home after argument with his dad
  • Started working as a waiter on a cruise ship earning $60/month
  • Arrived in Portland and headed to Los Angeles
  • Started playing competitive pool; so good no one wanted to play him
  • Switched to poker: won and lost $2 million
  • With the last $50 in his pocket drove to Las Vegas
slide-16
SLIDE 16

Centre for Central Banking Studies Modelling and Forecasting 5

Archie Karas

Considered by some to be the greatest gambler the world!

  • Grew up in poverty in Greece
  • Won money playing marbles so he could eat
  • Ran away from home after argument with his dad
  • Started working as a waiter on a cruise ship earning $60/month
  • Arrived in Portland and headed to Los Angeles
  • Started playing competitive pool; so good no one wanted to play him
  • Switched to poker: won and lost $2 million
  • With the last $50 in his pocket drove to Las Vegas
slide-17
SLIDE 17

Centre for Central Banking Studies Modelling and Forecasting 5

Archie Karas

Considered by some to be the greatest gambler the world!

  • Grew up in poverty in Greece
  • Won money playing marbles so he could eat
  • Ran away from home after argument with his dad
  • Started working as a waiter on a cruise ship earning $60/month
  • Arrived in Portland and headed to Los Angeles
  • Started playing competitive pool; so good no one wanted to play him
  • Switched to poker: won and lost $2 million
  • With the last $50 in his pocket drove to Las Vegas
slide-18
SLIDE 18

Centre for Central Banking Studies Modelling and Forecasting 5

Archie Karas

Considered by some to be the greatest gambler the world!

  • Grew up in poverty in Greece
  • Won money playing marbles so he could eat
  • Ran away from home after argument with his dad
  • Started working as a waiter on a cruise ship earning $60/month
  • Arrived in Portland and headed to Los Angeles
  • Started playing competitive pool; so good no one wanted to play him
  • Switched to poker: won and lost $2 million
  • With the last $50 in his pocket drove to Las Vegas
slide-19
SLIDE 19

Centre for Central Banking Studies Modelling and Forecasting 6

Archie Karas

  • Started playing Razz, like poker but weakest hand wins
  • Figured it was great for him given his bad luck
  • Borrowed $10,000 from old friend...
  • Repaid loan with 50% interest in 3 hours and had plenty left over to play
  • Kept on playing for 3 years, and seemed unable to lose
  • Accumulated $40 million in the process (on many games)
  • Then came the losses:
  • $11 million playing craps; in 3 weeks
  • $17 million trying to get back the 11; in 1 week
  • $2 million playing poker
  • Caught cheating at a blackjack table in 2013 (4 years of

probation)

  • Placed in Nevada’s Black Book in 2015
slide-20
SLIDE 20

Centre for Central Banking Studies Modelling and Forecasting 6

Archie Karas

  • Started playing Razz, like poker but weakest hand wins
  • Figured it was great for him given his bad luck
  • Borrowed $10,000 from old friend...
  • Repaid loan with 50% interest in 3 hours and had plenty left over to play
  • Kept on playing for 3 years, and seemed unable to lose
  • Accumulated $40 million in the process (on many games)
  • Then came the losses:
  • $11 million playing craps; in 3 weeks
  • $17 million trying to get back the 11; in 1 week
  • $2 million playing poker
  • Caught cheating at a blackjack table in 2013 (4 years of

probation)

  • Placed in Nevada’s Black Book in 2015
slide-21
SLIDE 21

Centre for Central Banking Studies Modelling and Forecasting 6

Archie Karas

  • Started playing Razz, like poker but weakest hand wins
  • Figured it was great for him given his bad luck
  • Borrowed $10,000 from old friend...
  • Repaid loan with 50% interest in 3 hours and had plenty left over to play
  • Kept on playing for 3 years, and seemed unable to lose
  • Accumulated $40 million in the process (on many games)
  • Then came the losses:
  • $11 million playing craps; in 3 weeks
  • $17 million trying to get back the 11; in 1 week
  • $2 million playing poker
  • Caught cheating at a blackjack table in 2013 (4 years of

probation)

  • Placed in Nevada’s Black Book in 2015
slide-22
SLIDE 22

Centre for Central Banking Studies Modelling and Forecasting 6

Archie Karas

  • Started playing Razz, like poker but weakest hand wins
  • Figured it was great for him given his bad luck
  • Borrowed $10,000 from old friend...
  • Repaid loan with 50% interest in 3 hours and had plenty left over to play
  • Kept on playing for 3 years, and seemed unable to lose
  • Accumulated $40 million in the process (on many games)
  • Then came the losses:
  • $11 million playing craps; in 3 weeks
  • $17 million trying to get back the 11; in 1 week
  • $2 million playing poker
  • Caught cheating at a blackjack table in 2013 (4 years of

probation)

  • Placed in Nevada’s Black Book in 2015
slide-23
SLIDE 23

Centre for Central Banking Studies Modelling and Forecasting 6

Archie Karas

  • Started playing Razz, like poker but weakest hand wins
  • Figured it was great for him given his bad luck
  • Borrowed $10,000 from old friend...
  • Repaid loan with 50% interest in 3 hours and had plenty left over to play
  • Kept on playing for 3 years, and seemed unable to lose
  • Accumulated $40 million in the process (on many games)
  • Then came the losses:
  • $11 million playing craps; in 3 weeks
  • $17 million trying to get back the 11; in 1 week
  • $2 million playing poker
  • Caught cheating at a blackjack table in 2013 (4 years of

probation)

  • Placed in Nevada’s Black Book in 2015
slide-24
SLIDE 24

Centre for Central Banking Studies Modelling and Forecasting 6

Archie Karas

  • Started playing Razz, like poker but weakest hand wins
  • Figured it was great for him given his bad luck
  • Borrowed $10,000 from old friend...
  • Repaid loan with 50% interest in 3 hours and had plenty left over to play
  • Kept on playing for 3 years, and seemed unable to lose
  • Accumulated $40 million in the process (on many games)
  • Then came the losses:
  • $11 million playing craps; in 3 weeks
  • $17 million trying to get back the 11; in 1 week
  • $2 million playing poker
  • Caught cheating at a blackjack table in 2013 (4 years of

probation)

  • Placed in Nevada’s Black Book in 2015
slide-25
SLIDE 25

Centre for Central Banking Studies Modelling and Forecasting 6

Archie Karas

  • Started playing Razz, like poker but weakest hand wins
  • Figured it was great for him given his bad luck
  • Borrowed $10,000 from old friend...
  • Repaid loan with 50% interest in 3 hours and had plenty left over to play
  • Kept on playing for 3 years, and seemed unable to lose
  • Accumulated $40 million in the process (on many games)
  • Then came the losses:
  • $11 million playing craps; in 3 weeks
  • $17 million trying to get back the 11; in 1 week
  • $2 million playing poker
  • Caught cheating at a blackjack table in 2013 (4 years of

probation)

  • Placed in Nevada’s Black Book in 2015
slide-26
SLIDE 26

Centre for Central Banking Studies Modelling and Forecasting 6

Archie Karas

  • Started playing Razz, like poker but weakest hand wins
  • Figured it was great for him given his bad luck
  • Borrowed $10,000 from old friend...
  • Repaid loan with 50% interest in 3 hours and had plenty left over to play
  • Kept on playing for 3 years, and seemed unable to lose
  • Accumulated $40 million in the process (on many games)
  • Then came the losses:
  • $11 million playing craps; in 3 weeks
  • $17 million trying to get back the 11; in 1 week
  • $2 million playing poker
  • Caught cheating at a blackjack table in 2013 (4 years of

probation)

  • Placed in Nevada’s Black Book in 2015
slide-27
SLIDE 27

Centre for Central Banking Studies Modelling and Forecasting 6

Archie Karas

  • Started playing Razz, like poker but weakest hand wins
  • Figured it was great for him given his bad luck
  • Borrowed $10,000 from old friend...
  • Repaid loan with 50% interest in 3 hours and had plenty left over to play
  • Kept on playing for 3 years, and seemed unable to lose
  • Accumulated $40 million in the process (on many games)
  • Then came the losses:
  • $11 million playing craps; in 3 weeks
  • $17 million trying to get back the 11; in 1 week
  • $2 million playing poker
  • Caught cheating at a blackjack table in 2013 (4 years of

probation)

  • Placed in Nevada’s Black Book in 2015
slide-28
SLIDE 28

Centre for Central Banking Studies Modelling and Forecasting 6

Archie Karas

  • Started playing Razz, like poker but weakest hand wins
  • Figured it was great for him given his bad luck
  • Borrowed $10,000 from old friend...
  • Repaid loan with 50% interest in 3 hours and had plenty left over to play
  • Kept on playing for 3 years, and seemed unable to lose
  • Accumulated $40 million in the process (on many games)
  • Then came the losses:
  • $11 million playing craps; in 3 weeks
  • $17 million trying to get back the 11; in 1 week
  • $2 million playing poker
  • Caught cheating at a blackjack table in 2013 (4 years of

probation)

  • Placed in Nevada’s Black Book in 2015
slide-29
SLIDE 29

Centre for Central Banking Studies Modelling and Forecasting 6

Archie Karas

  • Started playing Razz, like poker but weakest hand wins
  • Figured it was great for him given his bad luck
  • Borrowed $10,000 from old friend...
  • Repaid loan with 50% interest in 3 hours and had plenty left over to play
  • Kept on playing for 3 years, and seemed unable to lose
  • Accumulated $40 million in the process (on many games)
  • Then came the losses:
  • $11 million playing craps; in 3 weeks
  • $17 million trying to get back the 11; in 1 week
  • $2 million playing poker
  • Caught cheating at a blackjack table in 2013 (4 years of

probation)

  • Placed in Nevada’s Black Book in 2015
slide-30
SLIDE 30

Centre for Central Banking Studies Modelling and Forecasting 6

Archie Karas

  • Started playing Razz, like poker but weakest hand wins
  • Figured it was great for him given his bad luck
  • Borrowed $10,000 from old friend...
  • Repaid loan with 50% interest in 3 hours and had plenty left over to play
  • Kept on playing for 3 years, and seemed unable to lose
  • Accumulated $40 million in the process (on many games)
  • Then came the losses:
  • $11 million playing craps; in 3 weeks
  • $17 million trying to get back the 11; in 1 week
  • $2 million playing poker
  • Caught cheating at a blackjack table in 2013 (4 years of

probation)

  • Placed in Nevada’s Black Book in 2015
slide-31
SLIDE 31

Centre for Central Banking Studies Modelling and Forecasting 7

Thought experiment

  • Imagine you are head of security in a casino (pre-2015)
  • Archie comes through the door...
  • He wants to play high-stakes coin toss
  • Every game costs $1 million
  • Heads: he wins $2 million (stake + $1 million extra)
  • Tails: he loses his $1 million stake
  • Archie insists on using his own ‘lucky’ coin
  • You have limited time and can inspect the coin
  • How do you decide whether to allow him to play or not?
slide-32
SLIDE 32

Centre for Central Banking Studies Modelling and Forecasting 7

Thought experiment

  • Imagine you are head of security in a casino (pre-2015)
  • Archie comes through the door...
  • He wants to play high-stakes coin toss
  • Every game costs $1 million
  • Heads: he wins $2 million (stake + $1 million extra)
  • Tails: he loses his $1 million stake
  • Archie insists on using his own ‘lucky’ coin
  • You have limited time and can inspect the coin
  • How do you decide whether to allow him to play or not?
slide-33
SLIDE 33

Centre for Central Banking Studies Modelling and Forecasting 7

Thought experiment

  • Imagine you are head of security in a casino (pre-2015)
  • Archie comes through the door...
  • He wants to play high-stakes coin toss
  • Every game costs $1 million
  • Heads: he wins $2 million (stake + $1 million extra)
  • Tails: he loses his $1 million stake
  • Archie insists on using his own ‘lucky’ coin
  • You have limited time and can inspect the coin
  • How do you decide whether to allow him to play or not?
slide-34
SLIDE 34

Centre for Central Banking Studies Modelling and Forecasting 7

Thought experiment

  • Imagine you are head of security in a casino (pre-2015)
  • Archie comes through the door...
  • He wants to play high-stakes coin toss
  • Every game costs $1 million
  • Heads: he wins $2 million (stake + $1 million extra)
  • Tails: he loses his $1 million stake
  • Archie insists on using his own ‘lucky’ coin
  • You have limited time and can inspect the coin
  • How do you decide whether to allow him to play or not?
slide-35
SLIDE 35

Centre for Central Banking Studies Modelling and Forecasting 7

Thought experiment

  • Imagine you are head of security in a casino (pre-2015)
  • Archie comes through the door...
  • He wants to play high-stakes coin toss
  • Every game costs $1 million
  • Heads: he wins $2 million (stake + $1 million extra)
  • Tails: he loses his $1 million stake
  • Archie insists on using his own ‘lucky’ coin
  • You have limited time and can inspect the coin
  • How do you decide whether to allow him to play or not?
slide-36
SLIDE 36

Centre for Central Banking Studies Modelling and Forecasting 7

Thought experiment

  • Imagine you are head of security in a casino (pre-2015)
  • Archie comes through the door...
  • He wants to play high-stakes coin toss
  • Every game costs $1 million
  • Heads: he wins $2 million (stake + $1 million extra)
  • Tails: he loses his $1 million stake
  • Archie insists on using his own ‘lucky’ coin
  • You have limited time and can inspect the coin
  • How do you decide whether to allow him to play or not?
slide-37
SLIDE 37

Centre for Central Banking Studies Modelling and Forecasting 7

Thought experiment

  • Imagine you are head of security in a casino (pre-2015)
  • Archie comes through the door...
  • He wants to play high-stakes coin toss
  • Every game costs $1 million
  • Heads: he wins $2 million (stake + $1 million extra)
  • Tails: he loses his $1 million stake
  • Archie insists on using his own ‘lucky’ coin
  • You have limited time and can inspect the coin
  • How do you decide whether to allow him to play or not?
slide-38
SLIDE 38

Centre for Central Banking Studies Modelling and Forecasting 7

Thought experiment

  • Imagine you are head of security in a casino (pre-2015)
  • Archie comes through the door...
  • He wants to play high-stakes coin toss
  • Every game costs $1 million
  • Heads: he wins $2 million (stake + $1 million extra)
  • Tails: he loses his $1 million stake
  • Archie insists on using his own ‘lucky’ coin
  • You have limited time and can inspect the coin
  • How do you decide whether to allow him to play or not?
slide-39
SLIDE 39

Centre for Central Banking Studies Modelling and Forecasting 7

Thought experiment

  • Imagine you are head of security in a casino (pre-2015)
  • Archie comes through the door...
  • He wants to play high-stakes coin toss
  • Every game costs $1 million
  • Heads: he wins $2 million (stake + $1 million extra)
  • Tails: he loses his $1 million stake
  • Archie insists on using his own ‘lucky’ coin
  • You have limited time and can inspect the coin
  • How do you decide whether to allow him to play or not?
slide-40
SLIDE 40

Centre for Central Banking Studies Modelling and Forecasting 8

A primer on maximum likelihood

  • We really care about the probability of ending up with

heads: q∗

  • The probability q∗ is unknown - Archie may have done

something to the coin

  • Maximum likelihood provides a way of estimating q∗
slide-41
SLIDE 41

Centre for Central Banking Studies Modelling and Forecasting 8

A primer on maximum likelihood

  • We really care about the probability of ending up with

heads: q∗

  • The probability q∗ is unknown - Archie may have done

something to the coin

  • Maximum likelihood provides a way of estimating q∗
slide-42
SLIDE 42

Centre for Central Banking Studies Modelling and Forecasting 8

A primer on maximum likelihood

  • We really care about the probability of ending up with

heads: q∗

  • The probability q∗ is unknown - Archie may have done

something to the coin

  • Maximum likelihood provides a way of estimating q∗
slide-43
SLIDE 43

Centre for Central Banking Studies Modelling and Forecasting 9

Idea

  • Generate data x by tossing the coin and noting results
  • We then form a likelihood function L(q|x)

L(q|x) ≡ p(x|q) =

n

  • i=1

p(xi|q)

  • Here
  • q is number between 0 and 1
  • p(x|q) denotes the probability of observing x if the probability of heads

equals q

  • p(xi|q) is the probabilityof observing xi in throw i conditional on q
  • A maximum likelihood (ML) estimator of the unknown

parameter q∗ is the argument ˆ q which maximises the likelihood function L(·|x)

  • Of course ˆ

q is conditional on the entire sample x!

slide-44
SLIDE 44

Centre for Central Banking Studies Modelling and Forecasting 9

Idea

  • Generate data x by tossing the coin and noting results
  • We then form a likelihood function L(q|x)

L(q|x) ≡ p(x|q) =

n

  • i=1

p(xi|q)

  • Here
  • q is number between 0 and 1
  • p(x|q) denotes the probability of observing x if the probability of heads

equals q

  • p(xi|q) is the probabilityof observing xi in throw i conditional on q
  • A maximum likelihood (ML) estimator of the unknown

parameter q∗ is the argument ˆ q which maximises the likelihood function L(·|x)

  • Of course ˆ

q is conditional on the entire sample x!

slide-45
SLIDE 45

Centre for Central Banking Studies Modelling and Forecasting 9

Idea

  • Generate data x by tossing the coin and noting results
  • We then form a likelihood function L(q|x)

L(q|x) ≡ p(x|q) =

n

  • i=1

p(xi|q)

  • Here
  • q is number between 0 and 1
  • p(x|q) denotes the probability of observing x if the probability of heads

equals q

  • p(xi|q) is the probabilityof observing xi in throw i conditional on q
  • A maximum likelihood (ML) estimator of the unknown

parameter q∗ is the argument ˆ q which maximises the likelihood function L(·|x)

  • Of course ˆ

q is conditional on the entire sample x!

slide-46
SLIDE 46

Centre for Central Banking Studies Modelling and Forecasting 9

Idea

  • Generate data x by tossing the coin and noting results
  • We then form a likelihood function L(q|x)

L(q|x) ≡ p(x|q) =

n

  • i=1

p(xi|q)

  • Here
  • q is number between 0 and 1
  • p(x|q) denotes the probability of observing x if the probability of heads

equals q

  • p(xi|q) is the probabilityof observing xi in throw i conditional on q
  • A maximum likelihood (ML) estimator of the unknown

parameter q∗ is the argument ˆ q which maximises the likelihood function L(·|x)

  • Of course ˆ

q is conditional on the entire sample x!

slide-47
SLIDE 47

Centre for Central Banking Studies Modelling and Forecasting 9

Idea

  • Generate data x by tossing the coin and noting results
  • We then form a likelihood function L(q|x)

L(q|x) ≡ p(x|q) =

n

  • i=1

p(xi|q)

  • Here
  • q is number between 0 and 1
  • p(x|q) denotes the probability of observing x if the probability of heads

equals q

  • p(xi|q) is the probabilityof observing xi in throw i conditional on q
  • A maximum likelihood (ML) estimator of the unknown

parameter q∗ is the argument ˆ q which maximises the likelihood function L(·|x)

  • Of course ˆ

q is conditional on the entire sample x!

slide-48
SLIDE 48

Centre for Central Banking Studies Modelling and Forecasting 9

Idea

  • Generate data x by tossing the coin and noting results
  • We then form a likelihood function L(q|x)

L(q|x) ≡ p(x|q) =

n

  • i=1

p(xi|q)

  • Here
  • q is number between 0 and 1
  • p(x|q) denotes the probability of observing x if the probability of heads

equals q

  • p(xi|q) is the probabilityof observing xi in throw i conditional on q
  • A maximum likelihood (ML) estimator of the unknown

parameter q∗ is the argument ˆ q which maximises the likelihood function L(·|x)

  • Of course ˆ

q is conditional on the entire sample x!

slide-49
SLIDE 49

Centre for Central Banking Studies Modelling and Forecasting 9

Idea

  • Generate data x by tossing the coin and noting results
  • We then form a likelihood function L(q|x)

L(q|x) ≡ p(x|q) =

n

  • i=1

p(xi|q)

  • Here
  • q is number between 0 and 1
  • p(x|q) denotes the probability of observing x if the probability of heads

equals q

  • p(xi|q) is the probabilityof observing xi in throw i conditional on q
  • A maximum likelihood (ML) estimator of the unknown

parameter q∗ is the argument ˆ q which maximises the likelihood function L(·|x)

  • Of course ˆ

q is conditional on the entire sample x!

slide-50
SLIDE 50

Centre for Central Banking Studies Modelling and Forecasting 9

Idea

  • Generate data x by tossing the coin and noting results
  • We then form a likelihood function L(q|x)

L(q|x) ≡ p(x|q) =

n

  • i=1

p(xi|q)

  • Here
  • q is number between 0 and 1
  • p(x|q) denotes the probability of observing x if the probability of heads

equals q

  • p(xi|q) is the probabilityof observing xi in throw i conditional on q
  • A maximum likelihood (ML) estimator of the unknown

parameter q∗ is the argument ˆ q which maximises the likelihood function L(·|x)

  • Of course ˆ

q is conditional on the entire sample x!

slide-51
SLIDE 51

Centre for Central Banking Studies Modelling and Forecasting 10

Binomial example

  • What is the probability of observing tails (T) if the

probability of heads (H) is p (H) = q∗?

  • Single unknown parameter q∗ to be estimated
  • Assume we observed a sequence: HTHH . . . TT
  • What is the corresponding probability as a function of q∗?

p(H|q∗) · p(T|q∗) · p(H|q∗) · p(H|q∗) . . . p(T|q∗) · p(T|q∗)

slide-52
SLIDE 52

Centre for Central Banking Studies Modelling and Forecasting 10

Binomial example

  • What is the probability of observing tails (T) if the

probability of heads (H) is p (H) = q∗?

  • Single unknown parameter q∗ to be estimated
  • Assume we observed a sequence: HTHH . . . TT
  • What is the corresponding probability as a function of q∗?

p(H|q∗) · p(T|q∗) · p(H|q∗) · p(H|q∗) . . . p(T|q∗) · p(T|q∗)

slide-53
SLIDE 53

Centre for Central Banking Studies Modelling and Forecasting 10

Binomial example

  • What is the probability of observing tails (T) if the

probability of heads (H) is p (H) = q∗?

  • Single unknown parameter q∗ to be estimated
  • Assume we observed a sequence: HTHH . . . TT
  • What is the corresponding probability as a function of q∗?

p(H|q∗) · p(T|q∗) · p(H|q∗) · p(H|q∗) . . . p(T|q∗) · p(T|q∗)

slide-54
SLIDE 54

Centre for Central Banking Studies Modelling and Forecasting 10

Binomial example

  • What is the probability of observing tails (T) if the

probability of heads (H) is p (H) = q∗?

  • Single unknown parameter q∗ to be estimated
  • Assume we observed a sequence: HTHH . . . TT
  • What is the corresponding probability as a function of q∗?

p(H|q∗) · p(T|q∗) · p(H|q∗) · p(H|q∗) . . . p(T|q∗) · p(T|q∗)

slide-55
SLIDE 55

Centre for Central Banking Studies Modelling and Forecasting 11

Binomial example (ctd.)

  • To estimate q∗ define the likelihood

L(q|HTHH . . . TT) = p(H|q) · p(T|q) · p(H|q) · p(H|q) . . . . . . p(T|q) · p(T|q) = qk(1 − q)n−k

  • What are k and n?
  • The ML estimator of the unknown parameter q∗ is the ˆ

q which maximises the likelihood function L(q|HTHH . . . TT)

slide-56
SLIDE 56

Centre for Central Banking Studies Modelling and Forecasting 11

Binomial example (ctd.)

  • To estimate q∗ define the likelihood

L(q|HTHH . . . TT) = p(H|q) · p(T|q) · p(H|q) · p(H|q) . . . . . . p(T|q) · p(T|q) = qk(1 − q)n−k

  • What are k and n?
  • The ML estimator of the unknown parameter q∗ is the ˆ

q which maximises the likelihood function L(q|HTHH . . . TT)

slide-57
SLIDE 57

Centre for Central Banking Studies Modelling and Forecasting 11

Binomial example (ctd.)

  • To estimate q∗ define the likelihood

L(q|HTHH . . . TT) = p(H|q) · p(T|q) · p(H|q) · p(H|q) . . . . . . p(T|q) · p(T|q) = qk(1 − q)n−k

  • What are k and n?
  • The ML estimator of the unknown parameter q∗ is the ˆ

q which maximises the likelihood function L(q|HTHH . . . TT)

slide-58
SLIDE 58

Centre for Central Banking Studies Modelling and Forecasting 12

Binomial example (ctd.)

  • Exercise 1. Find the ML estimator of q∗ conditional on
  • bserving the sequence

H, H, H, T, T, H, T, H, T, H

  • What is the likelihood function

L(q|H, H, H, T, T, H, T, H, T, H) ?

  • We can plot this function to see if it has a maximum
  • Important to understand what this function shows us!
slide-59
SLIDE 59

Centre for Central Banking Studies Modelling and Forecasting 12

Binomial example (ctd.)

  • Exercise 1. Find the ML estimator of q∗ conditional on
  • bserving the sequence

H, H, H, T, T, H, T, H, T, H

  • What is the likelihood function

L(q|H, H, H, T, T, H, T, H, T, H) ?

  • We can plot this function to see if it has a maximum
  • Important to understand what this function shows us!
slide-60
SLIDE 60

Centre for Central Banking Studies Modelling and Forecasting 12

Binomial example (ctd.)

  • Exercise 1. Find the ML estimator of q∗ conditional on
  • bserving the sequence

H, H, H, T, T, H, T, H, T, H

  • What is the likelihood function

L(q|H, H, H, T, T, H, T, H, T, H) ?

  • We can plot this function to see if it has a maximum
  • Important to understand what this function shows us!
slide-61
SLIDE 61

Centre for Central Banking Studies Modelling and Forecasting 12

Binomial example (ctd.)

  • Exercise 1. Find the ML estimator of q∗ conditional on
  • bserving the sequence

H, H, H, T, T, H, T, H, T, H

  • What is the likelihood function

L(q|H, H, H, T, T, H, T, H, T, H) ?

  • We can plot this function to see if it has a maximum
  • Important to understand what this function shows us!
slide-62
SLIDE 62

Centre for Central Banking Studies Modelling and Forecasting 13

Binomial example (ctd.)

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.2 0.4 0.6 0.8 1 1.2 x 10

−3

slide-63
SLIDE 63

Centre for Central Banking Studies Modelling and Forecasting 14

Binomial example (ctd.)

  • We could solve for the maximum analytically
  • To do so take the derivative of the likelihood function

L(q|H, H, H, T, T, H, T, H, T, H) ≡ q6(1 − q)4 with respect to q and set it equal to zero ∂L(q|H, H, H, T, T, H, T, H, T, H) ∂q ≡ 6q5(1−q)4−4q6(1−q)3 = q5(1 − q)3 (6(1 − q) − 4q) ≡ 0

  • Eliminating 0 and 1 as maximum candidates (why?) =

⇒ 6(1 − ˆ q) − 4ˆ q = 0 ⇔ 10ˆ q = 6 ⇔ ˆ q = 0.6

  • ML estimator of q∗ is 0.6!
  • Note that this does not account for any prior beliefs we

may have had about Archie being dishonest

slide-64
SLIDE 64

Centre for Central Banking Studies Modelling and Forecasting 14

Binomial example (ctd.)

  • We could solve for the maximum analytically
  • To do so take the derivative of the likelihood function

L(q|H, H, H, T, T, H, T, H, T, H) ≡ q6(1 − q)4 with respect to q and set it equal to zero ∂L(q|H, H, H, T, T, H, T, H, T, H) ∂q ≡ 6q5(1−q)4−4q6(1−q)3 = q5(1 − q)3 (6(1 − q) − 4q) ≡ 0

  • Eliminating 0 and 1 as maximum candidates (why?) =

⇒ 6(1 − ˆ q) − 4ˆ q = 0 ⇔ 10ˆ q = 6 ⇔ ˆ q = 0.6

  • ML estimator of q∗ is 0.6!
  • Note that this does not account for any prior beliefs we

may have had about Archie being dishonest

slide-65
SLIDE 65

Centre for Central Banking Studies Modelling and Forecasting 14

Binomial example (ctd.)

  • We could solve for the maximum analytically
  • To do so take the derivative of the likelihood function

L(q|H, H, H, T, T, H, T, H, T, H) ≡ q6(1 − q)4 with respect to q and set it equal to zero ∂L(q|H, H, H, T, T, H, T, H, T, H) ∂q ≡ 6q5(1−q)4−4q6(1−q)3 = q5(1 − q)3 (6(1 − q) − 4q) ≡ 0

  • Eliminating 0 and 1 as maximum candidates (why?) =

⇒ 6(1 − ˆ q) − 4ˆ q = 0 ⇔ 10ˆ q = 6 ⇔ ˆ q = 0.6

  • ML estimator of q∗ is 0.6!
  • Note that this does not account for any prior beliefs we

may have had about Archie being dishonest

slide-66
SLIDE 66

Centre for Central Banking Studies Modelling and Forecasting 14

Binomial example (ctd.)

  • We could solve for the maximum analytically
  • To do so take the derivative of the likelihood function

L(q|H, H, H, T, T, H, T, H, T, H) ≡ q6(1 − q)4 with respect to q and set it equal to zero ∂L(q|H, H, H, T, T, H, T, H, T, H) ∂q ≡ 6q5(1−q)4−4q6(1−q)3 = q5(1 − q)3 (6(1 − q) − 4q) ≡ 0

  • Eliminating 0 and 1 as maximum candidates (why?) =

⇒ 6(1 − ˆ q) − 4ˆ q = 0 ⇔ 10ˆ q = 6 ⇔ ˆ q = 0.6

  • ML estimator of q∗ is 0.6!
  • Note that this does not account for any prior beliefs we

may have had about Archie being dishonest

slide-67
SLIDE 67

Centre for Central Banking Studies Modelling and Forecasting 14

Binomial example (ctd.)

  • We could solve for the maximum analytically
  • To do so take the derivative of the likelihood function

L(q|H, H, H, T, T, H, T, H, T, H) ≡ q6(1 − q)4 with respect to q and set it equal to zero ∂L(q|H, H, H, T, T, H, T, H, T, H) ∂q ≡ 6q5(1−q)4−4q6(1−q)3 = q5(1 − q)3 (6(1 − q) − 4q) ≡ 0

  • Eliminating 0 and 1 as maximum candidates (why?) =

⇒ 6(1 − ˆ q) − 4ˆ q = 0 ⇔ 10ˆ q = 6 ⇔ ˆ q = 0.6

  • ML estimator of q∗ is 0.6!
  • Note that this does not account for any prior beliefs we

may have had about Archie being dishonest

slide-68
SLIDE 68

Centre for Central Banking Studies Modelling and Forecasting 15

Univariate regression

  • Now, consider a univariate regression model, where B∗

and σ∗ are to be estimated yt = B∗xt + vt, vt ∼ N.i.d.(0, (σ∗)2) Independence and the assumption of normality imply p(vt) = 1 √ 2πσ∗ exp

1 2 (σ∗)2 v2

t

  • What is the likelihood function

L(B, σ2|x1, x2, . . . , xn, y1, y2, . . . , yn)?

  • Idea: estimate B∗ and σ∗ by finding the maximum of the

likelihood function with respect to B and σ

  • Just like the coin toss example except now we have two parameters!
  • In what follows x ≡ (x1, x2, . . . , xn), y ≡ (y1, y2, . . . , yn)
slide-69
SLIDE 69

Centre for Central Banking Studies Modelling and Forecasting 15

Univariate regression

  • Now, consider a univariate regression model, where B∗

and σ∗ are to be estimated yt = B∗xt + vt, vt ∼ N.i.d.(0, (σ∗)2) Independence and the assumption of normality imply p(vt) = 1 √ 2πσ∗ exp

1 2 (σ∗)2 v2

t

  • What is the likelihood function

L(B, σ2|x1, x2, . . . , xn, y1, y2, . . . , yn)?

  • Idea: estimate B∗ and σ∗ by finding the maximum of the

likelihood function with respect to B and σ

  • Just like the coin toss example except now we have two parameters!
  • In what follows x ≡ (x1, x2, . . . , xn), y ≡ (y1, y2, . . . , yn)
slide-70
SLIDE 70

Centre for Central Banking Studies Modelling and Forecasting 15

Univariate regression

  • Now, consider a univariate regression model, where B∗

and σ∗ are to be estimated yt = B∗xt + vt, vt ∼ N.i.d.(0, (σ∗)2) Independence and the assumption of normality imply p(vt) = 1 √ 2πσ∗ exp

1 2 (σ∗)2 v2

t

  • What is the likelihood function

L(B, σ2|x1, x2, . . . , xn, y1, y2, . . . , yn)?

  • Idea: estimate B∗ and σ∗ by finding the maximum of the

likelihood function with respect to B and σ

  • Just like the coin toss example except now we have two parameters!
  • In what follows x ≡ (x1, x2, . . . , xn), y ≡ (y1, y2, . . . , yn)
slide-71
SLIDE 71

Centre for Central Banking Studies Modelling and Forecasting 15

Univariate regression

  • Now, consider a univariate regression model, where B∗

and σ∗ are to be estimated yt = B∗xt + vt, vt ∼ N.i.d.(0, (σ∗)2) Independence and the assumption of normality imply p(vt) = 1 √ 2πσ∗ exp

1 2 (σ∗)2 v2

t

  • What is the likelihood function

L(B, σ2|x1, x2, . . . , xn, y1, y2, . . . , yn)?

  • Idea: estimate B∗ and σ∗ by finding the maximum of the

likelihood function with respect to B and σ

  • Just like the coin toss example except now we have two parameters!
  • In what follows x ≡ (x1, x2, . . . , xn), y ≡ (y1, y2, . . . , yn)
slide-72
SLIDE 72

Centre for Central Banking Studies Modelling and Forecasting 15

Univariate regression

  • Now, consider a univariate regression model, where B∗

and σ∗ are to be estimated yt = B∗xt + vt, vt ∼ N.i.d.(0, (σ∗)2) Independence and the assumption of normality imply p(vt) = 1 √ 2πσ∗ exp

1 2 (σ∗)2 v2

t

  • What is the likelihood function

L(B, σ2|x1, x2, . . . , xn, y1, y2, . . . , yn)?

  • Idea: estimate B∗ and σ∗ by finding the maximum of the

likelihood function with respect to B and σ

  • Just like the coin toss example except now we have two parameters!
  • In what follows x ≡ (x1, x2, . . . , xn), y ≡ (y1, y2, . . . , yn)
slide-73
SLIDE 73

Centre for Central Banking Studies Modelling and Forecasting 16

ML estimates of population parameters (ctd.)

  • Turns out to be easier to differentiate the log of the

likelihood rather than the likelihood itself

  • This is equivalent, since L(B, σ2|x, y) > 0 and because

∂ ∂B ln

  • L(B, σ2|x, y)
  • =

1 L(B, σ2|x, y) ∂ ∂B L(B, σ2|x, y)

  • How does this help us?
  • We then have

ln

  • L(B, σ2|x, y)
  • = −n

2 ln(2π)−n 2 ln σ2−1 2

n

  • i=1

(yi − Bxi)2 σ2

slide-74
SLIDE 74

Centre for Central Banking Studies Modelling and Forecasting 16

ML estimates of population parameters (ctd.)

  • Turns out to be easier to differentiate the log of the

likelihood rather than the likelihood itself

  • This is equivalent, since L(B, σ2|x, y) > 0 and because

∂ ∂B ln

  • L(B, σ2|x, y)
  • =

1 L(B, σ2|x, y) ∂ ∂B L(B, σ2|x, y)

  • How does this help us?
  • We then have

ln

  • L(B, σ2|x, y)
  • = −n

2 ln(2π)−n 2 ln σ2−1 2

n

  • i=1

(yi − Bxi)2 σ2

slide-75
SLIDE 75

Centre for Central Banking Studies Modelling and Forecasting 16

ML estimates of population parameters (ctd.)

  • Turns out to be easier to differentiate the log of the

likelihood rather than the likelihood itself

  • This is equivalent, since L(B, σ2|x, y) > 0 and because

∂ ∂B ln

  • L(B, σ2|x, y)
  • =

1 L(B, σ2|x, y) ∂ ∂B L(B, σ2|x, y)

  • How does this help us?
  • We then have

ln

  • L(B, σ2|x, y)
  • = −n

2 ln(2π)−n 2 ln σ2−1 2

n

  • i=1

(yi − Bxi)2 σ2

slide-76
SLIDE 76

Centre for Central Banking Studies Modelling and Forecasting 16

ML estimates of population parameters (ctd.)

  • Turns out to be easier to differentiate the log of the

likelihood rather than the likelihood itself

  • This is equivalent, since L(B, σ2|x, y) > 0 and because

∂ ∂B ln

  • L(B, σ2|x, y)
  • =

1 L(B, σ2|x, y) ∂ ∂B L(B, σ2|x, y)

  • How does this help us?
  • We then have

ln

  • L(B, σ2|x, y)
  • = −n

2 ln(2π)−n 2 ln σ2−1 2

n

  • i=1

(yi − Bxi)2 σ2

slide-77
SLIDE 77

Centre for Central Banking Studies Modelling and Forecasting 17

ML estimates of population parameters (ctd.)

  • The first order conditions, or the likelihood equations are

∂ ln

  • L(B, σ2|x, y)
  • ∂B

B,ˆ σ)

= 1 ˆ σ2

n

  • i=1

(yi − ˆ Bxi)xi = 0 ∂ ln

  • L(B, σ2|x, y)
  • ∂σ2

B,ˆ σ)

= − n 2ˆ σ2 + 1 2ˆ σ4

n

  • i=1

(yi − ˆ Bxi)2 = 0

slide-78
SLIDE 78

Centre for Central Banking Studies Modelling and Forecasting 18

ML estimates of population parameters (ctd.)

  • From the first of these

n

  • i=1
  • yixi − ˆ

Bxixi

  • = 0 ⇒ ˆ

B = n

  • i=1

xixi −1

n

  • i=1

xiyi =

  • x′x

−1 x′y

  • The maximum likelihood estimate of B∗ is therefore ˆ

B given by the familiar formula above

  • From the second we obtain

−n+ 1 ˆ σ2

n

  • i=1

(yi−ˆ Bxi)2 = 0 ⇒ ˆ σ2 = 1 n

n

  • i=1

(yi−ˆ Bxi)2 =

  • y−ˆ

Bx ′ y−ˆ Bx

  • n
  • ML estimate of σ2 divided through by n (not n − k) so

biased in small samples but not asymptotically

  • This is quite typical of ML estimates
slide-79
SLIDE 79

Centre for Central Banking Studies Modelling and Forecasting 18

ML estimates of population parameters (ctd.)

  • From the first of these

n

  • i=1
  • yixi − ˆ

Bxixi

  • = 0 ⇒ ˆ

B = n

  • i=1

xixi −1

n

  • i=1

xiyi =

  • x′x

−1 x′y

  • The maximum likelihood estimate of B∗ is therefore ˆ

B given by the familiar formula above

  • From the second we obtain

−n+ 1 ˆ σ2

n

  • i=1

(yi−ˆ Bxi)2 = 0 ⇒ ˆ σ2 = 1 n

n

  • i=1

(yi−ˆ Bxi)2 =

  • y−ˆ

Bx ′ y−ˆ Bx

  • n
  • ML estimate of σ2 divided through by n (not n − k) so

biased in small samples but not asymptotically

  • This is quite typical of ML estimates
slide-80
SLIDE 80

Centre for Central Banking Studies Modelling and Forecasting 18

ML estimates of population parameters (ctd.)

  • From the first of these

n

  • i=1
  • yixi − ˆ

Bxixi

  • = 0 ⇒ ˆ

B = n

  • i=1

xixi −1

n

  • i=1

xiyi =

  • x′x

−1 x′y

  • The maximum likelihood estimate of B∗ is therefore ˆ

B given by the familiar formula above

  • From the second we obtain

−n+ 1 ˆ σ2

n

  • i=1

(yi−ˆ Bxi)2 = 0 ⇒ ˆ σ2 = 1 n

n

  • i=1

(yi−ˆ Bxi)2 =

  • y−ˆ

Bx ′ y−ˆ Bx

  • n
  • ML estimate of σ2 divided through by n (not n − k) so

biased in small samples but not asymptotically

  • This is quite typical of ML estimates
slide-81
SLIDE 81

Centre for Central Banking Studies Modelling and Forecasting 18

ML estimates of population parameters (ctd.)

  • From the first of these

n

  • i=1
  • yixi − ˆ

Bxixi

  • = 0 ⇒ ˆ

B = n

  • i=1

xixi −1

n

  • i=1

xiyi =

  • x′x

−1 x′y

  • The maximum likelihood estimate of B∗ is therefore ˆ

B given by the familiar formula above

  • From the second we obtain

−n+ 1 ˆ σ2

n

  • i=1

(yi−ˆ Bxi)2 = 0 ⇒ ˆ σ2 = 1 n

n

  • i=1

(yi−ˆ Bxi)2 =

  • y−ˆ

Bx ′ y−ˆ Bx

  • n
  • ML estimate of σ2 divided through by n (not n − k) so

biased in small samples but not asymptotically

  • This is quite typical of ML estimates
slide-82
SLIDE 82

Centre for Central Banking Studies Modelling and Forecasting 18

ML estimates of population parameters (ctd.)

  • From the first of these

n

  • i=1
  • yixi − ˆ

Bxixi

  • = 0 ⇒ ˆ

B = n

  • i=1

xixi −1

n

  • i=1

xiyi =

  • x′x

−1 x′y

  • The maximum likelihood estimate of B∗ is therefore ˆ

B given by the familiar formula above

  • From the second we obtain

−n+ 1 ˆ σ2

n

  • i=1

(yi−ˆ Bxi)2 = 0 ⇒ ˆ σ2 = 1 n

n

  • i=1

(yi−ˆ Bxi)2 =

  • y−ˆ

Bx ′ y−ˆ Bx

  • n
  • ML estimate of σ2 divided through by n (not n − k) so

biased in small samples but not asymptotically

  • This is quite typical of ML estimates
slide-83
SLIDE 83

Centre for Central Banking Studies Modelling and Forecasting 19

Summary of ML approach to linear regression

  • We wanted to estimate B∗ and σ∗ in the following model

yt = B∗xt + vt, vt ∼ N(0, (σ∗)2)

  • To do that, we wrote down the likelihood function

L(B, σ2|x, y) =

  • 2πσ2−n/2

exp

  • −(y − Bx)′ (y − Bx)

2σ2

  • Plugging in data and maximising L(B, σ2|x, y) with respect

to B and σ2 gave us the standard OLS estimator ˆ B = (x′x)−1 (x′y) and ˆ σ2 =

  • y−ˆ

Bx ′ y−ˆ Bx

  • /n
  • The procedure only incorporated information in x and y!
  • The Bayesian approach will allow us to combine prior

beliefs about B∗ and σ∗ with information in x and y

slide-84
SLIDE 84

Centre for Central Banking Studies Modelling and Forecasting 19

Summary of ML approach to linear regression

  • We wanted to estimate B∗ and σ∗ in the following model

yt = B∗xt + vt, vt ∼ N(0, (σ∗)2)

  • To do that, we wrote down the likelihood function

L(B, σ2|x, y) =

  • 2πσ2−n/2

exp

  • −(y − Bx)′ (y − Bx)

2σ2

  • Plugging in data and maximising L(B, σ2|x, y) with respect

to B and σ2 gave us the standard OLS estimator ˆ B = (x′x)−1 (x′y) and ˆ σ2 =

  • y−ˆ

Bx ′ y−ˆ Bx

  • /n
  • The procedure only incorporated information in x and y!
  • The Bayesian approach will allow us to combine prior

beliefs about B∗ and σ∗ with information in x and y

slide-85
SLIDE 85

Centre for Central Banking Studies Modelling and Forecasting 19

Summary of ML approach to linear regression

  • We wanted to estimate B∗ and σ∗ in the following model

yt = B∗xt + vt, vt ∼ N(0, (σ∗)2)

  • To do that, we wrote down the likelihood function

L(B, σ2|x, y) =

  • 2πσ2−n/2

exp

  • −(y − Bx)′ (y − Bx)

2σ2

  • Plugging in data and maximising L(B, σ2|x, y) with respect

to B and σ2 gave us the standard OLS estimator ˆ B = (x′x)−1 (x′y) and ˆ σ2 =

  • y−ˆ

Bx ′ y−ˆ Bx

  • /n
  • The procedure only incorporated information in x and y!
  • The Bayesian approach will allow us to combine prior

beliefs about B∗ and σ∗ with information in x and y

slide-86
SLIDE 86

Centre for Central Banking Studies Modelling and Forecasting 19

Summary of ML approach to linear regression

  • We wanted to estimate B∗ and σ∗ in the following model

yt = B∗xt + vt, vt ∼ N(0, (σ∗)2)

  • To do that, we wrote down the likelihood function

L(B, σ2|x, y) =

  • 2πσ2−n/2

exp

  • −(y − Bx)′ (y − Bx)

2σ2

  • Plugging in data and maximising L(B, σ2|x, y) with respect

to B and σ2 gave us the standard OLS estimator ˆ B = (x′x)−1 (x′y) and ˆ σ2 =

  • y−ˆ

Bx ′ y−ˆ Bx

  • /n
  • The procedure only incorporated information in x and y!
  • The Bayesian approach will allow us to combine prior

beliefs about B∗ and σ∗ with information in x and y

slide-87
SLIDE 87

Centre for Central Banking Studies Modelling and Forecasting 19

Summary of ML approach to linear regression

  • We wanted to estimate B∗ and σ∗ in the following model

yt = B∗xt + vt, vt ∼ N(0, (σ∗)2)

  • To do that, we wrote down the likelihood function

L(B, σ2|x, y) =

  • 2πσ2−n/2

exp

  • −(y − Bx)′ (y − Bx)

2σ2

  • Plugging in data and maximising L(B, σ2|x, y) with respect

to B and σ2 gave us the standard OLS estimator ˆ B = (x′x)−1 (x′y) and ˆ σ2 =

  • y−ˆ

Bx ′ y−ˆ Bx

  • /n
  • The procedure only incorporated information in x and y!
  • The Bayesian approach will allow us to combine prior

beliefs about B∗ and σ∗ with information in x and y

slide-88
SLIDE 88

Centre for Central Banking Studies Modelling and Forecasting 20

ML score and information

  • At the optimum there are two matrices we will use later in

testing

  • The efficient score matrix is

∂ ln L(θ) ∂θ = S(θ)

  • This should be zero at ML estimate
  • The information matrix is given by

E

  • −∂ ln L(θ)

∂θ∂θ′

  • = I(θ)
slide-89
SLIDE 89

Centre for Central Banking Studies Modelling and Forecasting 20

ML score and information

  • At the optimum there are two matrices we will use later in

testing

  • The efficient score matrix is

∂ ln L(θ) ∂θ = S(θ)

  • This should be zero at ML estimate
  • The information matrix is given by

E

  • −∂ ln L(θ)

∂θ∂θ′

  • = I(θ)
slide-90
SLIDE 90

Centre for Central Banking Studies Modelling and Forecasting 20

ML score and information

  • At the optimum there are two matrices we will use later in

testing

  • The efficient score matrix is

∂ ln L(θ) ∂θ = S(θ)

  • This should be zero at ML estimate
  • The information matrix is given by

E

  • −∂ ln L(θ)

∂θ∂θ′

  • = I(θ)
slide-91
SLIDE 91

Centre for Central Banking Studies Modelling and Forecasting 20

ML score and information

  • At the optimum there are two matrices we will use later in

testing

  • The efficient score matrix is

∂ ln L(θ) ∂θ = S(θ)

  • This should be zero at ML estimate
  • The information matrix is given by

E

  • −∂ ln L(θ)

∂θ∂θ′

  • = I(θ)
slide-92
SLIDE 92

Centre for Central Banking Studies Modelling and Forecasting 21

ML score and information

  • I(θ) turns out to be the most important matrix, as the

Cram´ er-Rao lower bound states that Var(ˆ θml) ≥ [I(ˆ θml)]

−1

  • This means that the covariance of an estimate exceeds the

inverse of the information matrix by a positive semi-definite matrix

  • Thus the inverse provides the lower bound that an estimate can achieve!
  • Fortunately, a large number of ML estimators achieve this

lower bound

  • I.e. in many cases it can be shown that the inverse of the information

matrix is the covariance of the estimate

  • In those cases the Cram´

er-Rao lower bound gives the covariance of ˆ θ

slide-93
SLIDE 93

Centre for Central Banking Studies Modelling and Forecasting 21

ML score and information

  • I(θ) turns out to be the most important matrix, as the

Cram´ er-Rao lower bound states that Var(ˆ θml) ≥ [I(ˆ θml)]

−1

  • This means that the covariance of an estimate exceeds the

inverse of the information matrix by a positive semi-definite matrix

  • Thus the inverse provides the lower bound that an estimate can achieve!
  • Fortunately, a large number of ML estimators achieve this

lower bound

  • I.e. in many cases it can be shown that the inverse of the information

matrix is the covariance of the estimate

  • In those cases the Cram´

er-Rao lower bound gives the covariance of ˆ θ

slide-94
SLIDE 94

Centre for Central Banking Studies Modelling and Forecasting 21

ML score and information

  • I(θ) turns out to be the most important matrix, as the

Cram´ er-Rao lower bound states that Var(ˆ θml) ≥ [I(ˆ θml)]

−1

  • This means that the covariance of an estimate exceeds the

inverse of the information matrix by a positive semi-definite matrix

  • Thus the inverse provides the lower bound that an estimate can achieve!
  • Fortunately, a large number of ML estimators achieve this

lower bound

  • I.e. in many cases it can be shown that the inverse of the information

matrix is the covariance of the estimate

  • In those cases the Cram´

er-Rao lower bound gives the covariance of ˆ θ

slide-95
SLIDE 95

Centre for Central Banking Studies Modelling and Forecasting 21

ML score and information

  • I(θ) turns out to be the most important matrix, as the

Cram´ er-Rao lower bound states that Var(ˆ θml) ≥ [I(ˆ θml)]

−1

  • This means that the covariance of an estimate exceeds the

inverse of the information matrix by a positive semi-definite matrix

  • Thus the inverse provides the lower bound that an estimate can achieve!
  • Fortunately, a large number of ML estimators achieve this

lower bound

  • I.e. in many cases it can be shown that the inverse of the information

matrix is the covariance of the estimate

  • In those cases the Cram´

er-Rao lower bound gives the covariance of ˆ θ

slide-96
SLIDE 96

Centre for Central Banking Studies Modelling and Forecasting 21

ML score and information

  • I(θ) turns out to be the most important matrix, as the

Cram´ er-Rao lower bound states that Var(ˆ θml) ≥ [I(ˆ θml)]

−1

  • This means that the covariance of an estimate exceeds the

inverse of the information matrix by a positive semi-definite matrix

  • Thus the inverse provides the lower bound that an estimate can achieve!
  • Fortunately, a large number of ML estimators achieve this

lower bound

  • I.e. in many cases it can be shown that the inverse of the information

matrix is the covariance of the estimate

  • In those cases the Cram´

er-Rao lower bound gives the covariance of ˆ θ

slide-97
SLIDE 97

Centre for Central Banking Studies Modelling and Forecasting 21

ML score and information

  • I(θ) turns out to be the most important matrix, as the

Cram´ er-Rao lower bound states that Var(ˆ θml) ≥ [I(ˆ θml)]

−1

  • This means that the covariance of an estimate exceeds the

inverse of the information matrix by a positive semi-definite matrix

  • Thus the inverse provides the lower bound that an estimate can achieve!
  • Fortunately, a large number of ML estimators achieve this

lower bound

  • I.e. in many cases it can be shown that the inverse of the information

matrix is the covariance of the estimate

  • In those cases the Cram´

er-Rao lower bound gives the covariance of ˆ θ

slide-98
SLIDE 98

Centre for Central Banking Studies Modelling and Forecasting 22

Asymptotic inference

  • We expect that any normalised statistic is distributed as

√ n( θ − θ) d → N(0, IA(ˆ θ)−1)

  • Where IA(θ)−1 = plim nI(θ)−1
  • Usually replace nIA(θ)−1 by I(θ)−1
  • If we can formulate any statistic like this we know its

distribution is χ2

m with m the number of restrictions

  • Typically consider Wald (W), Likelihood ratio (LR) and LM

statistics

  • Quick look at LR here
slide-99
SLIDE 99

Centre for Central Banking Studies Modelling and Forecasting 22

Asymptotic inference

  • We expect that any normalised statistic is distributed as

√ n( θ − θ) d → N(0, IA(ˆ θ)−1)

  • Where IA(θ)−1 = plim nI(θ)−1
  • Usually replace nIA(θ)−1 by I(θ)−1
  • If we can formulate any statistic like this we know its

distribution is χ2

m with m the number of restrictions

  • Typically consider Wald (W), Likelihood ratio (LR) and LM

statistics

  • Quick look at LR here
slide-100
SLIDE 100

Centre for Central Banking Studies Modelling and Forecasting 22

Asymptotic inference

  • We expect that any normalised statistic is distributed as

√ n( θ − θ) d → N(0, IA(ˆ θ)−1)

  • Where IA(θ)−1 = plim nI(θ)−1
  • Usually replace nIA(θ)−1 by I(θ)−1
  • If we can formulate any statistic like this we know its

distribution is χ2

m with m the number of restrictions

  • Typically consider Wald (W), Likelihood ratio (LR) and LM

statistics

  • Quick look at LR here
slide-101
SLIDE 101

Centre for Central Banking Studies Modelling and Forecasting 22

Asymptotic inference

  • We expect that any normalised statistic is distributed as

√ n( θ − θ) d → N(0, IA(ˆ θ)−1)

  • Where IA(θ)−1 = plim nI(θ)−1
  • Usually replace nIA(θ)−1 by I(θ)−1
  • If we can formulate any statistic like this we know its

distribution is χ2

m with m the number of restrictions

  • Typically consider Wald (W), Likelihood ratio (LR) and LM

statistics

  • Quick look at LR here
slide-102
SLIDE 102

Centre for Central Banking Studies Modelling and Forecasting 22

Asymptotic inference

  • We expect that any normalised statistic is distributed as

√ n( θ − θ) d → N(0, IA(ˆ θ)−1)

  • Where IA(θ)−1 = plim nI(θ)−1
  • Usually replace nIA(θ)−1 by I(θ)−1
  • If we can formulate any statistic like this we know its

distribution is χ2

m with m the number of restrictions

  • Typically consider Wald (W), Likelihood ratio (LR) and LM

statistics

  • Quick look at LR here
slide-103
SLIDE 103

Centre for Central Banking Studies Modelling and Forecasting 22

Asymptotic inference

  • We expect that any normalised statistic is distributed as

√ n( θ − θ) d → N(0, IA(ˆ θ)−1)

  • Where IA(θ)−1 = plim nI(θ)−1
  • Usually replace nIA(θ)−1 by I(θ)−1
  • If we can formulate any statistic like this we know its

distribution is χ2

m with m the number of restrictions

  • Typically consider Wald (W), Likelihood ratio (LR) and LM

statistics

  • Quick look at LR here
slide-104
SLIDE 104

Centre for Central Banking Studies Modelling and Forecasting 23

Likelihood ratio test

  • Let there be some restriction on the parameters of a model

θ such that Rθ − r = 0

  • Calculate the ratio

λ = L(ˆ θ

r)

L(ˆ θ) < 1

  • For the two estimates of θ where we use the superscript r to indicate an

alternative which is restricted in some way

  • If we know some critical value for the size we could use

this as a test

  • There are examples where this is true, but we usually

construct the following asymptotic test

slide-105
SLIDE 105

Centre for Central Banking Studies Modelling and Forecasting 23

Likelihood ratio test

  • Let there be some restriction on the parameters of a model

θ such that Rθ − r = 0

  • Calculate the ratio

λ = L(ˆ θ

r)

L(ˆ θ) < 1

  • For the two estimates of θ where we use the superscript r to indicate an

alternative which is restricted in some way

  • If we know some critical value for the size we could use

this as a test

  • There are examples where this is true, but we usually

construct the following asymptotic test

slide-106
SLIDE 106

Centre for Central Banking Studies Modelling and Forecasting 23

Likelihood ratio test

  • Let there be some restriction on the parameters of a model

θ such that Rθ − r = 0

  • Calculate the ratio

λ = L(ˆ θ

r)

L(ˆ θ) < 1

  • For the two estimates of θ where we use the superscript r to indicate an

alternative which is restricted in some way

  • If we know some critical value for the size we could use

this as a test

  • There are examples where this is true, but we usually

construct the following asymptotic test

slide-107
SLIDE 107

Centre for Central Banking Studies Modelling and Forecasting 23

Likelihood ratio test

  • Let there be some restriction on the parameters of a model

θ such that Rθ − r = 0

  • Calculate the ratio

λ = L(ˆ θ

r)

L(ˆ θ) < 1

  • For the two estimates of θ where we use the superscript r to indicate an

alternative which is restricted in some way

  • If we know some critical value for the size we could use

this as a test

  • There are examples where this is true, but we usually

construct the following asymptotic test

slide-108
SLIDE 108

Centre for Central Banking Studies Modelling and Forecasting 23

Likelihood ratio test

  • Let there be some restriction on the parameters of a model

θ such that Rθ − r = 0

  • Calculate the ratio

λ = L(ˆ θ

r)

L(ˆ θ) < 1

  • For the two estimates of θ where we use the superscript r to indicate an

alternative which is restricted in some way

  • If we know some critical value for the size we could use

this as a test

  • There are examples where this is true, but we usually

construct the following asymptotic test

slide-109
SLIDE 109

Centre for Central Banking Studies Modelling and Forecasting 24

Likelihood ratio test

  • Write a 2nd order Taylor series approximation

ln L(θr) ≈ ln L(θ) + ∂ ln L(θ) ∂θ ′ (θr − θ) + 1 2(θr − θ)′ ∂2 ln L(θ) ∂θ∂θ′

  • (θr − θ)

= ln L(θ) + S(θ)′(θr − θ) − 1 2(θr − θ)′I(θ)(θr − θ) = ln L(θ) − 1 2(θr − θ)′I(θ)(θr − θ)

  • The Taylor expansion is about the unrestricted value
  • At the unrestricted optimum S(θ) = 0
slide-110
SLIDE 110

Centre for Central Banking Studies Modelling and Forecasting 24

Likelihood ratio test

  • Write a 2nd order Taylor series approximation

ln L(θr) ≈ ln L(θ) + ∂ ln L(θ) ∂θ ′ (θr − θ) + 1 2(θr − θ)′ ∂2 ln L(θ) ∂θ∂θ′

  • (θr − θ)

= ln L(θ) + S(θ)′(θr − θ) − 1 2(θr − θ)′I(θ)(θr − θ) = ln L(θ) − 1 2(θr − θ)′I(θ)(θr − θ)

  • The Taylor expansion is about the unrestricted value
  • At the unrestricted optimum S(θ) = 0
slide-111
SLIDE 111

Centre for Central Banking Studies Modelling and Forecasting 24

Likelihood ratio test

  • Write a 2nd order Taylor series approximation

ln L(θr) ≈ ln L(θ) + ∂ ln L(θ) ∂θ ′ (θr − θ) + 1 2(θr − θ)′ ∂2 ln L(θ) ∂θ∂θ′

  • (θr − θ)

= ln L(θ) + S(θ)′(θr − θ) − 1 2(θr − θ)′I(θ)(θr − θ) = ln L(θ) − 1 2(θr − θ)′I(θ)(θr − θ)

  • The Taylor expansion is about the unrestricted value
  • At the unrestricted optimum S(θ) = 0
slide-112
SLIDE 112

Centre for Central Banking Studies Modelling and Forecasting 25

Likelihood ratio test

  • Estimate the model with and without the restriction
  • Construct the statistic:

−2

  • ln L(ˆ

θ

r) − ln L(ˆ

θ)

  • = (ˆ

θ

r − ˆ

θ)′I(ˆ θ)(ˆ θ

r − ˆ

θ) (LR)

  • This is the LR (likelihood ratio) statistic written −2 ln λ
  • Distributed χ2

m where m is the number of restrictions

involved in calculating ˆ θ

r

  • This holds asymptotically
slide-113
SLIDE 113

Centre for Central Banking Studies Modelling and Forecasting 25

Likelihood ratio test

  • Estimate the model with and without the restriction
  • Construct the statistic:

−2

  • ln L(ˆ

θ

r) − ln L(ˆ

θ)

  • = (ˆ

θ

r − ˆ

θ)′I(ˆ θ)(ˆ θ

r − ˆ

θ) (LR)

  • This is the LR (likelihood ratio) statistic written −2 ln λ
  • Distributed χ2

m where m is the number of restrictions

involved in calculating ˆ θ

r

  • This holds asymptotically
slide-114
SLIDE 114

Centre for Central Banking Studies Modelling and Forecasting 25

Likelihood ratio test

  • Estimate the model with and without the restriction
  • Construct the statistic:

−2

  • ln L(ˆ

θ

r) − ln L(ˆ

θ)

  • = (ˆ

θ

r − ˆ

θ)′I(ˆ θ)(ˆ θ

r − ˆ

θ) (LR)

  • This is the LR (likelihood ratio) statistic written −2 ln λ
  • Distributed χ2

m where m is the number of restrictions

involved in calculating ˆ θ

r

  • This holds asymptotically
slide-115
SLIDE 115

Centre for Central Banking Studies Modelling and Forecasting 25

Likelihood ratio test

  • Estimate the model with and without the restriction
  • Construct the statistic:

−2

  • ln L(ˆ

θ

r) − ln L(ˆ

θ)

  • = (ˆ

θ

r − ˆ

θ)′I(ˆ θ)(ˆ θ

r − ˆ

θ) (LR)

  • This is the LR (likelihood ratio) statistic written −2 ln λ
  • Distributed χ2

m where m is the number of restrictions

involved in calculating ˆ θ

r

  • This holds asymptotically
slide-116
SLIDE 116

Centre for Central Banking Studies Modelling and Forecasting 25

Likelihood ratio test

  • Estimate the model with and without the restriction
  • Construct the statistic:

−2

  • ln L(ˆ

θ

r) − ln L(ˆ

θ)

  • = (ˆ

θ

r − ˆ

θ)′I(ˆ θ)(ˆ θ

r − ˆ

θ) (LR)

  • This is the LR (likelihood ratio) statistic written −2 ln λ
  • Distributed χ2

m where m is the number of restrictions

involved in calculating ˆ θ

r

  • This holds asymptotically
slide-117
SLIDE 117

Centre for Central Banking Studies Modelling and Forecasting 26

Orienteering

slide-118
SLIDE 118

Centre for Central Banking Studies Modelling and Forecasting 27

Perils of Orienteering

slide-119
SLIDE 119

Centre for Central Banking Studies Modelling and Forecasting 28

Nonlinear optimisation

  • Suppose we have an unconstrained function f(·) which we

wish to maximise

  • Start by using a second-order Taylor expansion

f(θ) ≈ f(˜ θ) + (θ − ˜ θ)′g(˜ θ) + 1 2(θ − ˜ θ)′G(˜ θ)(θ − ˜ θ) = F(θ)

  • The gradient is:

g(˜ θ) = ∂f(˜ θ) ∂˜ θ

  • The Hessian is:

G(˜ θ) = ∂2f(˜ θ) ∂˜ θ∂˜ θ

slide-120
SLIDE 120

Centre for Central Banking Studies Modelling and Forecasting 28

Nonlinear optimisation

  • Suppose we have an unconstrained function f(·) which we

wish to maximise

  • Start by using a second-order Taylor expansion

f(θ) ≈ f(˜ θ) + (θ − ˜ θ)′g(˜ θ) + 1 2(θ − ˜ θ)′G(˜ θ)(θ − ˜ θ) = F(θ)

  • The gradient is:

g(˜ θ) = ∂f(˜ θ) ∂˜ θ

  • The Hessian is:

G(˜ θ) = ∂2f(˜ θ) ∂˜ θ∂˜ θ

slide-121
SLIDE 121

Centre for Central Banking Studies Modelling and Forecasting 28

Nonlinear optimisation

  • Suppose we have an unconstrained function f(·) which we

wish to maximise

  • Start by using a second-order Taylor expansion

f(θ) ≈ f(˜ θ) + (θ − ˜ θ)′g(˜ θ) + 1 2(θ − ˜ θ)′G(˜ θ)(θ − ˜ θ) = F(θ)

  • The gradient is:

g(˜ θ) = ∂f(˜ θ) ∂˜ θ

  • The Hessian is:

G(˜ θ) = ∂2f(˜ θ) ∂˜ θ∂˜ θ

slide-122
SLIDE 122

Centre for Central Banking Studies Modelling and Forecasting 28

Nonlinear optimisation

  • Suppose we have an unconstrained function f(·) which we

wish to maximise

  • Start by using a second-order Taylor expansion

f(θ) ≈ f(˜ θ) + (θ − ˜ θ)′g(˜ θ) + 1 2(θ − ˜ θ)′G(˜ θ)(θ − ˜ θ) = F(θ)

  • The gradient is:

g(˜ θ) = ∂f(˜ θ) ∂˜ θ

  • The Hessian is:

G(˜ θ) = ∂2f(˜ θ) ∂˜ θ∂˜ θ

slide-123
SLIDE 123

Centre for Central Banking Studies Modelling and Forecasting 29

Nonlinear optimisation

  • We can find the optimum of the approximating function

straightforwardly as ∂F(θ) ∂θ = g(˜ θ) + G(˜ θ)(θ − ˜ θ) = 0

  • Can be rearranged to give

θ = ˜ θ − G(˜ θ)

−1g(˜

θ)

  • If the underlying function is actually quadratic then this

converges in one step

  • What should we be careful about?
slide-124
SLIDE 124

Centre for Central Banking Studies Modelling and Forecasting 29

Nonlinear optimisation

  • We can find the optimum of the approximating function

straightforwardly as ∂F(θ) ∂θ = g(˜ θ) + G(˜ θ)(θ − ˜ θ) = 0

  • Can be rearranged to give

θ = ˜ θ − G(˜ θ)

−1g(˜

θ)

  • If the underlying function is actually quadratic then this

converges in one step

  • What should we be careful about?
slide-125
SLIDE 125

Centre for Central Banking Studies Modelling and Forecasting 29

Nonlinear optimisation

  • We can find the optimum of the approximating function

straightforwardly as ∂F(θ) ∂θ = g(˜ θ) + G(˜ θ)(θ − ˜ θ) = 0

  • Can be rearranged to give

θ = ˜ θ − G(˜ θ)

−1g(˜

θ)

  • If the underlying function is actually quadratic then this

converges in one step

  • What should we be careful about?
slide-126
SLIDE 126

Centre for Central Banking Studies Modelling and Forecasting 29

Nonlinear optimisation

  • We can find the optimum of the approximating function

straightforwardly as ∂F(θ) ∂θ = g(˜ θ) + G(˜ θ)(θ − ˜ θ) = 0

  • Can be rearranged to give

θ = ˜ θ − G(˜ θ)

−1g(˜

θ)

  • If the underlying function is actually quadratic then this

converges in one step

  • What should we be careful about?
slide-127
SLIDE 127

Centre for Central Banking Studies Modelling and Forecasting 30

The Newton-Raphson algorithm

  • If not at maximum, we update guess until there is a

‘sufficiently small’ change in the parameter vector or the gradient is close enough to zero

  • We use the recursive formula

θτ+1 = θτ − G(θτ)−1g(θτ) = θτ − dτ

  • The expression dτ is the ascent direction
  • This works exactly like orienteering; question how far to go

before checking terrain again...

  • Here, the step size is fixed to 1
slide-128
SLIDE 128

Centre for Central Banking Studies Modelling and Forecasting 30

The Newton-Raphson algorithm

  • If not at maximum, we update guess until there is a

‘sufficiently small’ change in the parameter vector or the gradient is close enough to zero

  • We use the recursive formula

θτ+1 = θτ − G(θτ)−1g(θτ) = θτ − dτ

  • The expression dτ is the ascent direction
  • This works exactly like orienteering; question how far to go

before checking terrain again...

  • Here, the step size is fixed to 1
slide-129
SLIDE 129

Centre for Central Banking Studies Modelling and Forecasting 30

The Newton-Raphson algorithm

  • If not at maximum, we update guess until there is a

‘sufficiently small’ change in the parameter vector or the gradient is close enough to zero

  • We use the recursive formula

θτ+1 = θτ − G(θτ)−1g(θτ) = θτ − dτ

  • The expression dτ is the ascent direction
  • This works exactly like orienteering; question how far to go

before checking terrain again...

  • Here, the step size is fixed to 1
slide-130
SLIDE 130

Centre for Central Banking Studies Modelling and Forecasting 30

The Newton-Raphson algorithm

  • If not at maximum, we update guess until there is a

‘sufficiently small’ change in the parameter vector or the gradient is close enough to zero

  • We use the recursive formula

θτ+1 = θτ − G(θτ)−1g(θτ) = θτ − dτ

  • The expression dτ is the ascent direction
  • This works exactly like orienteering; question how far to go

before checking terrain again...

  • Here, the step size is fixed to 1
slide-131
SLIDE 131

Centre for Central Banking Studies Modelling and Forecasting 30

The Newton-Raphson algorithm

  • If not at maximum, we update guess until there is a

‘sufficiently small’ change in the parameter vector or the gradient is close enough to zero

  • We use the recursive formula

θτ+1 = θτ − G(θτ)−1g(θτ) = θτ − dτ

  • The expression dτ is the ascent direction
  • This works exactly like orienteering; question how far to go

before checking terrain again...

  • Here, the step size is fixed to 1
slide-132
SLIDE 132

Centre for Central Banking Studies Modelling and Forecasting 31

The Newton-Raphson algorithm (ctd.)

  • A simple modification to this iterative step is to include an

additional parameter to give θτ+1 = θτ − ατdτ

  • Where ατ is the line search parameter
  • Can chose ατ to maximise the amount of ascent
  • Broadly:
  • d tells you which direction to go in
  • α tells you how far
slide-133
SLIDE 133

Centre for Central Banking Studies Modelling and Forecasting 31

The Newton-Raphson algorithm (ctd.)

  • A simple modification to this iterative step is to include an

additional parameter to give θτ+1 = θτ − ατdτ

  • Where ατ is the line search parameter
  • Can chose ατ to maximise the amount of ascent
  • Broadly:
  • d tells you which direction to go in
  • α tells you how far
slide-134
SLIDE 134

Centre for Central Banking Studies Modelling and Forecasting 31

The Newton-Raphson algorithm (ctd.)

  • A simple modification to this iterative step is to include an

additional parameter to give θτ+1 = θτ − ατdτ

  • Where ατ is the line search parameter
  • Can chose ατ to maximise the amount of ascent
  • Broadly:
  • d tells you which direction to go in
  • α tells you how far
slide-135
SLIDE 135

Centre for Central Banking Studies Modelling and Forecasting 31

The Newton-Raphson algorithm (ctd.)

  • A simple modification to this iterative step is to include an

additional parameter to give θτ+1 = θτ − ατdτ

  • Where ατ is the line search parameter
  • Can chose ατ to maximise the amount of ascent
  • Broadly:
  • d tells you which direction to go in
  • α tells you how far
slide-136
SLIDE 136

Centre for Central Banking Studies Modelling and Forecasting 31

The Newton-Raphson algorithm (ctd.)

  • A simple modification to this iterative step is to include an

additional parameter to give θτ+1 = θτ − ατdτ

  • Where ατ is the line search parameter
  • Can chose ατ to maximise the amount of ascent
  • Broadly:
  • d tells you which direction to go in
  • α tells you how far
slide-137
SLIDE 137

Centre for Central Banking Studies Modelling and Forecasting 31

The Newton-Raphson algorithm (ctd.)

  • A simple modification to this iterative step is to include an

additional parameter to give θτ+1 = θτ − ατdτ

  • Where ατ is the line search parameter
  • Can chose ατ to maximise the amount of ascent
  • Broadly:
  • d tells you which direction to go in
  • α tells you how far
slide-138
SLIDE 138

Centre for Central Banking Studies Modelling and Forecasting 32

Variations

  • The most expensive part of this algorithm is calculating

G(θ)−1, which is sometimes replaced by I, when it is called steepest ascent

  • Gradient based Quasi Newton methods popular: e.g. DFP

(Davidon-Fletcher-Powell) and BFGS (Broyden-Fletcher-Goldfarb-Shanno)

  • ‘Broyden’ frequently used in maximum likelihood problems
  • Very complex nonlinear and discontinuous functions may

require non-derivative based direct search methods

  • These are slow to converge
slide-139
SLIDE 139

Centre for Central Banking Studies Modelling and Forecasting 32

Variations

  • The most expensive part of this algorithm is calculating

G(θ)−1, which is sometimes replaced by I, when it is called steepest ascent

  • Gradient based Quasi Newton methods popular: e.g. DFP

(Davidon-Fletcher-Powell) and BFGS (Broyden-Fletcher-Goldfarb-Shanno)

  • ‘Broyden’ frequently used in maximum likelihood problems
  • Very complex nonlinear and discontinuous functions may

require non-derivative based direct search methods

  • These are slow to converge
slide-140
SLIDE 140

Centre for Central Banking Studies Modelling and Forecasting 32

Variations

  • The most expensive part of this algorithm is calculating

G(θ)−1, which is sometimes replaced by I, when it is called steepest ascent

  • Gradient based Quasi Newton methods popular: e.g. DFP

(Davidon-Fletcher-Powell) and BFGS (Broyden-Fletcher-Goldfarb-Shanno)

  • ‘Broyden’ frequently used in maximum likelihood problems
  • Very complex nonlinear and discontinuous functions may

require non-derivative based direct search methods

  • These are slow to converge
slide-141
SLIDE 141

Centre for Central Banking Studies Modelling and Forecasting 32

Variations

  • The most expensive part of this algorithm is calculating

G(θ)−1, which is sometimes replaced by I, when it is called steepest ascent

  • Gradient based Quasi Newton methods popular: e.g. DFP

(Davidon-Fletcher-Powell) and BFGS (Broyden-Fletcher-Goldfarb-Shanno)

  • ‘Broyden’ frequently used in maximum likelihood problems
  • Very complex nonlinear and discontinuous functions may

require non-derivative based direct search methods

  • These are slow to converge
slide-142
SLIDE 142

Centre for Central Banking Studies Modelling and Forecasting 32

Variations

  • The most expensive part of this algorithm is calculating

G(θ)−1, which is sometimes replaced by I, when it is called steepest ascent

  • Gradient based Quasi Newton methods popular: e.g. DFP

(Davidon-Fletcher-Powell) and BFGS (Broyden-Fletcher-Goldfarb-Shanno)

  • ‘Broyden’ frequently used in maximum likelihood problems
  • Very complex nonlinear and discontinuous functions may

require non-derivative based direct search methods

  • These are slow to converge
slide-143
SLIDE 143

Centre for Central Banking Studies Modelling and Forecasting 33

Scoring

  • For maximum likelihood, one method often used is the

method of scoring θτ+1 = θτ + I(θτ)−1S(θτ)

  • The known form of the information matrix may make the iteration cheap as

we can calculate the inverse directly

  • It is often modified to include a line search parameter
  • Note that the efficent score is zero at the optimum so the

updates terminate

slide-144
SLIDE 144

Centre for Central Banking Studies Modelling and Forecasting 33

Scoring

  • For maximum likelihood, one method often used is the

method of scoring θτ+1 = θτ + I(θτ)−1S(θτ)

  • The known form of the information matrix may make the iteration cheap as

we can calculate the inverse directly

  • It is often modified to include a line search parameter
  • Note that the efficent score is zero at the optimum so the

updates terminate

slide-145
SLIDE 145

Centre for Central Banking Studies Modelling and Forecasting 33

Scoring

  • For maximum likelihood, one method often used is the

method of scoring θτ+1 = θτ + I(θτ)−1S(θτ)

  • The known form of the information matrix may make the iteration cheap as

we can calculate the inverse directly

  • It is often modified to include a line search parameter
  • Note that the efficent score is zero at the optimum so the

updates terminate

slide-146
SLIDE 146

Centre for Central Banking Studies Modelling and Forecasting 33

Scoring

  • For maximum likelihood, one method often used is the

method of scoring θτ+1 = θτ + I(θτ)−1S(θτ)

  • The known form of the information matrix may make the iteration cheap as

we can calculate the inverse directly

  • It is often modified to include a line search parameter
  • Note that the efficent score is zero at the optimum so the

updates terminate

slide-147
SLIDE 147

Centre for Central Banking Studies Modelling and Forecasting 34

Fundamentals of Bayesian Econometrics

  • Classical econometrics (e.g. ML) treats the parameters of

a model as fixed, unknown constants to be estimated

  • Within the Bayesian framework, the parameters are treated

as random variables that have probability distributions

  • Prior distributions are used to summarise ex-ante

information about parameters and are then updated by sample information (the likelihood function) to form a posterior distribution

  • The procedure thus provides a natural framework for

accommodating different sources of information and thereby sharpening macroeconomic analysis

slide-148
SLIDE 148

Centre for Central Banking Studies Modelling and Forecasting 34

Fundamentals of Bayesian Econometrics

  • Classical econometrics (e.g. ML) treats the parameters of

a model as fixed, unknown constants to be estimated

  • Within the Bayesian framework, the parameters are treated

as random variables that have probability distributions

  • Prior distributions are used to summarise ex-ante

information about parameters and are then updated by sample information (the likelihood function) to form a posterior distribution

  • The procedure thus provides a natural framework for

accommodating different sources of information and thereby sharpening macroeconomic analysis

slide-149
SLIDE 149

Centre for Central Banking Studies Modelling and Forecasting 34

Fundamentals of Bayesian Econometrics

  • Classical econometrics (e.g. ML) treats the parameters of

a model as fixed, unknown constants to be estimated

  • Within the Bayesian framework, the parameters are treated

as random variables that have probability distributions

  • Prior distributions are used to summarise ex-ante

information about parameters and are then updated by sample information (the likelihood function) to form a posterior distribution

  • The procedure thus provides a natural framework for

accommodating different sources of information and thereby sharpening macroeconomic analysis

slide-150
SLIDE 150

Centre for Central Banking Studies Modelling and Forecasting 34

Fundamentals of Bayesian Econometrics

  • Classical econometrics (e.g. ML) treats the parameters of

a model as fixed, unknown constants to be estimated

  • Within the Bayesian framework, the parameters are treated

as random variables that have probability distributions

  • Prior distributions are used to summarise ex-ante

information about parameters and are then updated by sample information (the likelihood function) to form a posterior distribution

  • The procedure thus provides a natural framework for

accommodating different sources of information and thereby sharpening macroeconomic analysis

slide-151
SLIDE 151

Centre for Central Banking Studies Modelling and Forecasting 35

Advantages of Bayesian Techniques

  • Bayesian estimation of econometric models has become

increasingly popular:

1 Provides an easy way of incorporating parameter restrictions making estimation of more complicated models possible 2 Provides usable results in small samples 3 Evidence of improved forecasting performance

slide-152
SLIDE 152

Centre for Central Banking Studies Modelling and Forecasting 35

Advantages of Bayesian Techniques

  • Bayesian estimation of econometric models has become

increasingly popular:

1 Provides an easy way of incorporating parameter restrictions making estimation of more complicated models possible 2 Provides usable results in small samples 3 Evidence of improved forecasting performance

slide-153
SLIDE 153

Centre for Central Banking Studies Modelling and Forecasting 35

Advantages of Bayesian Techniques

  • Bayesian estimation of econometric models has become

increasingly popular:

1 Provides an easy way of incorporating parameter restrictions making estimation of more complicated models possible 2 Provides usable results in small samples 3 Evidence of improved forecasting performance

slide-154
SLIDE 154

Centre for Central Banking Studies Modelling and Forecasting 35

Advantages of Bayesian Techniques

  • Bayesian estimation of econometric models has become

increasingly popular:

1 Provides an easy way of incorporating parameter restrictions making estimation of more complicated models possible 2 Provides usable results in small samples 3 Evidence of improved forecasting performance

slide-155
SLIDE 155

Centre for Central Banking Studies Modelling and Forecasting 36

References

  • Greene, W. H. (1997). Econometric Analysis (third ed.).

McGraw Hill

  • Press, W., S. Teukolsky, W. Vetterling, and B. Flannery

(1992). Numerical Recipes in C: The Art of Scientific Computing (Second ed.). Cambridge: Cambridge University Press