Judgment in forecasting Nigel Harvey University College London ISF - - PowerPoint PPT Presentation

judgment in forecasting nigel harvey university college
SMART_READER_LITE
LIVE PREVIEW

Judgment in forecasting Nigel Harvey University College London ISF - - PowerPoint PPT Presentation

Judgment in forecasting Nigel Harvey University College London ISF Thessaloniki 2019 Applied psychology in forecasting research: Topics Ill cover Applied psychology focusses on task performance: a) the characteristics of performance, b)


slide-1
SLIDE 1

Judgment in forecasting Nigel Harvey University College London

ISF Thessaloniki 2019

slide-2
SLIDE 2

Applied psychology in forecasting research: Topics I’ll cover

  • Applied psychology focusses on task performance: a) the

characteristics of performance, b) factors that affect performance, and c) methods for improving performance.

  • Forecasting has been subjected to this approach (mainly by

management scientists) because every stage of the forecasting process involves some judgment. The main tasks are a) judgmental forecasting and b) judgmental adjustment of statistical forecasts.

  • As in most applied science, experimental methods are important.

Experiments are models of the real system. We find things out about the model, see if they are true of the real system, and, if not, include more features of the system in the model until they are true.

ISF Thessaloniki 2019

slide-3
SLIDE 3

Applied psychology : A little institutional history

  • In the UK, research into applied psychology was funded in the

1940s by the government to meet the needs of the war effort. Research, led by Kenneth Craik at the Cambridge Applied Psychology Unit, focussed on tracking tasks (target acquisition, piloting) and vigilance tasks (radar).

  • Despite the applied orientation, theoretical advances were made.
  • They found that human identification and anticipation of signals in

continuous tracking tasks improves with practice: people start by correcting position errors and, then, successively, learn to eliminate errors in velocity, acceleration, and jerk.

ISF Thessaloniki 2019

slide-4
SLIDE 4

Applied psychology : A little personal history

  • I took over George Drew’s motor skills teaching at UCL (1976).
  • I studied step tracking: people saw an array of windows, moved a

cursor to where they judged the next signal to be, corrected if necessary, and repeated. An AR(2) algorithm produced the signal: parameters were set to give a nondeterministic seasonal pattern.

  • To meet needs of 1980s industry, UK funding moved from motor to

cognitive skills. To obtain funding, I adapted my step-tracking program to examine judgmental forecasting. Participants produced 100 forecasts from a rolling window of the signal, generated 100 instances and repeated. They acquired AR(1) then AR(2). But performance in forecasting and generating tasks was uncorrelated.

ISF Thessaloniki 2019

slide-5
SLIDE 5

Business forecasting: Changing forecasting practice

Fildes & Goodwin (2007) Fildes & Petropoulos (2014) Judgment alone 25% 15.6% Statistics alone 25% 28.7% Average of judgmental 17% 18.5% and statistical forecasts Judgmental adjustment 34% 37.1%

  • f statistical forecasts
  • Statistics may be basic use of Excel
  • There have been many other surveys, reviewed by De Baets (2019).
  • Pure judgmental forecasting had become considerably less common and each of the
  • ther three approaches slightly more common. Here I consider a) judgmental forecasting,

b) judgmental adjustment, and c) judgmental selection of forecasting models.

ISF Thessaloniki 2019

slide-6
SLIDE 6

Judgmental forecasting

ISF Thessaloniki 2019

slide-7
SLIDE 7

Characteristics of performance in judgmental forecasting tasks

  • Trend damping (e.g., Bolger & Harvey, 1993; Eggleton, 1982;

Keren, 1983; Lawrence & Makridakis, 1989).

  • Misperception of sequential dependence (e.g., Bolger & Harvey,

1993; Eggleton, 1982; Reimers & Harvey, 2011).

  • Framing effects – e.g., over-forecasting of desirable variables

and under-forecasting of undesirable ones (e.g., Eggleton, 1982; Harvey & Reimers, 2013; Lawrence & Makridakis, 1989).

  • Addition of noise to forecasts (e.g., Harvey, 1995; Harvey, Ewart

& West, 1997).

ISF Thessaloniki 2019

slide-8
SLIDE 8

Trend damping

  • Forecasts for upward trends are too low; those for downward ones

are too high. (Not a context effect: it occurs with a single trial.)

  • Effects are greater with positively accelerated functions but they are

clearly present with linear ones.

  • For very shallow trends, the opposite effect (anti-damping) can
  • ccur.
  • Two broad accounts: (1) under-adjustment from anchor provided by

the last data point (on the trend line on average); (2) people expect from their real-life experience with the ecology that accelerating trends will become sigmoid or will be part of long-term cycles.

ISF Thessaloniki 2019

slide-9
SLIDE 9

Trend damping: typical data series and judgmental forecasts

ISF Thessaloniki 2019

slide-10
SLIDE 10

Typical damping with accelerating upward trend

ISF Thessaloniki 2019

50 100 150 200 250 300 350 400 450 500 1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 Time Prediction

300 320 340 360 380 400 420 440 460 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 Time Prediction

slide-11
SLIDE 11

Anti-damping found with shallow decelerating trend

ISF Thessaloniki 2019

50 100 150 200 250 300 350 400 450 500 1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 Time Prediction 300 320 340 360 380 400 420 440 460 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 Time Prediction

slide-12
SLIDE 12

Misperception of sequential dependence

  • People forecast from un-trended independent series as if they see

some sequential dependence in them. Forecasts should lie on the mean but they are much too close to the last data point.

  • With high degrees of sequential dependence (e.g., AR1 = 0.8), the
  • pposite effect occurs. Forecasts are too far from the last data point.
  • Under-adjustment from the anchor provided by the last data point

can explain the error when sequential dependence is absent or low but not when it is high.

  • Real-life data series tend to show modest autocorrelation. People

use this ecological knowledge as an a priori hypothesis and make adjustment from it on the basis of limited and noisy data series.

ISF Thessaloniki 2019

slide-13
SLIDE 13

Misperception of sequential dependence: typical data series and judgmental forecasts

ISF Thessaloniki 2019

slide-14
SLIDE 14

Measuring the perceived autocorrelation implied by participants’ forecasts

ISF Thessaloniki 2019

slide-15
SLIDE 15

Misperception of sequetial dependence: Cumulative distribution

ISF Thessaloniki 2019

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

  • 1.0
  • 0.9
  • 0.8
  • 0.7
  • 0.6
  • 0.5
  • 0.4
  • 0.3
  • 0.2
  • 0.1

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0 Implied Autocorrelation Cumulative Frequency

0.0 Autocorrelation 0.4 Autocorrelation 0.8 Autocorrelation

slide-16
SLIDE 16

Framing effects

  • People tend to over-forecast from a series when it is labelled as

representing something desirable (e.g., profits) but to under-forecast from exactly the same series when it is labelled as representing something undesirable (e.g., losses).

  • The most likely explanation is that people are unknowingly affected

by optimism (Weinstein, 1980).

  • However, it can also be argued that people expect action to be

taken to reverse upward trends of undesirable quantities and downward trends of desirable ones (O’Connor, Remus & Griggs, 1997).

ISF Thessaloniki 2019

slide-17
SLIDE 17

Framing effects: ‘profit’ versus ‘loss’ labels

ISF Thessaloniki 2019

slide-18
SLIDE 18

Addition of noise to forecasts

  • When people make forecasts from noisy series, their forecasts are

scattered around the line representing optimal forecasts.

  • This is so despite the forecasters knowing that they should be

forecasting the most likely point of the true outcomes rather than representing the sort of series that will appear after the outcomes are known. The effect has been found with forecasters who are familiar with regression.

  • It is possible that people see illusory patterns in the noise and

attempt to take those apparent patterns into account when making their forecasts.

ISF Thessaloniki 2019

slide-19
SLIDE 19

Appropriate forecasts from noisy series

ISF Thessaloniki 2019

slide-20
SLIDE 20

More noise is added when series are noisier

ISF Thessaloniki 2019

slide-21
SLIDE 21

Expectations for real series

  • The findings demonstrate the effects with artificial time series generated

via an algorithm. Would they appear for real series? Lawrence & O’Connor (1995) found that, on average, forecasts were not too close to the last data point with real series.

  • This is exactly what would be expected from the ecological account of the

‘biases’ that have been found. Had Lawrence and O’Connor (1995) looked only at real series that showed independence, they would have found forecasts too close to the last data point. Had they looked only at real series with strong sequential dependence, they would have found forecasts too far from the last data point. But averaged across all the real series (representative of the ecology), there was no overall bias.

ISF Thessaloniki 2019

slide-22
SLIDE 22

Some factors affecting judgmental forecasting performance

  • Prior forecasting context (Harvey & Reimers, 2013; Reimers &

Harvey, 2011).

  • Graphical or tabular format used to present data series (Harvey

& Bolger, 1996; Lawrence, 1983; Lawrence, Edmundson & O’Connor, 1985; Theocharis, Smith & Harvey, 2019).

  • Length of data series (Andersson et al, 2012; Lawrence &

O’Connor, 1992; Theocharis & Harvey, 2019; Wagenaar & Timmers, 1978).

  • Order in which a sequence of forecasts are made (Theocharis &

Harvey, 2016).

ISF Thessaloniki 2019

slide-23
SLIDE 23

Prior forecasting context

  • People’s forecasts from trended series are affected by the

steepness of trends in other series they have previously forecast. After forecasting from shallow-trended series, they expect trends to be shallower than after forecasting from series with steeper trends.

  • Similarly, if people have previously forecast from series that were

highly sequentially dependent, they expect later series to be more sequentially dependent than if they have previously forecast from less sequentially dependent series.

  • Implications for situations where people make many forecast in a

session – e.g., many SKUs in demand forecasting.

ISF Thessaloniki 2019

slide-24
SLIDE 24

Noisy power law functions with exponent k = 1.5 (left) and 2.0 (right) Prior low context: k = 0.2, 0.4, 1.0, 1.5, 2.0 Prior high context: k = 1.25, 1.5, 1.75, 2.0, 2.25

ISF Thessaloniki 2019

300 320 340 360 380 400 420 440 460 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 Time Prediction 300 320 340 360 380 400 420 440 460 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 Time Prediction

slide-25
SLIDE 25

Forecasting series with AR1 = 0.4 after forecasting series with AR1 = 0.0, 0.1, 0.2, & 0.3 (low context) or series with AR1 = 0.5, 0.6, 0.7 & 0.8 (high context).

ISF Thessaloniki 2019

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

  • 1.0
  • 0.9
  • 0.8
  • 0.7
  • 0.6
  • 0.5
  • 0.4
  • 0.3
  • 0.2
  • 0.1

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0 Implied Autocorrelation Cumulative Frequency

Low Context High Context

slide-26
SLIDE 26

Effects of data format used to present series and make forecasts

  • Forecasting trended series is better from graphs but there may be

some advantage of tabular presentation with un-trended series.

  • Bar graphs yield lower forecasts than line or point graphs – worse

for upward trended series (more damping) but better for downward trended series (less damping). The effect is reversed for hanging bar graphs – they yield higher forecasts than line or point graphs. This format effect may be related to where attention is drawn.

  • With un-trended series, misperception of sequential dependence is

greater in line graphs than point graphs. The line connecting points appears to imply some dependence between them.

  • Implications for forecasting support systems?

ISF Thessaloniki 2019

slide-27
SLIDE 27

Graphical versus tabular format presentation of data series

No trend Up trend Down trend Graphical ME -1.09 2.52 -2.60 VE 1.75 1.92 1.84 RMSE 3.12 4.74 4.46 Tabular ME -0.35 4.44 -4.02 VE 1.57 1.93 1.92 RMSE 3.06 5.91 5.32

ISF Thessaloniki 2019

slide-28
SLIDE 28

Forecasting from bar, point, and line graphs

ISF Thessaloniki 2019

slide-29
SLIDE 29

Lower forecasts from bar graphs than line or point graphs

ISF Thessaloniki 2019

slide-30
SLIDE 30

Effect reverses with hanging bars below the x-axis

ISF Thessaloniki 2019

slide-31
SLIDE 31

ISF Thessaloniki 2019

0.5 1 1.5 2 2.5 3 3.5 1 2 3 4 5 ADFM Horizon Continuous (Lines) Discrete (Points) Exp Sm (alpha=0.2)

Forecasts from independent series are closer to the last data point (further from the mean) with line graphs than point graphs

slide-32
SLIDE 32

Effects of length of data series presented to forecasters

  • Early work from different laboratories showed forecast accuracy to

be higher when fewer data points are presented to forecasters. It was suggested that people can be overloaded by too much information and this damaged their performance.

  • Later findings from experiments using a different range of numbers
  • f data points yielded the expected result: accuracy was better with

more data points.

  • Now it appears that accuracy is nonlinearly related to number of

data points with lowest accuracy around 5-10 data points.

ISF Thessaloniki 2019

slide-33
SLIDE 33

Forecasting error as a function of series length for various types

  • f data: a) linear trend b) cyclical trend, c) fractal d) AR1 = 0.9

ISF Thessaloniki 2019

a) (b) (c) (d)

slide-34
SLIDE 34

Order in which forecasts for different horizons are made

  • Forecasters usually make forecasts starting with the most

immediate horizon and then working out to the most distant horizon.

  • Each forecast contains some random (variable) error. As each

forecasts act as an anchor for next one, this means random error accumulates across horizons.

  • If the most distant horizon is forecast first, it will still contain random

error but it will act as an anchor to constrain the values of the forecasts for earlier horizons. The advantage of this end-anchoring will be greater for more distant horizons because with normal forecasting random error accumulates across horizons.

ISF Thessaloniki 2019

slide-35
SLIDE 35

Effects of end-anchoring for seasonally trended, linearly trended and un-trended autocorrelated series.

No end-anchoring ––o–– End-anchoring – – o – –

ISF Thessaloniki 2019

slide-36
SLIDE 36

Approaches to improving performance

  • Personnel selection: Eroglu and Croxton (2010) studied

judgmental adjustment of statistical forecasts rather than judgmental

  • forecasting. Openness to experience, extraversion, agreeableness,

and conscientiousness each increased some biases but decreased

  • thers. So these variables do not provide a good basis for selection.
  • Training: There is evidence that experienced forecasters perform

no better than lay people in judgmental forecasting from time series information alone (Lawrence et al, 1985). Experience may still help in assessment of effects of interventions (e.g., promotions).

  • Ergonomics: It should be possible to facilitate performance with

forecasting support systems that provide forecasters with advice.

ISF Thessaloniki 2019

slide-37
SLIDE 37

Judgmental adjustment of statistical forecasts

ISF Thessaloniki 2019

slide-38
SLIDE 38

Increased popularity suggests a decrease in algorithm aversion

  • Meehl (1954) showed algorithms improve clinical decisions. But now

50 years later, doctors are still reluctant to use them (Grove, 2005).

  • People may be reluctant to become dependent on algorithms in case

they lose their skills and become unable to perform tasks when they are unavailable or inappropriate (Bainbridge, 1983).

  • Dietvorst et al (2014) showed people using algorithms hold them to

higher standards than they expect from people. So when an algorithm errs, they tend to lose all confidence in it. In contrast, when humans err, they provide excuses even for experts (tiredness, momentary lapse of attention). Algorithms cannot be excused.

  • Use of algorithms can be increased by allowing people to make

changes to the result (Dietvorst, Simmons & Massey, 2016).

ISF Thessaloniki 2019

slide-39
SLIDE 39

Characteristics of performance

  • Fildes, Goodwin, Lawrence & Nikopoulos (2009) analysed 60,000

forecasts from four supply-chain companies. They found statistical forecasts helped in three of them. But larger adjustments increased accuracy (presumably from genuine knowledge) whereas smaller

  • nes damaged it (presumably by adding noise).
  • Also, positive upward adjustments were less likely to imporve

accuracy and were more often in the wrong direction. Like results from Franses & Legerstee (2009), this suggests an optimism effect.

  • Two of the four characteristics of performance (noise addition,
  • ptimism) are the same as those in judgmental forecasting. What of
  • ther two (trend damping, misperception of sequential dependence)?

ISF Thessaloniki 2019

slide-40
SLIDE 40

Incorporating promotional effects

  • In non-experimental work, it is difficult to separate out periods with

and without promotions. SFs are based on data from all periods.

  • In Goodwin and Fildes (1999) experimental research, they based

the statistical forecasts on non-promotional periods only. We might expect these ‘cleansed’ statistical forecasts to be more effective because they provide a baseline for forecasting non-promotional periods and effects of promotions just have to be added to them.

  • De Baets & Harvey (2018) ask a) whether judgmentally adjusted

SFs are better than judgmental forecasts, b) whether cleansed SFs help more, c) whether the % promotional periods is important.

ISF Thessaloniki 2019

slide-41
SLIDE 41

ISF Thessaloniki 2019

slide-42
SLIDE 42

Mean Absolute Error

1) SF helps but type does not matter 2) Fewer promotions in the data lowers MAE but

  • nly when no promotion is planned.

3) SF gives greater benefit with five than with 20

  • promotions. Type of SF does not matter.

PN NP/P No SF Non-cleansed SF Cleansed SF P=20 F=No promo 35.1 31.7 28.7 P=20 F= Promo 33.5 30.2 30.1 P=5 F= No promo 30.0 23.7 24.4 P=5 F= Promo 34.5 29.4 31.1

ISF Thessaloniki 2019

slide-43
SLIDE 43

Mean Error (+ means that forecast is too high)

1) Clear anchoring on series mean. Also, higher forecasts with more promotions because mean is higher. 2) Forecasts are lower with the non-cleansed SF than cleansed SF. The SF may contribute to anchoring and is lower when it is cleansed. 3) Failure to adjust down with no promotion is greater than failure to adjust upwards with promotion. Optimism about promotion effects?

No SF Non-cleansed SF Cleansed SF P=20 F=No promo 18.2 18.4 14.1 P=20 F= Promo

  • 8.1
  • 3.8
  • 6.5

P=5 F= No promo 12.6 9.8 8.1 P=5 F= Promo

  • 10.9
  • 7.0
  • 11.4

ISF Thessaloniki 2019

slide-44
SLIDE 44

Improving performance

  • Goodwin (2000) found that improved forecasts resulted from a)

making the statistical forecast the default that required an explicit request to be changed and b) requiring forecasters to record a reason for changing a forecast.

  • Goodwin, Fildes, Lawrence & Stephens (2010) prevented forecasters

from making small adjustments but this resulted in an increase in damaging large adjustments. Also guidance to focus adjustments on promotional periods was resented and largely ignored.

  • It is now possible to produce SFs that take account of promotional

effects in SFs. Would this help?

ISF Thessaloniki 2019

slide-45
SLIDE 45

SFs that take account of promotional effects

  • Recently, it has become possible to generate statistical forecasts

that take account of sporadic perturbations (Huang, Fildes, Soopramanien, 2014; Kourentzes & Petropoulos, 2016; Trapero, Pedregal, Fildes & Kourentzes, 2013).

  • However, there is usually a time gap between statistical advances

and their use by practitioners (Lawrence, 2000; Sanders & Mandrodt, 2003).

  • Would use of SFs that take account of promotional effects improve

forecast accuracy?

ISF Thessaloniki 2019

slide-46
SLIDE 46

SFs are based on the generating equation minus noise

ISF Thessaloniki 2019

slide-47
SLIDE 47

But some error persists even with optimal SFs

MAE Mean Error P = 20 F = No Promo 23.1 +2.5 P = 20 F = Promo 25.6 +2.0 P = 5 F = No promo 19.5

  • 0.6

P = 5 F = Promo 23.0

  • 2.8

MAE larger with more promotions in the data and for forecasts when promotions are planned. ME affected by number of promotions in the data. Problems with acceptance within organizations.

ISF Thessaloniki 2019

slide-48
SLIDE 48

Judgmental selection of statistical forecasting models

ISF Thessaloniki 2019

slide-49
SLIDE 49

Using judgment to select a statistical forecasting approach

  • Forecasting support systems such as ForecastPro offer an automatic

selection of appropriate forecasting techniques or the option of the forecaster making the selection. Can people do this?

  • In their selection condition, Petropoulos, Kourentzes, Nikopoulos &

Siemsen (2018) asked people to look at data (non-seasonal and non- trended; seasonal but non-trended; trended but non-seasonal; trended and seasonal) and then to select one of four exponential smoothing models, each appropriate for one of these types of data.

  • In their build condition, people decided if the data contained trends

and/or seasonality and the system used this to choose the model

ISF Thessaloniki 2019

slide-50
SLIDE 50

Petropoulos (In press): People were not as good as the system at selecting a model but they could identify data features that allowed selection of a model as well as the system

ISF Thessaloniki 2019

slide-51
SLIDE 51

Model selection when past performance of models is displayed

  • De Baets & Harvey (under review) displayed the past performance of

different models overlaid on outcomes. Forecasters could use this information as a basis. One group had to choose between good and bad models and another between good versus intermediate models. We expect the latter task to be more difficult.

  • Data and models were not as well-matched as in Petropoulos et al

(2018). Data series were either AR1 = 0.0, AR1 = 0.8, AR1 = -0.8.

  • Models were either naïve forecast (worst), exponential smoothing

(intermediate), or autoregressive parameterized with appropriate AR1.

ISF Thessaloniki 2019

slide-52
SLIDE 52

Intermediate forecast in orange; best forecast in green

ISF Thessaloniki 2019

slide-53
SLIDE 53

Best forecast in green; worst forecast in orange

ISF Thessaloniki 2019

slide-54
SLIDE 54

Model selection when past performance of models is displayed

  • People were significantly better at choosing between good and bad

forecasts than at choosing between good and intermediate ones.

ISF Thessaloniki 2019

53 54 55 56 57 58 59 60 61 62 63 64 low noise high noise

Percentage Correctly Identified

good-poor good-intermediate

slide-55
SLIDE 55

Model selection when past performance of models is displayed

  • But forecasters were punished more severely when choosing

between good and bad choices than when choosing between good and intermediate ones.

ISF Thessaloniki 2019

1 2 3 4 5 6 7 8 9 low noise high noise

MAE of selected models

good-poor good-intermediate

slide-56
SLIDE 56

However, the forecasters’ errors were less than the errors

  • btained by averaging the forecasts of the two models

Noise level Contrasted forecast qualities Participant MAE MAE of averaged models Df t p Noise: low Good vs. poor 4.87 (SD = 2.89) 7.83 (SD = 7.69) 41

  • 6.66

<.001 Good vs. intermediate 2.32 (SD = 1.06) 3.50 (SD = 3.12) 48

  • 7.89

<.001 Noise: high Good vs. poor 8.17 (SD = 5.29) 12.47 (SD = 11.88) 52

  • 5.93

<.001 Good vs. intermediate 3.73 (SD = 1.36) 4.60 (SD = 3.98) 46

  • 4.40

<.001

ISF Thessaloniki 2019

slide-57
SLIDE 57

Summary

  • We now know a considerable amount about judgmental forecasting.
  • Judgmental adjustment is a more complex topic but a good start has

been made. There are still questions in need of answers – e.g., when making adjustments do people increase trend damping and make forecasts that are based on misperception of sequential dependence in series.

  • Studies of judgmental choice of forecasting models are in their

infancy.

  • Still much to do.

ISF Thessaloniki 2019