Improving Subseasonal Forecasting in the Western U.S. Paulo - - PowerPoint PPT Presentation

improving subseasonal forecasting in the western u s
SMART_READER_LITE
LIVE PREVIEW

Improving Subseasonal Forecasting in the Western U.S. Paulo - - PowerPoint PPT Presentation

Introduction Forecast Rodeo Dataset Models Results Conclusion Improving Subseasonal Forecasting in the Western U.S. Paulo Orenstein March 22, 2019 Photo credit: IIP Photo Archive Joint work with Jessica Hwang, Lester Mackey, Judah Cohen,


slide-1
SLIDE 1

Introduction Forecast Rodeo Dataset Models Results Conclusion

Improving Subseasonal Forecasting in the Western U.S.

Paulo Orenstein

March 22, 2019

Photo credit: IIP Photo Archive

Joint work with Jessica Hwang, Lester Mackey, Judah Cohen, Karl Pfeiffer

Paulo Orenstein Improving Subseasonal Forecasting Stanford University 1 / 27

slide-2
SLIDE 2

Introduction Forecast Rodeo Dataset Models Results Conclusion

Goals

◮ Bring awareness to subseasonal forecasting, an important problem for water man-

agement and weather extremes

Paulo Orenstein Improving Subseasonal Forecasting Stanford University 2 / 27

slide-3
SLIDE 3

Introduction Forecast Rodeo Dataset Models Results Conclusion

Goals

◮ Bring awareness to subseasonal forecasting, an important problem for water man-

agement and weather extremes

◮ Introduce an example of a crowdsourced, social good project

Paulo Orenstein Improving Subseasonal Forecasting Stanford University 2 / 27

slide-4
SLIDE 4

Introduction Forecast Rodeo Dataset Models Results Conclusion

Goals

◮ Bring awareness to subseasonal forecasting, an important problem for water man-

agement and weather extremes

◮ Introduce an example of a crowdsourced, social good project ◮ Present the SubseasonalRodeo Dataset

Paulo Orenstein Improving Subseasonal Forecasting Stanford University 2 / 27

slide-5
SLIDE 5

Introduction Forecast Rodeo Dataset Models Results Conclusion

Goals

◮ Bring awareness to subseasonal forecasting, an important problem for water man-

agement and weather extremes

◮ Introduce an example of a crowdsourced, social good project ◮ Present the SubseasonalRodeo Dataset ◮ Discuss effective machine learning methods for the problem

Paulo Orenstein Improving Subseasonal Forecasting Stanford University 2 / 27

slide-6
SLIDE 6

Introduction Forecast Rodeo Dataset Models Results Conclusion

Goals

◮ Bring awareness to subseasonal forecasting, an important problem for water man-

agement and weather extremes

◮ Introduce an example of a crowdsourced, social good project ◮ Present the SubseasonalRodeo Dataset ◮ Discuss effective machine learning methods for the problem

multitask model selection weighted locally linear regression ensembling

Paulo Orenstein Improving Subseasonal Forecasting Stanford University 2 / 27

slide-7
SLIDE 7

Introduction Forecast Rodeo Dataset Models Results Conclusion

Goals

◮ Bring awareness to subseasonal forecasting, an important problem for water man-

agement and weather extremes

◮ Introduce an example of a crowdsourced, social good project ◮ Present the SubseasonalRodeo Dataset ◮ Discuss effective machine learning methods for the problem

multitask model selection weighted locally linear regression ensembling

◮ Encourage you to improve on our results!

Paulo Orenstein Improving Subseasonal Forecasting Stanford University 2 / 27

slide-8
SLIDE 8

Introduction Forecast Rodeo Dataset Models Results Conclusion

Motivation

◮ Long-term weather prediction (> 2 months): hopeless, use historical climate

Paulo Orenstein Improving Subseasonal Forecasting Stanford University 3 / 27

slide-9
SLIDE 9

Introduction Forecast Rodeo Dataset Models Results Conclusion

Motivation

◮ Long-term weather prediction (> 2 months): hopeless, use historical climate ◮ Short-term weather prediction (< 2 weeks): accurate predictions possible using

physics-based models

Paulo Orenstein Improving Subseasonal Forecasting Stanford University 3 / 27

slide-10
SLIDE 10

Introduction Forecast Rodeo Dataset Models Results Conclusion

Motivation

◮ Long-term weather prediction (> 2 months): hopeless, use historical climate ◮ Short-term weather prediction (< 2 weeks): accurate predictions possible using

physics-based models

◮ Medium-term (subseasonal) weather prediction: physics-based models are no longer

accurate

Paulo Orenstein Improving Subseasonal Forecasting Stanford University 3 / 27

slide-11
SLIDE 11

Introduction Forecast Rodeo Dataset Models Results Conclusion

Motivation

◮ Long-term weather prediction (> 2 months): hopeless, use historical climate ◮ Short-term weather prediction (< 2 weeks): accurate predictions possible using

physics-based models

◮ Medium-term (subseasonal) weather prediction: physics-based models are no longer

accurate

◮ Subseasonal forecasts are important

Paulo Orenstein Improving Subseasonal Forecasting Stanford University 3 / 27

slide-12
SLIDE 12

Introduction Forecast Rodeo Dataset Models Results Conclusion

Motivation

◮ Long-term weather prediction (> 2 months): hopeless, use historical climate ◮ Short-term weather prediction (< 2 weeks): accurate predictions possible using

physics-based models

◮ Medium-term (subseasonal) weather prediction: physics-based models are no longer

accurate

◮ Subseasonal forecasts are important

allocate water resources manage wildfires prepare for droughts, floods and other weather extremes crop planting, irrigation scheduling, and fertilizer application

Paulo Orenstein Improving Subseasonal Forecasting Stanford University 3 / 27

slide-13
SLIDE 13

Introduction Forecast Rodeo Dataset Models Results Conclusion

Motivation

◮ Long-term weather prediction (> 2 months): hopeless, use historical climate ◮ Short-term weather prediction (< 2 weeks): accurate predictions possible using

physics-based models

◮ Medium-term (subseasonal) weather prediction: physics-based models are no longer

accurate

◮ Subseasonal forecasts are important

allocate water resources manage wildfires prepare for droughts, floods and other weather extremes crop planting, irrigation scheduling, and fertilizer application

◮ Can statistical/ML/non-physics models extend the forecast horizon beyond short-

term prediction?

Paulo Orenstein Improving Subseasonal Forecasting Stanford University 3 / 27

slide-14
SLIDE 14

Introduction Forecast Rodeo Dataset Models Results Conclusion

“During the past eight years, every state in the Western United States has experienced drought that has affected the economy both locally and nationally through impacts to agricultural production, water supply, and energy.” David Raff, USBR

Paulo Orenstein Improving Subseasonal Forecasting Stanford University 4 / 27

slide-15
SLIDE 15

Introduction Forecast Rodeo Dataset Models Results Conclusion

Forecasting systems in use now

◮ CFSv2 (Climate Forecasting System, version 2): operational forecasting system for

the US, physics-based model representing “coupled atmosphere-ocean-land surface- sea ice system”

Paulo Orenstein Improving Subseasonal Forecasting Stanford University 5 / 27

slide-16
SLIDE 16

Introduction Forecast Rodeo Dataset Models Results Conclusion

Forecasting systems in use now

◮ CFSv2 (Climate Forecasting System, version 2): operational forecasting system for

the US, physics-based model representing “coupled atmosphere-ocean-land surface- sea ice system”

◮ NMME (North American Model Ensemble): ensemble of CFSv2 and about 10 other

physics-based models from various North American modeling centers

Paulo Orenstein Improving Subseasonal Forecasting Stanford University 5 / 27

slide-17
SLIDE 17

Introduction Forecast Rodeo Dataset Models Results Conclusion

Forecasting systems in use now

◮ CFSv2 (Climate Forecasting System, version 2): operational forecasting system for

the US, physics-based model representing “coupled atmosphere-ocean-land surface- sea ice system”

◮ NMME (North American Model Ensemble): ensemble of CFSv2 and about 10 other

physics-based models from various North American modeling centers

◮ Both are examples of Numerical Weather Prediction models

Paulo Orenstein Improving Subseasonal Forecasting Stanford University 5 / 27

slide-18
SLIDE 18

Introduction Forecast Rodeo Dataset Models Results Conclusion

Forecasting systems in use now

◮ CFSv2 (Climate Forecasting System, version 2): operational forecasting system for

the US, physics-based model representing “coupled atmosphere-ocean-land surface- sea ice system”

◮ NMME (North American Model Ensemble): ensemble of CFSv2 and about 10 other

physics-based models from various North American modeling centers

◮ Both are examples of Numerical Weather Prediction models

simulate future weather using partial differential equations and supercomputers initialized many times with current weather conditions; use the average of predictions initial error doubles every 5 days

Paulo Orenstein Improving Subseasonal Forecasting Stanford University 5 / 27

slide-19
SLIDE 19

Introduction Forecast Rodeo Dataset Models Results Conclusion

Forecasting systems in use now

◮ CFSv2 (Climate Forecasting System, version 2): operational forecasting system for

the US, physics-based model representing “coupled atmosphere-ocean-land surface- sea ice system”

◮ NMME (North American Model Ensemble): ensemble of CFSv2 and about 10 other

physics-based models from various North American modeling centers

◮ Both are examples of Numerical Weather Prediction models

simulate future weather using partial differential equations and supercomputers initialized many times with current weather conditions; use the average of predictions initial error doubles every 5 days

◮ Can we do better?

Paulo Orenstein Improving Subseasonal Forecasting Stanford University 5 / 27

slide-20
SLIDE 20

Introduction Forecast Rodeo Dataset Models Results Conclusion

Subseasonal Climate Forecast Rodeo

◮ Year-long, real-time forecasting competition sponsored by US Bureau of Reclama-

tion and NOAA

Paulo Orenstein Improving Subseasonal Forecasting Stanford University 6 / 27

slide-21
SLIDE 21

Introduction Forecast Rodeo Dataset Models Results Conclusion

Subseasonal Climate Forecast Rodeo

◮ Year-long, real-time forecasting competition sponsored by US Bureau of Reclama-

tion and NOAA

◮ Four categories

Paulo Orenstein Improving Subseasonal Forecasting Stanford University 6 / 27

slide-22
SLIDE 22

Introduction Forecast Rodeo Dataset Models Results Conclusion

Subseasonal Climate Forecast Rodeo

◮ Year-long, real-time forecasting competition sponsored by US Bureau of Reclama-

tion and NOAA

◮ Four categories

two variables: two-week average temperature and two-week accumulated precipitation two forecasting horizons: 3-4 weeks out and 5-6 weeks out

Paulo Orenstein Improving Subseasonal Forecasting Stanford University 6 / 27

slide-23
SLIDE 23

Introduction Forecast Rodeo Dataset Models Results Conclusion

Subseasonal Climate Forecast Rodeo

◮ Year-long, real-time forecasting competition sponsored by US Bureau of Reclama-

tion and NOAA

◮ Four categories

two variables: two-week average temperature and two-week accumulated precipitation two forecasting horizons: 3-4 weeks out and 5-6 weeks out

◮ Submission frequency: every two weeks

Paulo Orenstein Improving Subseasonal Forecasting Stanford University 6 / 27

slide-24
SLIDE 24

Introduction Forecast Rodeo Dataset Models Results Conclusion

Subseasonal Climate Forecast Rodeo

◮ Year-long, real-time forecasting competition sponsored by US Bureau of Reclama-

tion and NOAA

◮ Four categories

two variables: two-week average temperature and two-week accumulated precipitation two forecasting horizons: 3-4 weeks out and 5-6 weeks out

◮ Submission frequency: every two weeks

first submission: April 18, 2017 last submission: April 3, 2018

Paulo Orenstein Improving Subseasonal Forecasting Stanford University 6 / 27

slide-25
SLIDE 25

Introduction Forecast Rodeo Dataset Models Results Conclusion

Subseasonal Climate Forecast Rodeo

◮ Year-long, real-time forecasting competition sponsored by US Bureau of Reclama-

tion and NOAA

◮ Four categories

two variables: two-week average temperature and two-week accumulated precipitation two forecasting horizons: 3-4 weeks out and 5-6 weeks out

◮ Submission frequency: every two weeks

first submission: April 18, 2017 last submission: April 3, 2018

◮ Region: 17 states in western US, G = 514 grid points

Paulo Orenstein Improving Subseasonal Forecasting Stanford University 6 / 27

slide-26
SLIDE 26

Introduction Forecast Rodeo Dataset Models Results Conclusion

Forecast Rodeo region

Paulo Orenstein Improving Subseasonal Forecasting Stanford University 7 / 27

slide-27
SLIDE 27

Introduction Forecast Rodeo Dataset Models Results Conclusion

Contest scoring/objective

◮ For the two-week period beginning on t

Paulo Orenstein Improving Subseasonal Forecasting Stanford University 8 / 27

slide-28
SLIDE 28

Introduction Forecast Rodeo Dataset Models Results Conclusion

Contest scoring/objective

◮ For the two-week period beginning on t

  • bserved average temperature or total precipitation: yt ∈ RG

climatology for a month-day combination d: cd = 1 30

  • t : monthday(t)=d,

1981≤year(t)≤2010

yt the long-term average over 1981-2010 for the month-day d

  • bserved anomaly: at = yt − cmonthday(t)

Paulo Orenstein Improving Subseasonal Forecasting Stanford University 8 / 27

slide-29
SLIDE 29

Introduction Forecast Rodeo Dataset Models Results Conclusion

Contest scoring/objective

◮ For the two-week period beginning on t

  • bserved average temperature or total precipitation: yt ∈ RG

climatology for a month-day combination d: cd = 1 30

  • t : monthday(t)=d,

1981≤year(t)≤2010

yt the long-term average over 1981-2010 for the month-day d

  • bserved anomaly: at = yt − cmonthday(t)

◮ Given a forecast ˆ

yt, or equivalently, forecast anomalies ˆ at = ˆ yt − cmonthday(t), the cosine similarity or skill is skill(ˆ at, at) = cos(ˆ at, at) = ˆ at, at ˆ at2at2 Highest average skill over the contest period = winner

Paulo Orenstein Improving Subseasonal Forecasting Stanford University 8 / 27

slide-30
SLIDE 30

Introduction Forecast Rodeo Dataset Models Results Conclusion

Contest scoring/objective

◮ For the two-week period beginning on t

  • bserved average temperature or total precipitation: yt ∈ RG

climatology for a month-day combination d: cd = 1 30

  • t : monthday(t)=d,

1981≤year(t)≤2010

yt the long-term average over 1981-2010 for the month-day d

  • bserved anomaly: at = yt − cmonthday(t)

◮ Given a forecast ˆ

yt, or equivalently, forecast anomalies ˆ at = ˆ yt − cmonthday(t), the cosine similarity or skill is skill(ˆ at, at) = cos(ˆ at, at) = ˆ at, at ˆ at2at2 Highest average skill over the contest period = winner

◮ Benchmarks: debiased CFSv2 and “damped persistence”

Paulo Orenstein Improving Subseasonal Forecasting Stanford University 8 / 27

slide-31
SLIDE 31

Introduction Forecast Rodeo Dataset Models Results Conclusion Paulo Orenstein Improving Subseasonal Forecasting Stanford University 9 / 27

slide-32
SLIDE 32

Introduction Forecast Rodeo Dataset Models Results Conclusion Paulo Orenstein Improving Subseasonal Forecasting Stanford University 10 / 27

slide-33
SLIDE 33

Introduction Forecast Rodeo Dataset Models Results Conclusion

Our dataset

◮ No data provided!

Paulo Orenstein Improving Subseasonal Forecasting Stanford University 11 / 27

slide-34
SLIDE 34

Introduction Forecast Rodeo Dataset Models Results Conclusion

Our dataset

◮ No data provided! ◮ Gathered historical data on various weather variables, 1980 to present

Paulo Orenstein Improving Subseasonal Forecasting Stanford University 11 / 27

slide-35
SLIDE 35

Introduction Forecast Rodeo Dataset Models Results Conclusion

Our dataset

◮ No data provided! ◮ Gathered historical data on various weather variables, 1980 to present

temperature precipitation sea surface temperature sea ice concentration multivariate El Niño / Southern Oscillation index Madden-Julian oscillation relative humidity pressure geopotential height historical NMME forecasts

Paulo Orenstein Improving Subseasonal Forecasting Stanford University 11 / 27

slide-36
SLIDE 36

Introduction Forecast Rodeo Dataset Models Results Conclusion

Our dataset

◮ No data provided! ◮ Gathered historical data on various weather variables, 1980 to present

temperature precipitation sea surface temperature sea ice concentration multivariate El Niño / Southern Oscillation index Madden-Julian oscillation relative humidity pressure geopotential height historical NMME forecasts

◮ Some are spatiotemporal, some are only temporal

Paulo Orenstein Improving Subseasonal Forecasting Stanford University 11 / 27

slide-37
SLIDE 37

Introduction Forecast Rodeo Dataset Models Results Conclusion

Our dataset

◮ This was an enormous amount of work (seriously)

Paulo Orenstein Improving Subseasonal Forecasting Stanford University 12 / 27

slide-38
SLIDE 38

Introduction Forecast Rodeo Dataset Models Results Conclusion

Our dataset

◮ This was an enormous amount of work (seriously) ◮ Postprocessing and transformation

Paulo Orenstein Improving Subseasonal Forecasting Stanford University 12 / 27

slide-39
SLIDE 39

Introduction Forecast Rodeo Dataset Models Results Conclusion

Our dataset

◮ This was an enormous amount of work (seriously) ◮ Postprocessing and transformation

aggregated to two-week averages or sums PCAed some variables to reduce dimensionality down to 3 principal components chose 1-3 fixed lags for each variable according to forecast horizon, data availability

Paulo Orenstein Improving Subseasonal Forecasting Stanford University 12 / 27

slide-40
SLIDE 40

Introduction Forecast Rodeo Dataset Models Results Conclusion

Our dataset

◮ This was an enormous amount of work (seriously) ◮ Postprocessing and transformation

aggregated to two-week averages or sums PCAed some variables to reduce dimensionality down to 3 principal components chose 1-3 fixed lags for each variable according to forecast horizon, data availability

◮ Data processing challenges: weird data coding, weird data formats, huge data,

inconsistent/untimely data updates, real-time processing

Paulo Orenstein Improving Subseasonal Forecasting Stanford University 12 / 27

slide-41
SLIDE 41

Introduction Forecast Rodeo Dataset Models Results Conclusion

Our dataset

◮ This was an enormous amount of work (seriously) ◮ Postprocessing and transformation

aggregated to two-week averages or sums PCAed some variables to reduce dimensionality down to 3 principal components chose 1-3 fixed lags for each variable according to forecast horizon, data availability

◮ Data processing challenges: weird data coding, weird data formats, huge data,

inconsistent/untimely data updates, real-time processing

◮ Statistical challenges: spatiotemporal data, correlated predictors, complex depen-

dence structure (careful holdout for cross-validation), non-standard loss function

Paulo Orenstein Improving Subseasonal Forecasting Stanford University 12 / 27

slide-42
SLIDE 42

Introduction Forecast Rodeo Dataset Models Results Conclusion

Our dataset

◮ This was an enormous amount of work (seriously) ◮ Postprocessing and transformation

aggregated to two-week averages or sums PCAed some variables to reduce dimensionality down to 3 principal components chose 1-3 fixed lags for each variable according to forecast horizon, data availability

◮ Data processing challenges: weird data coding, weird data formats, huge data,

inconsistent/untimely data updates, real-time processing

◮ Statistical challenges: spatiotemporal data, correlated predictors, complex depen-

dence structure (careful holdout for cross-validation), non-standard loss function

◮ SubseasonalRodeo Dataset: https://doi.org/10.7910/DVN/IHBANG

Paulo Orenstein Improving Subseasonal Forecasting Stanford University 12 / 27

slide-43
SLIDE 43

Introduction Forecast Rodeo Dataset Models Results Conclusion

Data matrix

                lat lon date rhum_shift30 pres_shift30 . . . target 47 260 1979-02-09 86.539415 96061.320731 . . .

  • 18.464830

47 261 1979-02-09 89.957313 96419.183454 . . .

  • 18.329887

47 262 1979-02-09 92.553695 97493.990932 . . .

  • 18.289105

48 236 1979-02-09 93.731037 97277.973493 . . . 2.575200 . . . . . . . . . . . . . . . . . .                

106×30

Paulo Orenstein Improving Subseasonal Forecasting Stanford University 13 / 27

slide-44
SLIDE 44

Introduction Forecast Rodeo Dataset Models Results Conclusion

Our models

◮ Two regression models

Paulo Orenstein Improving Subseasonal Forecasting Stanford University 14 / 27

slide-45
SLIDE 45

Introduction Forecast Rodeo Dataset Models Results Conclusion

Our models

◮ Two regression models

MultiLLR (local linear regression with multitask model selection): uses lagged predic- tors based on all weather variables, chosen using multitask model selection tailored to the cosine similarity objective AutoKNN (k-nearest-neighbors autoregression): uses lagged temperature or precipi- tation only, and a skill-specific form of nearest neighbor modeling

Paulo Orenstein Improving Subseasonal Forecasting Stanford University 14 / 27

slide-46
SLIDE 46

Introduction Forecast Rodeo Dataset Models Results Conclusion

Our models

◮ Two regression models

MultiLLR (local linear regression with multitask model selection): uses lagged predic- tors based on all weather variables, chosen using multitask model selection tailored to the cosine similarity objective AutoKNN (k-nearest-neighbors autoregression): uses lagged temperature or precipi- tation only, and a skill-specific form of nearest neighbor modeling

◮ Ensemble of the two models performs better than either model individually

Paulo Orenstein Improving Subseasonal Forecasting Stanford University 14 / 27

slide-47
SLIDE 47

Introduction Forecast Rodeo Dataset Models Results Conclusion

MultiLLR: Local linear regression with multitask model selection

◮ Subset the training data to dates within 8 weeks of the target date to be predicted

Paulo Orenstein Improving Subseasonal Forecasting Stanford University 15 / 27

slide-48
SLIDE 48

Introduction Forecast Rodeo Dataset Models Results Conclusion

MultiLLR: Local linear regression with multitask model selection

◮ Subset the training data to dates within 8 weeks of the target date to be predicted

This is the “local” part

Paulo Orenstein Improving Subseasonal Forecasting Stanford University 15 / 27

slide-49
SLIDE 49

Introduction Forecast Rodeo Dataset Models Results Conclusion

MultiLLR: Local linear regression with multitask model selection

◮ Subset the training data to dates within 8 weeks of the target date to be predicted

This is the “local” part

◮ Using backward stepwise selection with linear regression subroutine, choose a com-

mon set of relevant predictors for all grid points

Paulo Orenstein Improving Subseasonal Forecasting Stanford University 15 / 27

slide-50
SLIDE 50

Introduction Forecast Rodeo Dataset Models Results Conclusion

MultiLLR: Local linear regression with multitask model selection

◮ Subset the training data to dates within 8 weeks of the target date to be predicted

This is the “local” part

◮ Using backward stepwise selection with linear regression subroutine, choose a com-

mon set of relevant predictors for all grid points

This is the “linear regression with multitask model selection” part Don’t expect all features to be relevant at all times of year

Paulo Orenstein Improving Subseasonal Forecasting Stanford University 15 / 27

slide-51
SLIDE 51

Introduction Forecast Rodeo Dataset Models Results Conclusion

MultiLLR: Local linear regression with multitask model selection

◮ Subset the training data to dates within 8 weeks of the target date to be predicted

This is the “local” part

◮ Using backward stepwise selection with linear regression subroutine, choose a com-

mon set of relevant predictors for all grid points

This is the “linear regression with multitask model selection” part Don’t expect all features to be relevant at all times of year

◮ Backward stepwise has to be customized

Paulo Orenstein Improving Subseasonal Forecasting Stanford University 15 / 27

slide-52
SLIDE 52

Introduction Forecast Rodeo Dataset Models Results Conclusion

MultiLLR: Local linear regression with multitask model selection

◮ Subset the training data to dates within 8 weeks of the target date to be predicted

This is the “local” part

◮ Using backward stepwise selection with linear regression subroutine, choose a com-

mon set of relevant predictors for all grid points

This is the “linear regression with multitask model selection” part Don’t expect all features to be relevant at all times of year

◮ Backward stepwise has to be customized

At each step, remove variable that decreases predictive performance the least Predictive performance is the leave-one-year-out cross-validated cosine similarity on the target date’s day of year, averaged across all historical years To properly leave one year out around t, need to hold out from 4 weeks before t to 48 weeks after t

Paulo Orenstein Improving Subseasonal Forecasting Stanford University 15 / 27

slide-53
SLIDE 53

Introduction Forecast Rodeo Dataset Models Results Conclusion

Inclusion frequencies of candidate variables

precip_shift86 phase_shift31 precip_shift86_anom rhum_shift44 nmme0_wo_ccsm3_nasa tmp2m_shift86_anom icec_2_shift44 mei_shift59 sst_2_shift44 precip_shift43_anom wind_hgt_10_1_shift44 wind_hgt_10_2_shift44 precip_shift43 tmp2m_shift43_anom icec_1_shift44 icec_3_shift44 sst_1_shift44 sst_3_shift44 tmp2m_shift43 tmp2m_shift86

  • nes

nmme_wo_ccsm3_nasa pres_shift44 20 40 60 80

inclusion frequency

precipitation, weeks 5-6

phase_shift31 wind_hgt_10_1_shift44 icec_3_shift44 icec_2_shift44 mei_shift59 wind_hgt_10_2_shift44 tmp2m_shift43_anom rhum_shift44 sst_2_shift44 icec_1_shift44 sst_1_shift44 nmme0_wo_ccsm3_nasa tmp2m_shift43 sst_3_shift44 nmme_wo_ccsm3_nasa tmp2m_shift86 tmp2m_shift86_anom pres_shift44

  • nes

30 60 90 120

inclusion frequency

temperature, weeks 5-6

Paulo Orenstein Improving Subseasonal Forecasting Stanford University 16 / 27

slide-54
SLIDE 54

Introduction Forecast Rodeo Dataset Models Results Conclusion

AutoKNN: Multitask k-nearest-neighbor autoregression

◮ For each target date t, find the 20 most similar historical dates by looking at cosine

similarity between anomaly trajectory in the 60 days leading up to t and leading up to each historical date

Paulo Orenstein Improving Subseasonal Forecasting Stanford University 17 / 27

slide-55
SLIDE 55

Introduction Forecast Rodeo Dataset Models Results Conclusion

AutoKNN: Multitask k-nearest-neighbor autoregression

◮ For each target date t, find the 20 most similar historical dates by looking at cosine

similarity between anomaly trajectory in the 60 days leading up to t and leading up to each historical date

◮ Call the anomalies of the 20 most similar historical dates knn1 through knn20

Paulo Orenstein Improving Subseasonal Forecasting Stanford University 17 / 27

slide-56
SLIDE 56

Introduction Forecast Rodeo Dataset Models Results Conclusion

AutoKNN: Multitask k-nearest-neighbor autoregression

◮ For each target date t, find the 20 most similar historical dates by looking at cosine

similarity between anomaly trajectory in the 60 days leading up to t and leading up to each historical date

◮ Call the anomalies of the 20 most similar historical dates knn1 through knn20 ◮ Perform weighted local linear regression using knn1 through knn20 and fixed lagged

measurements of temperature or precipitation to predict future anomaly

Paulo Orenstein Improving Subseasonal Forecasting Stanford University 17 / 27

slide-57
SLIDE 57

Introduction Forecast Rodeo Dataset Models Results Conclusion

Learned nearest neighbors

Precipitation Temperature

nbr Dec nbr Nov nbr Oct nbr Sep nbr Aug nbr Jul n b r J u n nbr May n b r A p r nbr Mar nbr Feb nbr Jan target Jan target Feb target Mar t a r g e t A p r t a r g e t M a y target Jun target Jul t a r g e t A u g target Sep target Oct target Nov target Dec n b r D e c nbr Nov n b r O c t nbr Sep n b r A u g nbr Jul nbr Jun nbr May nbr Apr nbr Mar n b r F e b nbr Jan target Jan target Feb target Mar t a r g e t A p r t a r g e t M a y target Jun target Jul t a r g e t A u g target Sep target Oct target Nov target Dec

Paulo Orenstein Improving Subseasonal Forecasting Stanford University 18 / 27

slide-58
SLIDE 58

Introduction Forecast Rodeo Dataset Models Results Conclusion

Learned nearest neighbors

5 10 15 20 11−Mar 11−Jun 11−Sep 11−Dec 12−Mar 12−Jun 12−Sep 12−Dec 13−Mar 13−Jun 13−Sep 13−Dec 14−Mar 14−Jun 14−Sep 14−Dec 15−Mar 15−Jun 15−Sep 15−Dec 16−Mar 16−Jun 16−Sep 16−Dec 17−Mar 17−Jun 17−Sep 17−Dec 18−Mar 18−Jun

target date neighbor rank neighbor_month

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

temperature, weeks 3−4

Paulo Orenstein Improving Subseasonal Forecasting Stanford University 19 / 27

slide-59
SLIDE 59

Introduction Forecast Rodeo Dataset Models Results Conclusion

Learned nearest neighbors

5 10 15 20 11−Mar 11−Jun 11−Sep 11−Dec 12−Mar 12−Jun 12−Sep 12−Dec 13−Mar 13−Jun 13−Sep 13−Dec 14−Mar 14−Jun 14−Sep 14−Dec 15−Mar 15−Jun 15−Sep 15−Dec 16−Mar 16−Jun 16−Sep 16−Dec 17−Mar 17−Jun 17−Sep 17−Dec 18−Mar 18−Jun

target date neighbor rank

1980 1990 2000 2010

neighbor_year

temperature, weeks 3−4

Paulo Orenstein Improving Subseasonal Forecasting Stanford University 20 / 27

slide-60
SLIDE 60

Introduction Forecast Rodeo Dataset Models Results Conclusion

Ensemble of the two models

◮ We ensemble by averaging the ℓ2-normalized forecasted anomalies:

ˆ aensemble = 1 2 ˆ aLLR ˆ aLLR2 + 1 2 ˆ aKNN ˆ aKNN2

Paulo Orenstein Improving Subseasonal Forecasting Stanford University 21 / 27

slide-61
SLIDE 61

Introduction Forecast Rodeo Dataset Models Results Conclusion

Ensemble of the two models

◮ We ensemble by averaging the ℓ2-normalized forecasted anomalies:

ˆ aensemble = 1 2 ˆ aLLR ˆ aLLR2 + 1 2 ˆ aKNN ˆ aKNN2

◮ This is guaranteed to improve the average skill, thanks to. . .

Paulo Orenstein Improving Subseasonal Forecasting Stanford University 21 / 27

slide-62
SLIDE 62

Introduction Forecast Rodeo Dataset Models Results Conclusion

Ensemble of the two models

◮ We ensemble by averaging the ℓ2-normalized forecasted anomalies:

ˆ aensemble = 1 2 ˆ aLLR ˆ aLLR2 + 1 2 ˆ aKNN ˆ aKNN2

◮ This is guaranteed to improve the average skill, thanks to. . .

Proposition

Consider an observed anomaly vector a and m distinct forecast anomaly vectors (ˆ ai)m

i=1. For any vector of weights p ∈ Rm with m i=1 pi = 1 and pi ≥ 0, let

¯ a(p) =

m

  • i=1

pi ˆ ai ˆ ai be the weighted average of the ℓ2-normalized forecast anomalies. Then,

  • m
  • i=1

pi cos(ˆ ai, a)

  • ≤ | cos(¯

a(p), a)|.

Paulo Orenstein Improving Subseasonal Forecasting Stanford University 21 / 27

slide-63
SLIDE 63

Introduction Forecast Rodeo Dataset Models Results Conclusion

Results

◮ In the contest period (2017-2018), our models beat both of the contest baselines

(and the top competitor) by a lot

unfortunately, only arrived at these model late in the competition

Paulo Orenstein Improving Subseasonal Forecasting Stanford University 22 / 27

slide-64
SLIDE 64

Introduction Forecast Rodeo Dataset Models Results Conclusion

Results

◮ In the contest period (2017-2018), our models beat both of the contest baselines

(and the top competitor) by a lot

unfortunately, only arrived at these model late in the competition

◮ In a historical evaluation period (2011-2017), our models beat a reconstructed

baseline by a lot

Paulo Orenstein Improving Subseasonal Forecasting Stanford University 22 / 27

slide-65
SLIDE 65

Introduction Forecast Rodeo Dataset Models Results Conclusion

Results

◮ In the contest period (2017-2018), our models beat both of the contest baselines

(and the top competitor) by a lot

unfortunately, only arrived at these model late in the competition

◮ In a historical evaluation period (2011-2017), our models beat a reconstructed

baseline by a lot

◮ Ensembling the two models helps significantly

Paulo Orenstein Improving Subseasonal Forecasting Stanford University 22 / 27

slide-66
SLIDE 66

Introduction Forecast Rodeo Dataset Models Results Conclusion

Contest period, 2017-2018

task LLR KNN ensemble cfsv2 damped top competitor temp, weeks 3-4 0.2856 0.2807 0.3414 0.1589 0.1952 0.2855 temp, weeks 5-6 0.2371 0.2817 0.3077 0.2192

  • 0.0762

0.2357 precip, weeks 3-4 0.1675 0.2156 0.2388 0.0713

  • 0.1463

0.2144 precip, weeks 5-6 0.2219 0.1870 0.2412 0.0227

  • 0.1613

0.2162

Paulo Orenstein Improving Subseasonal Forecasting Stanford University 23 / 27

slide-67
SLIDE 67

Introduction Forecast Rodeo Dataset Models Results Conclusion

Contest period, 2017-2018

temperature, weeks 3-4 temperature, weeks 5-6 precipitation, weeks 3-4 precipitation, weeks 5-6 multillr autoknn ensemble debiased cfsv2 damped

  • 0.5

0.0 0.5 1.0

  • 0.5

0.0 0.5 1.0

  • 0.5

0.0 0.5 1.0

  • 0.5

0.0 0.5 1.0 0.0 2.5 5.0 7.5 0.0 2.5 5.0 7.5 0.0 2.5 5.0 7.5 0.0 2.5 5.0 7.5 0.0 2.5 5.0 7.5

skill count Paulo Orenstein Improving Subseasonal Forecasting Stanford University 24 / 27

slide-68
SLIDE 68

Introduction Forecast Rodeo Dataset Models Results Conclusion

Historical evaluation period, 2011-2017

task LLR KNN ensemble cfsv2 ens-cfsv2 temp, weeks 3-4 0.2230 0.3111 0.3073 0.2557 0.3508 temp, weeks 5-6 0.2204 0.2810 0.2962 0.2142 0.3279 precip, weeks 3-4 0.1573 0.1513 0.1893 0.0860 0.1964 precip, weeks 5-6 0.1312 0.1403 0.1703 0.0691 0.1755

Paulo Orenstein Improving Subseasonal Forecasting Stanford University 25 / 27

slide-69
SLIDE 69

Introduction Forecast Rodeo Dataset Models Results Conclusion

Historical evaluation period, 2011-2017

temperature 3-4 temperature 5-6 precipitation 3-4 precipitation 5-6 2012 2014 2016 2012 2014 2016 0.0 0.1 0.2 0.3 0.4 0.0 0.1 0.2 0.3 0.4

year skill method

ensemble cfs ensemble-cfs

Paulo Orenstein Improving Subseasonal Forecasting Stanford University 26 / 27

slide-70
SLIDE 70

Introduction Forecast Rodeo Dataset Models Results Conclusion

Conclusion

◮ Subseasonal Forecasting is an important and largely unsolved problem in weather

prediction

Paulo Orenstein Improving Subseasonal Forecasting Stanford University 27 / 27

slide-71
SLIDE 71

Introduction Forecast Rodeo Dataset Models Results Conclusion

Conclusion

◮ Subseasonal Forecasting is an important and largely unsolved problem in weather

prediction

◮ Simple statistical models can significantly improve subseasonal forecasting com-

pared to physics-based models

Paulo Orenstein Improving Subseasonal Forecasting Stanford University 27 / 27

slide-72
SLIDE 72

Introduction Forecast Rodeo Dataset Models Results Conclusion

Conclusion

◮ Subseasonal Forecasting is an important and largely unsolved problem in weather

prediction

◮ Simple statistical models can significantly improve subseasonal forecasting com-

pared to physics-based models

◮ Ensembling statistical and physics-based forecasts produce further improvements

Paulo Orenstein Improving Subseasonal Forecasting Stanford University 27 / 27

slide-73
SLIDE 73

Introduction Forecast Rodeo Dataset Models Results Conclusion

Conclusion

◮ Subseasonal Forecasting is an important and largely unsolved problem in weather

prediction

◮ Simple statistical models can significantly improve subseasonal forecasting com-

pared to physics-based models

◮ Ensembling statistical and physics-based forecasts produce further improvements ◮ We’ve released the dataset on Dataverse (https://doi.org/10.7910/DVN/IHBANG)

and our paper on arXiv (https://arxiv.org/abs/1809.07394)

Paulo Orenstein Improving Subseasonal Forecasting Stanford University 27 / 27

slide-74
SLIDE 74

Introduction Forecast Rodeo Dataset Models Results Conclusion

Conclusion

◮ Subseasonal Forecasting is an important and largely unsolved problem in weather

prediction

◮ Simple statistical models can significantly improve subseasonal forecasting com-

pared to physics-based models

◮ Ensembling statistical and physics-based forecasts produce further improvements ◮ We’ve released the dataset on Dataverse (https://doi.org/10.7910/DVN/IHBANG)

and our paper on arXiv (https://arxiv.org/abs/1809.07394)

◮ More sophisticated modeling approaches can almost certainly do even better. Try

your own!

Paulo Orenstein Improving Subseasonal Forecasting Stanford University 27 / 27