Machine learning techniques in predicting uncertainty of - - PowerPoint PPT Presentation

machine learning techniques in predicting uncertainty of
SMART_READER_LITE
LIVE PREVIEW

Machine learning techniques in predicting uncertainty of - - PowerPoint PPT Presentation

Machine learning techniques in predicting uncertainty of environmental models Dimitri Solomatine Professor of Hydroinformatics, IHE Delft Institute for Water Education Delft, The Netherlands 1 Outline Introduction: what are analysisng?


slide-1
SLIDE 1

Machine learning techniques in predicting uncertainty

  • f environmental models

Dimitri Solomatine

Professor of Hydroinformatics, IHE Delft Institute for Water Education Delft, The Netherlands

1

slide-2
SLIDE 2

Outline

 Introduction: what are analysisng?  Machine learning methods to (a) analyse and (b) predict

the model uncertainty

 Suggested approach: “escalation” of uncertainty  Examples

2

D.P. Solomatine. Escalation of uncertainty.

slide-3
SLIDE 3

Example for a quick start: deterministic forecasts and 90% uncertainty bounds

900 920 940 960 980 1000 1020 500 1000 1500 2000 2500 3000 3500 4000 Time(days) Discharge(m3/s)

3

D.P. Solomatine. Escalation of uncertainty.

slide-4
SLIDE 4

Sources of model uncertainty: perceptual, structure, parameters, data

D.P. Solomatine. Escalation of uncertainty.

4

LZ UZ SM RF R PERC EA Q=Q0+Q1 Q1 Transform function SP Q0 SF CFLUX IN SF – Snow RF – Rain EA – Evapotranspiration SP – Snow cover IN – Infiltration R – Recharge SM – Soil moisture CFLUX – Capillary transport UZ – Storage in upper reservoir PERC – Percolation LZ – Storage in lower reservoir Qo – Fast runoff component Q1 – Slow runoff component Q – Total runoff LZ UZ SM RF RF R PERC PERC EA EA Q=Q0+Q1 Q1 Q1 Transform function SP Q0 Q0 SF SF CFLUX CFLUX IN IN SF – Snow RF – Rain EA – Evapotranspiration SP – Snow cover IN – Infiltration R – Recharge SM – Soil moisture CFLUX – Capillary transport UZ – Storage in upper reservoir PERC – Percolation LZ – Storage in lower reservoir Qo – Fast runoff component Q1 – Slow runoff component Q – Total runoff

y = M(x, p) + εs + εθ + εx + εy

slide-5
SLIDE 5

Traditional steps in uncertainty analysis

  • f a calibrated model

 Identification of sources of uncertainty (input, parameter,

model structure)

 Quantification of uncertainty (e.g. as distribution)  Studying propagation of uncertainty through the model

(e.g. by Monte Carlo simulation)

 Quantification of uncertainty in the model outputs (i.e.

identification of output distribution (pdf) or its characteristics – mean, st.dev., quantiles)

 If possible, reduction of uncertainty (e.g. model

improvement, more accurate measurements, etc.)

 Application of the uncertain information in decision

making process

5

D.P. Solomatine. Escalation of uncertainty.

slide-6
SLIDE 6

Data uncertainty (input, parameters): propagation of uncertainty through the model

 y^ = M (x, p)  x = input, p = parameters

 Uncertainty in X and p propagates to output y  pdf of parameters  pdf of output pdfp  pdfy  pdf of inputs pdfx  pdf of output pdfx  pdfy

6

D.P. Solomatine. Escalation of uncertainty.

slide-7
SLIDE 7

Monte Carlo Simulation

7

slide-8
SLIDE 8

Mote Carlo casino: roulette wheel

It is a random number generator – uses uniform distribution with the range of [0, 36]

8

D.P. Solomatine. Escalation of uncertainty.

slide-9
SLIDE 9

Single model run (no uncertainty)

9

D.P. Solomatine. Escalation of uncertainty.

LZ UZ SM RF R PERC EA Q=Q0+Q1 Q1 Transform function SP Q0 SF CFLUX IN SF – Snow RF – Rain EA – Evapotranspiration SP – Snow cover IN – Infiltration R – Recharge SM – Soil moisture CFLUX – Capillary transport UZ – Storage in upper reservoir PERC – Percolation LZ – Storage in lower reservoir Qo – Fast runoff component Q1 – Slow runoff component Q – Total runoff LZ UZ SM RF RF R PERC PERC EA EA Q=Q0+Q1 Q1 Q1 Transform function SP Q0 Q0 SF SF CFLUX CFLUX IN IN SF – Snow RF – Rain EA – Evapotranspiration SP – Snow cover IN – Infiltration R – Recharge SM – Soil moisture CFLUX – Capillary transport UZ – Storage in upper reservoir PERC – Percolation LZ – Storage in lower reservoir Qo – Fast runoff component Q1 – Slow runoff component Q – Total runoff

Single parameter vector P: FC, ALPHA, K, MAXBAS, etc. Output (single time series) Input (single time series) Run the model

slide-10
SLIDE 10

Monte Carlo simulation in analysing parametric uncertainty

10

D.P. Solomatine. Escalation of uncertainty.

slide-11
SLIDE 11

Sampling parameters and multiple model runs

11

D.P. Solomatine. Escalation of uncertainty.

LZ UZ SM RF R PERC EA Q=Q0+Q1 Q1 Transform function SP Q0 SF CFLUX IN SF – Snow RF – Rain EA – Evapotranspiration SP – Snow cover IN – Infiltration R – Recharge SM – Soil moisture CFLUX – Capillary transport UZ – Storage in upper reservoir PERC – Percolation LZ – Storage in lower reservoir Qo – Fast runoff component Q1 – Slow runoff component Q – Total runoff LZ UZ SM RF RF R PERC PERC EA EA Q=Q0+Q1 Q1 Q1 Transform function SP Q0 Q0 SF SF CFLUX CFLUX IN IN SF – Snow RF – Rain EA – Evapotranspiration SP – Snow cover IN – Infiltration R – Recharge SM – Soil moisture CFLUX – Capillary transport UZ – Storage in upper reservoir PERC – Percolation LZ – Storage in lower reservoir Qo – Fast runoff component Q1 – Slow runoff component Q – Total runoff

Input (single time series) Single parameter vector P: FC, ALPHA, K, MAXBAS, etc. Ensemble of multiple output time series Sample one parameter vector from distributions Single output Do this muliple times Run the model

slide-12
SLIDE 12

Monte Carlo sampling: illustration

y = M(x, s, θ) + εs + εθ + εx + εy

12

D.P. Solomatine. Escalation of uncertainty.

slide-13
SLIDE 13

Sampling rainfall and multiple model runs

13

D.P. Solomatine. Escalation of uncertainty.

LZ UZ SM RF R PERC EA Q=Q0+Q1 Q1 Transform function SP Q0 SF CFLUX IN SF – Snow RF – Rain EA – Evapotranspiration SP – Snow cover IN – Infiltration R – Recharge SM – Soil moisture CFLUX – Capillary transport UZ – Storage in upper reservoir PERC – Percolation LZ – Storage in lower reservoir Qo – Fast runoff component Q1 – Slow runoff component Q – Total runoff LZ UZ SM RF RF R PERC PERC EA EA Q=Q0+Q1 Q1 Q1 Transform function SP Q0 Q0 SF SF CFLUX CFLUX IN IN SF – Snow RF – Rain EA – Evapotranspiration SP – Snow cover IN – Infiltration R – Recharge SM – Soil moisture CFLUX – Capillary transport UZ – Storage in upper reservoir PERC – Percolation LZ – Storage in lower reservoir Qo – Fast runoff component Q1 – Slow runoff component Q – Total runoff

Input (single time series) Ensemble of multiple output time series Sample one input time series from distributions Single output Do this muliple times Run the model Single parameter vector P: FC, ALPHA, K, MAXBAS, etc.

slide-14
SLIDE 14

14

Representing uncertainty of model

  • utput by the confidence bounds

Instead of fitting a theoretical distribution, we can use mean, standard deviation, quantiles. E.g., 5% and 95% form the 90% confidence bounds

14

D.P. Solomatine. Escalation of uncertainty.

10 20 30 40 50 60 70 50 100 150 200 250 300 350 400 Time (hr) Discharge (m3.s)

q5 q95

slide-15
SLIDE 15

Propagation of parameters/data uncertainty by Monte Carlo simulation is a typical practical approach. But is it the only one?

15

D.P. Solomatine. Escalation of uncertainty.

slide-16
SLIDE 16

QUESTION 1. On assumptions

 We are assuming some known distributions of parameters

  • r inputs. How safe is this?

 Could we take a safer route and assume less?  Let’s make a step backwards and pose the

QUESTION 1: what is the uncertainty of the calibrated model itself?

16

D.P. Solomatine. Escalation of uncertainty.

slide-17
SLIDE 17

Residual uncertainty: uncertainty of a calibrated (“optimal”) model

Uncertainty of an optimal model M (x, θ)

 Model M is calibrated on measured data y  We say the model M uncertainty is manifested in the residual

model error ε = y^ – y

 This error incorporates all uncertainties due to:

  • bservational errors, inaccurately estimated parameters,

inadequate model structure

time Output Y Actual Actual value y* (unknown) Model Measured value y Model output y^ Measured Model error Observation error

17

D.P. Solomatine. Escalation of uncertainty.

slide-18
SLIDE 18

ESCALATION (“build up”) of model uncertainty [message 1]

 1. Study the (residual) uncertainty of an optimal model

M (p*)

 2. Add and study (typically, by MC simulation)

 A) uncertainty of M (p*) due to DATA uncertainty  B) uncertainty of M (p) due to PARAMETERS uncertainty

 3. Add and study uncertainty of M (p) due to

STRUCTURAL uncertainty

 4. Study uncertainty of a model class M (p), given the

probabilistic properties of parameters and data

18

D.P. Solomatine. Escalation of uncertainty.

slide-19
SLIDE 19

QUESTION 2. On what is analysed

 In UA we always use the past data, so

Estimates of uncertainty are about the PAST.

 QUESTION 2:

how can we assess the model uncertainty for new inputs, i.e. for the future?

  • and this question we pose for all sources of uncertainty (and not
  • nly residual )

19

D.P. Solomatine. Escalation of uncertainty.

10 20 30 40 50 60 70 50 100 150 200 250 300 350 400 Time (hr) Discharge (m3.s)

q5 q95

?

slide-20
SLIDE 20

Models of Residual Uncertainty : Using Methods of Computational Intelligence

20

slide-21
SLIDE 21

CI in building models of natural processes - why not build a model of uncertainty?

CI provides methods to build Data-driven models

Ideally, such models are “ultimate models” since they are not polluted by theories

Input data Natural process X Actual (observed)

  • utput Y

Data-driven model M (p, x) Predicted output Y’ Error (p)  min

y yK

K

x xK

K n n

x xK

K2 2

x xK

K1 1

Instance K Instance K … … y y2

2

x x2

2 n n

x x21

21

x x21

21

Instance Instance 2 2 y y1

1

x x1

1n n

x x12

12

x x11

11

Instance Instance 1 1 y y x xn

n

… … x x2

2

x x1

1

Instances Instances Output Output Inputs Inputs Attributes Attributes Measured data Measured data y yK

K

x xK

K n n

x xK

K2 2

x xK

K1 1

Instance K Instance K … … y y2

2

x x2

2 n n

x x21

21

x x21

21

Instance Instance 2 2 y y1

1

x x1

1n n

x x12

12

x x11

11

Instance Instance 1 1 y y x xn

n

… … x x2

2

x x1

1

Instances Instances Output Output Inputs Inputs Attributes Attributes Measured data Measured data

21

D.P. Solomatine. Escalation of uncertainty.

slide-22
SLIDE 22

Example of a data-driven (statistical, CI) model

 observed data characterises the

input-output relationship X  Y

 model parameters are found by

  • ptimization

 the model then predicts output

for the new input without actual knowledge of what drives Y

Linear regression model Y = a0 + a1 X X (e.g. rainfall) Y (e.g., flow)

new input value actual

  • utput

value model predicts new

  • utput value

Which model is “better”: green, red or blue?

red green blue

22

D.P. Solomatine. Escalation of uncertainty.

slide-23
SLIDE 23

CI models: are they indeed intelligent?

 

  

i t i ij j j hid jk k

  • ut

x a a g b b g Y

2 ) (

)) ( ( e + 1 1 = (u) g where

u

Artificial neural network

23

D.P. Solomatine. Escalation of uncertainty.

slide-24
SLIDE 24

Data-driven model as an error corrector for a process (physically-based) model

HYDROLOGIC FORECASTING MODEL Input data Observed output Model errors Forecasted errors DATA- DRIVEN error forecasting model Improved

  • utput

Model parameters Model output PHYSICAL SYSTEM PROCESS MODEL M (e.g. hydrologic) Input data Observed output Model errors Forecasted error Improved

  • utput

Model parameters Model output PHYSICAL SYSTEM Data-driven model EM to forecast ERROR of model M

24

D.P. Solomatine. Escalation of uncertainty.

slide-25
SLIDE 25

Data-driven model to predict the residual error distribution

Train data-driven model (e.g. Neural Network) to forecast residual error pdf (i.e. the model M output uncertainty)

HYDROLOGIC FORECASTING MODEL Input data Observed output Model errors errors DATA- DRIVEN error forecasting model Model parameters Model output PHYSICAL SYSTEM PROCESS MODEL M (e.g. hydrologic) Input data Observed output Model errors Forecasted quantiles of the residual error distribution Model parameters Model output PHYSICAL SYSTEM Data-driven model U to forecast pdf

  • f the error

distribution Model output+ its uncertainty estimates

25

D.P. Solomatine. Escalation of uncertainty.

slide-26
SLIDE 26

Some of the models of residual uncertainty

 QR (1978) (quantile regression): autoregressive linear

model of model residuals predicts the distribution quantiles [Koenker & Basset]

 DUMBRAE (2012) (Dynamic Uncertainty Model By

Regression on Absolute Error) [Pianosi & Raso]: autoregressive model of model residuals (it corrects the model residual first and then carries out the uncertainty prediction by an autoregressive model)

 UNNEC (2006, 2009) (UNcertainty Estimation based on

local Errors and Clustering) [Shrestha & Solomatine]: it takes into account all variables influencing such uncertainty and uses machine learning (non-linear) methods (neural networks, model trees, instance-based learning etc.)

26

D.P. Solomatine. Escalation of uncertainty.

slide-27
SLIDE 27

UNEEC method

UNcertainty Estimation based on local Errors and Clustering

 machine learning model of the past residual errors of the

  • ptimal process model is built

27

D.P. Solomatine. Escalation of uncertainty.

D.P. Solomatine, D.L. Shrestha (2009). A novel method to estimate model uncertainty using machine learning techniques. Water Resources Res. 45, W00B11.

slide-28
SLIDE 28

UNEEC: assumptions, constraints

Assumptions

 Model error is an indicator of the model uncertainty  Model error depends on the current condition of a natural system

and can be predicted

 Model errors are similar for similar conditions 

Constraints

 Model structure and parameters are fixed  Need to re-train the error model with the changes in the

catchment characteristics (e.g. land use change)

 Data hungry, more data are needed for reliable results

28

D.P. Solomatine. Escalation of uncertainty.

slide-29
SLIDE 29

Error (Qt-Qt’)

time past records (examples)

i

 

i N i

 

1 i N i

  

1

2 /

i N i

   

1

) 2 / 1 (

Prediction interval

Error distribution

Idea 1: assess CDF from all historical data about errors

29

D.P. Solomatine. Escalation of uncertainty.

slide-30
SLIDE 30

Error (Qt-Qt’)

Flow Qt-1 Rainfall Rt-2 past records (examples in the space of inputs) Output

i

 

i N i

 

1 i N i

  

1

2 /

i N i

   

1

) 2 / 1 (

Prediction interval

Error distribution in cluster

Idea 2: local modelling of errors (link CDF to “characteristis variables”)

(different for each cluster)

30

D.P. Solomatine. Escalation of uncertainty.

slide-31
SLIDE 31

Idea 3: Use fuzzy clustering of examples to generate training data sets

New record. The trained f

L and f U models will

estimate the prediction interval

Error limits (or prediction intervals)

Flow Qt-1 Rainfall Rt-2 past records (examples in the space of inputs) Output

L clus Nclus clus example clus L example

PIC PI

1 ,

  • clus,,example is the

membership grade of the example to cluster clus

Train regression (ANN) models: PIL = fL (X) PIU = fU (X)

Eager learning (ANN or M5 model tree)

31

D.P. Solomatine. Escalation of uncertainty.

slide-32
SLIDE 32

Using instance-based learning

Error limits (or prediction intervals)

Flow Qt-1 Rainfall Rt-2 past records (examples in the space of inputs) New record Output

L clus Nclus clus example clus L example

PIC PI

1 ,

  • clus,,example is the

membership grade of the example to cluster clus

Instance based learning

  

The distance function is computed to estimate fuzzy weight

32

D.P. Solomatine. Escalation of uncertainty.

slide-33
SLIDE 33
  • Clustering (finding groups of data in the space characterising

hydro-meteo condition): K-means clustering, fuzzy C-means

clustering

UNEEC details. Step 1: clustering

  • Obj. function

            

  c j N i j i m j i m

D V U J V U

1 1 2 , ,

) , ( ) , min( 

Constraint

i

c j j i

  

, 1

1 ,

Distance 2 2 , A j i j i

v x D  

Degree of Fuzzification

1  m

33

D.P. Solomatine. Escalation of uncertainty.

slide-34
SLIDE 34

i L j

e PIC 

UNEEC details. Step 2: Determining Prediction Interval (PI) for each cluster

j i,

 

 N i j i 1 ,

 

 N i j i 1 ,

2 /    

 N i j i 1 ,

) 2 / 1 (  

i U j

e PIC    

  N i j i j i i k

i

1 , , 1

2 / :       

  N i j i j i i k

i

1 , , 1

) 2 / 1 ( :   

34

D.P. Solomatine. Escalation of uncertainty.

slide-35
SLIDE 35

UNEEC details. Step 3, 4, 5: Building and using the model

L j c j j i L i

PIC PI  

1 ,

U j c j j i U i

PIC PI  

1 ,

Step 3: Generation of Prediction intervals for each example

) X (

u L u L

f PI  ) X (

U u U

f PI 

Step 4: Building the uncertainty Model

) X (

v L u L

f PI  ) X (

v U u U

f PI 

Step 5: Using the uncertainty Model

U i i U i

PI y PL   ˆ

L i i L i

PI y PL   ˆ

Model Outputs with uncertainty bounds Independent Computation

35

D.P. Solomatine. Escalation of uncertainty.

slide-36
SLIDE 36

UNEEC methodology

36

D.P. Solomatine. Escalation of uncertainty.

slide-37
SLIDE 37

Extensions (simplifications)

  • f UNEEC:

without clustering and using instance-based learning

 Based on Master study of Omar Wani (2015)

 SKIBLUE (Streamflow-Centric K nearest neighbour Instance-Based

Learning and Uncertainty Estimation)

 O. Wani, J. Beckers, A.H. Weerts, D.P. Solomatine. Non-

parametric Predictive Uncertainty Estimation Using Instance Based Learning with Applications to Hydrologic Forecasting. HESS-D, 2016.

 Based on Master study of Ms. Jingyi Chen (2015)

 UNEEC-IBL  Jingyi Chen (2015). Uncertainty Prediction in Hydrological

Modelling: Case of Dapoling-Wangjiaba Catchment in Huai River

  • Basin. UNESCO-IHE Master thesis

37

D.P. Solomatine. Escalation of uncertainty.

20 40 60 80 100 2000 4000 6000

  • 1000

1000 2000 3000 Rt Qt et+1

slide-38
SLIDE 38

Models of Parametric Uncertainty : … and again using Methods of Computational Intelligence

––– Escalating uncertainty –––

38

Assuming now uncertainty in parameters and or data… Running Monte Carlo simulations… But how to estimate output uncertainty for the new model runs?

slide-39
SLIDE 39

MLUE method

Machine Learning in Uncertainty Estimation

 machine learning model of the process model’s Monte

Carlo simulation results is built

39

D.P. Solomatine. Escalation of uncertainty.

  • D. L. Shrestha, N. Kayastha, and D. P. Solomatine (2009). A novel approach to parameter

uncertainty analysis of hydrological models using neural networks. HESS, 13, 1235–1248.

slide-40
SLIDE 40

Monte Carlo simulation of parametric uncertainty

y = M(x, s, θ) + εs + εθ + εx + εy

40

D.P. Solomatine. Escalation of uncertainty.

slide-41
SLIDE 41

Issues with MC for new model runs in real-time

 Issues with re-running MC for new inputs:

 1) convergence of the Monte Carlo simulation is very slow

(O(N^-0.5)) so larger number of runs needed to establish a reliable estimate of uncertainties

 2) number of simulation increases exponentially with the

dimension of the parameter vector ((O(n^d)) to cover the entire parameter domain

 Idea:

 encapsulate the results of MC simulation in a machine

learning model

41

D.P. Solomatine. Escalation of uncertainty.

slide-42
SLIDE 42

MLUE Methodology (1)

 Consider the sources of the uncertainty analysis to be

conducted within the framework of Monte Carlo simulation

 Execute the MC simulations to generate the data

yi(t) = M (X(t), pi)

 Estimate the uncertainty measures of the MC realizations,

e.g., mean, variance, prediction intervals, quantiles

 to start with, estimate two quantiles (say, 5% and

95%), forming the prediction interval PI

42

D.P. Solomatine. Escalation of uncertainty.

slide-43
SLIDE 43

MLUE Methodology (2)

 Analyze the dependency of the uncertainty measures

(quantiles) on the input and state variables of the hydrological model

 we used Correlation and Average mutual information

analysis

 Select the input variables for machine learning model

based on the dependency analysis

 Train the machine learning model U to predict the

uncertainty measures of MC realizations PI = U (X)

 Validate machine learning model U by estimating the

uncertainty measures with the “new” input data

 Use model U

The picture can't be displayed.

43

D.P. Solomatine. Escalation of uncertainty.

slide-44
SLIDE 44

Validation

Measuring predictive capability of uncertainty model U (measures the accuracy of uncertainty models in approximating the quantiles of the model outputs generated by MC simulations)

 Coefficient of correlation (r) and root mean squared error (RMSE) 

Measuring the statistics of the uncertainty estimation (i.e. goodness of the model U as uncertainty estimator)

 Prediction interval coverage probability (PICP) and

mean prediction interval (MPI) (Shrestha & Solomatine 2006, 2008)

Visualizing such as scatter and time plot of the prediction intervals

  • btained from the MC simulation and their predicted values
The picture can't be displayed.

1

1 1, with = 0, otherwise

n t L U t t t

PICP C n PL y PL C

       

1

1 ( )

n U L t t t

MPI PL PL n   

44

D.P. Solomatine. Escalation of uncertainty.

slide-45
SLIDE 45

Applications

UNEEC and MLUE were tested and compared to

  • ther methods on 5 various cases:

Brue, Bagmati, Sieve, Severn, Dapoling-Wanjiaba

45

slide-46
SLIDE 46
  • Catchment area: 135 km2
  • Location: south west of England
  • Average annual rainfall: 867 mm
  • Mean river flow: 1.92 m3/s
  • Calibration data: 24/06/94-

24/06/95

  • Validation data: 24/06/95-

31/05/96

Study area: Brue catchment, UK

46

D.P. Solomatine. Escalation of uncertainty.

slide-47
SLIDE 47

Study area: Brue catchment, UK

5 10 15 20 25 30 35 40 45 2000 4000 6000 8000 10000 12000 14000 16000 Time (houry) (1994/6/24 05:00 - 1996/05/31 13:00) Discharge (m 3/s) 5 10 15 20 25 30 35 40 Rainfall (m m /hour)

Calibration data (8760 points): Validation data (8217 points):

47

D.P. Solomatine. Escalation of uncertainty.

slide-48
SLIDE 48

Conceptual Hydrological model HBV

LZ UZ SM RF R PERC EA Q=Q0+Q1 Q1 Transform function SP Q0 SF CFLUX IN SF – Snow RF – Rain EA – Evapotranspiration SP – Snow cover IN – Infiltration R – Recharge SM – Soil moisture CFLUX – Capillary transport UZ – Storage in upper reservoir PERC – Percolation LZ – Storage in lower reservoir Qo – Fast runoff component Q1 – Slow runoff component Q – Total runoff LZ UZ SM RF RF R PERC PERC EA EA Q=Q0+Q1 Q1 Q1 Transform function SP Q0 Q0 SF SF CFLUX CFLUX IN IN SF – Snow RF – Rain EA – Evapotranspiration SP – Snow cover IN – Infiltration R – Recharge SM – Soil moisture CFLUX – Capillary transport UZ – Storage in upper reservoir PERC – Percolation LZ – Storage in lower reservoir Qo – Fast runoff component Q1 – Slow runoff component Q – Total runoff

48

D.P. Solomatine. Escalation of uncertainty.

slide-49
SLIDE 49

Data Analysis

 Analysis of dependency btw various combinations of the

input variables and the output

 Correlation  Average mutual information (AMI) between REt and PIs,

( optimal lag time is around 7-9 hours).

 Additional analysis of the correlation and AMI between the

PIs and observed discharge Qt are carried out. (i.e. with the lag of 0, 1, 2) have very high correlation with the PIs.

49

D.P. Solomatine. Escalation of uncertainty.

slide-50
SLIDE 50

Experimental setup

 MC simulation

 9 Parameters of HBV model are sampled uniformly from

the feasible ranges

 Nash-Sutcliffe coefficient of efficiency (CE) is used as error

measure

 Convergence – stabilized after 10,000 (75,000 runs made)  Only 25,000 “good” models considered (rejection threshold

is set to 0) to compute prediction quantiles

50

D.P. Solomatine. Escalation of uncertainty.

slide-51
SLIDE 51

Experimental setup

 Machine learning model U

 PI = U (REt-5a, Qt-1, Qt-1 )

 PI - lower or upper prediction intervals,  REt-5a - average of REt-5, REt-6, REt-7, REt-8, and REt-9  Qt-1 - Qt-1 - Qt-2.

 Input variables were selected based on the analysis of their

relatedness to output error (average mutual information)

 Methods:

 M5 model trees,  locally weighted regression  MLP neural networks

2 ,

( , ) AMI= ( , )log ( ) ( )

XY i j XY i j X i Y j i j

P x y P x y P x P y        

51

D.P. Solomatine. Escalation of uncertainty.

slide-52
SLIDE 52

Results

52

slide-53
SLIDE 53

UNEEC: Clustering result example

1000 2000 3000 4000 5000 6000 7000 8000 5 10 15 20 25 30 35 40 Discharge (m3/s) Time (hours) C1 C2 C3 C4 C5

53

D.P. Solomatine. Escalation of uncertainty.

slide-54
SLIDE 54

UNEEC: Performance (MLP ANN)

50 100 150 200 250 50 100 150 200 250 Target lower interval (m3/s) Predicted lower interval (m3/s) 50 100 150 200 250 50 100 150 200 250 Target upper interval (m3/s) Predicted upper interval (m3/s)

Prediction interval Data set Mean

  • Std. dev.

RMSE

  • Corr. coef.

training 110.91 53.6 5.9582 0.9937 CV 112.18 52.64 6.0852 0.9934 lower training+CV 111.35 53.32 5.9582 0.9937 training 115.16 55.11 3.9002 0.9975 CV 116.69 54.18 3.9332 0.9974 upper training+CV 115.66 54.79 3.9002 0.9975

54

D.P. Solomatine. Escalation of uncertainty.

slide-55
SLIDE 55

UNEEC: Estimation of prediction intervals

5 10 15 20 25 Discharge (m3/s) 5 10 15 20 25 Prediction bounds (m

3/s)

1000 2000 3000 4000 5000 6000 7000 8000

  • 20

20 Time (hours) Residuals (m

3/s)

C1 C2 C3 C4 C5 PIs by instance base Observed PIs by regression

5 10 15 20 25 Discharge (m3/s) 5 10 15 20 25 Prediction bounds (m

3/s)

4700 4750 4800 4850 4900 4950 5000

  • 2

2 Time (hours) Residuals (m

3/s)

C1 C2 C3 C4 C5 PIs by instance base Observed PIs by regression

55

D.P. Solomatine. Escalation of uncertainty.

slide-56
SLIDE 56

MLUE: Estimation of prediction intervals

56

D.P. Solomatine. Escalation of uncertainty.

slide-57
SLIDE 57

MLUE: Performances

 Predictive capability

MCS MT LWR ANN PICP % 77.24 66.97 75.16 65.54 MPI m3/s 2.09 2.03 1.93 1.96 Corr C RMSE PIL PIU PIL PIU MT 0.841 0.792 0.614 1.641 LWR 0.822 0.798 0.643 1.604 ANN 0.847 0.806 0.584 1.568

 Goodness of

uncertainty measures

MCS = Monte Carlo MT = M5 Model tree LWR = local weighted regression ANN =MLP neural network

57

D.P. Solomatine. Escalation of uncertainty.

slide-58
SLIDE 58

Extensions

Estimation of several quantiles 5%, 10%:10%:90%, 95%

 i.e. estimating cdf of MC realizations by machine learning

models

58

D.P. Solomatine. Escalation of uncertainty.

slide-59
SLIDE 59

Use of Machine learning methods: conclusions

 Machine learning methods are able to replicate:

 Past performance of a process model  Results of Monte-Carlo simulations

 The methods are computationally efficient and can be

used in real time application

 They are to various kinds of models  The results demonstrate that the interpretable

uncertainty estimates are generated

 Future work:

 Other ML methods are to be tested  The methods can be applied in the context of other

sources of uncertainty - input, structure, or combined

59

D.P. Solomatine. Escalation of uncertainty.

slide-60
SLIDE 60

References

UNEEC and extensions:

D.L. Shrestha, D.P. Solomatine. Machine learning approaches for estimation of prediction interval for the model output. Neural Networks , 2006, 19(2), 225-235.

D.P. Solomatine, D.L. Shrestha. A novel method to estimate model uncertainty using machine learning techniques. Water Resour Res. 45, W00B11, 2009.

  • N. Dogulu, P. López López, D. P. Solomatine, A. H. Weerts, and D. L. Shrestha.

Estimation of predictive hydrologic uncertainty using the quantile regression and UNEEC methods and their comparison on contrasting catchments, Hydrol. Earth Syst. Sci., 19, 3181-3201, 2015.

  • O. Wani, J. Beckers, A.H. Weerts, D.P. Solomatine. Non-parametric Predictive

Uncertainty Estimation Using Instance Based Learning with Applications to Hydrologic

  • Forecasting. HESS-D, 2016

MLUE:

  • D. L. Shrestha, N. Kayastha and D. P. Solomatine. A novel approach to parameter

uncertainty analysis of hydrological models using neural networks. Hydrol. Earth Syst. Sci., 13, 1235-1248, 2009.

Shrestha, D.L., Kayastha, N., Solomatine, D., Price, R. Encapsulation of parametric uncertainty statistics by various predictive machine learning models: MLUE method. J Hydroinformatics, 16 (1), 95-113, 2014.

60

D.P. Solomatine. Escalation of uncertainty.

slide-61
SLIDE 61

Conclusions

 Uncertainty analysis should always contain explicit answers

to two questions:

 1) what type of uncertainty is to be analysed: residual (which do not

need MC), or parametric/data (which need MC)

 2) what is required: just analysis of the past, or also a model

predicting the future uncertainty

 It is advisable:

 to go explicitly through all stages of uncertainty escalation,

starting from residual uncertainty

 to try to build the predictive models of uncertainty at all stages  complement the deterministic models M with a family of uncertainty

models U

61

D.P. Solomatine. Escalation of uncertainty.

slide-62
SLIDE 62

What to know more?

 We teach Master courses:

 Hydroinformatics  Flood Risk Management

62

D.P. Solomatine. Escalation of uncertainty.

slide-63
SLIDE 63

Thank you for your attention

63

D.P. Solomatine. Escalation of uncertainty.