Bottom-up Estimation and Top-down Prediction for Multi-level Models: - - PowerPoint PPT Presentation

bottom up estimation and top down prediction for multi
SMART_READER_LITE
LIVE PREVIEW

Bottom-up Estimation and Top-down Prediction for Multi-level Models: - - PowerPoint PPT Presentation

Bottom-up Estimation and Top-down Prediction for Multi-level Models: Solar Energy Prediction Combining Information from Multiple Sources Jae-Kwang Kim Department of Statistics, Iowa State University Ross-Royall Symposium: Johns Hopkins


slide-1
SLIDE 1

Bottom-up Estimation and Top-down Prediction for Multi-level Models: Solar Energy Prediction Combining Information from Multiple Sources

Jae-Kwang Kim

Department of Statistics, Iowa State University

Ross-Royall Symposium: Johns Hopkins University Feb 26, 2016

1/37

slide-2
SLIDE 2

Collaborators

◮ Youngdeok Hwang (IBM Research) ◮ Siyuan Lu (IBM Research)

2/37

slide-3
SLIDE 3

Outline

◮ Introduction ◮ Modeling approach ◮ Application: Solar Energy Prediction ◮ Conclusion

Overview 3/37

slide-4
SLIDE 4

Mountain Climbing for Problem Solving!

Math Problem Stat Problem Real Problem Math Solution Stat Solution Real Solution We need a map (abstraction) to move from problem to solution!

Overview 4/37

slide-5
SLIDE 5

Real Problem: Solar Energy Prediction

◮ Solar electricity is now projected to supply 14% of total

demand of contiguous U.S. by 2030, and 27% by 2050.

Introduction 5/37

slide-6
SLIDE 6

IBM Solar Forecasting

Figure : Sky Camera for short-term forecasting (located at Watson)

◮ Research program funded the by the U.S. Department of

Energy’s SunShot Initiative.

Introduction 6/37

slide-7
SLIDE 7

Monitoring Network

◮ Global Horizontal Irradiance (GHI): The total amount of

shortwave radiation received from above by a horizontal surface.

◮ GHI Measurements are being collected every 15 minutes from

1,528 sensor units.

Introduction 7/37

slide-8
SLIDE 8

Weather Models

◮ Prediction of GHI from widely-used weather models North

American Mesoscale Forecast System (NAM) and Short-Range Ensemble Forecast (SREF).

◮ We want to combine GHI measurements with the weather model

  • utcomes to obtain the solar energy prediction.

Introduction 8/37

slide-9
SLIDE 9

Statistical Model: Basic setup

◮ Population is divided into H exhaustive and

non-overlapping groups, where group h has nh units, for h = 1, . . . , H.

◮ For group h, nh units are selected for measurement. ◮ From the i-th unit of group h, the measurements and its

associated covariates, (yhij, xhij), are available for j = 1, . . . , nhi.

Model 9/37

slide-10
SLIDE 10

Multi-level Model

◮ Consider level one and level two model,

yhi ∼ f1(yhi|xhi; θhi), θhi ∼ f2(θhi|zhi; ζh),

◮ yhi = (yhi1, . . . , yhinhi)⊤: observations at unit (hi). ◮ xhi = (x⊤

hi1, . . . , x⊤ hinhi)⊤: covariates associated with unit (hi)

(=two weather model outcomes).

◮ zhi: unit-specific covariate.

◮ Note that θhi is a parameter in level 1 model, but a random

variable (latent variable) in level 2 model.

◮ We can build a level 3 model on ζh if necessary.

ζh ∼ f3(ζh | qh; α).

Model 10/37

slide-11
SLIDE 11

Data Structure Under Two-level Model ζh θh2 θh1 θh3

yh11 . . . yh1n1 yh21 . . . yh2n2 yh31 . . . yh3n3

f2 f2 f2 f1 f1 f1

Model 11/37

slide-12
SLIDE 12

Why Multi-level Models?

  • 1. To reflect the reality: To allow for structural heterogeneity

(=variety in big data) across areas.

  • 2. To borrow strength: we need to predict the locations with

no direct measurement.

Model 12/37

slide-13
SLIDE 13

Real Problems Become Statistical Problems!

  • 1. Parameter estimation
  • 2. Prediction
  • 3. Uncertainty quantification

Bayesian method using MCMC computation is a useful tool.

Model 13/37

slide-14
SLIDE 14

Classical Solutions Do Not Necessarily Work in Reality!

  • 1. No single data file exists, as they are stored in cloud

(Hadoop Distributed File System).

  • 2. Micro-level data is not always available to the analyst for

confidentiality and security reasons.

  • 3. Classical solution, based on MCMC algorithm, is time

consuming and the computational cost can be huge for big data. This is a typical big data problem.

Solution 14/37

slide-15
SLIDE 15

New Solution: Divide-and-Conquer Approach

◮ Three steps for parameter estimation in each level

  • 1. Summarization: Find a summary (=measurement) for latent

variable to obtain the sampling error model.

  • 2. Combine: Combine the sampling error model and the latent

variable model.

  • 3. Learning: Estimate the parameters from the summary data.

◮ Apply the three steps in level two model and then do these

in level three model.

Solution 15/37

slide-16
SLIDE 16

Modeling Structure

Storage Storage Storage Sensor Sensor Sensor Level 1 Level 1 Level 1 Level 2 Site 1 Site 2 Site 3

individual data Unit summary Group Summary Solution 16/37

slide-17
SLIDE 17

Summarization

◮ Find a measurement for θhi. ◮ For each unit, treat (xhi, yhi) as a single data set to obtain

the best estimator ˆ θhi of θhi by treating θhi as a fixed parameter.

◮ Obtain the sampling distribution of ˆ

θhi as a function of θhi, ˆ θhi ∼ g1(ˆ θhi | θhi).

Solution 17/37

slide-18
SLIDE 18

Summarization Step under Two-Level Model Structure ζh θh2 θh1 θh3 ˆ θh1 ˆ θh2 ˆ θh3

f2 f2 f2 g1 g1 g1

g1(ˆ θhi | θhi): Sampling error model, ˆ θhi ∼ N(θhi, ˆ V(ˆ θhi)).

Solution 18/37

slide-19
SLIDE 19

Combining

◮ The marginal distribution of ˆ

θhi is m2(ˆ θhi | zhi; ζh) =

  • g1(ˆ

θhi | θhi)f2(θhi | zhi; ζh)dθhi. (1) which is combining g1(ˆ θhi | θhi) and f2(θhi | zhi; ζh) via latent variable θhi.

◮ Also, the prediction model for the latent variable θhi is

  • btained by using Bayes theorem:

p2(θhi | ˆ θhi; ζh) = g1(ˆ θhi | θhi)f2(θhi | zhi; ζh)

  • g1(ˆ

θhi | θhi)f2(θhi | zhi; ζh)dθhi (2)

Solution 19/37

slide-20
SLIDE 20

Combining Step ˆ θhi θhi ζh

g1 f2 m2 p2 p2 Sampling error model (g1)+ Latent variable model (f2) ⇒ Marginal model (m2), Prediction model (p2)

Solution 20/37

slide-21
SLIDE 21

Learning

◮ Level two model can be learned by EM algorithm: at t-th

iteration, we update ζh by solving ˆ ζ(t+1)

h

← arg max

ζh nh

  • i=1

Ep2

  • log f2(θhi | zhi; ζh)
  • ˆ

θhi; ˆ ζ(t)

h

  • where the conditional expectation is taken with respect to

the prediction model p2 in (2) evaluated at ˆ ζ(t)

h , and ˆ

ζ(t)

h

denotes the t-th iteration of the EM algorithm.

Solution 21/37

slide-22
SLIDE 22

Learning Using EM Algorithm ˆ θhi θhi Zhi ˆ ζh

M-step E-step

Solution 22/37

slide-23
SLIDE 23

Bayesian Interpretation

◮ Prediction model (2) can be written as

p2(θhi | ˆ θhi; ζh) ∝ g1(ˆ θhi | θhi)f2(θhi | zhi; ζh).

◮ Here, f2(θhi | zhi; ζh) can be treated as a prior distribution

and p2(θhi | ˆ θhi; ζh) is a posterior distribution that incorporates the observation of ˆ θhi.

◮ Use of g1(ˆ

θhi | θhi) instead of full likelihood simplifies the

  • computation. (Approximate Bayesian Computation).

Solution 23/37

slide-24
SLIDE 24

Extension to Three Level Model

Model Measurement Parameter Latent variable (Data summary) Level 1 yhi = (yhi1, · · · , yhin) θhi Level 2 ˆ θh = (ˆ θh1, · · · , ˆ θhnh) ζh θ = (θh1, · · · , θhnh) Level 3 ˆ ζ = (ˆ ζ1, · · · , ˆ ζH) α ζ = (ζ1, · · · , ζH) We can apply the same three steps to the level three model.

Solution 24/37

slide-25
SLIDE 25

Bottom-up Estimation

Latent Variable Model f3(ζh|qh; α) f2(θhi|zhi; ζh) f1(yhij|xhij; θhi) Level 3 2 1 Sampling Error Model ˆ ζh ∼ g2(ˆ ζh|ζh) ˆ θhi ∼ g1(ˆ θhi|θhi) Parameter Estimation ˆ α = arg maxα H

h=1 log

  • g2(ˆ

ζh|ζh)f3(ζh|qh; α)dζh ˆ ζh = arg maxζh nh

i=1 log

  • g1(ˆ

θhi|θhi)f2(θhi|zhi; ζh)dθhi ˆ θhi = arg maxθhi nhi

j=1 log f1(yhij|xhij; θhi)

Figure : An illustration of the Bottom-up approach to parameter estimation

Solution 25/37

slide-26
SLIDE 26

Prediction

◮ Our goal is to predict unobserved yhij values from the

above models using the parameter estimates.

◮ The best prediction for yhij is

ˆ y∗

hij = Ep3

  • Ep2
  • Ef1(yhij | xhij, θhi) | ˆ

θhi; ζh

  • | ˆ

ζh; ˆ α

  • ,

where p3(ζh | ˆ ζh, ˆ α) = g2(ˆ ζh | ζh)f3(ζh | qh; ˆ α)

  • g2(ˆ

ζh | ζh)f3(ζh | qh; ˆ α)dζh and p2(θhi | ˆ θhi, ζh) = g1(ˆ θhi | θhi)f2(θhi | zhi; ζh)

  • g1(ˆ

θhi | θhi)f2(θhi | zhi; ζh)dθhi .

◮ The prediction is made in a top-down manner.

Solution 26/37

slide-27
SLIDE 27

Prediction: Top-down Prediction ˆ α ζ∗

2

ζ∗

1

ζ∗

3

θ∗

1i

θ∗

2i

θ∗

3i p3 p3 p3 p2 p2 p2

Predict yhij using f1(yhij | xhij; θ∗

hi).

Solution 27/37

slide-28
SLIDE 28

Prediction: Top-down Prediction

Level 3 2 1 Latent ζh θhi yhij Prediction Model p3(ζh | ˆ ζh; ˆ α) p2(θhi | ˆ θhi; ζh) f1(yhij | xhij; θhi) Best Prediction ζ∗

h ∼ p3(ζh | ˆ

ζh; ˆ α) θ∗

hi ∼ p2(θhi | ˆ

θhi; ζ∗

h)

y∗

hij ∼ f1(yhij|xhij, θ∗ hi)

Figure : Top-down approach to prediction

Solution 28/37

slide-29
SLIDE 29

Case study: Application to Solar Energy Prediction

◮ We use 15-day long (12/01/2014 – 12/15/2014) data for

analysis.

◮ Organized the states into 12 groups. ◮ The number of sites in each group, mh, varies between 37

and 321.

Application 29/37

slide-30
SLIDE 30

Grouping Scheme

◮ Pooling data from nearby sites. ◮ Can incorporate complex structure such as distribution

zone.

Application 30/37

slide-31
SLIDE 31

Application: Site Level

◮ First assume that

yhij = xhijθhi + ǫhij, ǫhij ∼ t(0, σ2

hi, νhi),

where σ2

hi is scale parameter and νhi is degree of freedom

and ˆ θhi | θhi ∼ N(θhi, V hi), where V hi = V(ˆ θhi).

◮ The degree of freedom is assumed to be unknown and

estimated by the method of Lange et al. (1989).

Application 31/37

slide-32
SLIDE 32

Three Level Model

◮ Assume level 2 model

θhi ∼ N(βh, Σh), and ζh = (βh, Σh)

◮ Similarly, level 3 model is

ζh ∼ N(µ, Σ), and α = (µ, Σ).

Application 32/37

slide-33
SLIDE 33

Comparison

◮ We compared the performance of the multi-level approach

with three other modeling methods:

◮ Site-by-site model: fit a different model for each individual

site

◮ Group-by-group model: fit a different model for each group ◮ One global model: fit a single common model for all sites

using the aggregate data

◮ To evaluate the prediction accuracy, we randomly selected

the 70% of the data to fit the model and tested on the remaining 30%.

Application 33/37

slide-34
SLIDE 34

MSPE Comparison

◮ We compare the accuracy by Mean Squared Prediction

Error (MSPE), N−1

T

(yhij − ˆ yhij)2, where ˆ yhij are obtained from four different methods and NT is the size of the test data set. Multi level Site model Group model Global model MSPE 0.297 0.298 0.406 0.383 SD 0.601 0.609 0.803 0.791

Table : Accuracy comparison of the different modeling methods

Application 34/37

slide-35
SLIDE 35

Comparison in Detail (nhi ≤ 100 vs > 100)

0.0 0.5 1.0 1.5 <100 >100

Sample Size Mean Squared Error

Method Multilevel Site Model Group Model Global Model

Application 35/37

slide-36
SLIDE 36

Discussion

◮ Motivated from a real problem: A solar energy forecasting

system has been developed.

◮ We used a multi-level model approach to address the

practical issues.

◮ There are more issues to be investigated.

◮ Spatial modeling ◮ Estimation of group structure ◮ Preferential sampling of sites ◮ ...

◮ The proposed method is promising for handling big data.

Application 36/37

slide-37
SLIDE 37

Application 37/37