Bottom-up Estimation and Top-down Prediction for Multi-level Models: - PowerPoint PPT Presentation

Bottom-up Estimation and Top-down Prediction for Multi-level Models: Solar Energy Prediction Combining Information from Multiple Sources Jae-Kwang Kim Department of Statistics, Iowa State University Ross-Royall Symposium: Johns Hopkins University Feb 26, 2016 1/37

Collaborators ◮ Youngdeok Hwang (IBM Research) ◮ Siyuan Lu (IBM Research) 2/37

Outline ◮ Introduction ◮ Modeling approach ◮ Application: Solar Energy Prediction ◮ Conclusion Overview 3/37

Mountain Climbing for Problem Solving! Math Problem Math Solution Stat Problem Stat Solution Real Problem Real Solution We need a map (abstraction) to move from problem to solution! Overview 4/37

Real Problem: Solar Energy Prediction ◮ Solar electricity is now projected to supply 14% of total demand of contiguous U.S. by 2030, and 27% by 2050. Introduction 5/37

IBM Solar Forecasting Figure : Sky Camera for short-term forecasting (located at Watson) ◮ Research program funded the by the U.S. Department of Energy’s SunShot Initiative. Introduction 6/37

Monitoring Network ◮ Global Horizontal Irradiance (GHI) : The total amount of shortwave radiation received from above by a horizontal surface. ◮ GHI Measurements are being collected every 15 minutes from 1,528 sensor units. Introduction 7/37

Weather Models ◮ Prediction of GHI from widely-used weather models North American Mesoscale Forecast System ( NAM ) and Short-Range Ensemble Forecast ( SREF ). ◮ We want to combine GHI measurements with the weather model outcomes to obtain the solar energy prediction. Introduction 8/37

Statistical Model: Basic setup ◮ Population is divided into H exhaustive and non-overlapping groups, where group h has n h units, for h = 1 , . . . , H . ◮ For group h , n h units are selected for measurement. ◮ From the i -th unit of group h , the measurements and its associated covariates, ( y hij , x hij ) , are available for j = 1 , . . . , n hi . Model 9/37

Multi-level Model ◮ Consider level one and level two model, y hi ∼ f 1 ( y hi | x hi ; θ hi ) , θ hi ∼ f 2 ( θ hi | z hi ; ζ h ) , ◮ y hi = ( y hi 1 , . . . , y hin hi ) ⊤ : observations at unit ( hi ) . ◮ x hi = ( x ⊤ hi 1 , . . . , x ⊤ hin hi ) ⊤ : covariates associated with unit ( hi ) (=two weather model outcomes). ◮ z hi : unit-specific covariate. ◮ Note that θ hi is a parameter in level 1 model, but a random variable (latent variable) in level 2 model. ◮ We can build a level 3 model on ζ h if necessary. ζ h ∼ f 3 ( ζ h | q h ; α ) . Model 10/37

Data Structure Under Two-level Model ζ h f 2 f 2 f 2 θ h 1 θ h 2 θ h 3 f 1 f 1 f 1 y h 11 y h 21 y h 31 . . . . . . . . . y h 1 n 1 y h 2 n 2 y h 3 n 3 Model 11/37

Why Multi-level Models? 1. To reflect the reality: To allow for structural heterogeneity (=variety in big data) across areas. 2. To borrow strength: we need to predict the locations with no direct measurement. Model 12/37

Real Problems Become Statistical Problems! 1. Parameter estimation 2. Prediction 3. Uncertainty quantification Bayesian method using MCMC computation is a useful tool. Model 13/37

Classical Solutions Do Not Necessarily Work in Reality! 1. No single data file exists, as they are stored in cloud (Hadoop Distributed File System). 2. Micro-level data is not always available to the analyst for confidentiality and security reasons. 3. Classical solution, based on MCMC algorithm, is time consuming and the computational cost can be huge for big data. This is a typical big data problem. Solution 14/37

New Solution: Divide-and-Conquer Approach ◮ Three steps for parameter estimation in each level 1. Summarization: Find a summary (=measurement) for latent variable to obtain the sampling error model. 2. Combine: Combine the sampling error model and the latent variable model. 3. Learning: Estimate the parameters from the summary data. ◮ Apply the three steps in level two model and then do these in level three model. Solution 15/37

Modeling Structure Site 1 individual Storage Sensor Level 1 data Unit summary Site 2 Group Storage Sensor Level 1 Level 2 Summary Site 3 Storage Level 1 Sensor Solution 16/37

Summarization ◮ Find a measurement for θ hi . ◮ For each unit, treat ( x hi , y hi ) as a single data set to obtain the best estimator ˆ θ hi of θ hi by treating θ hi as a fixed parameter. ◮ Obtain the sampling distribution of ˆ θ hi as a function of θ hi , θ hi ∼ g 1 (ˆ ˆ θ hi | θ hi ) . Solution 17/37

Summarization Step under Two-Level Model Structure ζ h f 2 f 2 f 2 θ h 3 θ h 1 θ h 2 g 1 g 1 g 1 ˆ ˆ ˆ θ h 1 θ h 2 θ h 3 θ hi ∼ N ( θ hi , ˆ g 1 (ˆ θ hi | θ hi ) : Sampling error model, ˆ V (ˆ θ hi )) . Solution 18/37

Combining ◮ The marginal distribution of ˆ θ hi is � m 2 (ˆ g 1 (ˆ θ hi | z hi ; ζ h ) = θ hi | θ hi ) f 2 ( θ hi | z hi ; ζ h ) d θ hi . (1) which is combining g 1 (ˆ θ hi | θ hi ) and f 2 ( θ hi | z hi ; ζ h ) via latent variable θ hi . ◮ Also, the prediction model for the latent variable θ hi is obtained by using Bayes theorem: g 1 (ˆ θ hi | θ hi ) f 2 ( θ hi | z hi ; ζ h ) p 2 ( θ hi | ˆ θ hi ; ζ h ) = (2) g 1 (ˆ � θ hi | θ hi ) f 2 ( θ hi | z hi ; ζ h ) d θ hi Solution 19/37

Combining Step p 2 θ hi ζ h f 2 p 2 g 1 m 2 ˆ θ hi Sampling error model ( g 1 )+ Latent variable model ( f 2 ) ⇒ Marginal model ( m 2 ) , Prediction model ( p 2 ) Solution 20/37

Learning ◮ Level two model can be learned by EM algorithm: at t -th iteration, we update ζ h by solving n h � ζ ( t + 1 ) � ζ ( t ) � ˆ � � ˆ θ hi ; ˆ ← arg max log f 2 ( θ hi | z hi ; ζ h ) E p 2 � h h ζ h i = 1 where the conditional expectation is taken with respect to ζ ( t ) ζ ( t ) the prediction model p 2 in (2) evaluated at ˆ h , and ˆ h denotes the t -th iteration of the EM algorithm. Solution 21/37

Learning Using EM Algorithm E-step ˆ θ hi ζ h M-step ˆ Z hi θ hi Solution 22/37

Bayesian Interpretation ◮ Prediction model (2) can be written as p 2 ( θ hi | ˆ g 1 (ˆ θ hi ; ζ h ) ∝ θ hi | θ hi ) f 2 ( θ hi | z hi ; ζ h ) . ◮ Here, f 2 ( θ hi | z hi ; ζ h ) can be treated as a prior distribution and p 2 ( θ hi | ˆ θ hi ; ζ h ) is a posterior distribution that incorporates the observation of ˆ θ hi . ◮ Use of g 1 (ˆ θ hi | θ hi ) instead of full likelihood simplifies the computation. (Approximate Bayesian Computation). Solution 23/37

Extension to Three Level Model Model Measurement Parameter Latent variable (Data summary) Level 1 y hi = ( y hi 1 , · · · , y hin ) θ hi θ h = (ˆ ˆ θ h 1 , · · · , ˆ θ hn h ) θ = ( θ h 1 , · · · , θ hn h ) Level 2 ζ h ζ = (ˆ ˆ ζ 1 , · · · , ˆ Level 3 ζ H ) ζ = ( ζ 1 , · · · , ζ H ) α We can apply the same three steps to the level three model. Solution 24/37

Bottom-up Estimation Latent Variable Sampling Error Level Parameter Estimation Model Model ζ h ∼ g 2 (ˆ ˆ � H g 2 (ˆ f 3 ( ζ h | q h ; α ) � 3 ζ h | ζ h ) α = arg max α ˆ h = 1 log ζ h | ζ h ) f 3 ( ζ h | q h ; α ) d ζ h θ hi ∼ g 1 (ˆ ˆ ˆ � n h g 1 (ˆ f 2 ( θ hi | z hi ; ζ h ) � 2 θ hi | θ hi ) ζ h = arg max ζ h i = 1 log θ hi | θ hi ) f 2 ( θ hi | z hi ; ζ h ) d θ hi ˆ � n hi f 1 ( y hij | x hij ; θ hi ) θ hi = arg max θ hi j = 1 log f 1 ( y hij | x hij ; θ hi ) 1 Figure : An illustration of the Bottom-up approach to parameter estimation Solution 25/37

Prediction: Top-down Prediction α ˆ p 3 p 3 p 3 ζ ∗ ζ ∗ ζ ∗ 1 2 3 p 2 p 2 p 2 θ ∗ θ ∗ θ ∗ 1 i 2 i 3 i Predict y hij using f 1 ( y hij | x hij ; θ ∗ hi ) . Solution 27/37

Prediction: Top-down Prediction Level Latent Prediction Model Best Prediction p 3 ( ζ h | ˆ h ∼ p 3 ( ζ h | ˆ ζ ∗ 3 ζ h ζ h ; ˆ α ) ζ h ; ˆ α ) p 2 ( θ hi | ˆ hi ∼ p 2 ( θ hi | ˆ θ ∗ θ hi ; ζ ∗ 2 θ hi θ hi ; ζ h ) h ) y hij y ∗ hij ∼ f 1 ( y hij | x hij , θ ∗ hi ) f 1 ( y hij | x hij ; θ hi ) 1 Figure : Top-down approach to prediction Solution 28/37

Case study: Application to Solar Energy Prediction ◮ We use 15-day long (12/01/2014 – 12/15/2014) data for analysis. ◮ Organized the states into 12 groups. ◮ The number of sites in each group, m h , varies between 37 and 321. Application 29/37

Grouping Scheme ◮ Pooling data from nearby sites. ◮ Can incorporate complex structure such as distribution zone. Application 30/37

Bottom-up Estimation and Top-down Prediction for Multi-level Models: - PowerPoint PPT Presentation

Bottom-up Estimation and Top-down Prediction for Multi-level Models: Solar Energy Prediction Combining Information from Multiple Sources Jae-Kwang Kim Department of Statistics, Iowa State University Ross-Royall Symposium: Johns Hopkins

CSC 4181 Compiler Construction Parsing 1 1 Outline Top-down v.s. Bottom-up Top-down parsing

BOTTOM, STRANGE MESONS BOTTOM, STRANGE MESONS BOTTOM, STRANGE MESONS BOTTOM, STRANGE MESONS ( B

Bottom Bottom Bottom- Bottom - - -Up Studies for Regional Models Up Studies for Regional

Stacks Linear list. One end is called top. Other end is called bottom.

Top-Down AND Bottom-Up CGA Conference: Illuminating Space and Time in Data Science Krzysztof

Bottom-Up and Top-Down Reasoning with Hierarchical Rectified Gaussians Peiyun Hu, UC Irvine

Grundlegende Parsingalgorithmen Top-Down & Bottom-Up Parsing Kurt Eberle

MCP gap bottom bottom electrode gap Anode

Stack Of Cups Stacks top F top E E D D C C Linear list. B B One end is called

Top-Down Parsing Slides modified from Louden Book and Dr. Scherger Top Down Parsing A

Agenda What is Top-down Web services? Benefit of top-down Web services How to develop

To TOP or NOT to TOP www.SAS.com To TOP or NOT to TOP Using the TOP command in Linux By Len van

Introduction to Bottom-Up Parsing Shift-reduce parsing The LR parsing algorithm

Puncher/Squeezer Riveting Tools BEST PRACTICES 2018 Tool Uses Top Rail Punch Top Rail

QuickCheck 10.2 Starting from rest, a marble first rolls down a steeper hill, then down a less

Down Syndrome by Birth Order and Moms Age 3/20/2017 V0 2017-Down-Syndrome 1 2017-Down-Syndrome

Richard Redux (With Apologies to John Updike) Ash Asudeh University of Rochester October

Data Mining with Weka Class 1 Lesson 1 Introduction Ian H. Witten Department of Computer

WEATHER Weather Forecasting Module 3.1 Proudly developed by SMART with funding from Inspiring

Supplemental Slides Second Quarter 2020 Earnings August 5, 2020 Forward-Looking Statements T his

Lecture 19 Conditional Independence, Bayesian networks intro 1 Announcement nouncement

Modeling Update for Aliso OII California Public Utilities Commission Hearing Room, 5 th Floor 320

1 2 The AM was first developed as a method for weather predicting more than 60 years ago and

COORDINATION GAMES Nash Equilibria, Schelling Points and the Prisoners Dilemma Owain Evans,

Bottom-up Estimation and Top-down Prediction for Multi-level Models: - PowerPoint PPT Presentation

Bottom-up Estimation and Top-down Prediction for Multi-level Models: Solar Energy Prediction Combining Information from Multiple Sources Jae-Kwang Kim Department of Statistics, Iowa State University Ross-Royall Symposium: Johns Hopkins

CSC 4181 Compiler Construction Parsing 1 1 Outline Top-down v.s. Bottom-up Top-down parsing

BOTTOM, STRANGE MESONS BOTTOM, STRANGE MESONS BOTTOM, STRANGE MESONS BOTTOM, STRANGE MESONS ( B

Bottom Bottom Bottom- Bottom - - -Up Studies for Regional Models Up Studies for Regional

Stacks Linear list. One end is called top. Other end is called bottom.

Top-Down AND Bottom-Up CGA Conference: Illuminating Space and Time in Data Science Krzysztof

Bottom-Up and Top-Down Reasoning with Hierarchical Rectified Gaussians Peiyun Hu, UC Irvine

Grundlegende Parsingalgorithmen Top-Down &amp; Bottom-Up Parsing Kurt Eberle

MCP gap bottom bottom electrode gap Anode

Stack Of Cups Stacks top F top E E D D C C Linear list. B B One end is called

Top-Down Parsing Slides modified from Louden Book and Dr. Scherger Top Down Parsing A

Agenda What is Top-down Web services? Benefit of top-down Web services How to develop

To TOP or NOT to TOP www.SAS.com To TOP or NOT to TOP Using the TOP command in Linux By Len van

Introduction to Bottom-Up Parsing Shift-reduce parsing The LR parsing algorithm

Puncher/Squeezer Riveting Tools BEST PRACTICES 2018 Tool Uses Top Rail Punch Top Rail

QuickCheck 10.2 Starting from rest, a marble first rolls down a steeper hill, then down a less

Down Syndrome by Birth Order and Moms Age 3/20/2017 V0 2017-Down-Syndrome 1 2017-Down-Syndrome

Richard Redux (With Apologies to John Updike) Ash Asudeh University of Rochester October

Data Mining with Weka Class 1 Lesson 1 Introduction Ian H. Witten Department of Computer

WEATHER Weather Forecasting Module 3.1 Proudly developed by SMART with funding from Inspiring

Supplemental Slides Second Quarter 2020 Earnings August 5, 2020 Forward-Looking Statements T his

Lecture 19 Conditional Independence, Bayesian networks intro 1 Announcement nouncement

Modeling Update for Aliso OII California Public Utilities Commission Hearing Room, 5 th Floor 320

1 2 The AM was first developed as a method for weather predicting more than 60 years ago and

COORDINATION GAMES Nash Equilibria, Schelling Points and the Prisoners Dilemma Owain Evans,

Grundlegende Parsingalgorithmen Top-Down & Bottom-Up Parsing Kurt Eberle