WDCloud: An End to End System for Large- Scale Watershed Delineation on Cloud
*In Kee Kim, *Jacob Steele, +Anthony Castronova, *Jonathan Goodall, and *Marty Humphrey
*University of Virginia +Utah State University
Scale Watershed Delineation on Cloud * In Kee Kim, * Jacob Steele, + - - PowerPoint PPT Presentation
WDCloud : An End to End System for Large- Scale Watershed Delineation on Cloud * In Kee Kim, * Jacob Steele, + Anthony Castronova, * Jonathan Goodall, and * Marty Humphrey * University of Virginia + Utah State University Watershed Delineation
*In Kee Kim, *Jacob Steele, +Anthony Castronova, *Jonathan Goodall, and *Marty Humphrey
*University of Virginia +Utah State University
Mississippi Watershed (Consisting of approx. 1.1 million+ catchments)
WDCloud Component Description
Web Portal for WDCloud
target watershed coordinates.
(as well as output files (KML)).
NHD+ Dataset
integrating 21 district NHD DBs.
Automated Catchment Search Module
NHD regions for the target watershed.
Geometric Union Module
create the final watershed.
Execution Time Estimator
watershed delineation via LLR.
Amazon Web Services
and storage resources (e.g Amazon S3) for WDCloud.
Domain Specific Data-Reuse For the “monster-scale” watersheds (e.g. the Mississippi). Multi-HUC region case. (approx. 1.1mil+) 1 System Specific Parallel Union Maximize the performance of single VM. < 25K 1 MapReduce Maximize the performance of watershed delineation via Hadoop Cluster. >= 25K > 1
NHD+ Region “A” NHD+ Region “B+C” (Pre-computed)
NHD+ Region “A” NHD+ Region “B+C” (Pre-computed) Outlet (User Input) Water Flow
Target Watershed Only Merging Catchments in Region “A” (Green Area) NHD+ Region “A” NHD+ Region “B+C” (Pre-computed)
Delineation Result NHD+ Region “B+C” (Pre-computed) Watershed in Region “A”
computation.
A collection of catchments for Target Watershed
Split and Assign to Parallel Tasks
multiple numbers of VM instances.
A collection of catchments for Target Watershed
Split and Assign to Workers (Mapper)
with IaaS/Application (Watershed Delineation Tool) specific parameters (e.g. VM Type, # of Catchments)
# of Catchment Type of VM Non Geometric Union Geometric Union 0.0973 (negligible) 0.6129 (moderate) 0.7089 (strong) 0.3223 (weak)
Simple Linear Model Cannot Produce Reliable Prediction
# of Catchments (a) Global Linear regression on m1.large (using all samples)
“GLOBAL” LINEAR REGRESSION VS. “LOCAL” LINEAR REGRESSION
# of Catchments (b) Local Linear Regression on m1.large (Using three samples)
1. Applying kNN to find a proper set 𝑾(𝒚𝟏) for prediction.
model based on 𝑾(𝒚𝟏)
based on the Regression model
Samples
Prediction Model
𝒚𝟏 error 𝒚𝟏
′
𝒚𝟏
(Monster Watershed)
(# of catch. < 25K)
(# of catch. >= 25K)
Comm. Desktop Data Reuse Speed Ups 10+ Hrs 5.5 min. 111x 4 Core i7 with 8G RAM M1.xlarge Instance on AWS (4 vCPUs with 7.5G Ram) Mississippi Watershed
0.0 0.3 0.5 0.8 1.0 1 2 4 8 16 32
# of Parallel Tasks
VA (430 Catch.) TN (23K Catch.) SC (155 Catch.) PA (140 Catch.) Average
3.9x speedup (≈ 310 sec.) ≈ 1200 sec.
2.2x 4x 6.5x 4x 6.8x 12.5x 5.5x 9x 18x 7x 11x 21.2x 5 10 15 20 25 ME (66K) KY (107K) SD (253K)
Speed-Up (Baseline: Non-parallel) Large-Scale Watersheds (# of Catchment
MapReduce
4 cores (4 * medium) 8 cores (4 * large) 16 cores (4 * xlarge) 32 cores (4 * 2xlarge)
11.8 min.
𝑄𝑠𝑓𝑒𝑗𝑑𝑢𝑗𝑝𝑜 𝐵𝑑𝑑𝑣𝑠𝑏𝑑𝑧 = 𝑈𝑏𝑑𝑢𝑣𝑏𝑚 𝑈𝑞𝑠𝑓𝑒𝑗𝑑𝑢𝑓𝑒 , 𝑈𝑞𝑠𝑓𝑒𝑗𝑑𝑢𝑓𝑒 ≥ 𝑈𝑏𝑑𝑢𝑣𝑏𝑚 𝑈𝑞𝑠𝑓𝑒𝑗𝑑𝑢𝑓𝑒 𝑈𝑏𝑑𝑢𝑣𝑏𝑚 , 𝑈𝑏𝑑𝑢𝑣𝑏𝑚 > 𝑈𝑞𝑠𝑓𝑒𝑗𝑑𝑢𝑓𝑒 𝑁𝐵𝑄𝐹 = 1 𝑜
𝑗=1 𝑜
𝑈𝑏𝑑𝑢𝑣𝑏𝑚,𝑗 − 𝑈𝑞𝑠𝑓𝑒𝑗𝑑𝑢𝑓𝑒,𝑗 𝑈𝑏𝑑𝑢𝑣𝑏𝑚,𝑗
1) Prediction Accuracy 2) MAPE (Mean Absolute Percentage Error)
Prediction Accuracy 85.6% 65.7% 42.8% MAPE 0.19 0.93 1.97
Overall Results for Execution Time Estimation
0% 20% 40% 60% 80% 100%
Prediction Accuracy
LLR Predictor kNN mean 0.00 0.20 0.40 0.60 0.80 1.00
MAPE
LLR Predictor kNN mean
80% 0.2