Scale Watershed Delineation on Cloud * In Kee Kim, * Jacob Steele, + - PowerPoint PPT Presentation

WDCloud : An End to End System for Large- Scale Watershed Delineation on Cloud * In Kee Kim, * Jacob Steele, + Anthony Castronova, * Jonathan Goodall, and * Marty Humphrey * University of Virginia + Utah State University

Watershed Delineation • Watershed Delineation: • A starting point of many hydrological analyses. • Defining a watershed boundary for the area of interests. • Why Important? • Defining the scope of modeling domain. • Impacting further analysis and modeling steps of hydrologic research.

Approaches for Large-Scale Watershed Delineation • Approaches: • Commercial Desktop SWs (e.g. GIS tools). • Online Geo-Services (e.g. USGS – StreamStats). • Algorithms/Mechanisms from Research Community. • Limitations: • Steep Learning Curve. • Requiring Significant Amount of Preprocessing. • Scalability and Performance for nation-scale watersheds. • Uncertainty of Execution (Watershed Delineation) Time.

Research Goal • The goals of this research is addressing 1. The Scalability Problem of public dataset (NHD+)-based approach (Castronova and Goodall’s approach). 2. The Performance Problem of very large-scale watershed delineations (e.g. the Mississippi) using the recent advancement of computing technology (e.g. Cloud and MapReduce). 3. The Predictability Problem of watershed Mississippi Watershed (Consisting of approx. 1.1 million+ catchments) delineation using ML (e.g. Local Linear Regression).

Our Approach 1. Automated Catchment Search Mechanism Using NHD+ 2. Performance Improvement for Computing a Large Number of Geometric Union: a. Data-Reuse b. Parallel-Union c. MapReduce 3. LLR (Local Linear Regression)-based Execution Time Estimation

Our Approach 1. Automated Catchment Search Mechanism Using NHD+.  To address the Scalability Problem. 2. Performance Improvement for Computing a Large Number of Geometric Union: a. Data-Reuse  To address the Performance Problem. b. Parallel-Union c. MapReduce 3. LLR (Local Linear Regression)-based Execution Time Estimation.  To address the Predictability Problem.

Design of WDCloud WDCloud Description Component - Provides UI (Bing Maps) to select Web Portal for target watershed coordinates. WDCloud - Displays the final delineation results (as well as output files (KML)). - Has A single NHD+ DB (SQL Server) by NHD+ Dataset integrating 21 district NHD DBs. Automated - Collects relevant catchments in multiple Catchment Search NHD regions for the target watershed. Module Geometric Union - Performs geometric union operation to Module create the final watershed. Execution Time - Estimate duration for the given Estimator watershed delineation via LLR. - Various compute resources (e.g. VMs) Amazon Web and storage resources (e.g Amazon S3) Services for WDCloud.

Automated Catchment Search Module • Automatically search and collect all relevant catchments in multiple NHD+ regions via HydroSeq , TerminalPath , and DnHydroSeq . • Output: Set of Catchments that forms the target watershed.

Performance Improvement Strategies # of # of Strategy Description Catchments VMs Multi-HUC For the “monster - scale” Domain Data-Reuse region case. 1 Specific watersheds (e.g. the Mississippi). (approx. 1.1mil+) Parallel Union Maximize the performance of < 25K 1 single VM. System Specific Maximize the performance of MapReduce watershed delineation via Hadoop >= 25K > 1 Cluster.

Performance Improvement – “Data - Reuse” • Key Idea : - Pre-compute catchment unions for Monster-scale Watersheds. (not using a specific point for outlet). - Offline optimization to guarantee the performance of watershed delineations. NHD+ Region “ A ” NHD+ Region “ B+C ” ( Pre-computed )

Performance Improvement – “Data - Reuse” • Key Idea : - Pre-compute catchment unions for Monster-scale Watersheds. (not using a specific point for outlet). - Offline optimization to guarantee the performance of watershed delineations. Outlet (User Input) Water Flow NHD+ Region “A” NHD+ Region “B+C” ( Pre-computed )

Performance Improvement – “Data - Reuse” • Key Idea : - Pre-compute catchment unions for Monster-scale Watersheds. (not using a specific point for outlet). - Offline optimization to guarantee the performance of watershed delineations. Only Merging Catchments Target Watershed i n Region “A” ( Green Area) NHD+ Region “A” NHD+ Region “B+C” ( Pre-computed )

Performance Improvement – “Data - Reuse” • Key Idea : - Pre-compute catchment unions for Monster-scale Watersheds. (not using a specific point for outlet). - Offline optimization to guarantee the performance of watershed delineations. Delineation Result NHD+ Region “B+C” Watershed in Region “A” ( Pre-computed )

Performance Improvement – “Parallel - Union” • Key Idea : - Used for medium-size (less than 25K catchments) watersheds. - Designed to maximize a multi-core (up to 32 cores) single VM instance. - Watershed delineation can be parallelized via “Divide -and- Conquer” or “MapReduce Style” computation. Split and Assign to Parallel Tasks A collection of catchments for Target Watershed

Performance Improvement – “MapReduce” • Key Idea : - “ Hadoop version ” of Parallel -Union. - Designed to maximize the performance (minimize the watershed execution time) via utilizing multiple numbers of VM instances. - Used for large-size (more than 25K catchments) watersheds . Split and Assign to Workers (Mapper) A collection of catchments for Target Watershed

Execution Time Estimation – LLR (Local Linear Regression) • Initial Hypothesis : • Execution time for watershed delineation has a somewhat linear relationship with IaaS/Application (Watershed Delineation Tool) specific parameters (e.g. VM Type, # of Catchments) • Watershed Delineation Tool has several pipeline steps that each pipeline step is related to: • Geometric Union (Polygon Processing) • Non-Geometric Union • Data Collection and Correlation Analysis • Profiled 26 execution samples on 4 different Types of VMs on AWS. # of Catchment Type of VM 0.7089 (strong) 0.0973 (negligible) Non Geometric Union 0.6129 (moderate) 0.3223 (weak) Geometric Union Simple Linear Model  Cannot Produce Reliable Prediction

Execution Time Estimation – LLR (Local Linear Regression) “G LOBAL ” L INEAR R EGRESSION VS. “L OCAL ” L INEAR R EGRESSION error 𝒚 𝟏 𝒚 𝟏 ′ 𝒚 𝟏 # of Catchments # of Catchments (a) Global Linear regression on m1.large (using all samples) (b) Local Linear Regression on m1.large (Using three samples) • Procedure of Local Linear Regression 1. Applying k NN to find a 2. Creating simple Regression 3. Making prediction for Job 𝒚 𝟏 proper set 𝑾(𝒚 𝟏 ) for prediction. model based on 𝑾(𝒚 𝟏 ) based on the Regression model Prediction Samples • # of Catchment Model • Geographical Closeness • Exec. Environment (VM)

Evaluation (1) – Performance Improvement (1) Data-Reuse (3) MapReduce (2) Parallel-Union (Monster Watershed) (# of catch. < 25K) (# of catch. >= 25K) Mississippi Watershed Norm. xLarge (4 cores) MapReduce 25 ≈ 1200 sec. 11.8 min. Comm. Data Speed 4 cores (4 * medium) 1.0 VA (430 Catch.) 21.2x Desktop Reuse Ups Speed-Up (Baseline: Non-parallel) 8 cores (4 * large) TN (23K Catch.) 20 16 cores (4 * xlarge) SC (155 Catch.) 18x 10+ Hrs 5.5 min. 111x PA (140 Catch.) 32 cores (4 * 2xlarge) 0.8 Norm. Execution Time Average 15 12.5x 11x 0.5 10 9x 4 Core i7 7x 6.8x 6.5x 5.5x with 8G RAM 0.3 5 4x 4x 2.2x 3.9x speedup (≈ 310 sec.) M1.xlarge Instance on AWS 0 0.0 ME (66K) KY (107K) SD (253K) (4 vCPUs with 7.5G Ram) 1 2 4 8 16 32 Large-Scale Watersheds (# of # of Parallel Tasks Catchment

Evaluation – Execution Time Estimation (Overall) • Measures 420 random coordinates. - (20 random coordinates for watershed outlet * 21 HUC regions in NHD+) • Metrics: 1) Prediction Accuracy 2) MAPE (Mean Absolute Percentage Error) 𝑈 𝑏𝑑𝑢𝑣𝑏𝑚 𝑜 , 𝑈 𝑞𝑠𝑓𝑒𝑗𝑑𝑢𝑓𝑒 ≥ 𝑈 𝑏𝑑𝑢𝑣𝑏𝑚 𝑈 𝑏𝑑𝑢𝑣𝑏𝑚,𝑗 − 𝑈 𝑞𝑠𝑓𝑒𝑗𝑑𝑢𝑓𝑒,𝑗 𝑁𝐵𝑄𝐹 = 1 𝑈 𝑞𝑠𝑓𝑒𝑗𝑑𝑢𝑓𝑒 𝑜 𝑄𝑠𝑓𝑒𝑗𝑑𝑢𝑗𝑝𝑜 𝐵𝑑𝑑𝑣𝑠𝑏𝑑𝑧 = 𝑈 𝑏𝑑𝑢𝑣𝑏𝑚,𝑗 𝑈 𝑞𝑠𝑓𝑒𝑗𝑑𝑢𝑓𝑒 𝑗=1 , 𝑈 𝑏𝑑𝑢𝑣𝑏𝑚 > 𝑈 𝑞𝑠𝑓𝑒𝑗𝑑𝑢𝑓𝑒 𝑈 𝑏𝑑𝑢𝑣𝑏𝑚 Overall Results for Execution Time Estimation LLR Estimator (Geo) k NN Mean Prediction 85.6% 65.7% 42.8% Accuracy MAPE 0.19 0.93 1.97

Evaluation – Execution Time Estimation (Regional) LLR Predictor kNN mean Prediction Accuracy 100% 80% 80% 60% 40% 20% 0% LLR Predictor kNN mean MAPE 1.00 0.80 0.60 0.40 0.2 0.20 0.00

Conclusions • We have designed and implemented WDCloud on top of public cloud (AWS) to solve three limitations of existing approaches:  Automated Catchment Search Mechanism. 1) Scalability  Three Perf. Improvement Strategies. 2) Performance  Local Linear Regression. 3) Predictability • Evaluations of WDCloud on AWS: • Performance Improvement - 4x ~ 111x speed up (Parallel Union, MapReduce, Data Reuse) • Prediction Accuracy - 85.6% of prediction accuracy and 0.19 of MAPE.

Questions? Thank you!

Support Slides (NHD+ Regions)

Scale Watershed Delineation on Cloud * In Kee Kim, * Jacob Steele, + - PowerPoint PPT Presentation

WDCloud : An End to End System for Large- Scale Watershed Delineation on Cloud * In Kee Kim, * Jacob Steele, + Anthony Castronova, * Jonathan Goodall, and * Marty Humphrey * University of Virginia + Utah State University Watershed Delineation

Newfound Lake Region Association Protecting the Watershed Lakes Management Advisory Council July

Whats a Watershed Protection Plan Process? Linda Shead, Watershed Coordinator What is a

Whats a Watershed Protection Plan Process? Linda Shead, Watershed Coordinator A Watershed

E E nhanced DE nhanced DE M-based flow path M-based flow path delineation algorithms for

Reliability of Cloud-Scale Systems (CS 598) Fall 2018 Tianyin Xu 1 Reliability of Cloud-Scale

Outline The Watershed Association Reservoirs and Watershed The Nexus The Issues

Arlington County Watershed Retrofits Greg Hoffmann Center for Watershed Protection March 18,

State of the Red Deer River Watershed Report Red Deer River Watershed Alliance Tracy Scott ,

Ghost River State of the Watershed (2018) - Ghost Watershed Alliance Society Photo credit: R.

Watershed ONLINE INTERACTIVE MAP Presented by Justin Bedocs Watershed Technician, AmeriCorps,

Arlington County Arlington County Watershed Retrofits Greg Hoffmann Center for Watershed

Myakka River Watershed Initiative Myakka River Watershed Initiative Status Briefing and Next

Arlington County Arlington County Watershed Retrofits Greg Hoffmann, P.E. Center for Watershed

Building a Private Cloud Cloud Infrastructure Using Opensource Building a Private Cloud OSCON

KAFKA STREAMS CLOUD MONITORING AWS CLOUD MONITORING AWS APP CLOUD MONITORING AWS HTTP APP

SNR SNR- -cloud interaction cloud interaction cloud interaction SNR SNR cloud interaction

Strategies for Building Media Relationships to Expand Medicaid & CHIP Outreach Efforts June

Development of the CMS Phase-1 Pixel Online Monitoring System and the Evolution of Pixel Leakage

UTAH STATE BAR SPRING CONVENTION March 13, 2020 St. George, Utah LANDLORD-TENANT ISSUES Marty

Breaking the News: Extracting the Sparse Citation Network Backbone of Online News Articles

Rotor 67 average inlet/outlet data abs. total pressure : 101063. abs. total temperature:

Stanford CS193p Developing Applications for iOS Spring 2016 CS193p Spring 2016 Today MVC

Slide 1 / 65 Slide 2 / 65 1 The length and radius of an aluminum wire is quadrupled. By 2 A

Import/Export OVA Arik Hadas Deep Dive Scope Existing OVA support: Importing

Scale Watershed Delineation on Cloud * In Kee Kim, * Jacob Steele, + - PowerPoint PPT Presentation

WDCloud : An End to End System for Large- Scale Watershed Delineation on Cloud * In Kee Kim, * Jacob Steele, + Anthony Castronova, * Jonathan Goodall, and * Marty Humphrey * University of Virginia + Utah State University Watershed Delineation

Newfound Lake Region Association Protecting the Watershed Lakes Management Advisory Council July

Whats a Watershed Protection Plan Process? Linda Shead, Watershed Coordinator What is a

Whats a Watershed Protection Plan Process? Linda Shead, Watershed Coordinator A Watershed

E E nhanced DE nhanced DE M-based flow path M-based flow path delineation algorithms for

Reliability of Cloud-Scale Systems (CS 598) Fall 2018 Tianyin Xu 1 Reliability of Cloud-Scale

Outline The Watershed Association Reservoirs and Watershed The Nexus The Issues

Arlington County Watershed Retrofits Greg Hoffmann Center for Watershed Protection March 18,

State of the Red Deer River Watershed Report Red Deer River Watershed Alliance Tracy Scott ,

Ghost River State of the Watershed (2018) - Ghost Watershed Alliance Society Photo credit: R.

Watershed ONLINE INTERACTIVE MAP Presented by Justin Bedocs Watershed Technician, AmeriCorps,

Arlington County Arlington County Watershed Retrofits Greg Hoffmann Center for Watershed

Myakka River Watershed Initiative Myakka River Watershed Initiative Status Briefing and Next

Arlington County Arlington County Watershed Retrofits Greg Hoffmann, P.E. Center for Watershed

Building a Private Cloud Cloud Infrastructure Using Opensource Building a Private Cloud OSCON

KAFKA STREAMS CLOUD MONITORING AWS CLOUD MONITORING AWS APP CLOUD MONITORING AWS HTTP APP

SNR SNR- -cloud interaction cloud interaction cloud interaction SNR SNR cloud interaction

Strategies for Building Media Relationships to Expand Medicaid &amp; CHIP Outreach Efforts June

Development of the CMS Phase-1 Pixel Online Monitoring System and the Evolution of Pixel Leakage

UTAH STATE BAR SPRING CONVENTION March 13, 2020 St. George, Utah LANDLORD-TENANT ISSUES Marty

Breaking the News: Extracting the Sparse Citation Network Backbone of Online News Articles

Rotor 67 average inlet/outlet data abs. total pressure : 101063. abs. total temperature:

Stanford CS193p Developing Applications for iOS Spring 2016 CS193p Spring 2016 Today MVC

Slide 1 / 65 Slide 2 / 65 1 The length and radius of an aluminum wire is quadrupled. By 2 A

Import/Export OVA Arik Hadas Deep Dive Scope Existing OVA support: Importing

Strategies for Building Media Relationships to Expand Medicaid & CHIP Outreach Efforts June