A Study of Deadline Scheduling for Client-Server Systems on the - PowerPoint PPT Presentation

A Study of Deadline Scheduling for Client-Server Systems on the Computational Grid Atsuko Takefusa, JSPS/TITECH Henri Casanova, UCSD/SDSC Satoshi Matsuoka, TITECH/JST Francine Berman, UCSD/SDSC http://ninf.is.titech.ac.jp/bricks/ 1

The Computational Grid � A promising platform for the deployment of HPC applications � A crucial issue is Scheduling � Most scheduling works aim at improving execution time of a single application E.g., AppLeS, APST, AMWAT, MW, performance surface, stochastic scheduling, etc. 2

NES: Network-enabled Server � Grid software which provides a service on the network (a.k.a. GridRPC) � e.g. Ninf, NetSolve, Nimrod � Client-server architecture � RPC-style programming model � Many high-profile applications from science and engineering are amenable: � Molecular biology, genetic information, operations research Scheduling in multi-client multi-server scenario? 3

Scheduling for NES � Resource economy model (E.g. [Zhao and Karamcheti ’00], [Plank ’00], [Buyya ’00]) Grid currency allow owners to “charge” for usage $$$$$$$$$$ Choice? $ ? No actual economical model is implemented � Nimrod [abramson ’00] presents a study of deadline-scheduling algorithm Users specify deadlines for the task of their apps. and can spend more to get tighter deadlines 4

Our Approach � Our goal is to minimize � The overall occurrences of deadline misses � The resource cost � Each request comes with a deadline requirement � Deadline-scheduling algorithm under simple economy model � Simulation on Bricks A performance evaluation system for Grid scheduling 5

The Rest of the Talk � Overview of Bricks and its improvement � More scalable and realistic simulations � A Deadline-scheduling algorithm for multi- client/server NES systems � Load Correction mechanism � Fallback mechanism � Experiments in multi-client multi-server scenarios with Bricks � Resource load, resource cost, conservatism of prediction, efficacy of our deadline-scheduling 6

Bricks: A Grid Performance Evaluation System [HPDC ’99] � A Grid simulation framework to evaluate � Scheduling algorithms � Scheduling framework components (e.g. predictors) � Bricks provides � Reproducible and controlled Grid evaluation environments � Flexible setups of simulation environments (Grid topology, resource model, client model) � Evaluation environment for external Grid components (e.g., NWS forecaster) 7

The Bricks Architecture [HPDC ’99] NetworkPredictor Scheduling Unit Scheduling Unit Predictor ServerPredictor Scheduler ResourceDB NetworkMonitor ServerMonitor Network Client Server Network Client Network Server Network Client Network Server Network 8 Grid Computing Environment Grid Computing Environment

A Hierarchical Network Topology on the improved Bricks Local Domain Client Client Server Server Client Server Server Server WAN Client Client Client Server LAN Client Client Server Client Server Client Network Server Network Client Server 9

Deadline-Scheduling Deadline $ Server 1 $$$$$ Server 2 $$ Server 3 Job execution time � Many NES scheduling strategies ? Greedy assigns requests to the server that completes it � the earliest � Deadline-scheduling: � Aims at meeting user-supplied job deadline specifications 10

A Deadline-Scheduling Algorithm for multi-client/server NES Estimate job processing time T s i on each server S i ： 1 T s i = W send /P send + W recv /P recv + W s /P serv (0 ? i < n) W send , W recv , W s : send/recv data size, and logical comp. cost P send , P recv , P serv : estimated send/recv throughput, and performance Server 1 Send Comp. Recv Server 2 Estimated job execution time Server 3 2 Compute T until deadline : T until deadline = T deadline - now Deadline now T until deadline 11

A Deadline-Scheduling Algorithm (cont.) Compute target processing time T target : 3 T target = T until deadline x Opt (0 < Opt ? 1) T until deadline T target now 4 Select suitable server S i : Conditions ： MinDiff = Min( Diff s i ) where Diff s i = T target –T s i ? 0 Otherwise Min(| Diff |) Server 1 Send Comp. Recv Server 2 Server 3 Estimated job execution time 12

Factors in Deadline-Scheduling Failures � Accuracy of predictions is not guaranteed � Monitoring systems do not perceive load change instantaneously � Tasks might be out-of-order in FCFS queues 13

Ideas to improve schedule performance � Scheduling decisions will result in an increase in load of scheduled nodes ? Load Correction: Use corrected load values � Server can estimate whether it will be able to complete the task by the deadline ? Fallback: Push a scheduling functionality to server 14

The Load Correction Mechanism � Modify load predictions from monitoring system, Load S i , as follows: Load S i corrected = Load S i + N jobs S i x pload N jobs S i : the number of scheduled and unfinished jobs on the server S i Pload (= 1): arbitrary value that determines the magnitude NetworkPredictor Predictor ServerPredictor Corrected prediction Scheduler ResourceDB NetworkMonitor ServerMonitor 15

The Fallback Mechanism � Server can estimate whether it will be able to complete the task by the deadline � Fallback happens when: T until deadline < T send + ET exec + ET recv && N max. fallbacks ? N fallbacks T send : Comm. duration (send) ET exec , ET recv : Estimated comm. (recv) and comp. duration N fallbacks , N max. fallbacks : Total/Max. number of fallbacks Scheduler Server Re-submit Fallback Client Server 16

Experiments � Experiments in multi-client multi-server scenarios with Bricks � Resource load, resource cost, conservatism of prediction, efficacy of our deadline-scheduling � Performance criteria: � Failure rate: Percentage of requests that missed their deadline � Resource cost: Avg. resource cost over all requests cost = machine performance E.g. select 100 Mops/s and 300 Mops/s servers ? Resource cost= 200 17

Scheduling Algorithms � Greedy: Typical NES scheduling strategy � Deadline (Opt = 0.5, 0.6, 0.7, 0.8, 0.9) � Load Correction (on/off) � Fallback (N max fallbacks = 0/1/2/3/4/5) 18

Configurations of the Bricks Simulation � Grid Computing Environment ( ? 75 nodes, 5 Grids) � # of local domain: 10, # of local domain nodes: 5-10 � Avg. LAN bandwidth: 50-100[Mbits/s] � Avg. WAN bandwidth: 500-1000[Mbits/s] � Avg. server performance: 100-500[Mops/s] � Avg. server Load: 0.1 � Characteristics of client jobs � Send/recv data size: 100-5000[Mbits] � # of instructions: 1.5-1080[Gops] � Avg. intervals of invoking: 60(high load), 90(medium load), 120(low load) [min] 19

Simulation Environment � The Presto II cluster: 128PEs at Matsuoka Lab., Tokyo Institute of Technology. � Dual Pentium III 800MHz � Memory: 640MB � Network: 100Base/TX � Use APST[Casanova ’00] to deploy Bricks simulations � 24 hour simulation x 2,500 runs (1 sim. takes 30-60 [min] with Sun JVM 1.3.0+ HotSpot) 20

Comparison of Failure Rates (load: medium) Typical NES scheduling 70 x/x 60 L/x x/F 50 Failure Rate [%] L/F Fallback leads to 40 significant reductions 30 20 Load Correction is NOT useful 10 0 Greedy D-0.5 D-0.6 D-0.7 D-0.8 D-0.9 21

Comparison of Failure Rates (Load: high, medium, low) High Medium Low 70 70 70 x/x 60 60 60 L/x x/F 50 50 50 Failure Rate [%] Failure Rate[%] Failure Rate [%] x/x L/F x/x 40 40 40 L/x L/x x/F x/F 30 30 30 L/F L/F 20 20 20 10 10 10 0 0 0 Greedy D-0.5 D-0.6 D-0.7 D-0.8 D-0.9 Greedy D-0.5 D-0.6 D-0.7 D-0.8 D-0.9 Greedy D-0.5 D-0.6 D-0.7 D-0.8 D-0.9 � “Low” load leads to improved failure rates � All show similar characteristics 22

Comparison of Resource Costs Greedy leads to 500 higher costs 450 Even conservative Deadline is descent 400 c.f. Greedy Avg. Resource Cost 350 x/x 300 L/x 250 x/F Costs decrease when the 200 L/F algorithm becomes less conservative 150 100 Trade-off between failure-rate and cost 50 by adjusting conservatism of Deadline 0 Greedy D-0.5 D-0.6 D-0.7 D-0.8 D-0.9 23

Comparison of Failure Rates (x/F, N max. fallbacks = 0-5) 70 Multiple fallbacks cause significant improvement 60 50 0 Failure Rate [%] 1 40 2 3 30 4 20 5 10 0 Greedy D-0.5 D-0.6 D-0.7 D-0.8 D-0.9 24

Comparison of Resource Costs (x/F, N max. fallbacks = 0-5) 500 450 400 Multiple fallbacks lead to Avg. Resource Cost 0 350 “small” increases in costs 1 300 2 250 3 200 4 150 5 100 NES systems should facilitate multiple fallbacks as a part of their standard mechanisms 50 0 Greedy D-0.5 D-0.6 D-0.7 D-0.8 D-0.9 25

Related Work � Economy model: � Nimrod [abramson ’00] � Uses a self-scheduler � Targets parameter sweep apps. from a single user � Grid performance evaluation systems: � MicroGrid [Song ’00] � Emulates a virtual Globus Grid on an actual cluster � Not appropriate for large numbers of experiments � Simgrid [Casanova ’01] � A trace-based discrete event simulator � Provides primitives for simulation of application scheduling � Lacks the network-modeling feature Bricks provides 26

A Study of Deadline Scheduling for Client-Server Systems on the - PowerPoint PPT Presentation

A Study of Deadline Scheduling for Client-Server Systems on the Computational Grid Atsuko Takefusa, JSPS/TITECH Henri Casanova, UCSD/SDSC Satoshi Matsuoka, TITECH/JST Francine Berman, UCSD/SDSC http://ninf.is.titech.ac.jp/bricks/ 1 The

Multi-Threaded Servers December 6, 2007 1 Client-Server Communication Client Client Client

Content Server Caching Network Client Web Server Browser Avoid Network Latency Avoid Queuing

Client and Server model Client , like your computer or mobile phone, makes requests to

Evolution of the Web 1 Client Server 2 2 / 22 1 Client Server 2 2 / 22 1 Client Server

Real Real- -Time Systems Time Systems Deadline- Deadline -monotonic scheduling monotonic

PgBouncer and a Bit of Queueing Theory Peter Eisentraut peter.eisentraut@2ndquadrant.com

NETCONF and RESTCONF Client/Server Models Drafus covered: drafu-ietg-netconf-keystore-01

Aperiodic Task Scheduling Radek Pel anek Preemptive Scheduling Non-preemptive Scheduling

Slow Orbit Feedback - Schematics oco Console Client Event @ 0.5Hz CDEV BPM CORBA Server

Socket Programming Socket Programming Srinidhi Varadarajan Client- -server paradigm server

CS 3700 Networks and Distributed Systems The Web Client-Server Computing 2 99% of all

Distributed Processing Distributed Processing, Client/Server, and Clusters , Chapter 13

Scheduling Aperiodic and Sporadic Jobs Definitions Polling Server Deferrable Server

Server Traffic Management Server Traffic Management Jeff Chase Duke University, Department of

CPU Scheduling CPU Scheduling CPU Scheduling 101 CPU Scheduling 101 The CPU scheduler makes a

Module 5: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms

Client-Server Programming Paradigm 1 ' " The Computer Communication

distributed 1 / sockets 1 last time RAID ordering writes carefully waste space rather than

Security and performance designs for client-server communications Helmut Tschemernjak HELIOS

Hands on Scala.js Li Haoyi, PNWScala 14 Nov 2014 Hands on Scala.js: Agenda Intro to Scala.js

Accurate, Low Cost and Instrumentation-Free Security Audit Logging for Windows Shiqing Ma, Kyu

Long-Term Engagement Alan Dix Talis & University of

Cyber@UC Meeting 54 Router Login Bypass If Youre New! Join our Slack: ucyber.slack.com

Reading for Clipping CPSC 314 Computer Graphics FCG Sec 8.1.3-8.1.6 Clipping Jan-Apr 2013

A Study of Deadline Scheduling for Client-Server Systems on the - PowerPoint PPT Presentation

A Study of Deadline Scheduling for Client-Server Systems on the Computational Grid Atsuko Takefusa, JSPS/TITECH Henri Casanova, UCSD/SDSC Satoshi Matsuoka, TITECH/JST Francine Berman, UCSD/SDSC http://ninf.is.titech.ac.jp/bricks/ 1 The

Multi-Threaded Servers December 6, 2007 1 Client-Server Communication Client Client Client

Content Server Caching Network Client Web Server Browser Avoid Network Latency Avoid Queuing

Client and Server model Client , like your computer or mobile phone, makes requests to

Evolution of the Web 1 Client Server 2 2 / 22 1 Client Server 2 2 / 22 1 Client Server

Real Real- -Time Systems Time Systems Deadline- Deadline -monotonic scheduling monotonic

PgBouncer and a Bit of Queueing Theory Peter Eisentraut peter.eisentraut@2ndquadrant.com

NETCONF and RESTCONF Client/Server Models Drafus covered: drafu-ietg-netconf-keystore-01

Aperiodic Task Scheduling Radek Pel anek Preemptive Scheduling Non-preemptive Scheduling

Slow Orbit Feedback - Schematics oco Console Client Event @ 0.5Hz CDEV BPM CORBA Server

Socket Programming Socket Programming Srinidhi Varadarajan Client- -server paradigm server

CS 3700 Networks and Distributed Systems The Web Client-Server Computing 2 99% of all

Distributed Processing Distributed Processing, Client/Server, and Clusters , Chapter 13

Scheduling Aperiodic and Sporadic Jobs Definitions Polling Server Deferrable Server

Server Traffic Management Server Traffic Management Jeff Chase Duke University, Department of

CPU Scheduling CPU Scheduling CPU Scheduling 101 CPU Scheduling 101 The CPU scheduler makes a

Module 5: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms

Client-Server Programming Paradigm 1 ' &quot; The Computer Communication

distributed 1 / sockets 1 last time RAID ordering writes carefully waste space rather than

Security and performance designs for client-server communications Helmut Tschemernjak HELIOS

Hands on Scala.js Li Haoyi, PNWScala 14 Nov 2014 Hands on Scala.js: Agenda Intro to Scala.js

Accurate, Low Cost and Instrumentation-Free Security Audit Logging for Windows Shiqing Ma, Kyu

Long-Term Engagement Alan Dix Talis &amp; University of

Cyber@UC Meeting 54 Router Login Bypass If Youre New! Join our Slack: ucyber.slack.com

Reading for Clipping CPSC 314 Computer Graphics FCG Sec 8.1.3-8.1.6 Clipping Jan-Apr 2013

Client-Server Programming Paradigm 1 ' " The Computer Communication

Long-Term Engagement Alan Dix Talis & University of