Avoiding Register Overflow in the Bakery Algorithm The Bakery++ Algorithm The Bakery algorithm is the first true solution of mutual exclusion, but it suffers from register overflows. Bakery++ is a slightly modified version of Bakery that avoids overflows without introducing new variables or redefining the operations or functions of Bakery. Bakery++ is quite simple. Bakery++ is specified formally in the PlusCal language and verified correct using the TLC model checker. Amirhossein Sayyadabdi and Mohsen Sharifi SRMPDS ‘20, Edmonton, AB, Canada
Communication-aware Job Scheduling using SLURM Priya Mishra, Tushar Agrawal, Preeti Malakar Indian Institute of Technology Kanpur MOTIVATION RESULTS METHODS • • Performance of communication-intensive jobs Proposed algorithms reduce the Greedy Allocation: Nodes affected by network contention, node-spread and execution times by 9% on average allocated on switches with job interference and the wait times by 31% across lower communication ratio three job logs (lower contention and higher free nodes) • Balanced and adaptive always • perform better than default and Balanced Allocation: Nodes greedy allocated in powers-of-two to minimize inter-switch • Proposed algorithms always communication perform better than the default for • the same cluster state (individual Adaptive Allocation: Selects runs) the more optimal node- OBJECTIVE allocation algorithm (greedy or Developing node-allocation algorithms that balanced) based on their cost consider the job’s behaviour during resource of communication allocation to improve the performance of communication-intensive jobs
Charact cterizing t the Cost-Accu curacy P Performance of Cloud A Applications Sunimal Rathnayake, *Lavanya Ramapantulu, Yong Meng Teo National University of Singapore, *Nanyang Technological University Motivation ‒ results of different accuracy ‒ scalable Opportunity for ‒ resource demand ‒ resource pool Trading-off varies with ‒ pay for use accuracy Accuracy for Time charging ‒ e.g. machine and Cost learning cloud resources some cloud applications Contribution Approach two stage approach • Measurement-driven model and analysis - measurements for characterization • - model and optimization for determining cost, time and Cost-accuracy “sweet spots” configuration • Cost-accuracy and time-accuracy Pareto optimal configurations • Metrics for cost-accuracy and time-accuracy performance SRMPDS Workshop @ ICPP 2020
SRMPDS 2020 Scheduling Task-parallel Applications in Dynamically Asymmetric Environments Jing Chen , Pirah Noor Soomro , Mustafa Abduljabbar, Madhavan Manivannan, Miquel Pericàs Motivations Method Results RWS RWSM-C FA FAM-C DA DAM-C DAM-P Performance Trace Table (PTT) 3500 u Applications sharing resources suffer 3000 2500 Throughput [Tasks/s] from interference. 2000 u Runtime scheduling techniques 1500 coupled with application knowledge 1000 500 can be used to mitigate interference. 0 2 3 4 5 6 u An online performance model is used u Goal: Performance prediction for future DAG Parallelism Interference: co-running application to predict task performance. tasks given a set of resources; u Entries: elastic execution place (leader core, u We leverage task moldability and RWS RWSM-C FA FAM-C DA DAM-C DAM-P 900 resource width); 800 knowledge of task criticality to adapt u One PTT for each task type; 700 Throughput [Tasks/s] to interference. 600 u Dynamic update of execution time records 500 u Our scheduler targets to minimize during execution; 400 300 resource usage, execution time and u Awareness of interference activities; 200 u Only require few information; 100 overcommitting of resources . 0 u PTT is independent of platforms; 2 3 4 5 6 DAG Parallelism u Low overhead. Interference: DVFS
Network and Load-Aware Resource Manager for MPI Programs Ashish Kumar, Naman Jain, Preeti Malakar Department of Computer Science and Engineering, Indian Institute of Technology, Kanpur Problem Results Node allocation in a shared cluster for parallel jobs to max- imize performance considering both compute and network load on the cluster. Challenges Figure: Allocator workflow (a) N/W bandwidth (b) CPU load Figure: Variation across nodes Problem Formulation • Non exclusive access to nodes in shared cluster Model: Represent cluster as graph with vertices as compute nodes and edges as network links • Variation in load/utilization across time/nodes Objective: Find a sub-graph satisfying user demands such that the overall load of sub-graph • Topology does not capture the current state of network is minimized • Contention and congestion in the network due to Network Load Compute Load existing jobs − Measure of load on the P2P network link − Measure of overall load on the node • Varying computation and communication requirements − Considers bandwidth and latency − Static (core count, clock speed) & dynamic of different programs − Topology automatically gets captured (CPU load, available memory) attributes Algorithm Avg. gain Max. gain − NL ( u,v ) = w lt LT ( u,v ) + w bw BW ( u,v ) � − CL v = a ∈ attributes w a ∗ val va Core Components Random 49.9% 87.8% Resource Monitor Sequential 43.1% 84.5% Load Aware 32.4% 87.7% − Distributed monitoring system for cluster Algorithm − Uses light-weight daemons for periodically updating Table: Performance gain using our allocation method livehosts, node statistics and network status Observations − Find candidate sub-graphs Allocator − Our algorithm performs better than random, − Calculate total load for each sub-graph 2 . 2 4 . 1 1 . 8 3 . 7 − Allocates nodes based on user request sequential, and load-aware on an average. � Compute Load, C G v = u ∈V v CL u � �� � − Considers node attributes and network dynamics − Load-aware performed better than sequential for less � Network Load, N G v = ( x,y ) ∈E v NL ( x,y ) − Uses data collected by resource monitor number of nodes whereas worse for a large number of Total Load = α × C G v + β × N G v nodes. − Pick the best one according to total load Pictorial representation of allocation algorithm Conclusions and Future Work − Our algorithm reduces run-times by more than 38% over Candidate Selection Algorithm random, sequential and load-aware allocations. − Start with a particular node v − Formalization of weights estimation − Calculate addition load for all nodes w.r.t. start node − Extension to large scale systems, spanning over multiple A v ( u ) = α × CL ( u ) + β × NL ( v, u ) clusters. − Keep adding nodes in increasing order of addition load to sub-graph until request is satisfied
Recommend
More recommend