Optim imiz izatio ion Coachin ing for Fork/Join in Applic - PowerPoint PPT Presentation

Optim imiz izatio ion Coachin ing for Fork/Join in Applic licatio ions on the Java Vir irtual l Machin ine Eduardo Eduar do Ros osal ales es Advisor: Prof. Walter Binder Research area: Parallel applications, performance analysis EuroDW 2018 April 23, 2018 PhD stage: Planner Porto, Portugal

Opti timizati ation Coachi on Coaching ng for for For Fork/J /Joi oin n Applicati cations ons on the J on the Jav ava V a Virtual tual M Machi achine ne § The The pro probl blem: despite the complexities associated with developing and tuning fork/join applications, there is little work focused on assisting developers in optimizing such applications on the JVM . § Re Relevance: fork/join parallelism has an increasing popularity among developers targeting the JVM. It has been integrated to support parallel processing on the Java library , thread management in JVM languages and a variety of parallel applications based on Actors, MapReduce, etc. § Ou Our pro propo posal: coaching developers towards optimizing fork/join applications by diagnosing performance issues on such applications and further suggest concrete code refactoring to solve them. § Ex Expe pected out outcom come: in contrast to the manual experimentation often required to tune fork/join applications on the JVM, we devise a tool able to automatically assist developers in optimizing a fork/join application.

Fork/join Application § Wh What is is a fo fork/j /join appl applicat cation? on? solve(Problem problem) { if (problem is small) if directly solve problem sequentially else { else recursively split problem into independent parts: fork new new tasks to solve each part fork rk fo fork fo rk join all forked tasks join in j j o o jo i i n n } } join in join in j j j j fork rk o o jo fork rk o o jo i i i i n n fo n n fo fork fo fork fo rk rk

The Java Fork/Join Framework § The The Jav ava for fork/j /joi oin fr fram amewor ework [1] is the implementation enabling fork/join applications on the JVM § It implements the work-stealing [2] scheduling strategy: Worker thread 1 Push Push e e k k Task a a T T P P o o p p Deque 1 Submission Worker St Steal eal task thread 2 Tak Take Push Push P P o o p p CP CPU Deque 2 COR ORE COR ORE [1] D. Lea. A Java Fork/Join Framework . JAVA 2000. [2] Burton et al. Executing Functional Programs on a Virtual Tree of Processors . FPCA 1981.

The Java Fork/Join Framework § The The Jav ava for fork/j /joi oin fr fram amewor ework [1] is the implementation enabling fork/join applications on the JVM § It implements the work-stealing [2] scheduling strategy: Worker thread 1 Push Push e e k k Task a a T T P P o o p p Deque 1 Submission Deque 2 Worker task thread 2 Take Tak Push Push P P o o p p CP CPU COR ORE COR ORE [1] D. Lea. A Java Fork/Join Framework . JAVA 2000. [2] Burton et al. Executing Functional Programs on a Virtual Tree of Processors . FPCA 1981.

The Java Fork/Join Framework § Supports parallel processing in the Java library: • java.util.Array • java.util.streams (package) • java.util.concurrent.CompletableFuture<T> § Supports thread management for other JVM languages: • Scala • Apache Groovy • Clojure § Supports diverse fork/join parallelism, including applications based on Actors and MapReduce

The Java Fork/Join Framework § Many of the design forces encountered when implementing fork/join designs surround task granularity at four levels [3] : M M a a x x i i m m i i z z i i n n g g M M i i n n i i m m i i z z i i n n g g l l o o c c a a l l i i t t y y c c o o n n t t e e n n t t i i o o n n M M a a x x i i m m i i z z i i n n g g M M i i n n i i m m i i z z i i n n p p g g a a r r a a l l l l e e l l i i s s m m o o v v e e r r h h e e a a d d s s Task granul Task anular arity [3] D. Lea. Concurrent Programming in Java. Second Edition: Design Principles and Patterns . Addison-Wesley Professional, 2nd edition, 1999.

Example of a common performance issues 1/4 Too Too fine ne-gr grain ined d tasks Sub ubop opti timal al for forking ng § Ex Excessiv ive forkin ing § Push Pus Take Tak Pop Pop Push Pus Take Tak Pop Pop Push Pus Take Tak Pop Pop Take Push Pus Tak Pop Pop ✗ Parallelization overheads due to excessive: CP CPU COR ORE COR ORE • Deque accesses • Object creation/reclaiming COR ORE COR ORE

Example of a common performance issues 2/4 Few coars Few coarse-gr grain ined d tasks Sub ubop opti timal al for forking ng § Spa Sparse forkin ing § Push Pus Pop Pop Take Tak Push Pus Take Tak Pop Pop Push Pus Take Tak Pop Pop Take Push Pus Tak Steal St eal Pop Pop ✗ CPU CP Missed parallelization opportunities: • Low CPU utilization COR ORE COR ORE • Load imbalance COR ORE COR ORE ✗ idle id le

The problem De Despite the complexities associated wi with developing and tuning fork/j fo /join a applicati tions, , there is little wo work focused on assisting developers towa wards optimizing such applications on the JVM. The scope: CPU CORE CORE CORE CORE CPU CPU Memory CORE CORE CORE CORE CORE CORE CORE CORE CPU CORE CORE CORE CORE Fork/j For /joi oin ap n applicati cations ons A single shared-memory running in a single multicore JVM

Our Approach In contrast to manual experimentation used to tune a fork/join application, we propose an approach based on: Ou Our Pr Profiling g Op Optimization Ap Approach te techniques Coachi Coaching ng

Our Approach In contrast to manual experimentation often used to tune a fork/join application, we propose an approach based on: Ou Our Pr Profiling g Op Optimization Ap Approach techniques te Coaching Coachi ng Static and dynamic analysis to autom automati atical cally d diag agnos nose e per erfor formance i ance issues ues

Our Approach In contrast to manual experimentation often used to tune a fork/join application, we propose an approach based on: Ou Our Pr Profiling g Optimization Op Ap Approach techniques te Coaching Coachi ng § Stati tatic anal c analysis: : to automatically inspect the source code to detect fork/join anti patterns. § Dy Dynam namic anal c analysis: : to automatically diagnose performance issues noticeable at runtime (e.g., suboptimal forking, excessive garbage collection, low CPU usage, contention).

Our Approach In contrast to manual experimentation often used to tune a fork/join application, we propose an approach based on: Ou Our Pr Profiling g Optimization Op Approach Ap techniques te Coaching Coachi ng Opti timizati ation coachi on coaching ng [4]: [4]: processing the output generated by the compiler’s optimizer to suggest concrete code modifications that may enable the compiler to achieve missed optimizations. [4] St-Amour et al. Optimization Coaching: Optimizers Learn to Communicate with Programmers . OOPSLA 2012.

Our Approach In contrast to manual experimentation often used to tune a fork/join application, we propose an approach based on: Our Ou Pr Profiling g Optimization Op Approach Ap techniques te Coaching Coachi ng Inspired by Optimization Coaching the goal is aut automat omatical cally sugges uggesting ng concr concret ete e code code modi modificat cations ons to o sol olve e th the d dete tecte ted i issues

Future Work Method ethodol olog ogy for for the autom the automati atic d c diag agnos nosing ng of p of per erfor formance i ance issues ues: § Define a model to characterize fork/join tasks § Characterize all tasks spawned by a fork/join application § Determine the metrics and entities worth to consider to § automatically diagnose performance issues Method ethodol olog ogy for for the autom the automati atic s c sug ugges esti tion of op on of opti timizati ations ons: § Automatic recognition of fork/join anti patterns and matching to § concrete suggestions to avoid them Val alidati ation of the r on of the res esul ults ts: § Discover fork/join workloads, suitable for validating both § aforementioned methodologies

BAC BACKU KUP P SL SLIDES. ES.

Related Work § An Analy lysis is of paralle llel l applic licatio ions on the JVM § A number of parallelism profilers focus on the JVM [9][10] [9][10] Yo YourKi Kit Java Java Java Java JProf JP ofiler er Profiler Pr Inte In tel l vTune vTune Mission Control Mi The The goal oal Characterizing processes or threads over time. o None of the existing tools targets fork/join applications. Limitat Li ations ons [9] Adhianto et al. HPCTOOLKIT: Tools for Performance Analysis of Optimized Parallel Programs . Concurr. Comput.: Pract. Exper., 22(6): pp. 685–701, 2010. [10] Teng et al. THOR: a Performance Analysis Tool for Java Applications Running on Multicore Systems . IBM Journal of Research and Development, 54(5):4:1–4:17, 2010. 18

Optim imiz izatio ion Coachin ing for Fork/Join in Applic - PowerPoint PPT Presentation

Optim imiz izatio ion Coachin ing for Fork/Join in Applic licatio ions on the Java Vir irtual l Machin ine Eduardo Eduar do Ros osal ales es Advisor: Prof. Walter Binder Research area: Parallel applications, performance analysis

Scali ling Optim imiz izatio ion I2DL: Prof. Niessner, Prof. Leal-Taix 1 Lecture 4 Recap

PCI CIA Phas hase 2 Working Group Three Portfoli folio O o Optim imiz ization on a

SWK K 105: : Takin ing a Walk lk Th Through a Classroom Co Cla Coachin ing Guid ide Da

Model-based ased Ve Veri rific icatio ation, Optim imiz ization ation, Sy Synthesi hesis

Par arall llel Performan ance Optim imiz ization and Productiv ivity EU H2020 Centre of of

Par arall llel Performan ance Optim imiz ization and Productiv ivity EU H2020 Centre of of

Feasibility Review SOUTH SAN JOAQUIN ELECTRIC JUNE 28, 2016 A Fork in the Road BOARD DECISION

shell fork/exec Session ID ? Process Group ? ftree fork/exec fork/exec sleeper sleeper

Effect of BDD Optim ization Effect of BDD Optim ization on Synthesis of Reversible and Quantum

Spelling, Punctuation and Grammar Suffixes -ing Year One SPaG | Suffixes -ing Suffixes Suffixes

Stick a fork in it An attempt to summarise the Fork-Join framework through the same titled series

ADVOCATING FOR THE WORKING POOR Maxim imiz izing ing Income ome and Reduc ucing ing Expens

SAF SAFEGUAR ARDING ING CIV IVIL ILIZ IZATIO ION FORGING A CYBERSECURITY DEFENSE FO FO

Ne w T e c hnologie s Ne w T e c hnologie s & Applic ations & Applic ations for for

RFID from Farm to Fork Piero Filippin p.filippin@wlv.ac.uk RFID from Farm to Fork Funded by

Forks and Governance November 6, 2019 guha.jayachandran@sjsu.edu What is a Fork? What is a

Submarine cables and installation - Past, present and future technologies for interconnections

Inverse spectral results in Sobolev spaces for the AKNS operator with partial informations on the

Inverse spectral results for Schr odinger operators on the unit interval with partial

Agenda Thinking about the concept Introduction Types of defensive technology

11/15/2012 Public Health Quality Improvement 101 Public Health Quality Improvement 101 Learning,

Open Quantum Systems Maison Jean Kuntzmann - 29 novembre au 02 d ecembre 2010 Lieb-Robinson

Single Source Since 1989 Proprietary : Smt. Praveena Kanunga Wife of S.L.Kanunga (GPA Holder)

CSCI 699: Machine Learning for Knowledge Extraction and Reasoning Instructor: Xiang Ren

Sambuz

Useful Links

Newsletter

Mail Us

Optim imiz izatio ion Coachin ing for Fork/Join in Applic - PowerPoint PPT Presentation

Optim imiz izatio ion Coachin ing for Fork/Join in Applic licatio ions on the Java Vir irtual l Machin ine Eduardo Eduar do Ros osal ales es Advisor: Prof. Walter Binder Research area: Parallel applications, performance analysis

Scali ling Optim imiz izatio ion I2DL: Prof. Niessner, Prof. Leal-Taix 1 Lecture 4 Recap

PCI CIA Phas hase 2 Working Group Three Portfoli folio O o Optim imiz ization on a

SWK K 105: : Takin ing a Walk lk Th Through a Classroom Co Cla Coachin ing Guid ide Da

Model-based ased Ve Veri rific icatio ation, Optim imiz ization ation, Sy Synthesi hesis

Par arall llel Performan ance Optim imiz ization and Productiv ivity EU H2020 Centre of of

Par arall llel Performan ance Optim imiz ization and Productiv ivity EU H2020 Centre of of

Feasibility Review SOUTH SAN JOAQUIN ELECTRIC JUNE 28, 2016 A Fork in the Road BOARD DECISION

shell fork/exec Session ID ? Process Group ? ftree fork/exec fork/exec sleeper sleeper

Effect of BDD Optim ization Effect of BDD Optim ization on Synthesis of Reversible and Quantum

Spelling, Punctuation and Grammar Suffixes -ing Year One SPaG | Suffixes -ing Suffixes Suffixes

Stick a fork in it An attempt to summarise the Fork-Join framework through the same titled series

ADVOCATING FOR THE WORKING POOR Maxim imiz izing ing Income ome and Reduc ucing ing Expens

SAF SAFEGUAR ARDING ING CIV IVIL ILIZ IZATIO ION FORGING A CYBERSECURITY DEFENSE FO FO

Ne w T e c hnologie s Ne w T e c hnologie s &amp; Applic ations &amp; Applic ations for for

RFID from Farm to Fork Piero Filippin p.filippin@wlv.ac.uk RFID from Farm to Fork Funded by

Forks and Governance November 6, 2019 guha.jayachandran@sjsu.edu What is a Fork? What is a

Submarine cables and installation - Past, present and future technologies for interconnections

Inverse spectral results in Sobolev spaces for the AKNS operator with partial informations on the

Inverse spectral results for Schr odinger operators on the unit interval with partial

Agenda Thinking about the concept Introduction Types of defensive technology

11/15/2012 Public Health Quality Improvement 101 Public Health Quality Improvement 101 Learning,

Open Quantum Systems Maison Jean Kuntzmann - 29 novembre au 02 d ecembre 2010 Lieb-Robinson

Single Source Since 1989 Proprietary : Smt. Praveena Kanunga Wife of S.L.Kanunga (GPA Holder)

CSCI 699: Machine Learning for Knowledge Extraction and Reasoning Instructor: Xiang Ren

Sambuz

Useful Links

Newsletter

Mail Us

Ne w T e c hnologie s Ne w T e c hnologie s & Applic ations & Applic ations for for