Core Grammatical Evolution Generated Parallel Recursive Programs - PowerPoint PPT Presentation

Performance Optimization of Multi- Core Grammatical Evolution Generated Parallel Recursive Programs Gopinath Chennupati, R. Muhammad Atif Azad, and Conor Ryan 1

Programming Multi-Cores • Multi-cores first appearance 1995 • PCs and even Smart Phones now have multi-cores • IBM TrueNorth 4096 cores • SpiNNaker has in excess of a million processors • Biologically Inspired Massively Parallel Architectures • “If we simply added more than 16 cores, we would get diminishing returns, because the threads and data traffic would not be used properly, so the cores get in the way of each other. It’s like having too many cooks in the kitchen. ” • Jerry Bautista , director of Intel’s tera -scale research program. 2

Why is parallel programming hard? • Thread scheduling, synchronization, locking and optimizing the parallelism, etc. • Efficient parallel programming requires (highly skilled!) human expertise • Automatic Native Parallel Code Generation! 3

Human competitive tasks • Automated the three difficult tasks of humans – Optimal parallelism for recursion [1], [3]. – Automatic architecture awareness [1]. – Lock-free Programming on multi-cores [2]. [1] Gopinath Chennupati, R. Muhammad Atif Azad, Conor Ryan., (2015) Performance Optimization of Multi- Core Grammatical Evolution Generated Parallel Recursive Programs . In Proceedings of Genetic and Evolutionary Computation Conference (GECCO), edited by Anna I Esparcia Alcázar et al., ACM. In Press. [2] Gopinath Chennupati, R. Muhammad Atif Azad, Conor Ryan., (2015) A Multi-Core Grammatical Evolution Based Automatic Lock-Free Programming in OpenMP . In Proceedings of the International Conference on Parallel Computing (ParCO), edited by Gerhard R. Joubert et al., IOS Press. In Press. [3] Gopinath Chennupati, R. Muhammad Atif Azad, Conor Ryan, (2015) Automatic Evolution of Parallel Recursive Programs in Proceedings of EuroGP'15, pages 167 -- 178, Springer. 4

Criteria • D: The result is publishable in its own right as a new scientific result independent of the fact that the result was mechanically created. – [1], [2], [3] • E: The result is equal to or better than the most recent human-created solution to a long-standing problem for which there has been a succession of increasingly better human-created solutions. • G: The result solves a problem of indisputable difficulty in its field. 5

Recursive Problems # Problem Type Local Range Variables Input Output 1 Sum-of-N int int 3 [1, 1000] 2 Factorial int unsigned 3 [1, 60] long long 3 Fibonacci int unsigned 3 [1, 60] long long 4 Binary-Sum int [], int, int int 2 [1, 1000] 5 Reverse int [], int, int void 2 [1, 1000] 6 Quicksort int [], int, int void 3 [1, 1000] Why Recursion? – Easy to express but takes longer to execute. 6

Excessive Parallelism Human Program [7] int i, j; if (n <= 2) { return n; } n else { #pragma omp parallel sections \ shared (i, j) { #pragma omp section { i = fib(n−1); } #pragma omp section { j = fib(n−2); } } Maximizing 2 (n+1) threads return (i+j); Parallelism } [7] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. (2009) Introduction to Algorithms, 3 rd Edition . MIT Press. 7

Optimizing Parallelism MCGE-II Program if (n <= 2) { temp = n; C res += temp; } else if ( n <= 39 ) { temp = fib(n-1)+fib(n-2); res += temp; } else { #pragma omp parallel sections \ private (a) shared (n, temp, res) { #pragma omp section { Optimal a = fib(n−1); Parallelism #pragma omp atomic res += temp+a; 2 (c+1) threads } #pragma omp section { a = fib(n−2); #pragma omp atomic res += temp+a; } } Satisfies D, G } return res; 8

Human Competitive Efficiency 50 45 17.45% 40 35 30 25 20 15 10 5 0 Human MCGE-II Satisfies E 9

Automatic Architecture Awareness if (n <= 2) { temp = n; res += temp; } else if ( n <= 39 ) { temp = fib(n-1)+fib(n-2); res += temp; } else { #pragma omp parallel sections \ private (a) shared (n, temp, res) { #pragma omp section { a = fib(n−1); #pragma omp atomic res += temp+a; } #pragma omp section { a = fib(n−2); #pragma omp atomic res += temp+a; } } Satisfies } return res; Get it done in 8.35 hours rather waiting forever D, G for humans to figure out! 10

Lock-Free Parallel Programs #pragma omp parallel Lock the shared resources • Locks guarantee mutual exclusion. • But , they degrade the performance. • Even programming gurus often write wrong lock-free programs [6]. • Automatic lock-free parallel programming [2] [6] Shane V. Howley and Jeremy Jones. (2012) A non-blocking internal binary search tree . In Proceedings of the 24 th annual ACM symposium on Parallelism in algorithms and architectures (SPAA '12), pages 161--171. ACM Satisfies D, G 11

Lock-Free Results Efficiency 60 50 9.41% 25.21% 40 30 20 10 0 Human MCGE-II (Lock-Free) MCGE-II Satisfies E 12

Potential Impact • Software – Faster to execute parallel code – Faster to generate parallel code • Hardware – Better able to utilise multi-core processors – Hardware progress (increase in number of cores) less hindered by software limitations 13

Why we are the best? • MCGE-II fulfils the original intention of GP as general purpose programming tool • There is an urgent and pressing need in the parallel community for precisely this tool • The work has been published in a field outside of GP • This is the first attempt for the synthesis of native parallel programs. 14

References [1] Gopinath Chennupati, R. Muhammad Atif Azad, Conor Ryan., (2015) Performance Optimization of Multi-Core Grammatical Evolution Generated Parallel Recursive Programs . In Proceedings of Genetic and Evolutionary Computation Conference (GECCO), edited by Anna I Esparcia Alcázar et al., ACM. In Press. [2] Gopinath Chennupati, R. Muhammad Atif Azad, Conor Ryan., (2015) A Multi-Core Grammatical Evolution Based Automatic Lock-Free Programming in OpenMP . In Proceedings of the International Conference on Parallel Computing (ParCO), edited by Gerhard R. Joubert et al., IOS Press. In Press. [3] Gopinath Chennupati, R. Muhammad Atif Azad, Conor Ryan, (2015) Automatic Evolution of Parallel Recursive Programs in Proceedings of EuroGP'15, pages 167 -- 178, Springer. [4] Gopinath Chennupati, Jeannie Fitzgerald, Conor Ryan, (2014) On The Efficiency of Multi-core Grammatical Evolution (MCGE) Evolving Multi-Core Parallel Programs in Proceedings of Sixth World Congress on Nature and Biologically Inspired Computing (NaBIC ), pages 238 -- 243, IEEE. [5] Gopinath Chennupati, R. Muhammad Atif Azad, Conor Ryan, (2014) Multi-core GE: Automatic Evolution of CPU Based Multi-core Parallel Programs in Proceedings of GECCO Comp '14, pages 1041 -- 1044, ACM. [6] Shane V. Howley and Jeremy Jones. (2012) A non-blocking internal binary search tree . In Proceedings of the 24 th annual ACM symposium on Parallelism in algorithms and architectures (SPAA '12), pages 161--171. ACM [7] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. (2009) Introduction to Algorithms, 3 rd Edition . MIT Press. 15

Core Grammatical Evolution Generated Parallel Recursive Programs - PowerPoint PPT Presentation

Performance Optimization of Multi- Core Grammatical Evolution Generated Parallel Recursive Programs Gopinath Chennupati, R. Muhammad Atif Azad, and Conor Ryan 1 Programming Multi-Cores Multi-cores first appearance 1995 PCs and even

Generated by CamScanner Generated by CamScanner Generated by CamScanner Generated by CamScanner

Welcome Welcome Core: Core A Regional Destination Core: Core UL Core: Core Downtown

Caching, Parallelism, Fault Tolerance Marco Serafini COMPSCI 532 Lectures 2-3 Memory Hierarchy

Expressing I`rab: The Presentation of Arabic Grammatical Analysis Expressing I`rab: The

Grammatical markers and grammatical relations in the simple clause in Old French Nicolas

Syntax Valency Jirka Hana Jirka Hana Syntax Valency Grammatical Roles Adjunct versus

EVOLUTION X3 - 1 - Evolution X3 Marketing Dpt. November 2006 - 2 - EVOLUTION X3 Evolution X3

Assessment of Chinese Grammatical Knowledge for D/hh children: Current findings and implications

Expressing I`rab: The Presentation of Arabic Expressing I`rab: The Presentation of Arabic

MOLTO: Multilingual On-Line Translation Or: Using Grammatical Framework to Build

Chapter 3: Syntactic Forms, Grammatical Functions, and Semantic Roles Syntactic Constructions in

Neural Grammatical Error Correction with Finite State Transducers Felix Stahlberg, Christopher

THE SCIENCE AND THE SCIENCE AND ART OF ART OF INT INTERPRE RPRETAT ATION ION GRAMMATICAL-

Modelling Financial Time series using Grammatical Evolution Kamal Adamu and Steve Phelps CCFEA

in Grammatical Evolution Dr. Michael Fenton Michael.Fenton@ucd.ie Housekeeping: What is a

Evolution of valley depth and width Evolution of valley depth and width Evolution of valley depth

The High Performance Solution of Sparse Linear Systems and its application to large 3D

High Performance and Energy Efficient Machine Learning Accelerators and Variable Precision

FOR GREATER RESEARCH DISCOVERY OPTIMIZING Christa Studzinski Manager, Partnerships

Automatic acquisition of Named Entities for Rule-Based Machine Translation Antonio Toral , Andy

PRECISION CALCULATIONS FOR FCC-ee selected examples on ( Z ) , ( W ) and Higgs production,

Outline Motivation & Goal Framework & Design Examples Future

IC3D 2016 Towards an Interactive Navigation in Large Virtual Microscopy Images on 3D Displays J.

Mathematical Expressions Return to Table of Contents Slide 5 / 185 Expressions Algebra

Core Grammatical Evolution Generated Parallel Recursive Programs - PowerPoint PPT Presentation

Performance Optimization of Multi- Core Grammatical Evolution Generated Parallel Recursive Programs Gopinath Chennupati, R. Muhammad Atif Azad, and Conor Ryan 1 Programming Multi-Cores Multi-cores first appearance 1995 PCs and even

Generated by CamScanner Generated by CamScanner Generated by CamScanner Generated by CamScanner

Welcome Welcome Core: Core A Regional Destination Core: Core UL Core: Core Downtown

Caching, Parallelism, Fault Tolerance Marco Serafini COMPSCI 532 Lectures 2-3 Memory Hierarchy

Expressing I`rab: The Presentation of Arabic Grammatical Analysis Expressing I`rab: The

Grammatical markers and grammatical relations in the simple clause in Old French Nicolas

Syntax Valency Jirka Hana Jirka Hana Syntax Valency Grammatical Roles Adjunct versus

EVOLUTION X3 - 1 - Evolution X3 Marketing Dpt. November 2006 - 2 - EVOLUTION X3 Evolution X3

Assessment of Chinese Grammatical Knowledge for D/hh children: Current findings and implications

Expressing I`rab: The Presentation of Arabic Expressing I`rab: The Presentation of Arabic

MOLTO: Multilingual On-Line Translation Or: Using Grammatical Framework to Build

Chapter 3: Syntactic Forms, Grammatical Functions, and Semantic Roles Syntactic Constructions in

Neural Grammatical Error Correction with Finite State Transducers Felix Stahlberg, Christopher

THE SCIENCE AND THE SCIENCE AND ART OF ART OF INT INTERPRE RPRETAT ATION ION GRAMMATICAL-

Modelling Financial Time series using Grammatical Evolution Kamal Adamu and Steve Phelps CCFEA

in Grammatical Evolution Dr. Michael Fenton Michael.Fenton@ucd.ie Housekeeping: What is a

Evolution of valley depth and width Evolution of valley depth and width Evolution of valley depth

The High Performance Solution of Sparse Linear Systems and its application to large 3D

High Performance and Energy Efficient Machine Learning Accelerators and Variable Precision

FOR GREATER RESEARCH DISCOVERY OPTIMIZING Christa Studzinski Manager, Partnerships

Automatic acquisition of Named Entities for Rule-Based Machine Translation Antonio Toral , Andy

PRECISION CALCULATIONS FOR FCC-ee selected examples on ( Z ) , ( W ) and Higgs production,

Outline Motivation &amp; Goal Framework &amp; Design Examples Future

IC3D 2016 Towards an Interactive Navigation in Large Virtual Microscopy Images on 3D Displays J.

Mathematical Expressions Return to Table of Contents Slide 5 / 185 Expressions Algebra

Outline Motivation & Goal Framework & Design Examples Future