from mozart and freud to net technologies
play

From Mozart and Freud to .Net Technologies Jens Knoop Vienna - PowerPoint PPT Presentation

From Mozart and Freud to .Net Technologies Jens Knoop Vienna University of Technology, Austria 1 The Mystery Keynote !? Disclosing the secret... 2 Wolfgang Amadeus Mozart 2006 Celebrating the 250th Anniversary of the Birth of Mozart! 1756


  1. Semantic Code Motion... allows more powerful optimizations! (x,y,z) := (a,b,a+b) (a,b,c) := (x,y,y+z) h := a+b h := x+y (x,y,z) := (a,b, ) h (a,b,c) := (x,y, h) (example by B. Steffen, TAPSOFT’87) 37

  2. Remember,... CM (PREE) and its optimization goals! • Speed • Register Pressure There might be a third one: • Code Size 38

  3. A Computationally and Code -Size Optimal Program a:= ... := a+b h z := h ❀ Code size 39

  4. 1999 World Market for Microprocessors Going for size makes sense... Chip Category Number Sold Embedded 4-bit 2000 million Embedded 8-bit 4700 million Embedded 16-bit 700 million Embedded 32-bit 400 million DSP 600 million ∼ 2% Desktop 32/64-bit 150 million ... David Tennenhouse (Intel Director of Research). Keynote Speech at the 20th IEEE Real-Time Systems Symposium ( RTSS’99 ) , Phoenix AZ, 1999. 40

  5. Think of... ... domain-specific processors as used in embedded systems • Telecom – Cell phones, pagers, ... • Consumer Electronics – MP3 player, cameras, pocket games, ... • Automative – GPS navigation, airbags, ... • ... 41

  6. For such applications... ...code size often more critical than speed! 42

  7. Part II: CM – Classically, but Advanced ...enhancing (L)CM to take a user’s priorities into account! Code-Size Quality Computational Quality ...Run-Time Performance Lifetime Quality ...Register Pressure 43

  8. ...rendering possible this transformation, too: a:= ... �������������� �������������� �������������� �������������� h:= a+b �������������� �������������� �������������� �������������� �������������� �������������� �������������� �������������� �������������� �������������� �������������� �������������� �������������� �������������� Moderate �������������� �������������� Register Pressure! h z := �������������� �������������� �������������� �������������� 44

  9. Towards Code -Size Sensitive CM... • Background: Classical CM ❀ Busy CM (BCM) / Lazy CM (LCM) (Knoop et al., PLDI’92) – Received the ACM SIGPLAN Most Influential PLDI Paper Award 2002 ( for 1992 ) – Selected for “20 Years of the ACM SIGPLAN PLDI: A Selection” (60 papers out of ca. 600 papers) • Code-Size Sensitive CM (Knoop et al., POPL’00) ❀ ...modular extension of BCM/LCM ∗ Modelling and Solving the Problem ...based on graph-theoretical means ∗ Main Results ... correctness, optimality 45

  10. The Running Example 1 a := ... 15 2 4 3 6 5 7 8 10 9 a+b a+b a+b 11 12 13 14 46

  11. The Running Example (Cont’d) a) b) 1 1 a := ... 15 a := ... h:= a+b 15 2 3 4 2 3 4 6 6 5 5 h:= a+b h:= a+b h:= a+b 7 7 8 8 10 10 9 9 h h h h h h 11 11 12 13 12 13 14 14 Two Code−size Optimal Programs 47

  12. The Running Example (Cont’d) a) b) 1 1 ����������� ����������� ����������� ����������� 15 a := ... 15 a := ... h:= a+b 2 3 4 2 4 3 ����������� ����������� ����������� ����������� ����������� ����������� ����������� ����������� ����������� ����������� ����������� ����������� 6 6 5 5 h:= a+b h:= a+b h:= a+b ����������� ����������� ����������� ����������� ����������� ����������� ����������� ����������� ����������� ����������� ����������� ����������� 7 7 8 8 ����������� ����������� ����������� ����������� 10 10 ����������� ����������� 9 ����������� ����������� 9 ����������� ����������� ����������� ����������� h h h h h h 11 12 13 11 12 13 ����������� ����������� ����������� ����������� ����������� ����������� 14 14 > CQ > LQ SQ SQ > LQ > CQ 48

  13. The Running Example (Cont’d) Note, we do not want the following transformation: It’s no option! 1 15 a := ... 2 4 3 6 5 h:= a+b Impairing! 7 8 10 9 h h h 11 12 13 14 49

  14. Code -Size Sensitive PRE ❀ The Problem ...how to get a code-size minimal placement of computations, i.e., a placement which is – admissible (semantics & performance preserving) – code-size minimal ❀ Solution: A Fresh Look at PRE ...considering PRE a trade-off problem: trading the original computations against newly inserted ones! ❀ The Clou: Use Graph Theory! ...reducing the trade-off problem to the computation of tight sets in bipartite graphs based on maximum matchings! 50

  15. Bipartite Graph T S Tight Set ...of a bipartite graph ( S ∪ T, E ) is a subset S ts ⊆ S such that ∀ S ′ ⊆ S. | S ts | − | Γ( S ts ) | ≥ | S ′ | − | Γ( S ′ ) | Γ S ts ( ) { T S { S ts 2 Variants: (1) Largest Tight Sets (2) Smallest Tight Sets 51

  16. Bipartite Graph T S Tight Set ...of a bipartite graph ( S ∪ T, E ) is a subset S ts ⊆ S such that ∀ S ′ ⊆ S. | S ts | − | Γ( S ts ) | ≥ | S ′ | − | Γ( S ′ ) | Γ S ts ( ) { T S { S ts 2 Variants: (1) Largest Tight Sets (2) Smallest Tight Sets 52

  17. Apparently Off-the-shelf algorithms of graph theory can be used to compute... • Maximum matchings and • Tight sets Hence, our PRE problem boils down to... ...constructing the bipartite graph modelling the problem! 53

  18. Modelling the Trade -Off Problem The Set of Nodes S DS T DS U 6 2 3 4 5 11 12 7 8 13 { { { Comp/UpSafe Insert BCM / DownSafe U (Comp UpSafe) The Set of Edges... 54

  19. The Set of Nodes a) b) 1 1 a := ... 15 a := ... h:= a+b h:= a+b h:= a+b 15 2 3 4 2 3 4 6 6 5 5 7 7 8 8 10 10 9 9 h a+b a+b a+b h h 11 11 12 13 12 13 14 14 55

  20. Modelling the Trade -Off Problem The Set of Nodes S DS T DS U 2 3 6 4 5 11 12 7 8 13 { { { Comp/UpSafe Insert BCM / DownSafe U (Comp UpSafe) The Bipartite Graph T DS 2 3 6 4 5 7 8 S DS 11 12 6 5 13 7 8 ... ∀ n ∈ S DS ∀ m ∈ T DS . The Set of Edges { n, m } ∈ E DS ⇐ ⇒ d f m ∈ Closure ( pred ( n )) 56

  21. DownSafety Closures DownSafety Closure For n ∈ DownSafe/Upsafe the DownSafety Closure Closure ( n ) is the smallest set of nodes satisfying 1. n ∈ Closure ( n ) 2. ∀ m ∈ Closure ( n ) \ Comp . succ ( m ) ⊆ Closure ( n ) 3. ∀ m ∈ Closure ( n ) . pred ( m ) ∩ Closure ( n ) � = ∅ ⇒ pred ( m ) \ UpSafe ⊆ Closure ( n ) 57

  22. DownSafety Closures – The Very Idea 1(4) 1 15 a := ... 2 4 3 6 5 h:= a+b 7 8 10 9 h h h 11 13 12 14 58

  23. DownSafety Closures – The Very Idea 2(4) 1 15 a := ... 2 4 3 6 5 h:= a+b 7 8 10 9 h h h 11 13 12 14 59

  24. DownSafety Closures – The Very Idea 3(4) 1 15 a := ... 2 4 3 6 5 h:= a+b 7 8 10 9 h h h 11 13 12 No Initialization! 14 60

  25. DownSafety Closures – The Very Idea 4(4) 1 15 a := ... 2 4 3 6 5 h:= a+b h:= a+b 7 8 10 9 h h h 11 13 12 14 61

  26. DownSafety Closures DownSafety Closure For n ∈ DownSafe/Upsafe the DownSafety Closure Closure ( n ) is the smallest set of nodes satisfying 1. n ∈ Closure ( n ) 2. ∀ m ∈ Closure ( n ) \ Comp . succ ( m ) ⊆ Closure ( n ) 3. ∀ m ∈ Closure ( n ) . pred ( m ) ∩ Closure ( n ) � = ∅ ⇒ pred ( m ) \ UpSafe ⊆ Closure ( n ) 62

  27. DownSafety Regions Some subsets of nodes are distinguished. We call each of these sets a DownSafety Region... • A set R⊆ N of nodes is a DownSafety Region if and only if 1. Comp \ UpSafe ⊆ R ⊆ DownSafe \ UpSafe 2. Closure ( R ) = R 63

  28. Fundamental... Insertion Theorem Insertions of admissible PRE-Transformations are always at “earliest -frontiers” of DownSafety regions . Transp UpSafe EarliestFrontier R } R DownSafe/UpSafe Comp ...characterizes for the first time all semantics preserving CM-transf. 64

  29. The Key Questions ...concerning correctness and optimality: 1. Where to insert computations, why is it correct? 2. What is the impact on the code size? 3. Why is it optimal, i.e., code-size minimal? ...three theorems answering one of these questions each. 65

  30. Main Results / First Question 1. Where to insert computations, why is it correct? Intuitively, at the earliestness frontier of the DS-region induced by the tight set... Theorem 1 [Tight Sets: Insertion Points] Let TS ⊆ S DS be a tight set. Then R T S = d f Γ( TS ) ∪ ( Comp \ UpSafe ) is a DownSafety Region with Body R T S = TS Correctness ...immediate corollary of Theorem 1 and Insertion Theorem 66

  31. Main Results / Second Question 2. What is the impact on the code size? Intuitively, the difference between computations inserted and replaced... Theorem 2 [DownSafety Regions: Space Gain] Let R be a DownSafety Region with Body R = d f R\ EarliestFrontier R Then • Space Gain of Inserting at EarliestFrontier R : | Comp \ UpSafe | − | EarliestFrontier R | = | Body R | − | Γ( Body R ) | f = defic ( Body R ) d 67

  32. Main Results / Third Question 3. Why is it optimal, i.e., code-size minimal? Due to an inherent property of tight sets ( non-negative deficiency! ) ... Optimality Theorem [The Transformation] Let TS ⊆ S DS be a tight set. • Insertion Points: Insert SpCM = d TS = R TS \ TS f EarliestFrontier R • Space Gain: defic ( TS )= d f | TS | − | Γ( TS ) | ≥ 0 max. 68

  33. Largest vs. Smallest Tight Sets: The Impact EarliestFrontier R LaTS tight sets favor Largest Computational Quality Earliestness Principle R LaTS EarliestFrontier R SmTS Smallest tight sets favor R SmTS Lifetime Quality Latestness Principle Comp 69

  34. Recall the Running Example a) b) 1 1 15 a := ... 15 a := ... h := a+b 2 4 3 2 3 4 h := a+b 6 5 6 5 h h := a+b := a+b 7 8 7 8 10 10 9 9 h h h h h h 11 12 13 11 12 13 14 14 Largest Tight Set Smallest Tight Set ( SQ > CQ ) ( SQ > LQ ) Latestness Principle Earliestness Principle 70

  35. Code -Size Sensitive CM at a Glance Preprocess Optional: Perform (3 GEN/KILL-DFAs) LCM Compute Predicates of BCM (2 GEN/KILL-DFAs) for G resp. LCM (G) Main Process Reduction Phase Construct Bipartite Graph Compute Maximum Matching Optimization Phase Compute Largest/Smallest Tight Set Determine Insertion Points 71

  36. A brief overview on the history of CM... • 1958: ...first glimpse of PRE ❀ Ershov’s work on On Programming of Arithmetic Operations . • 1979: ...origin of contemporary PRE ❀ Morel/Renvoise’s seminal work on PRE • 1992: ...LCM [Knoop et al., PLDI’92] ❀ ...first to achieve comp. optimality with minimum register pressure ❀ ...first to rigorously be proven correct and optimal • 2000: ...origin of code-size sensitive PRE [Knoop et al., POPL 2000] ❀ ...first to allow prioritization of goals ❀ ...rigorously be proven correct and optimal ❀ ...first to bridge the gap between traditional compilation and compilation for embedded systems 72

  37. Overview (Cont’d) • ca. since 1997: ...a new strand of research on PRE ❀ Speculative PRE: Gupta, Horspool, Soffa, Xue, Scholz, Knoop,... • 2005 : ...another fresh look at PRE ( as maximum flow problem ) ❀ Unifying PRE and Speculative PRE [Jingling Xue and J. Knoop] 73

  38. Part III: CM – Phenomena of its Derivatives Optimality results are quite sensitive! Three examples to provide evidence... (A) Code motion vs. code placement (B) Interdependencies of (elementary) transformations (C) Paradigm dependencies 74

  39. (A) Code Motion vs. Code Placement ...not just synonyms! (x,y) := (a+b,c+b) c := a Motion gets stuck! (h1,h2) := (a+b,c+b) (x,y) := (h1,h2) c := a z := c+b z := a+b h1 := a+b Motion gets stuck! h2 := c+b h1 := a+b h2 := c+b Original Program (x,y) := (h1,h2) (c,h2) := (a,h1) z := h1 z := h2 Placing c+b Placing a+b After Sem. Code Motion z := h2 z := h1 After Sem. Code Placement 75

  40. Even worse... Optimality is lost! h := a+b h := c+b c := a c := a y := y := c+b h z := h z := c+b z := a+b z := a+b Incomparable! 76

  41. Even more worse... Peformance may be lost, when naively applied! h := a+b c := a c := a z := c+b z := c+b z := a+b z := a+b 77

  42. (B) Interdependencies of Transformations a := b+c a := b+c a := b+c x := a+b AS TDCE x := a+b x := a+b x := a+b x := z x := a+b x := z x := z out(x,a) out(x,a) out(x,a) ...2nd Order Effects! ❀ ...Partial Dead-Code Elimination (PDCE) 78

  43. Interdependencies of Transformations a := b+c a := b+c a := b+c AH TRAE b := a+c a := b+c b := a+c b := a+c out(a,b) out(a,b) out(a,b) ...2nd Order Effects! ❀ ...Partially Redundant Assignment Elimination (PRAE) 79

  44. Conceptually ...we can think of PREE, PRAE and PDCE in terms of • PREE = AH ; TREE • PRAE = (AH + TRAE) ∗ • PDCE = (AS + TDCE) ∗ 80

  45. PRAE/PDCE – Optimality Results Derivation relation ⊢ ... G ⊢ AH,T RAE G ′ • PRAE... ( ET= { AH,TRAE } ) G ⊢ AS,T DCE G ′ • PDCE... ( ET= { AS,TDCE } ) We can prove... Optimality Theorem For both PRAE and PDCE, ⊢ ET is confluent and terminating 81

  46. Universe ET ET ET G ET G ET opt 82

  47. Consider now... • Assignment Placement AP AP = (AH + TRAE + AS + TDCE) ∗ ...should be even more powerful! Indeed, but... x := a+b x := a+b x := a+b x := a+b x := a+b out(x) out(x) out(x) PDCE PRAE x := a+b x := a+b out(x) out(x) out(x) 83

  48. Confluence... ...and hence (global) optimality are lost! Universe G locOpt ET ET G ET ET ET 84

  49. Even worse... ...there are scenarios, where we can end up with universes like Universe ET ET G ET ET ET ? 85

  50. (C) Paradigm Dependencies (h1,h2,h3) := (a+b,c+b,d+b) ParBegin ParBegin x := a+b x := h1 z := d+b z := h3 y := c+b y := h2 ParEnd ParEnd Original Program After Earliestness Transformation 86

  51. Part IV: CM – Recent Strands of Research ...another strand of research on CM is gaining more and more attention • Speculative CM (SCM) 87

  52. SCM – What’s it all about? In contrast to CM • SCM takes profile information into account ...thereby allowing to improve the performance of hot program paths at the expense of impairing cold program paths. Anything else, especially the optimization goals, • the same! 88

  53. 1 20 80 = a+b 2 3 a = a = 10 50 10 5 30 4 = a+b 30 30 10 = a+b 7 6 a = = a+b 10 / 40 50 / 20 40 8 10 = a+b 9 10 40 / 10 50 / 80 = a+b 10 = a+b 11 10 50 / 80 12 89

  54. 1 20 80 = a+b 2 3 a = a = 10 50 10 5 30 4 = a+b 30 30 10 = a+b 7 6 a = = a+b 10 / 40 50 / 20 40 8 10 = a+b 9 10 40 / 10 50 / 80 = a+b 10 = a+b 11 10 50 / 80 12 90

  55. SCM vs. CM Apparently • SCM and CM are two closely related and very similar problems having much in common! However • SCM and CM are tackled by quite diverse algorithmic means – CM ...based on solving (typically) 4 bitvector analyses: Availability, Anticipability, ... – SCM ...based on solving a maximum flow problem 91

  56. Recent Achievement ...the missing link between • Classical PRE (CPRE) and Speculative PRE (SPRE) On the theoretical side, this yields... • a common high-level conceptual basis and understanding of CPRE and SPRE On the practical side, we obtain... • a new and simple algorithm for CPRE, which turns out to outperform its competitors (joint work with Jingling Xue (CC 2006) 92

  57. Major Finding Like SCM • CM is a maximum flow problem, too! This means • Each (S)CM-algorithm, if optimal, must find in one way or the other the unique minimum cut on a flow network derived from a program’s CFG. Hence, we have • The Missing Link between CM and SCM! 93

  58. On the Impact of this Finding 1(4) Practically • Possibly none ...at least not in terms of demanding replacement of implementations of optimal state-of-the-art CM algorithms by the flow-network based one. Theoretically • Possibly a lot ...a common high-level basis for understanding and reasoning about both SCM and CM. 94

  59. On the Impact of this Finding 2(4) This is in line with work on CM by other researchers striving for a simple and “motion-free” characterization of CM: • Bronnikov, D., A Practical Adaption of Partial Redundancy Elimination , SIGPLAN Not., 39(8), 49-53, 2004. • Dhamdhere, D. M., E-Path pre: Partial Redundancy Elimination Made Easy , SIGPLAN Not., 37(8), 53-65, 2002. • Paleri, V. K., Srikant, Y. N., Shankar, P., A Simple Algorithm for Partial Redundancy Elimination , SIGPLAN Not., 33(12), 35-43, 1998. 95

  60. On the Impact of this Finding 3(4) However, for these approaches • either no proofs of correctness and optimality are given • or these proofs still rely on a low-level path-based reasoning Especially in this respect, the characterization of • CM as a maximum flow problem can be considered a major step forward. 96

  61. On the Impact of this Finding 4(4) A practical impact though... Based on the new understanding, we obtained • a new and simple CM-algorithm – Like its competitors : ...relies on 4 bitvector analyses – At first sight thus : ...yet another CM-algorithm – But : ...outperforms its competitors 97

  62. Practical Measurements ...of the new algorithm show • a reduction in the number of bitvector operations required ranging from 20% to 60% in comparison to three state-of-the-art algorithms for CM (including LCM and E-path) Experiments were performed • on an Intel Xeon and a Sun UltraSPARC-III platform • with the GCC-compiler as vehicle • using all of the 22 C/C++/Fortran SPECcpu2000 benchmarks 98

  63. Conclusions and Perspectives • Code Motion (CM) ...a hot topic of on-going research for almost 50 years! • State-of-the-Art in Theory and Practice – Theory available and widely used in practice ∗ Classic CM – Theory available, but not yet widely used ∗ Derivatives of Classic CM (PDCE, PFCE, SR, DAP,...) ∗ Speculative CM and some derivatives (SR) ∗ Semantic CM – Theory not yet available ∗ Speculative Semantic CM ∗ ... 99

  64. Conclusions and Perspectives • Our obligation – Pushing forward the further development of CM-based optimizations – Demanding their application (e.g. in the Phoenix framework) ...in order to help the impatient (.Net) programmer and user! 100

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend