From Mozart and Freud to .Net Technologies
Jens Knoop Vienna University of Technology, Austria
1
From Mozart and Freud to .Net Technologies Jens Knoop Vienna - - PowerPoint PPT Presentation
From Mozart and Freud to .Net Technologies Jens Knoop Vienna University of Technology, Austria 1 The Mystery Keynote !? Disclosing the secret... 2 Wolfgang Amadeus Mozart 2006 Celebrating the 250th Anniversary of the Birth of Mozart! 1756
Jens Knoop Vienna University of Technology, Austria
1
Disclosing the secret...
2
3
4
5
Really? After some investigation, unfortunately, not.
6
Is there a topic in computer science researchers
7
...as far as possible. Coming up with:
Communications of the ACM did not appear until 1958!
8
...at this very first issue of the CACM, I got quite excited:
CACM 1 (8), 3 - 6, 1958.
9
numbers” for denoting the results of computations and avoid having to recompute them.
value numbering schemes.
The origin of research on Code Motion (CM) can be traced back to Ershov’s CACM article of
10
Research on CM-based Program Optimizations...
11
The Year of Anniversaries...
Research on Code Motion
12
Indeed, it is an active area of research...
thought-provoking phenomena Last but not least...
Technology for the Impatient .Net Programmer and User
13
– Achievements – Phenomena – Open Problems and Challenges
14
15
CM in the early days essentially meant...
16
Even if CM can be traced back to...
CACM 1 (8), 3 - 6, 1958. ...it is fair to say that contemporary CM starts with the seminal work of
Suppression of Partial Redundancies. CACM 22 (2), 96 - 103, 1979.
17
h:= a+b h:= a+b y :=h y := a+b x := a+b x :=h
18
16 z := a+b
9
y := a+b
5 7 8 10 11 12 13 14 6 4 3 x := a+b 2
a := c
1
15 17 18 x := a+b y := a+b
19
3
15 16
9 5 7 8 10 11 12 13 14 6 4
h y := z := h
1
a := c
2
h := a+b h y := h := a+b 17 18 x := h h x :=
20
:= a+b y := h z := h
4 3 x := a+b 2
a := c
1
:= a+b h 15
9 7 8 11 12 13 14 6
16
10 5
x := a+b 18 17 y := h h
21
CM and its two traditional optimization goals...
a:= ... z := a+b
22
...CM can be considered a two-stage process
...hoisting expressions to “earlier” safe computation points
...eliminating computations which became totally redundant
23
Placing computations as early as possible...
...hoisting expressions to their earliest safe computation points yields computationally optimal programs
❀ ...known as Busy Code Motion (PLDI’92, Knoop et al.) ...already known to Morel and Renvoise (though no theorem or proof).
24
Placing computations as early as possible... ...yields computationally optimal programs.
:= a+b h := a+b h z := h a:= ...
25
...as early as possible, but not earlier!
Incorrect!
h z := h:= a+b a:= x+y
26
...computationally optimal, but maximum register pressure
Maximum
h z := h:= a+b h:= a+b a:= ...
27
Placing computations as late as possible...
...hoisting expressions to their latest safe computation points yields computationally optimal programs with minimum register pressure
❀ ...known as Lazy Code Motion (PLDI’92, Knoop et al.)
28
...computationally optimal, too, with mininum register pressure!
Minimum
h z := h:= a+b h:= a+b h:= a+b a:= ...
29
Lazy Code Motion is...
contemporary state-of-the-art compilers – Gnu compiler family – Sun Sparc compiler family – ...
30
Traditionally,
But...
31
...can be assignments, too.
x := a+b x := a+b x := a+b x := a+b x := a+b x := a+b
elimination (PRAE)
32
...might also be sunk.
x := a+b
x := y+z
x := a+b x := y+z x := a+b
33
More generally...
Code / Motion Hoisting Sinking Expressions EH ·/· Assignments AH AS
34
− Interprocedural − Parallelism − Predicated code − ... − Intraprocedural
EH AH, AS
Paradigm Semantic Syntactic
Introducing semantics... !
x := a+b c := a y := a+b z := c+b
35
− Interprocedural − Parallelism − Predicated code − ... − Intraprocedural
EH AH, AS
Paradigm Semantic Syntactic
Introducing semantics... !
x := a+b c := a y := a+b z := c+b
36
allows more powerful optimizations!
(x,y,z) := (a,b,a+b) (a,b,c) := (x,y,y+z) (a,b,c) := (x,y, h := x+y := a+b h h (x,y,z) := (a,b, ) h)
(example by B. Steffen, TAPSOFT’87)
37
CM (PREE) and its optimization goals!
There might be a third one:
38
:= a+b h z := h a:= ...
❀ Code size
39
Going for size makes sense... Chip Category Number Sold Embedded 4-bit 2000 million Embedded 8-bit 4700 million Embedded 16-bit 700 million Embedded 32-bit 400 million DSP 600 million Desktop 32/64-bit 150 million ∼ 2%
... David Tennenhouse (Intel Director of Research). Keynote Speech at the 20th IEEE Real-Time Systems Symposium (RTSS’99), Phoenix AZ, 1999.
40
... domain-specific processors as used in embedded systems
– Cell phones, pagers, ...
– MP3 player, cameras, pocket games, ...
– GPS navigation, airbags, ...
41
...code size often more critical than speed!
42
...enhancing (L)CM to take a user’s priorities into account!
Computational Quality Lifetime Quality
...Register Pressure
Code-Size Quality
...Run-Time Performance
43
Register Pressure!
h z := h:= a+b a:= ...
44
❀ Busy CM (BCM) / Lazy CM (LCM) (Knoop et al., PLDI’92)
– Received the ACM SIGPLAN Most Influential PLDI Paper Award 2002 (for 1992) – Selected for “20 Years of the ACM SIGPLAN PLDI: A Selection” (60 papers out of ca. 600 papers)
❀ ...modular extension of BCM/LCM
∗ Modelling and Solving the Problem ...based on graph-theoretical means ∗ Main Results ...correctness, optimality
45
1 2 3 4 5 6 7 8 9 10 11 13 12 14 15
a+b a+b a+b a := ...
46
a) b)
Two Code−size Optimal Programs
h:= a+b h:= a+b h:= a+b h:= a+b a := ...
1 2 3 4 5 6 7 8 9 10 11 13 12 14 15 a := ...
h h h
1 2 3 4 5 6 7 8 9 10 11 13 12 14 15
h h h 47
> CQ> LQ SQ
a) b)
h:= a+b h:= a+b h:= a+b h:= a+b
1 2 3 4 5 6 7 8 9 10 11 13 12 14 15 a := ...
h h h
1 2 3 4 5 6 7 8 9 10 11 13 12 14 15 a := ...
h h h 48
Note, we do not want the following transformation: It’s no
Impairing! h:= a+b
1 2 3 4 5 6 7 8 9 10 11 13 12 14 15 a := ...
h h h
49
❀ The Problem
...how to get a code-size minimal placement of computations, i.e., a placement which is
– admissible (semantics & performance preserving) – code-size minimal
❀ Solution: A Fresh Look at PRE
...considering PRE a trade-off problem: trading the original computations against newly inserted ones!
❀ The Clou: Use Graph Theory!
...reducing the trade-off problem to the computation of tight sets in bipartite graphs based on maximum matchings!
50
Bipartite Graph
T S
Tight Set ...of a bipartite graph (S ∪ T, E) is a subset Sts ⊆ S such that ∀ S′ ⊆ S. |Sts| − |Γ(Sts)| ≥ |S′| − |Γ(S′)|
Sts
Γ
)
Sts
(
S T
2 Variants: (1) Largest Tight Sets (2) Smallest Tight Sets
51
Bipartite Graph
T S
Tight Set ...of a bipartite graph (S ∪ T, E) is a subset Sts ⊆ S such that ∀ S′ ⊆ S. |Sts| − |Γ(Sts)| ≥ |S′| − |Γ(S′)|
Sts
Γ
)
Sts
(
S T
2 Variants: (1) Largest Tight Sets (2) Smallest Tight Sets
52
Off-the-shelf algorithms of graph theory can be used to compute...
Hence, our PRE problem boils down to... ...constructing the bipartite graph modelling the problem!
53
The Set of Nodes
TDS SDS
U
InsertBCM
Comp/UpSafe
U
(Comp / DownSafe UpSafe)
12 13 3 4 2 5 6 7 8 11
The Set of Edges...
54
a) b)
h:= a+b h:= a+b h:= a+b a := ...
1 2 3 4 5 6 7 8 9 10 11 13 12 14 15 a := ... 1 2 4 6 7 8 9 10 11 13 12 14 15
h h h a+b a+b a+b
5 3
55
The Set of Nodes
TDS SDS
U
InsertBCM
Comp/UpSafe
U
(Comp / DownSafe UpSafe)
12 13 3 4 2 5 6 7 8 11
The Bipartite Graph
SDS TDS
3 4 2 6 7 8 5 8 11 13 6 7 12 5
The Set of Edges ... ∀ n ∈ SDS ∀ m ∈ T DS. {n, m} ∈ EDS ⇐ ⇒d
f m ∈ Closure(pred(n)) 56
DownSafety Closure For n ∈ DownSafe/Upsafe the DownSafety Closure Closure(n) is the smallest set of nodes satisfying
pred(m) \ UpSafe ⊆ Closure(n)
57
h:= a+b
1 2 3 4 5 6 7 8 9 10 11 13 12 14 15 a := ...
h h h
58
h:= a+b
1 2 3 4 5 6 7 8 9 10 11 13 12 14 15 a := ...
h h h
59
h:= a+b
1 2 3 4 5 6 7 8 9 10 11 13 12 14 15 a := ...
h h h No Initialization!
60
h:= a+b h:= a+b
1 2 3 4 5 6 7 8 9 10 11 13 12 14 15 a := ...
h h h
61
DownSafety Closure For n ∈ DownSafe/Upsafe the DownSafety Closure Closure(n) is the smallest set of nodes satisfying
pred(m) \ UpSafe ⊆ Closure(n)
62
Some subsets of nodes are distinguished. We call each of these sets a DownSafety Region...
63
Insertion Theorem Insertions of admissible PRE-Transformations are always at “earliest-frontiers” of DownSafety regions.
Comp
UpSafe Transp
R
DownSafe/UpSafe EarliestFrontier
R
...characterizes for the first time all semantics preserving CM-transf.
64
...concerning correctness and optimality:
...three theorems answering one of these questions each.
65
Intuitively, at the earliestness frontier of the DS-region induced by the tight set... Theorem 1 [Tight Sets: Insertion Points] Let TS ⊆ SDS be a tight set. Then RT S=d
f Γ(TS) ∪ (Comp\UpSafe)
is a DownSafety Region with BodyRT S = TS Correctness ...immediate corollary of Theorem 1 and Insertion Theorem
66
Intuitively, the difference between computations inserted and replaced... Theorem 2 [DownSafety Regions: Space Gain] Let R be a DownSafety Region with BodyR=d
f R\EarliestFrontierR
Then
|Comp\UpSafe| − |EarliestFrontierR| = |BodyR| − |Γ(BodyR)|
d f = defic(BodyR) 67
Due to an inherent property of tight sets (non-negative deficiency!)... Optimality Theorem [The Transformation] Let TS ⊆ SDS be a tight set.
InsertSpCM=d
f EarliestFrontierR
TS=R
TS\TS
defic(TS)=d
f |TS| − |Γ(TS)| ≥ 0 max. 68
tight sets favor tight sets favor Computational Quality Largest Earliestness Principle Smallest Latestness Principle Lifetime Quality
SmTS
R RSmTS
LaTS
R
LaTS
EarliestFrontier EarliestFrontier Comp
R
69
( SQ > CQ )
Latestness Principle Earliestness Principle
( SQ > LQ )
Smallest Tight Set
b) a)
Largest Tight Set
h := a+b
1 2 3 4 5 6 7 8 9 10 11 13 12 14 15 a := ...
h h h := a+b
1 2 3 4 5 6 7 8 9 10 11 13 12 14 15 a := ...
h h h := a+b h := a+b h h
70
LCM (G)
Main Process Preprocess
Perform
Optional:
(2 GEN/KILL-DFAs)
LCM for BCM Compute Predicates of
(3 GEN/KILL-DFAs)
G resp.
Compute Largest/Smallest Tight Set
Optimization Phase
Determine Insertion Points
Reduction Phase
Construct Bipartite Graph Compute Maximum Matching
71
❀ Ershov’s work on On Programming of Arithmetic Operations.
❀ Morel/Renvoise’s seminal work on PRE
❀ ...first to achieve comp. optimality with minimum register pressure ❀ ...first to rigorously be proven correct and optimal
2000] ❀ ...first to allow prioritization of goals ❀ ...rigorously be proven correct and optimal ❀ ...first to bridge the gap between traditional compilation and compilation for embedded systems
72
❀ Speculative PRE: Gupta, Horspool, Soffa, Xue, Scholz, Knoop,...
❀ Unifying PRE and Speculative PRE [Jingling Xue and J. Knoop]
73
Optimality results are quite sensitive! Three examples to provide evidence... (A) Code motion vs. code placement (B) Interdependencies of (elementary) transformations (C) Paradigm dependencies
74
...not just synonyms!
z := c+b (x,y) := (h1,h2) h2 := c+b z := h2 z := h1 z := h2 (x,y) := (h1,h2) h2 := c+b (h1,h2) := (a+b,c+b) (x,y) := (a+b,c+b) c := a h1 := a+b z := h1 (c,h2) := (a,h1) h1 := a+b z := a+b c := a
Original Program
Placing a+b Placing c+b
After Sem. Code Motion After Sem. Code Placement
Motion gets stuck! Motion gets stuck!
75
Optimality is lost!
y := c+b c := a z := a+b z := c+b y := c := a z := a+b z := h := a+b h := c+b h h
76
Peformance may be lost, when naively applied!
z := c+b z := c+b c := a z := a+b c := a h := a+b z := a+b
77
x := a+b a := b+c x := z
a := b+c
x := a+b x := z x := a+b x := a+b a := b+c
x := z x := a+b
AS TDCE ...2nd Order Effects!
❀ ...Partial Dead-Code Elimination (PDCE)
78
a := b+c
a := b+c b := a+c b := a+c a := b+c b := a+c a := b+c
AH TRAE ...2nd Order Effects!
❀ ...Partially Redundant Assignment Elimination (PRAE)
79
...we can think of PREE, PRAE and PDCE in terms of
80
Derivation relation ⊢...
G ⊢AH,T RAE G′ ( ET={AH,TRAE} )
G ⊢AS,T DCE G′ ( ET={AS,TDCE} ) We can prove... Optimality Theorem For both PRAE and PDCE, ⊢ET is confluent and terminating
81
ET ET ET ET ET
G G
Universe
82
AP = (AH + TRAE + AS + TDCE)∗ ...should be even more powerful! Indeed, but...
x := a+b
x := a+b x := a+b
x := a+b
x := a+b
x := a+b
x := a+b
PRAE PDCE
83
...and hence (global) optimality are lost!
ET ET ET ET ET
G
locOpt
G Universe
84
...there are scenarios, where we can end up with universes like
G Universe
ET ET ET ET ET
?
85
x := a+b z := d+b y := c+b z := h3 y := h2 (h1,h2,h3) := (a+b,c+b,d+b) x := h1
Original Program
ParBegin ParEnd ParBegin ParEnd
After Earliestness Transformation
86
...another strand of research on CM is gaining more and more attention
87
In contrast to CM
...thereby allowing to improve the performance of hot program paths at the expense of impairing cold program paths. Anything else, especially the optimization goals,
88
50 / 80 50 / 80 10 / 40 40 / 10 = a+b = a+b = a+b a = = a+b = a+b = a+b a = a = = a+b 1 2 3 4 5 6 7 8 10 11 12 9 20 80 10 10 50 30 30 30 40 10 10 50 / 20 10 10
89
50 / 80 50 / 80 10 / 40 40 / 10 = a+b = a+b = a+b a = = a+b = a+b = a+b a = a = = a+b 1 2 3 4 5 6 7 8 10 11 12 9 20 80 10 10 50 30 30 30 40 10 10 50 / 20 10 10
90
Apparently
problems having much in common! However
means – CM ...based on solving (typically) 4 bitvector analyses: Availability, Anticipability, ... – SCM ...based on solving a maximum flow problem
91
...the missing link between
On the theoretical side, this yields...
On the practical side, we obtain...
to outperform its competitors (joint work with Jingling Xue (CC 2006)
92
Like SCM
This means
derived from a program’s CFG. Hence, we have
93
Practically
...at least not in terms of demanding replacement of implementations of optimal state-of-the-art CM algorithms by the flow-network based one.
Theoretically
...a common high-level basis for understanding and reasoning about both SCM and CM.
94
This is in line with work on CM by other researchers striving for a simple and “motion-free” characterization of CM:
Elimination, SIGPLAN Not., 39(8), 49-53, 2004.
Elimination Made Easy, SIGPLAN Not., 37(8), 53-65, 2002.
for Partial Redundancy Elimination, SIGPLAN Not., 33(12), 35-43, 1998.
95
However, for these approaches
reasoning Especially in this respect, the characterization of
can be considered a major step forward.
96
A practical impact though... Based on the new understanding, we obtained
– Like its competitors: ...relies on 4 bitvector analyses – At first sight thus: ...yet another CM-algorithm – But: ...outperforms its competitors
97
...of the new algorithm show
required ranging from 20% to 60% in comparison to three state-of-the-art algorithms for CM (including LCM and E-path) Experiments were performed
benchmarks
98
...a hot topic of on-going research for almost 50 years!
– Theory available and widely used in practice ∗ Classic CM – Theory available, but not yet widely used ∗ Derivatives of Classic CM (PDCE, PFCE, SR, DAP,...) ∗ Speculative CM and some derivatives (SR) ∗ Semantic CM – Theory not yet available ∗ Speculative Semantic CM ∗ ...
99
– Pushing forward the further development of CM-based optimizations – Demanding their application (e.g. in the Phoenix framework) ...in order to help the impatient (.Net) programmer and user!
100
Predicting the future...
about the future.
101
The future will be bright! In particular, I predict that...
– 300th Anniversary of the Birth of Mozart – 200th Anniversary of the Birth of Freud – 100th Anniversary of the begin of CM-Research
102
...we will all meet again at The 54th Annual .Net Technologies 2056 Conference in Central Europe June 1 - 5, 2006, Plzen, Czech Republic
103
Browsing the programme of .Net Technologies 2056, I foresee to see... Keynote Speech: From .Net to .Net/XP: The Role and the Impact of a 100 Years of CM-Research
104
Questions?
Acknowledgements: Most of the results reported are joint work with Oliver R¨ uthing (U. Dortmund), Bernhard Steffen (U. Dortmund), Eduard Mehofer (U. Wien), and more recently Bernhard Scholz (U. Sydney), Nigel Horspool (U. Victoria), and Jingling Xue (Univ. of New South Wales).
105