Static Analysis and Code Optimizations in Glasgow Haskell Compiler
Ilya Sergey
12.12.12
ilya.sergey@gmail.com
1
Static Analysis and Code Optimizations in Glasgow Haskell Compiler - - PowerPoint PPT Presentation
Static Analysis and Code Optimizations in Glasgow Haskell Compiler Ilya Sergey ilya.sergey@gmail.com 12.12.12 1 The Goal Discuss what happens when we run ghc -O MyProgram.hs 2 The Plan Recall how laziness is implemented in GHC and
12.12.12
ilya.sergey@gmail.com
1
ghc -O MyProgram.hs
2
might cause;
an optimization technique implemented in GHC;
language;
mathematical basics of some analyses in GHC;
3
and How the Harm Can Be Reduced
4
module Main where import System.Environment import Text.Printf main = do [n] <- map read `fmap` getArgs printf "%f\n" (mysum n) mysum :: Double -> Double mysum n = myfoldl (+) 0 [1..n] myfoldl :: (a -> b -> a) -> a -> [b] -> a myfoldl f z0 xs0 = lgo z0 xs0 where lgo z [] = z lgo z (x:xs) = lgo (f z x) xs
5
> ghc --make -RTS -rtsopts Sum.hs > time ./Sum 1e6 +RTS -K100M 500000500000.0 real 0m0.583s user 0m0.509s sys 0m0.068s
Compile and run Compile optimized and run
> ghc --make -fforce-recomp -RTS -rtsopts -O Sum.hs > time ./Sum 1e6 500000500000.0 real 0m0.153s user 0m0.101s sys 0m0.011s
6
Profiling results for the non-optimized program
225,137,464 bytes allocated in the heap 195,297,088 bytes copied during GC 107 MB total memory in use INIT time 0.00s ( 0.00s elapsed) MUT time 0.21s ( 0.24s elapsed) GC time 0.36s ( 0.43s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 0.58s ( 0.67s elapsed) %GC time 63.2% (64.0% elapsed) > ghc --make -RTS -rtsopts -fforce-recomp Sum.hs > ./Sum 1e6 +RTS -sstderr -K100M
7
Profiling results for the optimized program
> ghc --make -RTS -rtsopts -fforce-recomp -O Sum.hs > ./Sum 1e6 +RTS -sstderr -K100M 92,082,480 bytes allocated in the heap 30,160 bytes copied during GC 1 MB total memory in use INIT time 0.00s ( 0.00s elapsed) MUT time 0.07s ( 0.08s elapsed) GC time 0.00s ( 0.00s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 0.07s ( 0.08s elapsed) %GC time 1.1% (1.4% elapsed)
8
Profiling results for the non-optimized program
> ghc --make -RTS -rtsopts -prof -fforce-recomp Sum.hs > ./Sum 1e6 +RTS -p -K100M
COST CENTRE MODULE %time %alloc mysum Main 52.7 74.1 myfoldl.lgo Main 43.6 25.8 myfoldl Main 3.7 0.0
9
Profiling results for the optimized program
> ghc --make -RTS -rtsopts -prof -fforce-recomp -O Sum.hs > ./Sum 1e6 +RTS -p -K100M
COST CENTRE MODULE %time %alloc mysum Main 92.1 99.9 myfoldl.lgo Main 7.9 0.0
10
Profiling results for the non-optimized program
> ghc --make -RTS -rtsopts -prof -fforce-recomp Sum.hs > ./Sum 1e6 +RTS -hy -p -K100M > hp2ps -e8in -c Sum.hp
Sum 1e6 +RTS -p -hy -K100M 3,127,720 bytes x seconds Wed Dec 12 15:01 2012
seconds 0.0 0.0 0.0 0.1 0.1 0.1 0.1 0.1 0.2 0.2 bytes 0M 2M 4M 6M 8M 10M 12M 14M 16M 18M BLACKHOLE * Double11
Profiling results for the optimized program
> ghc --make -RTS -rtsopts -prof -fforce-recomp -O Sum.hs > ./Sum 1e6 +RTS -hy -p -K100M > hp2ps -e8in -c Sum.hp
Sum 1e6 +RTS -p -hy -K100M 2,377 bytes x seconds Wed Dec 12 15:02 2012
seconds 0.0 0.0 0.0 0.1 0.1 0.1 0.1 bytes 0k 5k 10k 15k 20k 25k 30k (,) ForeignPtrContents IO WEAK Buffer12
Too Many Allocation of Double objects The cause: Too many thunks allocated for lazily computed values In our example the computation of Double values is delayed by the calls to lgo.
mysum :: Double -> Double mysum n = myfoldl (+) 0 [1..n] myfoldl :: (a -> b -> a) -> a -> [b] -> a myfoldl f z0 xs0 = lgo z0 xs0 where lgo z [] = z lgo z (x:xs) = lgo (f z x) xs
13
Call-by-Value Call-by-Need
Arguments of a function call are fully evaluated before the invocation. Arguments of a function call are not evaluated before the invocation. Instead, a pointer (thunk) to the code is created, and,
memoized.
Thunk (Urban Dictionary): To sneak up on someone and bean him with a heavy blow to the back of the head. “Jim got thunked going home last night. Serves him right for walking in a dark alley with all his paycheck in his pocket.”
14
case p of (a, b) -> f a b
p will be evaluated to the weak-head normal form, sufficient to examine whether it is a pair. However, its components will remain unevaluated (i.e., thunks). Remark: Only evaluation of boxed values can be delayed via thunks.
15
mysum :: Double -> Double mysum n = myfoldl (+) 0 [1..n] myfoldl :: (a -> b -> a) -> a -> [b] -> a myfoldl f z0 xs0 = lgo z0 xs0 where lgo z [] = z lgo z (x:xs) = lgo (f z x) xs mysum 3 myfoldl (+) 0 (1:2:3:[]) lgo z1 (1:2:3:[]) lgo z2 (2:3:[]) lgo z3 (3:[]) lgo z4 [] !z4 z1 -> 0 z2 -> 1 + !z1 z3 -> 2 + !z2 z4 -> 3 + !z3
Now GC can do the job...
= ⇒ = ⇒ = ⇒ = ⇒ = ⇒ = ⇒
16
Obvious Solution: Replace CBN by CBV, so no need in thunk. Obvious Problem: The semantics of a “lazy” program can change unpredictably.
f x e = if x > 0 then x + 1 else e f 5 (error “Urk”)
17
Let’s reformulate: Replace CBN by CBV only for strict functions, i.e., those that always evaluate their argument to the WHNF.
f x e = if x > 0 then x + 1 else e f 5 (error “Urk”)
18
Definition: A function f of one argument is strict iff
f undefined = undefined
Strictness is formulated similarly for functions of multiple arguments.
f x e = if x > 0 then x + 1 else e f 5 (error “Urk”)
19
Worker/Wrapper Transformation
call site.
f :: (Int, Int) -> Int f p = e f :: (Int, Int) -> Int f p = case p of (a, b) -> $wf a b $wf :: Int -> Int -> Int $wf a b = let p = (a, b) in e
Splitting a function into two parts
⇓
20
f :: (Int, Int) -> Int f p = case p of (a, b) -> $wf a b $wf :: Int -> Int -> Int $wf a b = let p = (a, b) in e
21
f :: (Int, Int) -> Int f p = case p of (a, b) -> $wf a $wf :: Int -> Int $wf a = let p = (a, error “Urk”) in (case p of (a, b) -> a) + 1
A strict function always examines its parameter. So, we just rely on a smart rewriter of case-expressions.
f :: (Int, Int) -> Int f p = (case p of (a, b) -> a) + 1
⇓
22
f :: (Int, Int) -> Int f p = case p of (a, b) -> $wf a $wf :: Int -> Int $wf a = a + 1
A strict function always examines its parameter. So, we just rely on a smart rewriter of case-expressions.
f :: (Int, Int) -> Int f p = (case p of (a, b) -> a) + 1
⇓
23
mysum :: Double -> Double mysum n = myfoldl (+) 0 [1..n] myfoldl :: (a -> b -> a) -> a -> [b] -> a myfoldl f z0 xs0 = lgo z0 xs0 where lgo z [] = z lgo z (x:xs) = lgo (f z x) xs
Step 1: Inline myfoldl
24
mysum :: Double -> Double mysum n = lgo 0 n where lgo :: Double -> [Double] -> Double lgo z [] = z lgo z (x:xs) = lgo (z + x) xs
Step 2: Analyze Strictness and Absence Result: lgo is strict in its both arguments
25
mysum :: Double -> Double mysum n = lgo 0 n where lgo :: Double -> [Double] -> Double lgo z [] = z lgo z (x:xs) = lgo (z + x) xs
Step 3: Worker/Wrapper Split
26
mysum :: Double -> Double mysum n = lgo 0 n where lgo :: Double -> [Double] -> Double lgo z xs = case z of D# d -> $wlgo d xs $wlgo :: Double# -> [Double] -> Double $wlgo d [] = D# d $wlgo d (x:xs) = lgo ((D# d) + x) xs
$wlgo takes unboxed doubles as an argument. Step 3: Worker/Wrapper Split
27
mysum :: Double -> Double mysum n = lgo 0 n where lgo :: Double -> [Double] -> Double lgo z xs = case z of D# d -> $wlgo d xs $wlgo :: Double# -> [Double] -> Double $wlgo d [] = D# d $wlgo d (x:xs) = lgo ((D# d) + x) xs
Step 4: Inline lgo in the Worker
28
mysum :: Double -> Double mysum n = lgo 0 n where lgo :: Double -> [Double] -> Double lgo z xs = case z of D# d -> $wlgo d xs $wlgo :: Double# -> [Double] -> Double $wlgo d [] = D# d $wlgo d (x:xs) = case ((D# d) + x) of D# d' -> $wlgo d' xs
Step 4: Inline lgo in the Worker
29
30
A number of Intermediate Languages
Most of interesting optimizations happen here
31
32
coercions;
33
data Expr b = Var Id | Lit Literal | App (Expr b) (Expr b) | Lam b (Expr b) | Let (Bind b) (Expr b) | Case (Expr b) b Type [Alt b] | Cast (Expr b) Coercion | Tick (Tickish Id) (Expr b) | Type Type | Coercion Coercion data Bind b = NonRec b (Expr b)
type Alt b = (AltCon, [b], Expr b) data AltCon = DataAlt DataCon | LitAlt Literal | DEFAULT
34
35
> ghc -ddump-ds Sum.hs
Desugared Core
> ghc -ddump-stranal Sum.hs
Core with Strictness Annotations
> ghc -ddump-worker-wrapper Sum.hs
Core after Worker/Wrapper Split
More at http://www.haskell.org/ghc/docs/2.10/users_guide/user_41.html
36
37
result;
inference.
abstract values of its inputs.
38
A function with multiple parameters
What if there are nested, recursive definitions?
39
If the result of applied to some arguments is going to be evaluated to WHNF, what can we say about its parameters?
Backwards analysis provides this contextual information.
40
about imperative programs as state transformers;
program to a mathematical function;
interpretation and analysis;
41
Definition Domain - a set of meanings for different programs What is the meaning of undefined
42
Once evaluated, it terminates the program
Adding bottom to a set of values is called lifting Example: Z⊥
. . . − 2 − 1 1 2 . . .
43
. . . − 2 − 1 1 2 . . .
Should be interpreted as . . . ? v 2, ? v 1, ? v 0, ? v 1, . . . Denotational semantics of a literal is itself
44
Partial order v
8x x v x
if x v y and y v z then x v z if x v y and y v x then x = y
Least upper bound z = x t y
x v z y v z x v z0 and y v z0 = ) z v z0
45
Algebraic Data Types
data Maybe a = Nothing | Just a
Nothing Just ⊥
Just (Just ⊥) Just 2
46
Monotone functions f is monotone iff x v y ( ) f x v f y
Denotational semantics of first-order Core functions - monotone functions on the lifted domain of values. Complete domain for denotational semantics of Core is defined recursively.
47
Monotone functions as domain elements
f x = ⇢ 1 if x = 0 ⊥
g x = 1 if x = 0 2 if x = 1 ⊥
Functions are compared point-wise:
f v g
Recursive definitions are computed as successive chains of increasingly more defined functions.
48
Definition: A monotone function is a projection if for every object
p
d
p d v d p(p d) = p d
Shrinking Idempotent In point-free style
p v ID p p = p
49
Examples
g
ID = λx.x BOT = λx.⊥ F1 = λ(x, y).(⊥, y) F2 = λg.λp.g(F1 p)
50
Theorem: Lemma: If P is a set of projections then tP exists and is a projection. Let p1 and p2 be projections. Then p1 v p2 = ) p1 p2 = p1.
51
Let p, q be projections, then (p, q)f = ⇢ (p d1, q d2) if f is a pair and f = (d1, d2) ⊥
These are projections, too.
(q → p)f = ⇢ p f q if f is a function ⊥
52
What does it mean “f is not using its argument”?
f z = f ⊥
What happens to the result What happens to the argument (ID → ID)f = (BOT → ID)f
53
| {z }
p
| {z }
p
| {z }
q
p f = p (q f)
q is a safe projection in the context of p
(ID → ID)f = (BOT → ID)f (ID → ID)f = (ID → ID)((BOT → ID)f)
54
p f = p (q f)
defines, how much information we can remove from the object, so it won’t change from p’s perspective.
The goal of a backwards absence/strictness analysis - to find a safe projection for a given value and a context
is going to be used;
55
p f = p (q f)
f :: (Int, Int, Int) -> [a] -> (Int, Bool) f (a, b, c) = case a of 0 -> error "urk" _ -> \y -> case b of 0 -> (c, null y) _ -> (c, False)
p q ID → ID ID → ID
ID → ID → (BOT, ID)
(ID, ID, BOT) → ID → ID
ID → ID → (ID, BOT)
ID → BOT → ID
56
Unfortunately, it is to weak for the strictness property. Usage context is modeled by the identity projection. The problem: A solution:
57
∀f f =
58
S = S ⊥ = S x = x, otherwise
Checking if the function f uses its argument strictly
S f = S f S
Indeed,
(S f) ⊥ = (S f S) ⊥ S (f ⊥) = S (f (S ⊥)) S (f ⊥) = S (f ) S (f ⊥) = S S (f ⊥) = f ⊥ = ⊥ = ⇒ = ⇒ = ⇒ = ⇒ = ⇒
59
“projection transformer”: it transforms a result context to a safe projection (not always the best one);
as it requires examining all contexts;
v v p1 p2 p3 q2 q1 q∗
p∗
ID
60
fact :: Int -> Int fact n = if n == 0 then n else n * (fact $ n - 1)
Let’s take a look on the strictness signatures (demo) Conclusion Polymorphism and type classes introduce implicit calls to non-strict functions and constructors, which make it harder to infer strictness.
61
Defines if a function can profitably return multiple results in registers.
62
dm :: Int -> Int -> (Int, Int) dm x y = (x `div` y, x `mod` y)
We would like to express that dm can return its result pair unboxed. Unboxed tuples are built-in types in GHC. The calling convention for a function that returns an unboxed tuple arranges to return the components on registers.
63
dm :: Int -> Int -> (Int, Int) dm x y = (x `div` y, x `mod` y) dm :: Int -> Int -> (Int, Int) dm x y = case $wdm x y of (# r1, r2 #) -> (r1, r2) $wdm :: Int -> Int -> (# Int, Int #) $wdm x y = (# x `div` y, x `mod` y #)
matcher;
64
case dm x y of (p, q) -> e case (case $wdm x y of (# r1, r2 #) -> (r1, r2)) of (p, q) -> e case $wdm x y of (# p, q #) -> e
If the result of the worker is scrutinized immediately... Inline the worker The tuple is returned unboxed The result pair construction has been moved from the body of dm to its call site.
65
f :: Int -> (Int, Int) f x = e f :: Int -> (Int, Int) f x = case $wf x of (# r1, r2 #) -> (r1, r2) $wf :: Int -> (# Int, Int #) $wf = case e of (r1, r2) -> (# r1, r2 #)
An arbitrary function returning a product The wrapper The worker
66
f :: Int -> (Int, Int) f x = case $wf x of (# r1, r2 #) -> (r1, r2) $wf :: Int -> (# Int, Int #) $wf = case e of (r1, r2) -> (# r1, r2 #)
The insight Things are getting worse unless the case expression in $wf is certain to cancel with the construction of the pair in e.
67
We should only perform the CPR W/W transformation if the result of the function is allocated by the function itself. Definition: A function has the CPR (constructed product result) property, if it allocates its result product itself. The goal of the CPR analysis is to infer this property.
68
definition only, but not its uses;
system, which tracks explicit product constructions;
non-explicitly constructed products.
69
f :: Int -> (Int, Int) f x y = if x <= y then (x, y) else f (x - 1) (y + 1)
Has CPR property
g :: Int -> (Int, Int) f x y = if x <= y then (x, y) else genRange x
Does not have CPR property CPR property in Core metadata: demo
is CPR depends on CPR(f) external function
70
tak :: Int -> Int -> Int -> Int tak x y z = if not(y < x) then z else tak (tak (x-1) y z)
(tak (z-1) x y) main = do
so both parts of the worker collapse
71
banner -1.4% -18.7% 0.00 boyer2 -1.3% -31.8% 0.00 clausify -1.3% -35.0% 0.03 comp_lab_zift -1.3% +0.2% +0.0% compress2 -1.4% -32.7% +1.4% cse -1.4% -15.8% 0.00 mandel2 -1.4% -28.0% 0.00 puzzle -1.3% +16.5% 0.16 rfib -1.4% -99.7% 0.02 x2n1 -1.2% -81.2% 0.01 ... and 90 more ...
Max -0.7% +16.5% +3.2% Geometric Mean -1.3% -16.9% -3.3%
72
it might cause performance problems due to a big chunk of GC work;
unboxing/evaluation;
change a program semantics;
Strictness and Absence are backwards ones, CPR is a forward analysis;
in a backwards analysis.
Thanks
73
World Haskell, Chapter 25
Thunk Leak
http://blog.ezyang.com/2011/05/anatomy-of-a-thunk-leak/
The Haskell Heap
http://blog.ezyang.com/2011/04/the-haskell-heap/
. Wadler, R. J. M. Hughes. Projections for strictness analysis.
74