Static Analysis and Code Optimizations in Glasgow Haskell Compiler - - PowerPoint PPT Presentation

static analysis and code optimizations in glasgow haskell
SMART_READER_LITE
LIVE PREVIEW

Static Analysis and Code Optimizations in Glasgow Haskell Compiler - - PowerPoint PPT Presentation

Static Analysis and Code Optimizations in Glasgow Haskell Compiler Ilya Sergey ilya.sergey@gmail.com 12.12.12 1 The Goal Discuss what happens when we run ghc -O MyProgram.hs 2 The Plan Recall how laziness is implemented in GHC and


slide-1
SLIDE 1

Static Analysis and Code Optimizations in Glasgow Haskell Compiler

Ilya Sergey

12.12.12

ilya.sergey@gmail.com

1

slide-2
SLIDE 2

The Goal

Discuss what happens when we run

ghc -O MyProgram.hs

2

slide-3
SLIDE 3

The Plan

  • Recall how laziness is implemented in GHC and what drawbacks it

might cause;

  • Introduce the worker/wrapper transformation -

an optimization technique implemented in GHC;

  • Realize why we need static analysis to do the transformations;
  • Take a brief look at the GHC compilation pipeline and the Core

language;

  • Meet two types of static analysis: forward and backwards;
  • Recall some basics of denotational semantics and take a look at the

mathematical basics of some analyses in GHC;

  • Introduce and motivate the CPR analysis.

3

slide-4
SLIDE 4

Why Laziness Might be Harmful

and How the Harm Can Be Reduced

4

slide-5
SLIDE 5

module Main where import System.Environment import Text.Printf main = do [n] <- map read `fmap` getArgs printf "%f\n" (mysum n) mysum :: Double -> Double mysum n = myfoldl (+) 0 [1..n] myfoldl :: (a -> b -> a) -> a -> [b] -> a myfoldl f z0 xs0 = lgo z0 xs0 where lgo z [] = z lgo z (x:xs) = lgo (f z x) xs

5

slide-6
SLIDE 6

> ghc --make -RTS -rtsopts Sum.hs > time ./Sum 1e6 +RTS -K100M 500000500000.0 real 0m0.583s user 0m0.509s sys 0m0.068s

Compile and run Compile optimized and run

> ghc --make -fforce-recomp -RTS -rtsopts -O Sum.hs > time ./Sum 1e6 500000500000.0 real 0m0.153s user 0m0.101s sys 0m0.011s

6

slide-7
SLIDE 7

Collecting Runtime Statistics

Profiling results for the non-optimized program

225,137,464 bytes allocated in the heap 195,297,088 bytes copied during GC 107 MB total memory in use INIT time 0.00s ( 0.00s elapsed) MUT time 0.21s ( 0.24s elapsed) GC time 0.36s ( 0.43s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 0.58s ( 0.67s elapsed) %GC time 63.2% (64.0% elapsed) > ghc --make -RTS -rtsopts -fforce-recomp Sum.hs > ./Sum 1e6 +RTS -sstderr -K100M

7

slide-8
SLIDE 8

Collecting Runtime Statistics

Profiling results for the optimized program

> ghc --make -RTS -rtsopts -fforce-recomp -O Sum.hs > ./Sum 1e6 +RTS -sstderr -K100M 92,082,480 bytes allocated in the heap 30,160 bytes copied during GC 1 MB total memory in use INIT time 0.00s ( 0.00s elapsed) MUT time 0.07s ( 0.08s elapsed) GC time 0.00s ( 0.00s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 0.07s ( 0.08s elapsed) %GC time 1.1% (1.4% elapsed)

8

slide-9
SLIDE 9

Time Profiling

Profiling results for the non-optimized program

> ghc --make -RTS -rtsopts -prof -fforce-recomp Sum.hs > ./Sum 1e6 +RTS -p -K100M

  • total time = 0.24 secs
  • total alloc = 124,080,472 bytes

COST CENTRE MODULE %time %alloc mysum Main 52.7 74.1 myfoldl.lgo Main 43.6 25.8 myfoldl Main 3.7 0.0

9

slide-10
SLIDE 10

Time Profiling

Profiling results for the optimized program

> ghc --make -RTS -rtsopts -prof -fforce-recomp -O Sum.hs > ./Sum 1e6 +RTS -p -K100M

  • total time = 0.14 secs
  • total alloc = 92,080,364 bytes

COST CENTRE MODULE %time %alloc mysum Main 92.1 99.9 myfoldl.lgo Main 7.9 0.0

10

slide-11
SLIDE 11

Memory Profiling

Profiling results for the non-optimized program

> ghc --make -RTS -rtsopts -prof -fforce-recomp Sum.hs > ./Sum 1e6 +RTS -hy -p -K100M > hp2ps -e8in -c Sum.hp

Sum 1e6 +RTS -p -hy -K100M 3,127,720 bytes x seconds Wed Dec 12 15:01 2012

seconds 0.0 0.0 0.0 0.1 0.1 0.1 0.1 0.1 0.2 0.2 bytes 0M 2M 4M 6M 8M 10M 12M 14M 16M 18M BLACKHOLE * Double

11

slide-12
SLIDE 12

Memory Profiling

Profiling results for the optimized program

> ghc --make -RTS -rtsopts -prof -fforce-recomp -O Sum.hs > ./Sum 1e6 +RTS -hy -p -K100M > hp2ps -e8in -c Sum.hp

Sum 1e6 +RTS -p -hy -K100M 2,377 bytes x seconds Wed Dec 12 15:02 2012

seconds 0.0 0.0 0.0 0.1 0.1 0.1 0.1 bytes 0k 5k 10k 15k 20k 25k 30k (,) ForeignPtrContents IO WEAK Buffer
  • >[]
MUT_VAR_CLEAN Handle__ MUT_ARR_PTRS_CLEAN [] ARR_WORDS

12

slide-13
SLIDE 13

The Problem

Too Many Allocation of Double objects The cause: Too many thunks allocated for lazily computed values In our example the computation of Double values is delayed by the calls to lgo.

mysum :: Double -> Double mysum n = myfoldl (+) 0 [1..n] myfoldl :: (a -> b -> a) -> a -> [b] -> a myfoldl f z0 xs0 = lgo z0 xs0 where lgo z [] = z lgo z (x:xs) = lgo (f z x) xs

13

slide-14
SLIDE 14

Intermezzo

Call-by-Value Call-by-Need

Arguments of a function call are fully evaluated before the invocation. Arguments of a function call are not evaluated before the invocation. Instead, a pointer (thunk) to the code is created, and,

  • nce evaluated, the value is

memoized.

Thunk (Urban Dictionary): To sneak up on someone and bean him with a heavy blow to the back of the head. “Jim got thunked going home last night. Serves him right for walking in a dark alley with all his paycheck in his pocket.”

14

slide-15
SLIDE 15

How to thunk a thunk

  • Apply its delayed value as a function;
  • Examine its value in a case-expression.

case p of (a, b) -> f a b

p will be evaluated to the weak-head normal form, sufficient to examine whether it is a pair. However, its components will remain unevaluated (i.e., thunks). Remark: Only evaluation of boxed values can be delayed via thunks.

15

slide-16
SLIDE 16

Our Example from CBN’s Perspective

mysum :: Double -> Double mysum n = myfoldl (+) 0 [1..n] myfoldl :: (a -> b -> a) -> a -> [b] -> a myfoldl f z0 xs0 = lgo z0 xs0 where lgo z [] = z lgo z (x:xs) = lgo (f z x) xs mysum 3 myfoldl (+) 0 (1:2:3:[]) lgo z1 (1:2:3:[]) lgo z2 (2:3:[]) lgo z3 (3:[]) lgo z4 [] !z4 z1 -> 0 z2 -> 1 + !z1 z3 -> 2 + !z2 z4 -> 3 + !z3

Now GC can do the job...

= ⇒ = ⇒ = ⇒ = ⇒ = ⇒ = ⇒

16

slide-17
SLIDE 17

Getting Rid of Redundant Thunks

Obvious Solution: Replace CBN by CBV, so no need in thunk. Obvious Problem: The semantics of a “lazy” program can change unpredictably.

f x e = if x > 0 then x + 1 else e f 5 (error “Urk”)

17

slide-18
SLIDE 18

Getting Rid of Redundant Thunks

Let’s reformulate: Replace CBN by CBV only for strict functions, i.e., those that always evaluate their argument to the WHNF.

f x e = if x > 0 then x + 1 else e f 5 (error “Urk”)

  • f is strict in x
  • f is non-strict (lazy) in e

18

slide-19
SLIDE 19

A Convenient Definition of Strictness

Definition: A function f of one argument is strict iff

f undefined = undefined

Strictness is formulated similarly for functions of multiple arguments.

f x e = if x > 0 then x + 1 else e f 5 (error “Urk”)

19

slide-20
SLIDE 20

Enforcing CBV for Function Calls

Worker/Wrapper Transformation

  • The worker does all the job, but takes unboxed;
  • The wrapper serves as an impedance matcher and inlined at every

call site.

f :: (Int, Int) -> Int f p = e f :: (Int, Int) -> Int f p = case p of (a, b) -> $wf a b $wf :: Int -> Int -> Int $wf a b = let p = (a, b) in e

Splitting a function into two parts

20

slide-21
SLIDE 21

Some Redundant Job Done?

f :: (Int, Int) -> Int f p = case p of (a, b) -> $wf a b $wf :: Int -> Int -> Int $wf a b = let p = (a, b) in e

  • f takes the pair apart and passes components to $wf;
  • $wf construct the pair again.

21

slide-22
SLIDE 22

Strictness to the Rescue

f :: (Int, Int) -> Int f p = case p of (a, b) -> $wf a $wf :: Int -> Int $wf a = let p = (a, error “Urk”) in (case p of (a, b) -> a) + 1

A strict function always examines its parameter. So, we just rely on a smart rewriter of case-expressions.

f :: (Int, Int) -> Int f p = (case p of (a, b) -> a) + 1

22

slide-23
SLIDE 23

Strictness to the Rescue

f :: (Int, Int) -> Int f p = case p of (a, b) -> $wf a $wf :: Int -> Int $wf a = a + 1

A strict function always examines its parameter. So, we just rely on a smart rewriter of case-expressions.

f :: (Int, Int) -> Int f p = (case p of (a, b) -> a) + 1

23

slide-24
SLIDE 24

Our Example

mysum :: Double -> Double mysum n = myfoldl (+) 0 [1..n] myfoldl :: (a -> b -> a) -> a -> [b] -> a myfoldl f z0 xs0 = lgo z0 xs0 where lgo z [] = z lgo z (x:xs) = lgo (f z x) xs

Step 1: Inline myfoldl

24

slide-25
SLIDE 25

Our Example

mysum :: Double -> Double mysum n = lgo 0 n where lgo :: Double -> [Double] -> Double lgo z [] = z lgo z (x:xs) = lgo (z + x) xs

Step 2: Analyze Strictness and Absence Result: lgo is strict in its both arguments

25

slide-26
SLIDE 26

Our Example

mysum :: Double -> Double mysum n = lgo 0 n where lgo :: Double -> [Double] -> Double lgo z [] = z lgo z (x:xs) = lgo (z + x) xs

Step 3: Worker/Wrapper Split

26

slide-27
SLIDE 27

Our Example

mysum :: Double -> Double mysum n = lgo 0 n where lgo :: Double -> [Double] -> Double lgo z xs = case z of D# d -> $wlgo d xs $wlgo :: Double# -> [Double] -> Double $wlgo d [] = D# d $wlgo d (x:xs) = lgo ((D# d) + x) xs

$wlgo takes unboxed doubles as an argument. Step 3: Worker/Wrapper Split

27

slide-28
SLIDE 28

Our Example

mysum :: Double -> Double mysum n = lgo 0 n where lgo :: Double -> [Double] -> Double lgo z xs = case z of D# d -> $wlgo d xs $wlgo :: Double# -> [Double] -> Double $wlgo d [] = D# d $wlgo d (x:xs) = lgo ((D# d) + x) xs

Step 4: Inline lgo in the Worker

28

slide-29
SLIDE 29

Our Example

mysum :: Double -> Double mysum n = lgo 0 n where lgo :: Double -> [Double] -> Double lgo z xs = case z of D# d -> $wlgo d xs $wlgo :: Double# -> [Double] -> Double $wlgo d [] = D# d $wlgo d (x:xs) = case ((D# d) + x) of D# d' -> $wlgo d' xs

Step 4: Inline lgo in the Worker

  • lgo is invoked just once;
  • No intermediate thunks for d is constructed.

29

slide-30
SLIDE 30

A Brief Look at GHC’s Guts

30

slide-31
SLIDE 31

GHC Compilation Pipeline

  • Haskell Source
  • Core
  • Spineless Tagless G-Machine
  • C--
  • C / Machine Code / LLVM Code

A number of Intermediate Languages

Most of interesting optimizations happen here

31

slide-32
SLIDE 32

32

slide-33
SLIDE 33

GHC Core

  • A tiny language, to which Haskell sources are de-sugared;
  • Based on explicitly typed System F with type equality

coercions;

  • Used as a base platform for analyses and optimizations;
  • All names are fully-qualified;
  • if-then-else is compiled to case-expressions;
  • Variables have additional metadata;
  • Type class constraints are compiled into record parameters.

33

slide-34
SLIDE 34

Core Syntax

data Expr b = Var Id | Lit Literal | App (Expr b) (Expr b) | Lam b (Expr b) | Let (Bind b) (Expr b) | Case (Expr b) b Type [Alt b] | Cast (Expr b) Coercion | Tick (Tickish Id) (Expr b) | Type Type | Coercion Coercion data Bind b = NonRec b (Expr b)

  • | Rec [(b, (Expr b))]

type Alt b = (AltCon, [b], Expr b) data AltCon = DataAlt DataCon | LitAlt Literal | DEFAULT

34

slide-35
SLIDE 35

Core Output (Demo)

  • A factorial function
  • mysum

35

slide-36
SLIDE 36

How to Get Core

> ghc -ddump-ds Sum.hs

Desugared Core

> ghc -ddump-stranal Sum.hs

Core with Strictness Annotations

> ghc -ddump-worker-wrapper Sum.hs

Core after Worker/Wrapper Split

More at http://www.haskell.org/ghc/docs/2.10/users_guide/user_41.html

36

slide-37
SLIDE 37

Strictness and Absence Analyses in a Nutshell

37

slide-38
SLIDE 38

Two Types

  • f Modular Program Analyses
  • Forward analysis
  • “Run” the program with abstract input and infer the abstract

result;

  • Examples: sign analysis, interval analysis, type checking/

inference.

  • Backwards analysis
  • From the expected abstract result of the program infer the

abstract values of its inputs.

38

slide-39
SLIDE 39

Strictness from the definition as a forward analysis

f ⊥ = ⊥

A function with multiple parameters

f x y z = . . . (f ? > >), (f > ? >), (f > > ?)

What if there are nested, recursive definitions?

39

slide-40
SLIDE 40

Strictness as a backwards analysis (Informally)

f x y z = . . .

If the result of applied to some arguments is going to be evaluated to WHNF, what can we say about its parameters?

f

Backwards analysis provides this contextual information.

40

slide-41
SLIDE 41

Defining the Contexts (formally)

Denotational Semantics

  • Answers the question what a program is;
  • Introduced by Dana Scott and Christopher Strachey to reason

about imperative programs as state transformers;

  • The effect of program execution is modeled by relating a

program to a mathematical function;

  • Main purpose: constructing different domains for program

interpretation and analysis;

  • Secondary purpose: introducing ordering on program objects.

41

slide-42
SLIDE 42

Simple Denotational Semantics of Core

Definition Domain - a set of meanings for different programs What is the meaning of undefined

  • r a non-terminating program?

JundefinedK = ⊥ Jf x = f xK = ⊥ ⊥ - “bottom”

42

slide-43
SLIDE 43

Simple Denotational Semantics of Core ⊥ is the least defined element in our domain

Once evaluated, it terminates the program

Simple Denotational Semantics of Core

Adding bottom to a set of values is called lifting Example: Z⊥

. . . − 2 − 1 1 2 . . .

43

slide-44
SLIDE 44

Simple Denotational Semantics of Core Simple Denotational Semantics of Core

. . . − 2 − 1 1 2 . . .

Should be interpreted as . . . ? v 2, ? v 1, ? v 0, ? v 1, . . . Denotational semantics of a literal is itself

J1K = 1

44

slide-45
SLIDE 45

Elements of Domain Theory

Partial order v

x v y

  • is “less defined than”

x

y

  • reflexive:
  • transitive:
  • antisymmetric:

8x x v x

if x v y and y v z then x v z if x v y and y v x then x = y

Least upper bound z = x t y

x v z y v z x v z0 and y v z0 = ) z v z0

45

slide-46
SLIDE 46

Simple Denotational Semantics of Core Simple Denotational Semantics of Core

Algebraic Data Types

data Maybe a = Nothing | Just a

Nothing Just ⊥

Just (Just ⊥) Just 2

46

slide-47
SLIDE 47

Simple Denotational Semantics of Core Simple Denotational Semantics of Core

Monotone functions f is monotone iff x v y ( ) f x v f y

Denotational semantics of first-order Core functions - monotone functions on the lifted domain of values. Complete domain for denotational semantics of Core is defined recursively.

47

slide-48
SLIDE 48

Simple Denotational Semantics of Core Simple Denotational Semantics of Core

Monotone functions as domain elements

f x = ⇢ 1 if x = 0 ⊥

  • therwise

g x =    1 if x = 0 2 if x = 1 ⊥

  • therwise

Functions are compared point-wise:

f v g

Recursive definitions are computed as successive chains of increasingly more defined functions.

48

slide-49
SLIDE 49

Projections: Defining Usage Contexts

Definition: A monotone function is a projection if for every object

p

d

p d v d p(p d) = p d

Shrinking Idempotent In point-free style

p v ID p p = p

49

slide-50
SLIDE 50

Intuition behind Projections

  • Projections remove information from objects;
  • Projections is a way to describe which parts of an
  • bject are essential for the computation;
  • Projection will be used as a synonym to context.

Examples

  • a projection if is monotone

g

ID = λx.x BOT = λx.⊥ F1 = λ(x, y).(⊥, y) F2 = λg.λp.g(F1 p)

50

slide-51
SLIDE 51

More Facts about Projections

Theorem: Lemma: If P is a set of projections then tP exists and is a projection. Let p1 and p2 be projections. Then p1 v p2 = ) p1 p2 = p1.

51

slide-52
SLIDE 52

Let p, q be projections, then (p, q)f = ⇢ (p d1, q d2) if f is a pair and f = (d1, d2) ⊥

  • therwise

These are projections, too.

Higher-Order Projections

(q → p)f = ⇢ p f q if f is a function ⊥

  • therwise

52

slide-53
SLIDE 53

Modeling Usage with Projections

What does it mean “f is not using its argument”?

f = λx. . . .

f z = f ⊥

  • r

What happens to the result What happens to the argument (ID → ID)f = (BOT → ID)f

53

slide-54
SLIDE 54

Modeling Usage with Projections

| {z }

p

| {z }

p

| {z }

q

m m

p f = p (q f)

q is a safe projection in the context of p

(ID → ID)f = (BOT → ID)f (ID → ID)f = (ID → ID)((BOT → ID)f)

54

slide-55
SLIDE 55

Safety Condition for Projections

p f = p (q f)

p defines a context, i.e., how we are going to use a value;

defines, how much information we can remove from the object, so it won’t change from p’s perspective.

q

The goal of a backwards absence/strictness analysis - to find a safe projection for a given value and a context

  • The context: how the result of the function

is going to be used;

  • The output: how arguments can be safely changed.

55

slide-56
SLIDE 56

Safe Usage Projections: Example

p f = p (q f)

f :: (Int, Int, Int) -> [a] -> (Int, Bool) f (a, b, c) = case a of 0 -> error "urk" _ -> \y -> case b of 0 -> (c, null y) _ -> (c, False)

p q ID → ID ID → ID

ID → ID → (BOT, ID)

(ID, ID, BOT) → ID → ID

ID → ID → (ID, BOT)

ID → BOT → ID

56

slide-57
SLIDE 57

What about Strictness?

Unfortunately, it is to weak for the strictness property. Usage context is modeled by the identity projection. The problem: A solution:

  • ID treats ⊥ as any other value;
  • It is not helpful to establish a context for detecting f ⊥ = ⊥.
  • Introduce a specific element in the domain for “true divergence”;
  • Devise a specific projection that maps ⊥ to the true divergence.

57

slide-58
SLIDE 58

Extending the Domain for True Divergence

  • lightning bolt

∀f f =

58

slide-59
SLIDE 59

Modeling Strictness with Projections

S = S ⊥ = S x = x, otherwise

Checking if the function f uses its argument strictly

S f = S f S

Indeed,

(S f) ⊥ = (S f S) ⊥ S (f ⊥) = S (f (S ⊥)) S (f ⊥) = S (f ) S (f ⊥) = S S (f ⊥) = f ⊥ = ⊥ = ⇒ = ⇒ = ⇒ = ⇒ = ⇒

59

slide-60
SLIDE 60

Conservative Nature of the Analysis

  • From the backwards perspective each function is a

“projection transformer”: it transforms a result context to a safe projection (not always the best one);

  • The set of all safe projections of a function is incomputable,

as it requires examining all contexts;

  • Instead, the optimal “threshold” result projection is chosen.

v v p1 p2 p3 q2 q1 q∗

p∗

ID

60

slide-61
SLIDE 61

How to screw the Strictness Analysis

fact :: Int -> Int fact n = if n == 0 then n else n * (fact $ n - 1)

Let’s take a look on the strictness signatures (demo) Conclusion Polymorphism and type classes introduce implicit calls to non-strict functions and constructors, which make it harder to infer strictness.

61

slide-62
SLIDE 62

Forward Analysis Example

Constructed Product Result Analysis

Defines if a function can profitably return multiple results in registers.

62

slide-63
SLIDE 63

Example and Motivation

dm :: Int -> Int -> (Int, Int) dm x y = (x `div` y, x `mod` y)

We would like to express that dm can return its result pair unboxed. Unboxed tuples are built-in types in GHC. The calling convention for a function that returns an unboxed tuple arranges to return the components on registers.

63

slide-64
SLIDE 64

Worker/Wrapper Split to the Rescue

dm :: Int -> Int -> (Int, Int) dm x y = (x `div` y, x `mod` y) dm :: Int -> Int -> (Int, Int) dm x y = case $wdm x y of (# r1, r2 #) -> (r1, r2) $wdm :: Int -> Int -> (# Int, Int #) $wdm x y = (# x `div` y, x `mod` y #)

  • The worker does actually all the job;
  • The wrapper serves as an impedance

matcher;

64

slide-65
SLIDE 65

The Essence of the Transformation

case dm x y of (p, q) -> e case (case $wdm x y of (# r1, r2 #) -> (r1, r2)) of (p, q) -> e case $wdm x y of (# p, q #) -> e

If the result of the worker is scrutinized immediately... Inline the worker The tuple is returned unboxed The result pair construction has been moved from the body of dm to its call site.

65

slide-66
SLIDE 66

General CPR Worker/Wrapper Split

f :: Int -> (Int, Int) f x = e f :: Int -> (Int, Int) f x = case $wf x of (# r1, r2 #) -> (r1, r2) $wf :: Int -> (# Int, Int #) $wf = case e of (r1, r2) -> (# r1, r2 #)

An arbitrary function returning a product The wrapper The worker

66

slide-67
SLIDE 67

When is the W/W Split Beneficial?

f :: Int -> (Int, Int) f x = case $wf x of (# r1, r2 #) -> (r1, r2) $wf :: Int -> (# Int, Int #) $wf = case e of (r1, r2) -> (# r1, r2 #)

  • The worker takes the pair apart;
  • The wrapper reconstructs it again.

The insight Things are getting worse unless the case expression in $wf is certain to cancel with the construction of the pair in e.

67

slide-68
SLIDE 68

When is the W/W Split Beneficial?

We should only perform the CPR W/W transformation if the result of the function is allocated by the function itself. Definition: A function has the CPR (constructed product result) property, if it allocates its result product itself. The goal of the CPR analysis is to infer this property.

68

slide-69
SLIDE 69

CPR Analysis Informally

  • The analysis is modular: it’s based on the function

definition only, but not its uses;

  • Implemented in the form of an augmented type

system, which tracks explicit product constructions;

  • Forwards analysis: assumes all arguments are

non-explicitly constructed products.

69

slide-70
SLIDE 70

Examples

f :: Int -> (Int, Int) f x y = if x <= y then (x, y) else f (x - 1) (y + 1)

Has CPR property

g :: Int -> (Int, Int) f x y = if x <= y then (x, y) else genRange x

Does not have CPR property CPR property in Core metadata: demo

is CPR depends on CPR(f) external function

70

slide-71
SLIDE 71

A program that benefits from CPR

tak :: Int -> Int -> Int -> Int tak x y z = if not(y < x) then z else tak (tak (x-1) y z)

  • (tak (y-1) z x)

(tak (z-1) x y) main = do

  • [xs,ys,zs] <- getArgs
  • print (tak (read xs) (read ys) (read zs))
  • Taken from the nofib benchmark suite
  • A result from tak is consumed by itself,

so both parts of the worker collapse

  • Memory consumption gain: 99.5%

71

slide-72
SLIDE 72

nofib: Strictness + Absence + CPR

  • Program Size Allocs Runtime
  • ansi -1.3% -12.1% 0.00

banner -1.4% -18.7% 0.00 boyer2 -1.3% -31.8% 0.00 clausify -1.3% -35.0% 0.03 comp_lab_zift -1.3% +0.2% +0.0% compress2 -1.4% -32.7% +1.4% cse -1.4% -15.8% 0.00 mandel2 -1.4% -28.0% 0.00 puzzle -1.3% +16.5% 0.16 rfib -1.4% -99.7% 0.02 x2n1 -1.2% -81.2% 0.01 ... and 90 more ...

  • Min -1.5% -95.0% -16.2%

Max -0.7% +16.5% +3.2% Geometric Mean -1.3% -16.9% -3.3%

72

slide-73
SLIDE 73

Conclusion

  • Lazy programs allocate a lot of thunks;

it might cause performance problems due to a big chunk of GC work;

  • Allocating thunks can be avoided by changing call/return contract
  • f a function;
  • Worker/Wrapper transformation is a cheap way to enforce argument

unboxing/evaluation;

  • We need Strictness and Absence analysis so the W/W split would not

change a program semantics;

  • We need CPR analysis so CPR W/W split would be beneficial;
  • There are two types of analyses: forward and backwards;

Strictness and Absence are backwards ones, CPR is a forward analysis;

  • Projections are a convenient way to model contexts

in a backwards analysis.

Thanks

73

slide-74
SLIDE 74

References

  • Profiling and optimization
  • B. O’Sullivan et al. Real

World Haskell, Chapter 25

  • E. Z.
  • Yang. Anatomy of a

Thunk Leak

http://blog.ezyang.com/2011/05/anatomy-of-a-thunk-leak/

The Haskell Heap

http://blog.ezyang.com/2011/04/the-haskell-heap/

  • Strictness and CPR Analyses
  • http://hackage.haskell.org/trac/ghc/wiki/Commentary/Compiler/Demand
  • http://www.haskell.org/haskellwiki/Lazy_vs._non-strict
  • C. Baker-Finch et al. Constructed Product Result Analysis for Haskell
  • Denotational Semantics and Projections
  • G. Winskel. Formal Semantics of Programming Languages
  • P

. Wadler, R. J. M. Hughes. Projections for strictness analysis.

74