Static Analysis and Code Optimizations in Glasgow Haskell Compiler - PowerPoint PPT Presentation

Static Analysis and Code Optimizations in Glasgow Haskell Compiler Ilya Sergey ilya.sergey@gmail.com 12.12.12 1

The Goal Discuss what happens when we run ghc -O MyProgram.hs 2

The Plan • Recall how laziness is implemented in GHC and what drawbacks it might cause; • Introduce the worker/wrapper transformation - an optimization technique implemented in GHC; • Realize why we need static analysis to do the transformations; • Take a brief look at the GHC compilation pipeline and the Core language; • Meet two types of static analysis: forward and backwards; • Recall some basics of denotational semantics and take a look at the mathematical basics of some analyses in GHC; • Introduce and motivate the CPR analysis. 3

Why Laziness Might be Harmful and How the Harm Can Be Reduced 4

module Main where import System.Environment import Text.Printf main = do [n] <- map read `fmap` getArgs printf "%f\n" (mysum n) mysum :: Double -> Double mysum n = myfoldl (+) 0 [1..n] myfoldl :: (a -> b -> a) -> a -> [b] -> a myfoldl f z0 xs0 = lgo z0 xs0 where lgo z [] = z lgo z (x:xs) = lgo (f z x) xs 5

Compile and run > ghc --make -RTS -rtsopts Sum.hs > time ./Sum 1e6 +RTS -K100M 500000500000.0 real � 0m0.583s user � 0m0.509s sys � 0m0.068s Compile optimized and run > ghc --make -fforce-recomp -RTS -rtsopts -O Sum.hs > time ./Sum 1e6 500000500000.0 real � 0m0.153s user � 0m0.101s sys � 0m0.011s 6

Collecting Runtime Statistics Profiling results for the non-optimized program > ghc --make -RTS -rtsopts -fforce-recomp Sum.hs > ./Sum 1e6 +RTS -sstderr -K100M 225,137,464 bytes allocated in the heap 195,297,088 bytes copied during GC 107 MB total memory in use INIT time 0.00s ( 0.00s elapsed) MUT time 0.21s ( 0.24s elapsed) GC time 0.36s ( 0.43s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 0.58s ( 0.67s elapsed) %GC time 63.2% (64.0% elapsed) 7

Collecting Runtime Statistics Profiling results for the optimized program > ghc --make -RTS -rtsopts -fforce-recomp -O Sum.hs > ./Sum 1e6 +RTS -sstderr -K100M 92,082,480 bytes allocated in the heap 30,160 bytes copied during GC 1 MB total memory in use INIT time 0.00s ( 0.00s elapsed) MUT time 0.07s ( 0.08s elapsed) GC time 0.00s ( 0.00s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 0.07s ( 0.08s elapsed) %GC time 1.1% (1.4% elapsed) 8

Time Profiling Profiling results for the non-optimized program > ghc --make -RTS -rtsopts -prof -fforce-recomp Sum.hs > ./Sum 1e6 +RTS -p -K100M � total time = 0.24 secs � total alloc = 124,080,472 bytes COST CENTRE MODULE %time %alloc mysum Main 52.7 74.1 myfoldl.lgo Main 43.6 25.8 myfoldl Main 3.7 0.0 9

Time Profiling Profiling results for the optimized program > ghc --make -RTS -rtsopts -prof -fforce-recomp -O Sum.hs > ./Sum 1e6 +RTS -p -K100M � total time = 0.14 secs � total alloc = 92,080,364 bytes COST CENTRE MODULE %time %alloc mysum Main 92.1 99.9 myfoldl.lgo Main 7.9 0.0 10

Memory Profiling Profiling results for the non-optimized program > ghc --make -RTS -rtsopts -prof -fforce-recomp Sum.hs > ./Sum 1e6 +RTS -hy -p -K100M > hp2ps -e8in -c Sum.hp Sum 1e6 +RTS -p -hy -K100M 3,127,720 bytes x seconds Wed Dec 12 15:01 2012 bytes 18M 16M Double 14M 12M * 10M 8M 6M BLACKHOLE 4M 2M 0M 0.0 0.0 0.0 0.1 0.1 0.1 0.1 0.1 0.2 0.2 seconds 11

Memory Profiling Profiling results for the optimized program > ghc --make -RTS -rtsopts -prof -fforce-recomp -O Sum.hs > ./Sum 1e6 +RTS -hy -p -K100M > hp2ps -e8in -c Sum.hp Sum 1e6 +RTS -p -hy -K100M 2,377 bytes x seconds Wed Dec 12 15:02 2012 bytes ARR_WORDS 30k [] 25k MUT_ARR_PTRS_CLEAN Handle__ 20k MUT_VAR_CLEAN ->[] 15k Buffer WEAK 10k IO ForeignPtrContents 5k (,) 0k 0.0 0.0 0.0 0.1 0.1 0.1 0.1 seconds 12

The Problem Too Many Allocation of Double objects The cause: Too many thunks allocated for lazily computed values mysum :: Double -> Double mysum n = myfoldl (+) 0 [1..n] myfoldl :: (a -> b -> a) -> a -> [b] -> a myfoldl f z0 xs0 = lgo z0 xs0 where lgo z [] = z lgo z (x:xs) = lgo (f z x) xs In our example the computation of Double values is delayed by the calls to lgo . 13

Intermezzo Call-by-Value Call-by-Need Arguments of a function call Arguments of a function call are fully evaluated are not evaluated before the invocation. before the invocation. Instead, a pointer (thunk) to the code is created, and, once evaluated, the value is memoized. Thunk (Urban Dictionary): To sneak up on someone and bean him with a heavy blow to the back of the head. “Jim got thunked going home last night. Serves him right for walking in a dark alley with all his paycheck in his pocket.” 14

How to thunk a thunk • Apply its delayed value as a function; • Examine its value in a case -expression. case p of (a, b) -> f a b p will be evaluated to the weak-head normal form , sufficient to examine whether it is a pair. However, its components will remain unevaluated (i.e., thunks). Remark: Only evaluation of boxed values can be delayed via thunks. 15

Our Example from CBN’s Perspective mysum :: Double -> Double mysum n = myfoldl (+) 0 [1..n] myfoldl :: (a -> b -> a) -> a -> [b] -> a myfoldl f z0 xs0 = lgo z0 xs0 where lgo z [] = z lgo z (x:xs) = lgo (f z x) xs mysum 3 myfoldl (+) 0 (1:2:3:[]) ⇒ = lgo z1 (1:2:3:[]) z1 -> 0 ⇒ = lgo z2 (2:3:[]) z2 -> 1 + !z1 ⇒ = lgo z3 (3:[]) z3 -> 2 + !z2 ⇒ = lgo z4 [] z4 -> 3 + !z3 ⇒ = !z4 ⇒ = Now GC can do the job... 16

Getting Rid of Redundant Thunks Obvious Solution: Replace CBN by CBV, so no need in thunk. Obvious Problem: The semantics of a “lazy” program can change unpredictably. f x e = if x > 0 then x + 1 else e f 5 (error “Urk” ) 17

Getting Rid of Redundant Thunks Let’s reformulate: Replace CBN by CBV only for strict functions, i.e., those that always evaluate their argument to the WHNF. f x e = if x > 0 then x + 1 else e f 5 (error “Urk” ) • f is strict in x • f is non-strict (lazy) in e 18

A Convenient Definition of Strictness Definition: A function f of one argument is strict iff f undefined = undefined Strictness is formulated similarly for functions of multiple arguments. f x e = if x > 0 then x + 1 else e f 5 (error “Urk” ) 19

Enforcing CBV for Function Calls Worker/Wrapper Transformation Splitting a function into two parts f :: (Int, Int) -> Int f p = e ⇓ f :: (Int, Int) -> Int f p = case p of (a, b) -> $wf a b $wf :: Int -> Int -> Int $wf a b = let p = (a, b) in e • The worker does all the job, but takes unboxed; • The wrapper serves as an impedance matcher and inlined at every call site. 20

Some Redundant Job Done? f :: (Int, Int) -> Int f p = case p of (a, b) -> $wf a b $wf :: Int -> Int -> Int $wf a b = let p = (a, b) in e • f takes the pair apart and passes components to $wf; • $wf construct the pair again. 21

Strictness to the Rescue A strict function always examines its parameter. So, we just rely on a smart rewriter of case -expressions. f :: (Int, Int) -> Int f p = ( case p of (a, b) -> a) + 1 ⇓ f :: (Int, Int) -> Int f p = case p of (a, b) -> $wf a $wf :: Int -> Int $wf a = let p = (a, error “Urk” ) in ( case p of (a, b) -> a) + 1 22

Strictness to the Rescue A strict function always examines its parameter. So, we just rely on a smart rewriter of case -expressions. f :: (Int, Int) -> Int f p = ( case p of (a, b) -> a) + 1 ⇓ f :: (Int, Int) -> Int f p = case p of (a, b) -> $wf a $wf :: Int -> Int $wf a = a + 1 23

Our Example Step 1: Inline myfoldl mysum :: Double -> Double mysum n = myfoldl (+) 0 [1..n] myfoldl :: (a -> b -> a) -> a -> [b] -> a myfoldl f z0 xs0 = lgo z0 xs0 where lgo z [] = z lgo z (x:xs) = lgo (f z x) xs 24

Our Example Step 2: Analyze Strictness and Absence mysum :: Double -> Double mysum n = lgo 0 n where lgo :: Double -> [Double] -> Double lgo z [] = z lgo z (x:xs) = lgo (z + x) xs Result: lgo is strict in its both arguments 25

Our Example Step 3: Worker/Wrapper Split mysum :: Double -> Double mysum n = lgo 0 n where lgo :: Double -> [Double] -> Double lgo z [] = z lgo z (x:xs) = lgo (z + x) xs 26

Our Example Step 3: Worker/Wrapper Split mysum :: Double -> Double mysum n = lgo 0 n where lgo :: Double -> [Double] -> Double lgo z xs = case z of D# d -> $wlgo d xs $wlgo :: Double# -> [Double] -> Double $wlgo d [] = D# d $wlgo d (x:xs) = lgo ((D# d) + x) xs $wlgo takes unboxed doubles as an argument. 27

Our Example Step 4: Inline lgo in the Worker mysum :: Double -> Double mysum n = lgo 0 n where lgo :: Double -> [Double] -> Double lgo z xs = case z of D# d -> $wlgo d xs $wlgo :: Double# -> [Double] -> Double $wlgo d [] = D# d $wlgo d (x:xs) = lgo ((D# d) + x) xs 28

Static Analysis and Code Optimizations in Glasgow Haskell Compiler - PowerPoint PPT Presentation

Static Analysis and Code Optimizations in Glasgow Haskell Compiler Ilya Sergey ilya.sergey@gmail.com 12.12.12 1 The Goal Discuss what happens when we run ghc -O MyProgram.hs 2 The Plan Recall how laziness is implemented in GHC and

Static Analysis of Haskell Neil Mitchell http://ndmitchell.com Static Analysis is getting

Principles of Program Analysis An overview of approaches beyond loop analysis and optimizations

An Integrated Code Generator for the Glasgow Haskell Compiler Jo ao Dias, Simon Marlow,

Static vs. Dynamic Analysis Static analysis: analyze source code or byte code Imprecise

Static analysis of OpenAFS code base Cheyenne Wills OpenAFS 2019 Workshop Overview What is

Static Code Analysis of Complex PHP Application Vulnerabilities Johannes Dahse Static Code

Bridging the Semantic Gap Through Static Code Analysis Christian Schneider, Jonas Pfoh, Claudia

Static and dynamic verification Software inspections Concerned with analysis of the static

Static Code Analysis on Networking Code: Identifying the capabilities of finding implementation

Loop Optimizations Important because lots of execution Loop Optimizations Loop Optimizations

Static and dynamic verification Software inspections Concerned with analysis of the

PROGR OGRAMMING NG IN N HA HASKE KELL LL Chapter 2 - First Steps 0 Glasgow Haskell

Dynamic Optimizations Last time Predication and speculation Today Dynamic compilation

Static execution-time analysis CPU speed model Bounds on Static (Sub)program code exec time

Dataflow Analysis Iterative Data-flow Analysis and Static-Single-Assignment cs5363 1

Source Code Analysis for Security through LLVM Lu Zhao HP Fortify lu.zhao@hp.com Static Code

Background Costs of attending college keep rising Declining family incomes Students are

NUMA Implication for Storage I/O Throughput in Modern Servers Shoaib Akram, Manolis Marazakis,

MENTAL HEALTH FIRST AID: MORE VALUABLE THAN CPR? WALTER P. SCHEFFE 2019 CPE SERIES CLARK BISHOP,

Dr.BrianEgan,DepartmentofGeography,SimonFraserUniversity

Perinatology Care of the mother and fetus during pregnancy, labor, delivery, and early neonatal

Pre-Test Imminent Death: Recognition & Management Robert M. Taylor, MD Medical Director,

Burns Children burn at much lower temperatures in a shorter amount of time Children are more

Synthesizing a Representative Critical Path for Post-Silicon Delay Prediction Qunzeng Liu and

Static Analysis and Code Optimizations in Glasgow Haskell Compiler - PowerPoint PPT Presentation

Static Analysis and Code Optimizations in Glasgow Haskell Compiler Ilya Sergey ilya.sergey@gmail.com 12.12.12 1 The Goal Discuss what happens when we run ghc -O MyProgram.hs 2 The Plan Recall how laziness is implemented in GHC and

Static Analysis of Haskell Neil Mitchell http://ndmitchell.com Static Analysis is getting

Principles of Program Analysis An overview of approaches beyond loop analysis and optimizations

An Integrated Code Generator for the Glasgow Haskell Compiler Jo ao Dias, Simon Marlow,

Static vs. Dynamic Analysis Static analysis: analyze source code or byte code Imprecise

Static analysis of OpenAFS code base Cheyenne Wills OpenAFS 2019 Workshop Overview What is

Static Code Analysis of Complex PHP Application Vulnerabilities Johannes Dahse Static Code

Bridging the Semantic Gap Through Static Code Analysis Christian Schneider, Jonas Pfoh, Claudia

Static and dynamic verification Software inspections Concerned with analysis of the static

Static Code Analysis on Networking Code: Identifying the capabilities of finding implementation

Loop Optimizations Important because lots of execution Loop Optimizations Loop Optimizations

Static and dynamic verification Software inspections Concerned with analysis of the

PROGR OGRAMMING NG IN N HA HASKE KELL LL Chapter 2 - First Steps 0 Glasgow Haskell

Dynamic Optimizations Last time Predication and speculation Today Dynamic compilation

Static execution-time analysis CPU speed model Bounds on Static (Sub)program code exec time

Dataflow Analysis Iterative Data-flow Analysis and Static-Single-Assignment cs5363 1

Source Code Analysis for Security through LLVM Lu Zhao HP Fortify lu.zhao@hp.com Static Code

Background Costs of attending college keep rising Declining family incomes Students are

NUMA Implication for Storage I/O Throughput in Modern Servers Shoaib Akram, Manolis Marazakis,

MENTAL HEALTH FIRST AID: MORE VALUABLE THAN CPR? WALTER P. SCHEFFE 2019 CPE SERIES CLARK BISHOP,

Dr.BrianEgan,DepartmentofGeography,SimonFraserUniversity

Perinatology Care of the mother and fetus during pregnancy, labor, delivery, and early neonatal

Pre-Test Imminent Death: Recognition &amp; Management Robert M. Taylor, MD Medical Director,

Burns Children burn at much lower temperatures in a shorter amount of time Children are more

Synthesizing a Representative Critical Path for Post-Silicon Delay Prediction Qunzeng Liu and

Pre-Test Imminent Death: Recognition & Management Robert M. Taylor, MD Medical Director,