Neil Mitchell www.cs.york.ac.uk/~ndm/ The Problem Count the - - PowerPoint PPT Presentation

neil mitchell cs york ac uk ndm the problem count the
SMART_READER_LITE
LIVE PREVIEW

Neil Mitchell www.cs.york.ac.uk/~ndm/ The Problem Count the - - PowerPoint PPT Presentation

Fastest Lambda First Neil Mitchell www.cs.york.ac.uk/~ndm/ The Problem Count the number of lines in a file = 0 test = 1 test\n = 1 test\ntest = 2 Read from the console Using getchar


slide-1
SLIDE 1

Fastest Lambda First

Neil Mitchell

www.cs.york.ac.uk/~ndm/

λ

slide-2
SLIDE 2

The Problem

 Count the number of lines in a file

– “”

= 0

– “test”

= 1

– “test\n”

= 1

– “test\ntest”

= 2

 Read from the console

– Using getchar only – No buffering

slide-3
SLIDE 3

The Haskell

main = print . length . lines =<< getContents

 getContents :: IO String  lines :: String →

[String]

 length :: [a] → Int  print :: Show a ⇒

a → String

slide-4
SLIDE 4

The C

int main() { int count = 0, last_newline = 1, c; while ((c = getchar()) != EOF) { if (last_newline) count++; last_newline = (c == '\n'); } printf("%i\n", count); return 0; } /* Is this correct? */

Thanks to Andrew Wilkinson

slide-5
SLIDE 5

The Results

1 2 3 4 5 6 7 8 9 10 C Supero GHC

slide-6
SLIDE 6

Disclaimer Slide

 Uses GHC as a backend

– GHC does some really cool optimisation – Inlining, strictness, unboxing

 Only one benchmark presented

– Promising results on others, but not enough yet

slide-7
SLIDE 7

Other Benchmarks

 Three results

– wc -c

13% faster GHC, 3% slower C

– wc -l

47% faster GHC, 2% slower C

– wc -w

70% faster GHC, 20% slower C

 All very similar programs…

slide-8
SLIDE 8

Overview

 Different approach  First order code  First order code without data  Termination  What could be improved  Conclusion

slide-9
SLIDE 9

Whole program analysis

 Look at all the code at once  Done by a few compilers (MLton, JHC)  Usually compilation is really slow  Linking is whole-program  Mine is quite quick

slide-10
SLIDE 10

Bullets versus a nuclear bomb

 Most (all?) optimising compilers use “bullets”

– Small, targeted transformations – Hit programs with a hail of bullets

 I use one single optimisation

– No issues of “enabling transformations” – No optimisation “dials” – No “swings and roundabouts”

slide-11
SLIDE 11

Alpha Renaming

 Some optimisers rely on special names

– foldr/build – stream/unstream

 Achieves good practical results

– Limits what can be optimised well – Requires functions to be defined unnaturally – They tend to go wrong (take in GHC 6.6)

slide-12
SLIDE 12

First Order Haskell

 Remove all lambda abstractions (lambda lift)  Leaving only partial application/currying

  • dd = (.) not even

(.) f g x = f (g x)

 Generate templates (specialised fragments)

slide-13
SLIDE 13

Oversaturation

f x y z, where arity(f) < 3 main = odd 12 <odd _> x = (.) not even x main = <odd _> 12

slide-14
SLIDE 14

Undersaturation

f x (g y) z, where arity(g) > 1 <odd _> x = (.) not even x <(.) not even _> x = not (even x) <odd _> x = <(.) not even _> x

slide-15
SLIDE 15

Special Rules

let z = f x y, where arity(f) > 2 (let-under)

– inline z, after sharing x and y

d = Ctor (f x) y, where arity(f) > 1 (ctor-under)

– inline d – The “dictionary” rule

slide-16
SLIDE 16

Standard Rules

 let x = (let y = z in q) in …

(let/let)

 case (let x = y in z) of …

(case/let)

 case (case x of …) of …

(case/case)

 (case x of …) y z

(app/case)

 case C x of …

(case/ctor)

slide-17
SLIDE 17

Removing functions

\x → head x f x Application Closure head x

slide-18
SLIDE 18

Removing data

x : xs case x of … Consumption Production …

slide-19
SLIDE 19

Church Encoding

data List a = Nil | Cons a (List a) len x = case x of Nil → Cons y ys → 1 + len ys nil = \n c → n cons x xs = \n c → c x xs len x = x (\y ys → 1 + len ys)

Efficient Interpretation by Transforming Data Types and Patterns to Functions, TFP 2006

slide-20
SLIDE 20

Optimisation Algorithm

1.

Remove higher-order functions

2.

Church encode

3.

Remove higher-order functions

slide-21
SLIDE 21

Proof: It doesn’t work

 A program has no data, and no functions  Implies its not Turing complete!  Linear Bounded Turing Machine  Therefore, removing HO cannot be perfect

slide-22
SLIDE 22

Failing Example

showPosInt x = f x “” f 0 acc = acc f i acc = f (i / 10) (c:acc) where c = ord ‘0’ + (i % 10)

 Requires a buffer O(log10 n)  Cannot be removed automatically

slide-23
SLIDE 23

Failing pleasantly

 Keep running  At some point, stop

– 1000 new functions created – 100 based on a particular function – Some particular name recurring

 Leaves higher-order functions around

slide-24
SLIDE 24

Failing Church Encoding

 Church encoding requires rank-2 types

– Cannot be inferred automatically – Makes some things more complex

 Why not merely “pretend” Church Encode

– Failure is now left-over data – Much more pleasant

Thanks to Tom Shackell Pretend we are Church encoding

slide-25
SLIDE 25

Summing the Integers

main n = sum (range 0 n) sum xs = case xs of [] → 0 (y:ys) → y + sum ys range i n = if i > n then [] else i : range (i+1) n

slide-26
SLIDE 26

Undersaturation of Data

 A constructor is higher-order

main n = sum (range 0 n) <sum (range#2)> i n = case range i n of … main n = <sum (range#2)> 0 n

slide-27
SLIDE 27

Oversaturation of Data

 A case is an application

case range i n of {[] → 0; (y:ys) → y + sum ys} <case range#2 {[] → 0; (y:ys) → y+sum ys}> i n = if i > n then 0 else i + sum (range (i+1) n)

slide-28
SLIDE 28

Final Result

main n = sum’ 0 n sum’ i n = range’ i n range’ i n = if i > n then 0 else i + sum’ (i+1) n

 All constructors have disappeared  First-order with Church encoding

slide-29
SLIDE 29

Special Cases

let x = C y z

– inline x, after sharing y and z

let x = f y z, where f produces data

– inlining may break sharing – only if one use of x

slide-30
SLIDE 30

What isn’t Optimised?

 This optimisation does a lot  But doesn’t always produce optimal code  What can we do better?

– Ignore “better algorithms”

slide-31
SLIDE 31

Call overhead

f1 x y = f2 x y f2 x y = f3 y x f3 y x = g x + y

 My optimisation gives loads of these!

GHC is very good at this

slide-32
SLIDE 32

Strictness/Boxing

 Lazy evaluation requires “thunks”  Strictness avoids these thunks  Int is box stored in the heap  Int# is more like a C int

Again, GHC is good at this

slide-33
SLIDE 33

Sharing/lets

g (f x) (f x) ⇒ let y = f x in g y y

 Common sub expression

map (g 100) ys g x y = f x + y

 Strength reduction

Can cause space leaks

slide-34
SLIDE 34

Constant movement

countLines xs = count ‘\n’ xs count n (x:xs) | n == x = 1 + count xs | otherwise = count n xs

 This one remains in linecount example  Should make the Haskell faster

slide-35
SLIDE 35

Can Haskell beat C?

 A question of abstraction

– In C, abstraction is painful – For linecount, not worth it

 Haskell can remove abstraction better than C

– Won’t win on micro-benchmarks (may draw) – May win on real programs

slide-36
SLIDE 36

Faster than C

print . sum . map readInt . lines =<< getContents readInt :: Int → String

 Haskell can optimise sum/readInt  C can’t optimise between them  NB. Not actually tried, yet…

http://shootout.alioth.debian.org/

slide-37
SLIDE 37

More Benchmarks

 Needs refactoring

– Some transformations in Yhc.Core – Some in the optimiser – Don’t glue together nicely

 GHC sometimes “over-optimises”

– Turns getchar into a constant! – Need to integrate with GHC’s IO Monad

slide-38
SLIDE 38

Conclusion

 Haskell can be made faster

– Nearly the speed of C (sometimes) – But always more beautiful

 You can’t draw conclusions from small

benchmarks