Yhc: The York Haskell Compiler By Tom Shackell What? Yhc is a - - PowerPoint PPT Presentation

yhc the york haskell compiler
SMART_READER_LITE
LIVE PREVIEW

Yhc: The York Haskell Compiler By Tom Shackell What? Yhc is a - - PowerPoint PPT Presentation

Yhc: The York Haskell Compiler By Tom Shackell What? Yhc is a rewrite of the back end of the nhc98 system. The back-end of the compiler is replaced. The runtime system is replaced. The instruction set is different. The


slide-1
SLIDE 1

Yhc: The York Haskell Compiler

By Tom Shackell

slide-2
SLIDE 2

What?

  • Yhc is a rewrite of the back end of the nhc98

system.

  • The back-end of the compiler is replaced.
  • The runtime system is replaced.
  • The instruction set is different.
  • The Prelude is heavily modified.
slide-3
SLIDE 3

Why?

  • It was written to address some issues with the

nhc98 back end.

  • In particular: The high bit problem.
  • Also as an experiment: Can we make nhc98

more portable?

slide-4
SLIDE 4

The High Bit Problem

slide-5
SLIDE 5

Graph Reduction

  • Lazy functional languages are usually

implemented using graph reduction.

  • Haskell expressions are represented by graphs.
  • The expression 'sum [1,2]' might be represented

by the graph:

sum : 1

sum :: [Int] -> Int sum [] = 0 sum (x:xs) = x + sum xs

: 2 [ ]

slide-6
SLIDE 6

Reduction

sum : 1 : 2 [ ]

slide-7
SLIDE 7

Reduction

sum : 1 : 2 [ ]

slide-8
SLIDE 8

Reduction

sum : 1 : 2 [ ] 3

slide-9
SLIDE 9

Reduction

IND 3

slide-10
SLIDE 10

Heap Node

We can see there are 4 types of graph node

: Constructor sum Thunk Blackholed Thunk IND Indirection

In nhc and Yhc these graph nodes are represented with 4 types of heap node

sum

slide-11
SLIDE 11

Heap Nodes in nhc

Constructor Information

10

Function Information Pointer

1

Function Information Pointer

1 1

Redirection Pointer

00

Constructor Thunk Blackholed Thunk Indirection sum

slide-12
SLIDE 12

The “High Bit” problem

Constructor Information

10

Function Information Pointer

1

Function Information Pointer

1 1

Redirection Pointer

00

Constructor Thunk Blackholed Thunk Indirection

  • nhc assumes that it can use the topmost bit of a pointer to store information.
  • This is not always the case: many modern Linux-x86 kernels allocate

memory in addresses too high to fit in 31bits.

slide-13
SLIDE 13

Heap Nodes in Yhc

Constructor Information Pointer

01

Function Information Pointer

1

Function Information Pointer

1

Redirection Pointer

00

Constructor Thunk Blackholed Thunk Indirection

1

  • Yhc makes sure that all FInfo structures are 4 byte aligned. Freeing up a bit

at the bottom for Thunk nodes.

  • It also represents constructors by using a pointer to the information about

the constructor, rather than encoding the information into the heap word.

slide-14
SLIDE 14

Instruction Sets

  • The instruction set for Yhc is much simpler than

for nhc.

  • Both are based on stack machines.
  • However, nhc has instructions for directly

manipulating both the heap and the stack.

  • Where as Yhc only directly manipulates the

stack.

slide-15
SLIDE 15

Instructions

main :: IO () main = putStrLn (show 42)

nhc instructions

main(): HEAP_CVAL show HEAP_INT 42 PUSH_HEAP HEAP_CVAL putStrLn HEAP_OFF -3 RETURN_EVAL

Yhc instructions

main(): PUSH_INT 42 MK_AP show MK_AP putStrLn RETURN_EVAL

slide-16
SLIDE 16

nhc instructions

main(): HEAP_CVAL show HEAP_INT 42 PUSH_HEAP HEAP_CVAL putStrLn HEAP_OFF -3 RETURN_EVAL

Stack Heap nhc instructions

main(): HEAP_CVAL show HEAP_INT 42 PUSH_HEAP HEAP_CVAL putStrLn HEAP_OFF -3 RETURN_EVAL

Constants

slide-17
SLIDE 17

Stack Heap nhc instructions

main(): HEAP_CVAL show HEAP_INT 42 PUSH_HEAP HEAP_CVAL putStrLn HEAP_OFF -3 RETURN_EVAL

show Constants

slide-18
SLIDE 18

Constants Stack Heap nhc instructions

main(): HEAP_CVAL show HEAP_INT 42 PUSH_HEAP HEAP_CVAL putStrLn HEAP_OFF -3 RETURN_EVAL

show 42

slide-19
SLIDE 19

Constants Stack Heap nhc instructions

main(): HEAP_CVAL show HEAP_INT 42 PUSH_HEAP HEAP_CVAL putStrLn HEAP_OFF -3 RETURN_EVAL

show 42

slide-20
SLIDE 20

Constants Stack Heap nhc instructions

main(): HEAP_CVAL show HEAP_INT 42 PUSH_HEAP HEAP_CVAL putStrLn HEAP_OFF -3 RETURN_EVAL

show 42 putStrLn

slide-21
SLIDE 21

Constants Stack Heap nhc instructions

main(): HEAP_CVAL show HEAP_INT 42 PUSH_HEAP HEAP_CVAL putStrLn HEAP_OFF -3 RETURN_EVAL

show 42 putStrLn

slide-22
SLIDE 22

Constants Stack Heap nhc instructions

main(): HEAP_CVAL show HEAP_INT 42 PUSH_HEAP HEAP_CVAL putStrLn HEAP_OFF -3 RETURN_EVAL

show 42 putStrLn

slide-23
SLIDE 23

Stack Heap Yhc instructions

main(): PUSH_INT 42 MK_AP show MK_AP putStrLn RETURN_EVAL

slide-24
SLIDE 24

Stack Heap Yhc instructions

main(): PUSH_INT 42 MK_AP show MK_AP putStrLn RETURN_EVAL

42

slide-25
SLIDE 25

Stack Heap Yhc instructions

main(): PUSH_INT 42 MK_AP show MK_AP putStrLn RETURN_EVAL

42 show

slide-26
SLIDE 26

Stack Heap Yhc instructions

main(): PUSH_INT 42 MK_AP show MK_AP putStrLn RETURN_EVAL

42 show putStrLn

slide-27
SLIDE 27

Stack Heap Yhc instructions

main(): PUSH_INT 42 MK_AP show MK_AP putStrLn RETURN_EVAL

42 show putStrLn

slide-28
SLIDE 28

Comparison

  • Yhc uses less instructions to do the same thing.
  • Because it doesn't have to have explicit

movements between heap and stack.

  • ... and because it can reference other nodes

implicitly rather than using explicit heap offsets.

  • Yhc instructions are also smaller
  • Because it has more 'specializations'
  • ... and again, because heap references are implicit
  • These two factors make Yhc about 20% faster

than nhc

slide-29
SLIDE 29

Improving Portability

slide-30
SLIDE 30

Bytecode in nhc

  • nhc compiles Haskell functions into a bytecode

for an abstract machine that manipulates graphs: The G-Machine.

  • The bytecode is placed in a C source file, using

an array of bytes. The C source file is then compiled and linked with the nhc interpreter to form an executable.

unsigned char[] FN_Prelude_46sum = { NEEDHEAP_I32, HEAP_CVAL_I3, HEAP_ARG, 1, HEAP_CVAL_I4, HEAP_ARG, 1, HEAP_CVAL_I5, HEAP_OFF_N1, 3, HEAP_CADR_N1, 1, PUSH_HEAP, HEAP_CVAL_P1, 6, HEAP_OFF_N1, 8, HEAP_OFF_N1, 5, RETURN, ENDCODE };

slide-31
SLIDE 31

Portable?

  • The C code is portable, isn't it?
  • Yes, but:
  • It creates a dependency on a C compiler.
  • There are issues with the nuances of various C

compilers.

  • The bytecode can't be loaded dynamically.
slide-32
SLIDE 32

Improved Portability.

  • Yhc also compiles Haskell functions into bytecode

instructions for a G-Machine.

  • However, Yhc places the bytecodes in a separate

file which is then loaded by the interpretter at

  • runtime. Similar to Java's classfile system.
  • More portable, but it means Yhc has to do its own

linking.

slide-33
SLIDE 33

More Portable Still?

  • Can we extend portability to include portability
  • ver a network?
  • Then we could take a closure on one machine

and have it run on another machine.

  • Not implemented yet, but some interesting ideas.
slide-34
SLIDE 34

Computer A Computer B calc data

slide-35
SLIDE 35

Computer A Computer B calc data calc data

slide-36
SLIDE 36

Computer A Computer B calc data calc data

slide-37
SLIDE 37

Computer A Computer B calc data

slide-38
SLIDE 38

Computer A Computer B calc data

slide-39
SLIDE 39

Computer A Computer B calc data

slide-40
SLIDE 40

Computer A Computer B calc data Need calc

slide-41
SLIDE 41

Computer A Computer B calc data Need calc

slide-42
SLIDE 42

Computer A Computer B calc data Need calc

slide-43
SLIDE 43

Computer A Computer B calc data Need calc calc

calc(x): PUSH_ARG x PUSH_CONST subcalc MK_AP iter RETURN_EVAL

slide-44
SLIDE 44

Computer A Computer B calc data calc

calc(x): PUSH_ARG x PUSH_CONST subcalc MK_AP iter RETURN_EVAL

slide-45
SLIDE 45

Computer A Computer B calc data calc

calc(x): PUSH_ARG x PUSH_CONST subcalc MK_AP iter RETURN_EVAL

slide-46
SLIDE 46

Computer A Computer B calc data calc

calc(x): PUSH_ARG x PUSH_CONST subcalc MK_AP iter RETURN_EVAL

slide-47
SLIDE 47

Computer A Computer B calc data calc

calc(x): PUSH_ARG x PUSH_CONST subcalc MK_AP iter RETURN_EVAL

iter subcalc

slide-48
SLIDE 48

Computer A Computer B IND data iter subcalc

slide-49
SLIDE 49

Computer A Computer B IND data iter subcalc Need iter

slide-50
SLIDE 50

Computer A Computer B IND data iter subcalc And so on ...

slide-51
SLIDE 51

Computer A Computer B IND 42 IND

slide-52
SLIDE 52

Computer A Computer B IND 42 IND Result

slide-53
SLIDE 53

Computer A Computer B 42 Result

slide-54
SLIDE 54

Computer A Computer B 42 Result

slide-55
SLIDE 55

Computer A Computer B 42 Result

slide-56
SLIDE 56

Computer A Computer B 42 Result calc data

slide-57
SLIDE 57

Computer A Computer B 42 Result IND

slide-58
SLIDE 58

Challenges

  • Needs concurrency to be useful.
  • Complicates Garbage collection.
  • Level of granularity versus laziness.
  • Possible architecture differences.
slide-59
SLIDE 59

Other Things!

  • Other people have written various interpretters and

backends for Yhc bytecode: Java, Python, .NET

  • ... and various related tools such as interactive

interpretters.

  • I'm also using Yhc to do my Hat G-Machine work.
slide-60
SLIDE 60

Questions?