Resource Allocation for Hardware Implementations of Map Richard - - PowerPoint PPT Presentation

resource allocation for hardware implementations of map
SMART_READER_LITE
LIVE PREVIEW

Resource Allocation for Hardware Implementations of Map Richard - - PowerPoint PPT Presentation

Resource Allocation for Hardware Implementations of Map Richard Townsend Martha A. Kim Stephen A. Edwards Columbia University ASBD Workshop, June 15, 2014 Functional Programs to Functional Hardware Functional program (Haskell) Compiler


slide-1
SLIDE 1

Resource Allocation for Hardware Implementations of Map

Richard Townsend Martha A. Kim Stephen A. Edwards

Columbia University

ASBD Workshop, June 15, 2014

slide-2
SLIDE 2

Functional Programs to Functional Hardware

Functional program (Haskell) Compiler Circuit (Verilog)

slide-3
SLIDE 3

Functional Programs to Functional Hardware

Map f [x1,x2,...,xn]

This Talk

?

Ordered List

slide-4
SLIDE 4

Functional Programs to Functional Hardware

Map f [x1,x2,...,xn]

This Talk

?

Ordered List

fold scan

Order Dependent

slide-5
SLIDE 5

Functional Map vs. MapReduce

slide-6
SLIDE 6

Functional Map vs. MapReduce

(0,0) (0,1) (0,2) (0,3) (1,0) (1,1) (1,2) (1,3) (2,0) (2,1) (2,2) (2,3) (3,0) (3,1) (3,2) (3,3)

slide-7
SLIDE 7

Functional Map vs. MapReduce

(0,0) (0,1) (0,2) (0,3) (1,0) (1,1) (1,2) (1,3) (2,0) (2,1) (2,2) (2,3) (3,0) (3,1) (3,2) (3,3)

Ordered

slide-8
SLIDE 8

Functional Map vs. MapReduce

(0,0) (0,1) (0,2) (0,3) (1,0) (1,1) (1,2) (1,3) (2,0) (2,1) (2,2) (2,3) (3,0) (3,1) (3,2) (3,3)

Ordered Unordered

slide-9
SLIDE 9

Structure of a Hardware Implementation

f

slide-10
SLIDE 10

Structure of a Hardware Implementation

f f f f f

slide-11
SLIDE 11

Structure of a Hardware Implementation

f f f f f

slide-12
SLIDE 12

Structure of a Hardware Implementation

f f f f f x1 x2 x3 x4 x5 x6 x7 x8 x9 x10

slide-13
SLIDE 13

Structure of a Hardware Implementation

f f f f f x1 x2 x3 x4 x5 x6 x7

slide-14
SLIDE 14

Structure of a Hardware Implementation

f f f f f x1 x2 x3 x4 x5 x6 x7 x8 x9 x10

slide-15
SLIDE 15

Structure of a Hardware Implementation

f f f f f x6 x7 x8 x9 x10 f (x1) x2 x4 x5 f (x3)

slide-16
SLIDE 16

Structure of a Hardware Implementation

f f f f f f (x1) x2 x4 x5 f (x3) x6 x7 x8 x9 x10

slide-17
SLIDE 17

Structure of a Hardware Implementation

f f f f f x2 x4 x5 f (x3) x6 x7 x8 x9 x10 f (x1)

slide-18
SLIDE 18

Structure of a Hardware Implementation

f f f f f x2 x4 x5 f (x3) x6 x7 x8 x9 x10 f (x1) f (x7)

slide-19
SLIDE 19

Structure of a Hardware Implementation

f f f f f x2 x4 x5 f (x3) x6 x7 x8 x9 x10 f (x1) f (x7) still processing

slide-20
SLIDE 20

Structure of a Hardware Implementation

f f f f f x2 x4 x5 f (x3) x6 x7 x8 x9 x10 f (x1) f (x7) still processing stuck in buffer

slide-21
SLIDE 21

Structure of a Hardware Implementation

f f f f f x2 x4 x5 f (x3) x6 x7 x8 x9 x10 f (x1) f (x7) still processing stuck in buffer holding and stalling

slide-22
SLIDE 22

Structure of a Hardware Implementation

f f f f f

slide-23
SLIDE 23

Structure of a Hardware Implementation

f f f f f More Functional Units (Parallelism) More Buffers (Utilization)

slide-24
SLIDE 24

Multiple Possible Configurations...Which to Choose?

Area = 15

slide-25
SLIDE 25

Multiple Possible Configurations...Which to Choose?

Area = 15 Buffers 50% size of func. unit

slide-26
SLIDE 26

Multiple Possible Configurations...Which to Choose?

Area = 15

f f f f f f f f f f f f f f f

15 Functional Units

slide-27
SLIDE 27

Multiple Possible Configurations...Which to Choose?

Area = 15

f

28 Buffers

slide-28
SLIDE 28

Multiple Possible Configurations...Which to Choose?

Area = 15

f f f f f

20× 1

2 = 10

5

f f f

24× 1

2 = 12

3

slide-29
SLIDE 29

Workload Structure

f f f

slide-30
SLIDE 30

Workload Structure

f f f

Best-case

slide-31
SLIDE 31

Workload Structure

f f f

Best-case

Time f f f

slide-32
SLIDE 32

Workload Structure

f f f

Best-case

Time f f f

Average-case

?

slide-33
SLIDE 33

Workload Structure

f f f

Best-case

Time f f f

Average-case

?

Worst-case

slide-34
SLIDE 34

Workload Structure

f f f

Best-case

Time f f f

Average-case

?

Worst-case

f f f

slide-35
SLIDE 35

Workload Structure

f f f

Best-case

Time f f f

Average-case

?

Worst-case

f f f

slide-36
SLIDE 36

Optimal Resource Allocation

0% 20% 40% 60% 80% 100% 5 10 15 20 25 30 Completion Time 50 100 150 200 250 Buffer Slots per Functional Unit

Simulator Results

slide-37
SLIDE 37

Optimal Resource Allocation

0% 20% 40% 60% 80% 100% 5 10 15 20 25 30 Completion Time 50 100 150 200 250 Buffer Slots per Functional Unit

Simulator Results 2x speedup

Maximizing Functional Units

slide-38
SLIDE 38

Optimal Resource Allocation

0% 20% 40% 60% 80% 100% 5 10 15 20 25 30 Completion Time 50 100 150 200 250 Buffer Slots per Functional Unit

Simulator Results 3x speedup

Maximizing Buffers

slide-39
SLIDE 39

Optimal Resource Allocation

0% 20% 40% 60% 80% 100% 5 10 15 20 25 30 Completion Time 50 100 150 200 250 Buffer Slots per Functional Unit

f f f f f

slide-40
SLIDE 40

Optimal Resource Allocation

0% 20% 40% 60% 80% 100% 5 10 15 20 25 30 Completion Time 50 100 150 200 250 Buffer Slots per Functional Unit

f f f f f

Fewer Buffers Slower

slide-41
SLIDE 41

Why Are There Multiple Optima?

0% 20% 40% 60% 80% 100% 5 10 15 20 25 30 Completion Time

slide-42
SLIDE 42

Why Are There Multiple Optima?

0% 20% 40% 60% 80% 100% 5 10 15 20 25 30 Completion Time

12 5

7 10

slide-43
SLIDE 43

Why Are There Multiple Optima?

0% 20% 40% 60% 80% 100% 5 10 15 20 25 30 Completion Time

12 5

7 10 11 6 6 11

slide-44
SLIDE 44

Performance Scales with Area

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 10 20 30 40 50 60 Completion Time Total Area

Ideal Minimum

slide-45
SLIDE 45

Performance Scales with Area

0% 20% 40% 60% 80% 100% 10 20 30 40 50 60 Completion Time 10 20 30 10 20 30 40 50 60 Buffers Slots / Functional Unit 2 4 6 8 10 10 20 30 40 50 60 Functional Units Total Area

slide-46
SLIDE 46

Performance Scales with Area

0% 20% 40% 60% 80% 100% 10 20 30 40 50 60 Completion Time 10 20 30 10 20 30 40 50 60 Buffers Slots / Functional Unit 2 4 6 8 10 12 10 20 30 40 50 60 Functional Units Total Area

f f

slide-47
SLIDE 47

Conclusions

0% 20% 40% 60% 80% 100% 5 10 15 20 25 30 Completion Time 50 100 150 200 250 Buffer Slots per Functional Unit

Area trade-off is important...

slide-48
SLIDE 48

Conclusions

0% 20% 40% 60% 80% 100% 5 10 15 20 25 30 Completion Time 50 100 150 200 250 Buffer Slots per Functional Unit

Area trade-off is important...

f f f f f f f f

?

...and non-obvious

slide-49
SLIDE 49

Conclusions

0% 20% 40% 60% 80% 100% 5 10 15 20 25 30 Completion Time 50 100 150 200 250 Buffer Slots per Functional Unit

Area trade-off is important...

f f f f f f f f

?

...and non-obvious

f f f f f x1 x2 x3 x4 x5 x6 x7 x8 x9 x10

Model helps explore design space

slide-50
SLIDE 50

Conclusions

0% 20% 40% 60% 80% 100% 5 10 15 20 25 30 Completion Time 50 100 150 200 250 Buffer Slots per Functional Unit

Area trade-off is important...

f f f f f f f f

?

...and non-obvious

f f f f f x1 x2 x3 x4 x5 x6 x7 x8 x9 x10

Model helps explore design space

Synthesize Efficient Hardware Implementation of Map

slide-51
SLIDE 51

Conclusions

0% 20% 40% 60% 80% 100% 5 10 15 20 25 30 Completion Time 50 100 150 200 250 Buffer Slots per Functional Unit

Area trade-off is important...

f f f f f f f f

?

...and non-obvious

f f f f f x1 x2 x3 x4 x5 x6 x7 x8 x9 x10

Model helps explore design space

Synthesize Efficient Hardware Implementation of Map

Map Fold Scan

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 10 20 30 40 50 60 Completion Time Total Area

Enhance our abstraction