Mimer and Schedeval: Tools for Comparing Static Schedulers for - - PowerPoint PPT Presentation

mimer and schedeval tools for comparing static schedulers
SMART_READER_LITE
LIVE PREVIEW

Mimer and Schedeval: Tools for Comparing Static Schedulers for - - PowerPoint PPT Presentation

Mimer and Schedeval: Tools for Comparing Static Schedulers for Streaming Applications on Manycore Architectures Nicolas Melot Johan Janzn Christoph Kessler Dept. of Computer and Inf. Science Linkping, Sweden November 27, 2015 1


slide-1
SLIDE 1

Mimer and Schedeval: Tools for Comparing Static Schedulers for Streaming Applications on Manycore Architectures

Nicolas Melot Johan Janzén Christoph Kessler

1Linköping University

  • Dept. of Computer and Inf. Science

Linköping, Sweden

November 27, 2015

slide-2
SLIDE 2

Introductjon Mimer Schedeval Mimer & Schedeval environment Conclusion References

Outline

1 Introductjon

Streaming Evaluatjon

2 Mimer evaluatjon framework 3 Schedeval streaming framework

Descriptjon Testjng overhead Evaluatjng computatjon tjme Studying energy consumptjon

4 Mimer & Schedeval environment

Data structures Programming Experiment

5 Conclusion

Conclusion Future work

Melot, Janzén, Kessler Mimer and Schedeval November 27, 2015 1 / 21

slide-3
SLIDE 3

Introductjon Mimer Schedeval Mimer & Schedeval environment Conclusion References

Introductjon

High-Performance Computjng on Streams Optjmize for tjme and energy Scheduling Problems Numerous publicatjons On-chip Pipelining (Keller et al. [2012]) Scheduling sequentjal tasks (Pruhs et al. [2008]) Crown Scheduling (Melot et al. [2015])

P1 P2 P3 P4 P5 P6 P7 P8 G4 G5 G6 G7 G3 G2 G1 G15 G14 G13 G12 G11 G10 G9 G8

Melot, Janzén, Kessler Mimer and Schedeval November 27, 2015 2 / 21

slide-4
SLIDE 4

Introductjon Mimer Schedeval Mimer & Schedeval environment Conclusion References

Streaming

Straightgorward: communicatjon through shared off-chip main memory Easy to implement Main memory bandwidth is performance botuleneck On-chip pipelining: communicatjon through small

  • n-chip memories (Keller et al.

[2012]). Trade core to core communicatjon for reduced

  • ff-chip memory accesses
HI56882-27 sv 44 FF2B HI56882-27 sv 44 FF2B HI56882-27 sv 44 FF2B HI56882-27 sv 44 FF2B HI56882-27 sv 44 FF2B HI56882-27 sv 44 FF2B HI56882-27 sv 44 FF2B HI56882-27 sv 44 FF2B ab-78123 FX HI56882-27 sv 44 FF2B HI56882-27 sv 44 FF2B HI56882-27 sv 44 FF2B HI56882-27 sv 44 FF2B HI56882-27 sv 44 FF2B HI56882-27 sv 44 FF2B HI56882-27 sv 44 FF2B HI56882-27 sv 44 FF2B ab-78123 FX HI56882-27 sv 44 FF2B HI56882-27 sv 44 FF2B HI56882-27 sv 44 FF2B HI56882-27 sv 44 FF2B HI56882-27 sv 44 FF2B HI56882-27 sv 44 FF2B HI56882-27 sv 44 FF2B HI56882-27 sv 44 FF2B ab-78123 FX HI56882-27 sv 44 FF2B HI56882-27 sv 44 FF2B HI56882-27 sv 44 FF2B HI56882-27 sv 44 FF2B HI56882-27 sv 44 FF2B HI56882-27 sv 44 FF2B HI56882-27 sv 44 FF2B HI56882-27 sv 44 FF2B ab-78123 FX

< < < < < < <

HI56882-27 sv 44 FF2B HI56882-27 sv 44 FF2B HI56882-27 sv 44 FF2B HI56882-27 sv 44 FF2B HI56882-27 sv 44 FF2B HI56882-27 sv 44 FF2B HI56882-27 sv 44 FF2B HI56882-27 sv 44 FF2B ab-78123 FX HI56882-27 sv 44 FF2B HI56882-27 sv 44 FF2B HI56882-27 sv 44 FF2B HI56882-27 sv 44 FF2B HI56882-27 sv 44 FF2B HI56882-27 sv 44 FF2B HI56882-27 sv 44 FF2B HI56882-27 sv 44 FF2B ab-78123 FX

< < < < < < < Melot, Janzén, Kessler Mimer and Schedeval November 27, 2015 3 / 21

slide-5
SLIDE 5

Introductjon Mimer Schedeval Mimer & Schedeval environment Conclusion References

Optjmize for tjme and energy

Scheduling for performance Compromise between tjme and energy Many features to take into account

Voltage and frequency

Cores grouped in islands Impact dependent on stalls due to memory accesses

Off-chip memory accesses On-chip communicatjons Statjc/dynamic power

Behavior differences between executjon platgorms Pdyn ≈ V · f2 ≈ f3 Edyn ≈ Pdyn · p · t ≈ f3 · p · t

SCC die

DIMM MC MC DIMM MC DIMM MC DIMM tile tile tile tile tile tile tile tile tile tile tile tile tile tile tile tile tile tile tile tile tile tile tile tile R R R R R R R R R R R R R R R R R R R R R R R R

Melot, Janzén, Kessler Mimer and Schedeval November 27, 2015 4 / 21

slide-6
SLIDE 6

Introductjon Mimer Schedeval Mimer & Schedeval environment Conclusion References

Testjng Scheduling techniques

Difficulty to test scheduling techniques Simulator: imperfect, slow Real platgorm: high development efforts Difficulty to compare between techniques Access to experimental datasets

Tailor-made for a paper (Xu et al. [2012]) Lack of adaptability (Kasahara [2004])

Access to raw results Access to result processing and representatjon tools Mimer: Testjng and interpretjng framework, raw and structured data Schedeval: run a streaming applicatjon on real architectures

Melot, Janzén, Kessler Mimer and Schedeval November 27, 2015 5 / 21

slide-7
SLIDE 7

Introductjon Mimer Schedeval Mimer & Schedeval environment Conclusion References

Mimer

Flexible formats GraphML (XML) Task graphs XML Schedules AMPL-base platgorms Flexible C++ backend API Open-data Keep raw data Publicatjon of data processing scripts (R) Structure data in CSV (Comma-Separated Values)

[ ] [ ]

Platforms T askgraphs Schedules Scheduler statistics Evaluation statistics Schedulers Evaluators Graphs Structured data Plotting script Analyser & Field list 1 - Schedule 2 - Assess 3 - Analyze 4 - Plot Input data Output data User-provided Executable or settings Intermediate data Benchmark phase

4 steps Schedule Evaluate Analyze Plot

Melot, Janzén, Kessler Mimer and Schedeval November 27, 2015 6 / 21

slide-8
SLIDE 8

Introductjon Mimer Schedeval Mimer & Schedeval environment Conclusion References

Schedeval

Schedeval: run a streaming applicatjon on a real architecture Co-developed with Janzén [2014] as Master thesis work. Integrates as a schedule evaluator in Mimer workflow

Platforms T askgraphs Scheduler Schedule Evaluation statistics Analytic evaluator Schedeval Or [ ] [ ]

Platforms T askgraphs Schedules Scheduler statistics Evaluation statistics Schedulers Evaluators Graphs Structured data Plotting script Analyser & Field list 1 - Schedule 2 - Assess 3 - Analyze 4 - Plot Input data Output data User-provided Executable or settings Intermediate data Benchmark phase

Melot, Janzén, Kessler Mimer and Schedeval November 27, 2015 7 / 21

slide-9
SLIDE 9

Introductjon Mimer Schedeval Mimer & Schedeval environment Conclusion References

Testjng the Schedeval framework

Framework overhead: ping-pong applicatjon Vary the number of tasks With a single pair With several pairs Vary the mapping Mapped to several cores Mapped to several tjles Mapped to a single core Monitor Average ping round trip tjme Proportjon of non data-ready task instances scheduled

Ping Pong

Melot, Janzén, Kessler Mimer and Schedeval November 27, 2015 8 / 21

slide-10
SLIDE 10

Introductjon Mimer Schedeval Mimer & Schedeval environment Conclusion References

Schedeval overhead

Time [us] 9 8 7 6 5 4 3 2 1 Local Tile Remote RCCE (tile) Message round trip delay (ms) Percentage not data ready 100% 80% 60% 40% 20%

Roundtrip tjme penalized by distance and

  • verhead

Latency hidden with increasing ping-pong pairs Decreasing rate of non data-ready task scheduled

Ping Pong

Melot, Janzén, Kessler Mimer and Schedeval November 27, 2015 9 / 21

slide-11
SLIDE 11

Introductjon Mimer Schedeval Mimer & Schedeval environment Conclusion References

Schedeval overhead

Number of simultaneous pingpongs Data-ready task proportion 2 1 8 3 4 5 6 7 50.00% 40.00% 35.00% 30.00% 25.00% 20.00% 15.00% 10.00% 5.00% 0.00% 45.00% Local RCCE for comparison Remote Number of simultaneous pingpongs Messages round trip time [us] 9 8 7 6 5 4 3 2 1 2 1 8 3 4 5 6 7

Roundtrip tjme penalized by distance and

  • verhead

Latency hidden with increasing ping-pong pairs Decreasing rate of non data-ready task scheduled

Ping Pong

Melot, Janzén, Kessler Mimer and Schedeval November 27, 2015 10 / 21

slide-12
SLIDE 12

Introductjon Mimer Schedeval Mimer & Schedeval environment Conclusion References

Evaluatjng the Schedeval framework

Computatjon tjme: mergesort Schedules over 6 cores (Melot et al. [2012]) Mapped per level (simple) Mapped per block (reduce communicatjons) Test schedule with a unique core Test a variant with no frequency scaling mechanism

Melot, Janzén, Kessler Mimer and Schedeval November 27, 2015 11 / 21

slide-13
SLIDE 13

Introductjon Mimer Schedeval Mimer & Schedeval environment Conclusion References

Schedeval computatjon tjme

Results described by Janzén [2014] Vary implementatjons Depth-first task executjon Distribute presort phases Observatjons Low performance difference in streaming phase High performance difference for initjal sort

Level mapping, extra presort Block mapping, extra presort Simpler variant Block mapping, depth-first Block mapping, depth-first Single core (1/6) Simpler variant Level mapping

Melot, Janzén, Kessler Mimer and Schedeval November 27, 2015 12 / 21

slide-14
SLIDE 14

Introductjon Mimer Schedeval Mimer & Schedeval environment Conclusion References

Studying energy consumptjon

Usefulness for energy consumptjon studies Use a fixed applicatjon: StreamIt implementatjon of FFT (Thies et al. [2002]) Run with 11 different schedules Vary deadline tjghtness to constrain frequency Monitor Energy consumptjon Compare with simple energy models Pdyn ≈ V · f2 ≈ f3 Edyn ≈ P · p · t ≈ f3 · p · t

Platforms T askgraphs Scheduler Schedule Evaluation statistics Analytic evaluator Schedeval Or

1 12 13 14 15 16 17 18 19 20 21 22 23 24 25 FFTReorderSimple 2 3 4 5 6 7 8 9 10 11 CombineDFT Source & split Join & sink

Melot, Janzén, Kessler Mimer and Schedeval November 27, 2015 13 / 21

slide-15
SLIDE 15

Introductjon Mimer Schedeval Mimer & Schedeval environment Conclusion References

Measure energy with Schedeval

Usefulness for energy consumptjon studies

tight average loose Task class 2e+ 09 4e+ 09 6e+ 09 8e+ 09 1e+ 10 1.2e+ 10 1.4e+ 10 Energy Energy quality of schedules Fast,ILP,ILP simple Fast,Bal.ILP,ILP Fast,LTLG,ILP Fast,Bal.ILP,Height Fast,LTLG,Height Bin,LTLG,Height Bin,LTLG,Height Ann. Integ. Pruhs [2008] (NLP,energy) Xu [2012] (ILP) Pruhs [2008] (heur,0) Pruhs [2008] (heur,0) Pruhs [2008] (NLP,energy) Fast,Bal.ILP,ILP Fast,LTLG,ILP Fast,Bal.ILP,Height Fast,LTLG,Height Bin,LTLG,Height Bin,LTLG,Height Ann. Xu [2012] (ILP) Fast,ILP,ILP simple Integ. Energy quality of schedules Energy Task class 200000 400000 600000 800000 1e+ 06 tight average loose

Analytjc evaluatjon All schedules are equivalent No voltage islands taken into account Observatjons through Mimer / Schedeval Analytjcal model fails to capture architectural details More fast tasks mapped to more voltage islands

Melot, Janzén, Kessler Mimer and Schedeval November 27, 2015 14 / 21

slide-16
SLIDE 16

Under the hood

slide-17
SLIDE 17

Introductjon Mimer Schedeval Mimer & Schedeval environment Conclusion References

Mimer & Schedeval Data structures: Taskgraph

. . . <node id =” n30 ” > < data key=”v_name” >t31 </ data > < data key=” v_module ” >merge</ data > < data key=” v_workload ” >1</ data > < data key=” v_max_width ” >1</ data > < data key=” v _ e f f i c i e n c y ” > e x p r t k : 1 / p </ data > </ node> <edge source =” n30 ” t a r g e t =” n14 ” / > <edge source =” n29 ” t a r g e t =” n14 ” / > . . .

Melot, Janzén, Kessler Mimer and Schedeval November 27, 2015 15 / 21

slide-18
SLIDE 18

Introductjon Mimer Schedeval Mimer & Schedeval environment Conclusion References

Mimer & Schedeval Data structures: Platgorm

# Number of cores param p := 4; # Frequency l e v e l s set F := 100000 106000 114000 123000 133000\ 145000 160000 178000 200000 228000 266000\ 320000 400000 533000 800000 ;

Melot, Janzén, Kessler Mimer and Schedeval November 27, 2015 16 / 21

slide-19
SLIDE 19

Introductjon Mimer Schedeval Mimer & Schedeval environment Conclusion References

Mimer & Schedeval Data structures: Schedule

<?xml version =” 1.0 ” encoding =”UTF−8” ?> < schedule name=” Generated␣from␣ Algebra ” \ appname=” Generated␣from␣ Algebra ” > < core co r eid =”1” > < task name=” t1 ” s t a r t =”0” frequency =”8e5 ” \ width =”1” workload =” 32 ” / > </ core > < core co r eid =”2” > < task name=” t2 ” s t a r t =”0” frequency =”8e5 ” \ width =”1” workload =” 16 ” / > . . . </ core > < core co r eid =”3” > . . . </ core > < core co r eid =”4” > . . . </ core > </ schedule >

Melot, Janzén, Kessler Mimer and Schedeval November 27, 2015 17 / 21

slide-20
SLIDE 20

Introductjon Mimer Schedeval Mimer & Schedeval environment Conclusion References

Schedeval refactored: Drake

# include <drake / node . h> # include <drake / l i n k . h> # include <drake / platform . h> i n t d r a k e _ i n i t ( t a s k _ t * task , void * aux ) { return 1; } i n t d r a k e _ s t a r t ( t a s k _ t * task ) { return 1; } i n t drake_run ( t a s k _ t * task ) { return 1; } i n t drake_destroy ( t a s k _ t * task ) { return 0; }

Melot, Janzén, Kessler Mimer and Schedeval November 27, 2015 18 / 21

slide-21
SLIDE 21

Introductjon Mimer Schedeval Mimer & Schedeval environment Conclusion References

Mimer experiment

platform =” scc_8 . dat ␣ scc_32 . dat ” taskgraph =” f f t 2 _ s t r e a m i t _ l o o s e ␣\ f f t 2 _ s t r e a m i t _ t i g h t ” e v a l u a t i o n =” static_dynamic_busy ␣\ drake_eval_scc_emulator ” scheduler =” c r o w n _ l t l g _ h e i g h t ␣\ c r o w n _ b i n a r y _ l t l g _ h e i g h t ” a n a l y s i s =” parco2015−nmelot ”

Melot, Janzén, Kessler Mimer and Schedeval November 27, 2015 19 / 21

slide-22
SLIDE 22

Demo

slide-23
SLIDE 23

Introductjon Mimer Schedeval Mimer & Schedeval environment Conclusion References

Investjgate more

Scheduling by minimizing the number of core to use and switch off unused cores. All you need is a new scheduler implementatjon Already done! Presented at PARCO 2015, Edinburgh, on Thursday, September 3rd.

Melot, Janzén, Kessler Mimer and Schedeval November 27, 2015 20 / 21

slide-24
SLIDE 24

Introductjon Mimer Schedeval Mimer & Schedeval environment Conclusion References

Conclusion

Mimer & Schedeval to facilitate scheduling experimentatjon and data publicatjon Mimer Simplifies the process through automatjon Allows the publicatjon of raw and structured results as well as how to process them into figures in artjcles. Schedeval (refactored into Drake) Schedeval performance scales well with number of tasks Shows schedule differences in tjme and energy

http://www.ida.liu.se/labs/pelab/mimer/

Melot, Janzén, Kessler Mimer and Schedeval November 27, 2015 21 / 21

slide-25
SLIDE 25

Introductjon Mimer Schedeval Mimer & Schedeval environment Conclusion References

Future work

Future work: lots of missing features Task microbenchmarking Improve platgorm descriptjon Implement more Schedeval applicatjons Investjgate core-to-core communicatjons Support for SDF Support for stream reconfiguratjon Port to Xeon, Tilera, MPPA

Questjons & answers

http://www.ida.liu.se/labs/pelab/mimer/

Melot, Janzén, Kessler Mimer and Schedeval November 27, 2015 21 / 21

slide-26
SLIDE 26

www.liu.se

slide-27
SLIDE 27

Introductjon Mimer Schedeval Mimer & Schedeval environment Conclusion References

Bibliography

Johan Janzén. Evaluatjon of energy-optjmizing scheduling algorithms for streaming computatjons on massively parallel multjcore architectures. Master’s thesis, Linköping University, 2014. URL http://liu.diva-portal.org/smash/record.jsf?pid=diva2%3A756758.

  • Kasahara. Standard task graph set, 2004. URL http://www.kasahara.elec.waseda.ac.jp/schedule/index.html.

Jörg Keller, Christoph Kessler, and Rikard Hulten. Optjmized on-chip-pipelining for memory-intensive computatjons on multj-core processors with explicit memory hierarchy. Journal of Universal Computer Science, 18(14):1987–2023, 2012. Nicolas Melot, Christoph Kessler, Kenan Avdic, Patrick Cichowski, and Jörg Keller. Engineering parallel sortjng for the intel SCC. Procedia Computer Science, 9(0):1890 – 1899, 2012. doi: http://dx.doi.org/10.1016/j.procs.2012.04.207.

  • Proc. of the Int. Conf. Computatjonal Science, ICCS 2012.

Nicolas Melot, Christoph Kessler, Jörg Keller, and Patrick Eitschberger. Fast Crown Scheduling Heuristjcs for Energy-Efficient Mapping and Scaling of Moldable Streaming Tasks on Manycore Systems. ACM Trans. Archit. Code Optjm., 11(4): 62:1–62:24, January 2015. ISSN 1544-3566. Kirk Pruhs, Rob van Stee, and Patchrawat Uthaisombut. Speed Scaling of Tasks with Precedence Constraints. Theory of Computjng Systems, 43(1):67–80, July 2008. William Thies, Michal Karczmarek, and Saman Amarasinghe. Streamit: A language for streaming applicatjons. In Compiler Constructjon, volume 2304 of Lecture Notes in Computer Science, pages 179–196. Springer Berlin Heidelberg, 2002. doi: 10.1007/3-540-45937-5\_14. Huitjng Xu, Fanxin Kong, and Qingxu Deng. Energy minimizing for parallel real-tjme tasks based on level-packing. In 18th Int.

  • Conf. on Emb. and Real-Time Comput. Syst. and Appl. (RTCSA), pages 98–103, Aug 2012.

Melot, Janzén, Kessler Mimer and Schedeval November 27, 2015 21 / 21