in big data analytic systems
play

in Big-Data Analytic Systems Rui Li , Peizhen Guo, Bo Hu, Wenjun Hu - PowerPoint PPT Presentation

Libra and the Art of Task Sizing in Big-Data Analytic Systems Rui Li , Peizhen Guo, Bo Hu, Wenjun Hu Yale University Background Stage 0 Stage 1 Stage 2 Stage 3 Stage 4 Stage 5 Stage 6 Stage 0 Stage 4 Stage 1 Stage 2 Stage 3 Stage 4


  1. Libra and the Art of Task Sizing in Big-Data Analytic Systems Rui Li , Peizhen Guo, Bo Hu, Wenjun Hu Yale University

  2. Background

  3. Stage 0 Stage 1 Stage 2 Stage 3 Stage 4 Stage 5 Stage 6

  4. Stage 0 Stage 4 Stage 1 Stage 2 Stage 3 Stage 4 Stage 5 Stage 6

  5. stage input data Stage 0 Stage 4 Stage 1 Stage 2 stage output data Stage 3 Stage 4 Stage 5 Stage 6

  6. stage input data Stage 0 Stage 4 Stage 1 Stage 2 stage output data Stage 3 Stage 4 Stage 5 Stage 6

  7. stage input data Stage 0 Stage 4 Stage 1 Stage 2 stage output data Stage 3 Stage 4 How to set task size? Stage 5 Stage 6

  8. stage input data Stage 0 Stage 4 Stage 1 Stage 2 stage output data Stage 3 Stage 4 How to set task size? Stage 5 -- User experience -- System default value Stage 6

  9. The importance of task sizing

  10. Observation 1: diff jobs have diff optimal task sizes Normalized stage completion time vs task size

  11. Observation 2: diff stages have diff optimal task sizes PageRank stage completion time vs task size

  12. 1. Proper task sizing is important

  13. 1. Proper task sizing is important 2. U-curve pattern

  14. Analysis of U-curve pattern

  15. Observation 3: tasks have similar scheduling delay and system overhead regardless of task sizes Per-task overhead for PageRank stage 1

  16. Observation 4: small size => fail to do batch processing large size => memory swapping # of IO ops for different stages of PageRank

  17. Small task size => high aggregated overhead, no batch processing Large task size => memory swapping

  18. System design • Strawman solution

  19. Refinement 1: ADAM optimization

  20. Refinement 2: noise filtering Task processing rate fluctuation for stage 1 of PageRank

  21. Refinement 2: noise filtering Task processing rate fluctuation for stage 1 of PageRank

  22. Refinement 3: contention avoidance PageRank over two machines

  23. Refinement 3: contention avoidance PageRank over two machines

  24. Evaluation • 8 m4.xlarge VMs from EC2 • Workloads generated from HiBench

  25. Initial task size effect PageRank completion time over diff. initial task size

  26. Libra performance PageRank completion time with diff. input data size

  27. Q&A

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend