minimizing latency in fault tolerant distributed stream
play

Minimizing Latency in Fault-Tolerant Distributed Stream Processing - PowerPoint PPT Presentation

Department of Computer Science Institute for Systems Architecture, Systems Engineering Group Minimizing Latency in Fault-Tolerant Distributed Stream Processing Systems Andrey Brito 1 , Christof Fetzer 1 , Pascal Felber 2 1 Technische Universitt


  1. Department of Computer Science Institute for Systems Architecture, Systems Engineering Group Minimizing Latency in Fault-Tolerant Distributed Stream Processing Systems Andrey Brito 1 , Christof Fetzer 1 , Pascal Felber 2 1 Technische Universität Dresden, Germany 2 Université de Neuchâtel, Switzerland ICDCS'09, June 23 rd , 2009

  2. Goal Minimize the cost of logging/checkpointing in event stream processing systems Contribution: Usage of an speculation framework based on transactional memory to overlap logging and processing ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 2 of 55

  3. Motivation (1) • Event stream applications – Directed acyclic graph of operators – Some operators don't keep state • Trivially parallelizable – Some do keep state • Not trivially parallelizable – Sometimes they are order sensitive • Need to process events sequentially, maybe even waiting for the order to be restored ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 3 of 55

  4. Application example Publisher Filter n A A6 STATE STATE Output Adapter Processor2 A5 B7 B6 B5 Processor1 A2 B0 A1 A0 A4 B3 B2 B1 Publisher Filter n B B8 ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 4 of 55

  5. Application example Events based on non-deterministic Events are out! decision Publisher Filter n A A6 STATE STATE Output Adapter Processor2 A5 B7 B6 B5 Processor1 A2 B0 A1 A0 A4 B3 B2 B1 Publisher Filter n B B8 ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 5 of 55

  6. Application example Publisher Filter n A A6 STATE STATE Processor1 Processor2 Output Adapter A5 B7 B6 B5 A2 B0 A1 A0 A4 B3 B2 B1 Publisher Filter n B B8 ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 6 of 55

  7. Application example Restore checkpoint. Publisher Filter n A A6 STATE STATE Output Adapter Processor1 Processor2 Publisher Filter n B B8 ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 7 of 55

  8. Application example Ask upstream node to replay missing ones. Publisher Filter n A A6 STATE STATE Output Adapter Processor1 Processor2 Publisher Filter n B B8 ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 8 of 55

  9. Application example Processing some events again. Publisher Filter n A A6 STATE STATE Output Adapter A5 B7 B6 B5 Processor1 Processor2 B3 B2 A4 B1 Publisher Filter n B B8 ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 9 of 55

  10. Application example What are you Events reflect talking about? different decisions. Publisher Filter n A A6 STATE STATE Output Adapter A5 B7 B6 B5 Processor1 Processor2 B3 B2 A4 B1 Publisher Filter n B B8 → Incomplete log of non-deterministic decisions no repeatability ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 10 of 55

  11. Motivation (2) • Fault-tolerant event stream applications – Precise recovery – Even if order does not matter, repeatability does – Non-determinism • Input order from different streams • Non-determinism in processing (multi-threading, time, random numbers) – Log or checkpoint before each output ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 11 of 55

  12. Logging is expensive ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 12 of 55

  13. My solution • Speculate... • … to parallelize stateful components • … to not have to wait for events • … to not have to wait for logging ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 13 of 55

  14. Outline • How the speculation works • Logging algorithm • Experiments • Final remarks ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 14 of 55

  15. How the speculation works • Base: TinySTM – Some extra features added – But same basic rule: “it appears to be atomic” • Goal: track accesses to shared memory – Instrumentation • Reads and writes are intercepted • Hold back writes, validate reads until all dependencies satisfied ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 15 of 55

  16. Speculative execution: parallelization NEXT = 9 Processor 1 8 7 6 12 11 9 Processor 2 ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 16 of 55

  17. Speculative execution: parallelization NEXT = 9 Processor 1 11 8 7 6 14 13 12 Processor 2 9 ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 17 of 55

  18. Speculative execution: parallelization NEXT = 9 Processor 1 11 8 7 6 14 13 12 Processor 2 9 ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 18 of 55

  19. Speculative execution: parallelization NEXT = 9 Processor 1 11 8 7 6 14 13 12 Processor 2 9 ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 19 of 55

  20. Speculative execution: parallelization NEXT = 10 Processor 1 11 9 8 7 14 13 12 Processor 2 ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 20 of 55

  21. Speculative execution: parallelization NEXT = 10 Processor 1 9 8 7 14 13 12 Processor 2 11 ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 21 of 55

  22. Logging algorithm • Operator enqueues all events & decisions • N+1 threads for N disks – One groups requests in a buffers – The others write their buffers to disk ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 22 of 55

  23. Logging algorithm E Operator ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 23 of 55

  24. Logging algorithm Operator E ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 24 of 55

  25. Logging algorithm Operator ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 25 of 55

  26. Logging algorithm Operator NDDs ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 26 of 55

  27. Logging algorithm Operator E ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 27 of 55

  28. Logging algorithm E is here waiting. Operator ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 28 of 55

  29. Logging algorithm Operator ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 29 of 55

  30. Logging algorithm Operator ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 30 of 55

  31. Logging algorithm Operator update(E) ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 31 of 55

  32. Logging algorithm E Operator ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 32 of 55

  33. Logging algorithm Publisher Filter n A A6 STATE STATE Output Adapter Processor2 A5 B7 B6 B5 Processor1 A2 B0 A1 A0 A4 B3 B2 B1 Publisher Filter n B B8 ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 33 of 55

  34. Logging algorithm Events based on non-deterministic Events are out! decision Publisher Filter n A A6 STATE STATE Output Adapter Processor2 A5 B7 B6 B5 Processor1 A2 B0 A1 A0 A4 B3 B2 B1 Publisher Filter n B B8 ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 34 of 55

  35. Logging algorithm Filter 1 A6 STATE Processor1 A5 B7 B6 B5 Filter n B8 Checkpoint/Logging ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 35 of 55

  36. Logging algorithm Filter 1 STATE Processor1 Filter n Checkpoint/Logging ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 36 of 55

  37. Logging algorithm Filter 1 STATE Processor1 1 Filter n Checkpoint/Logging ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 37 of 55

  38. Logging algorithm Filter 1 STATE Processor1 1 Filter n 2 Checkpoint/Logging ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 38 of 55

  39. Logging algorithm Filter 1 STATE Processor1 1 Filter n 3 2 Checkpoint/Logging ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 39 of 55

  40. Logging algorithm Filter 1 STATE Processor1 1 Filter n 3 2 4 Checkpoint/Logging ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 40 of 55

  41. Logging algorithm Filter 1 STATE Processor1 1 5 Filter n 3 2 4 Checkpoint/Logging ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 41 of 55

  42. Logging algorithm Filter 1 STATE Processor1 1 5 Filter n 3 2 4 6 Checkpoint/Logging ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 42 of 55

  43. Logging algorithm Filter 1 STATE 7 Processor1 1 5 Filter n 3 2 4 6 Checkpoint/Logging ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 43 of 55

  44. Speculative processing + Logging • From the original node's viewpoint – Emit outputs as speculative – When logging requests are acknowledged, emit final • The next downstream node – If speculative event modifies some state, keep track • Outputs that consider that part of the state are speculative • Speculative status is contagious ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 44 of 55

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend