principled schedulability analysis for distributed
play

Principled Schedulability Analysis for Distributed Storage Systems - PowerPoint PPT Presentation

Principled Schedulability Analysis for Distributed Storage Systems Using Thread Architecture Models Suli Yang*, Jing Liu, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau * work done while at UW-Madison Scheduling: A Fundamental Primitive


  1. Principled Schedulability Analysis for Distributed Storage Systems Using Thread Architecture Models Suli Yang*, Jing Liu, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau * work done while at UW-Madison

  2. Scheduling: A Fundamental Primitive • Modern storage systems are shared • Correct and efficient request scheduling is indispensable A S N snapchat A R/W R/W E R/W A R/W E A S N Shared Storage 2

  3. Broken Scheduling in Current Systems • Popular storage systems have fundamental scheduling deficiencies [ MongoDB - #21858]: “ A high throughput update workload … could cause starvation on secondary reads ” [HBase - #8884]: “ …when the read load is high on a specific RS is high, the write throughput also get impacted dramatically, and even write data loss... ” [Cassandra - #10989]: “ inability to balance writes/reads/compaction/flushing… ” etc. 3

  4. Why Is Scheduling Broken? • The complexities in modern storage systems Distributed: >1000 servers - Highly concurrent: ~1000 interacting threads in each server - Long execution path: requests traverses numerous threads across multiple machines - … We introduce Thread Architecture Model to describe scheduling complexities 4

  5. Thread Architecture Model (TAM) • Encodes scheduling related info: RegionServer/DataNode 1 2 Request flows • � � � ��� ��� � � Thread interactions • RPC Read RPC Handle RPC Respond Resource consumption patterns f 1 • a 1 • Easy to obtain automatically r 2 w � 1 Mem Flush LOG Append a 2 a 3 � � • From complicated systems to an r 1 Data Stream w 7 r 1 understandable and analyzable model w 7 w LOG Sync r 2 2 HBase • � � � w Ack Process Packet Ack Cassandra Data Xceive • 5 w 6 w MongoDB w 4 w • 5 3 Riak • w � � � w 4 3 w Data Xceive Packet Ack 5 w w 3 RegionServer/DataNode 2 5

  6. TAM Exposes Scheduling Problems • We discovered five categories of problems that happen in real systems - Lack of scheduling points - Unknown resource usage - Hidden contention between threads - Uncontrolled thread blocking - Ordering constraints upon requests 6

  7. Fix Problems Leads to Effective Scheduling • TAM-based simulation finds problem-free thread architectures Provides schedulability: various desired scheduling policies can be realized • HBase Tamed-HBase • • Implementation transforms system to be schedulable Muzzled-HBase: approximated implementation • Effective scheduling under YCSB and other workloads • 7

  8. Thread Architecture Model enables principled schedulability analysis on general distributed storage systems 8

  9. Outline • Overview • Thread Architecture Model • Scheduling Problems • Achieve Schedulability: A Case Study • Conclusion 9

  10. � ��� � � � � � � � � � ��� � � � � � � � � � � � � � � � � � � � � ��� ��� � � � � � � � � � � � � � � ��� ��� � � Thread Architecture Model egionServer/DataNode RegionServer/DataNode 1 2 stage (threads performing similar tasks) Name � � � ��� ��� � � I L C N RPC Read RPC Handle RPC Respond CPU RPC Read Name RPC Handle f 1 I/O a 1 resource usage network Lock r 2 w � 1 Mem Flush LOG Append request flow a 2 a 3 � � Data Stream r 1 w 7 request queue (scheduling point) r 1 w 7 w LOG Sync r 2 2 blocking � � � w Ack Process Packet Ack Data Xceive 5 w 6 w w 4 w 5 3 w � � � w 4 3 w Data Xceive Packet Ack Data Xceive 5 w w 3 RegionServer/DataNode 2 w 3 w 2 11

  11. Thread Architecture Model • TAM encodes scheduling related info: Request flows • Thread interactions • Resource consumption patterns • • From complex systems to analyzable models • TADalyzer: from live system to TAM automatically Only 20-50 lines of user annotation code required • 12

  12. Outline • Overview • Thread Architecture Model • Scheduling Problems • Achieve Schedulability: A Case Study • Conclusion 13

  13. ��� � � TAM Exposes Scheduling Problems No scheduling • Req Handle Unknown resource usage • Req Handle Hidden contention • Blocking • Ordering constraint • • Common in distributed storage systems HBase, Cassandra, MongoDB, Riak… • • Directly identifiable from TAM No low-level implementation details required • 14

  14. TAM Exposes Scheduling Problems No scheduling • Unknown resource usage • Hidden contention • Blocking • Ordering constraint • • Common in distributed storage systems HBase, Cassandra, MongoDB, Riak… • • Directly identifiable from TAM No low-level implementation details required • 15

  15. Scheduling Problem: Unknown Resource Usage Cassandra Node 8 1 ��� � Read � � C-Respond 4 � � � ��� l 1 C-ReqHandle Mutation 4 3 � ��� 3 4 V -Mutation 3 ... � � � � Msg In l 2 6 Msg Out � 5 Respond 7 5 2 Cassandra Node � ��� Read l 1 4 � ��� 3 Mutation 4 1 3 5 � ��� 4 V -Mutation 3 ... � � � � Msg Out Msg In 6 l 2 � 7 Respond 16

  16. Scheduling Problem: Unknown Resource Usage Workload: C1: issues cold requests C2: issues cold and cached requests Expectation: C2 has much higher throughput (due to cached request) CPU underutilized 17

  17. Unknown Resource Usage: Solution Workload: C1: issues cold requests C2: issues cold and cached requests Expectation: C2 has much higher throughput (due to cached request) 18

  18. ��� � Scheduling Problem: Unknown Resource Usage • Resource usage patterns unknown to schedulers until after the processing begins • Forces schedulers to make decisions before information is available • Identified as red square brackets around resource symbols in TAM Req Handle 19

  19. Scheduling Problem: Blocking Primary Node Secondary Node � ��� � ��� � 8 Worker Feedback 1 8 � � ��� � ��� Fetcher NetInterface Worker 2 1 3 5 4 � � � � � Batcher Oplog Writer Writer 7 6 MongoDB 20

  20. Scheduling Problem: Blocking Workload: C1: reads from primary (does not go to secondary) C2: writes to primary (replicate to secondary node) time 10: the secondary node slows down Expectation: C1 reads throughput remains stable Time (s) MongoDB 21

  21. Blocking: Solution Workload: C1: reads C2: writes (replicate to secondary node) time 10: the secondary node slows down Expectation: C1 reads throughput remains stable MongoDB

  22. � � Scheduling Problem: Blocking • Stages with fixed number of threads block on other stages • Unable to schedule requests that could have been completed because all threads block • Identified as dashed arrow point to stages with queues in TAM Req Handle I/O 23

  23. Outline • Overview • Thread Architecture Model • Scheduling Problems • Achieve Schedulability: A Case Study • Conclusion 24

  24. Fixing Problems Leads to Schedulability • TAM-based simulation framework: explore thread architectures Simulates how systems perform under workloads • Easily study architecture designs and scheduling policies • • Implementation: realize schedulable systems Also validates that simulation matches the real world • 25

  25. � � � � � ��� ��� � � � � � � � � � Simulation: HBase to Tamed-HBase RegionServer/DataNode RegionServer/DataNode � � � 1 2 Data Xceive � RPC Read RPC Handle RPC Respond Mem Flush � � � f 1 Network CPU a 1 a 1 � � r 2 r 2 RPC Read Mem Flush � � � LOG Append ��� [ ] � � w r 1 LOG Append Data Stream a 2 a 3 1 � � RPC Handle RPC Respond a 2 r 1 w 7 Data Stream w 7 � w LOG Sync IO 2 LOG Sync Ack Process Packet Ack Data Xceive w 5 Ack Process Packet Ack w 6 w w 4 w 5 3 w w 4 3 w Data Xceive Packet Ack � � � 5 Network IO Packet Ack w w RegionServer/DataNode 3 RegionServer/DataNode 2 26

  26. Implementation : Tamed-HBase to Muzzled-HBase • Some approximations to make implementation easier • Supports multiple scheduling policies • Proper scheduling under various workloads 27

  27. Muzzled-HBase: Weighted Fairness Workloads: Five clients, each with different weight , run YCSB (reads mostly) Expectation: Client receives throughput proportional to weight 28

  28. Muzzled-HBase: Weighted Fairness Workloads: Five clients, each with different weight , run YCSB (reads mostly) Expectation: Client receives throughput proportional to weight 29

  29. Muzzled-HBase: Tail Latency Guarantee Workloads: Foreground client: runs YCSB (update-heavy) Background client: random Gets or Puts Expectation: Foreground latency remains stable 30

  30. Muzzled-HBase: Tail Latency Guarantee Workloads: Foreground client: runs YCSB Background client: random Gets or Puts Expectation: Foreground latency remains stable 31

  31. Muzzled-HBase: Tail Latency Guarantee Workloads: Foreground client: runs YCSB Background client: random Gets or Puts Expectation: Foreground latency remains stable 32

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend