on the energy in efficiency of hadoop scale down
play

On the Energy (In)efficiency of Hadoop: Scale-down Efficiency - PowerPoint PPT Presentation

On the Energy (In)efficiency of Hadoop: Scale-down Efficiency Jacob Leverich and Christos Kozyrakis Stanford University The current design of Hadoop precludes scale-down of commodity clusters. 2 Outline Hadoop crash-course


  1. On the Energy (In)efficiency of Hadoop: Scale-down Efficiency Jacob Leverich and Christos Kozyrakis Stanford University

  2. The current design of Hadoop precludes scale-down of commodity clusters. 2

  3. Outline � � Hadoop crash-course � � Scale-down efficiency � � How Hadoop precludes scale-down � � How to fix it � � Did we fix it? � � Future work 3

  4. Hadoop crash-course � � Hadoop == Distributed Processing Framework � � 1000s of nodes, PBs of data � � Hadoop MapReduce � Google MapReduce � � Tasks are automatically distributed by the framework. � � Hadoop Distributed File System � Google File System � � Files divided into large (64MB) blocks; amortizes overheads. � � Blocks replicated for availability and durability management. 4

  5. Scale-down motivation Typical utilization [Barroso and Holzle, 2007] HP Proliant DL140 G3 5

  6. Scale-down for energy proportionality = 4 x P(40%) = 4 x 325W = 1300W 40% 40% 40% 40% = 2 x P(80%) = 2 x 365W = 730W 80% 0% 0% 80% 6

  7. The problem: storage consolidation :-( � � Hadoop Distributed File System… � � Consolidate computation? Easy. � � Consolidate storage? Not (as) easy. � � “All servers must be available, even during low-load periods.” [Barroso and Holzle, 2007] � � Hadoop inherited this “feature” from Google File System 7

  8. HDFS and block replication = replica “block replication table” Node Replication factor = 3 1 2 3 4 5 6 7 8 9 A 1 st replica local, others remote. B C Allocate evenly. Block D E F G H 8

  9. Attempted scale-down Node Problems: 1 2 3 4 5 6 7 8 9 A � � Scale-down vs. Self-healing B � � Wasted capacity: C sleeping replicas != lost replicas Block D � � Flurry of net & disk activity! E F � � Which nodes to disable? G � � Must maintain data availability H 9

  10. How to fix it Self-non-healing � � Scale-down vs. Self-healing � � Wasted capacity: sleeping replicas != lost replicas � � Flurry of net & disk activity! � � Which nodes to disable? � � Must maintain data availability 10

  11. Self-non-healing Node 1 2 3 4 5 6 7 8 9 A � � Coordinate with Hadoop B when we put a node to sleep C Block D E � � Prevent block re-replications F G H Zzzzz… 11

  12. New RPCs in HDFS primary node � � sleepNode(String hostname) � � Similar to node decommissioning, but don’t replicate blocks % hadoop dfsadmin –sleepNode 10.10.1.80:50020 � � Save blocks to a “sleeping blocks” map for bookkeeping � � Ignore heartbeats and block reports from this node � � wakeNode(String hostname) � � Watch for heartbeats, force node to send block report � � Execute arbitrary commands (i.e. send wake-on-LAN packet) � � wakeBlock(Block target) � � Wake a sleeping node that has a particular block 12

  13. How to fix it Self-non-healing � � Scale-down vs. Self-healing � � Wasted capacity: sleeping replicas != lost replicas � � Flurry of net & disk activity! “Covering Subset” � � Which nodes to disable? replication invariant � � Must maintain data availability 13

  14. Replication placement invariants � � Hadoop uses simple invariants to direct block placement � � Example: Rack-Aware Block Placement � � Protects against common-mode failures (i.e. switch failure, power delivery failure) � � Invariant: Blocks must have replicas on at least 2 racks. � � Is there some energy-efficient replication invariant? � � Must inform our decision on which nodes we can disable. 14

  15. Covering subset replication invariant � � Goal: Maximize the number of servers that can simultaneously sleep. � � Strategy: Aggregate live data onto a “covering subset” of nodes. Never turn off a node in the covering subset. � � Invariant: Every block must have one replica in the covering subset. 15

  16. Covering subset replication invariant Node 1 2 3 4 5 6 7 8 9 A B C Block D E F G H Zzzzz… 16

  17. How to fix it Self-non-healing � � Scale-down vs. Self-healing � � Wasted capacity: sleeping replicas != lost replicas � � Flurry of net & disk activity! “Covering Subset” � � Which nodes to disable? replication invariant � � Must maintain data availability 17

  18. Evaluation 18

  19. Methodology � � Disable n nodes, compare Hadoop job energy & perf. � � Individual runs of webdata_sort/webdata_scan from GridMix � � 30 minute job batches (with some idle time!) � � Cluster � � 36 nodes, HP Proliant DL140 G3 � � 2 quad-core Xeon 5335s each, 32GB RAM, 500GB disk � � 9-node covering subset (1/4 of the cluster) � � Energy model � � Validated estimate based on CPU utilization � � Disabled node = 0 Watts � � Possible to evaluate hypothetical hardware 19

  20. Results: Performance � � It slows down (obviously) � � Peak performance benchmark � � Sort (network intensive) worse off than Scan � � Amdahl’s Law 20

  21. Results: Energy � � Less energy consumed for same amount of work � � 9% to 51% saved � � Nodes consume energy more than they improve performance � � Slower systems usually more efficient; high performance is a trade-off! 21

  22. Results: Power � � Excellent knob for cluster- level power capping � � Much larger dynamic range than tweaking frequency/voltage at the server level 22

  23. Results: The Bottom Line Operational Hadoop clusters can scale-down. We reduce energy consumption at the expense of single-job latency. 23

  24. Continuing Work 24

  25. Covering subset: mechanism vs. policy � � The replication invariant is a mechanism. � � Which nodes constitute a subset is policy (open question). � � Size trade-off � � Too small: Low capacity and performance bottleneck � � Too large: Wasted energy on idle nodes � � 1 / (replication factor) � reasonable starting point � � How many covering subsets? � � Invariant: Blocks must have a replica in each covering subsets. 25

  26. Quantify Trade-offs Availability Energy Performance consumption Durability � � Random Fault Injection experiments � � What happens when a covering subset node fails? � � How much do you trust idle disks? 26

  27. Dynamic Power Management � � Algorithmically decide which nodes to sleep or wakeup � � What signals to use? � � CPU utilization? � � Disk/net utilization? � � Job Queue length? � � MapReduce and HDFS must cooperate � � i.e. idle nodes may host transient Map outputs 27

  28. Workloads � � Benchmarks � � HBase/BigTable vs. MapReduce � � Short, unpredictable data access vs. long streaming access � � Quality of service and throughput are important � � Pig vs. Sort+Scan � � Recorded job traces vs. random job traces � � Peak performance vs. fractional utilization � � What are typical usage patterns? 28

  29. Scale � � 36-nodes to 1000-nodes; emergent behaviors? � � Network hierarchy � � Hadoop framework inefficiencies � � Computational overhead (must process many block reports!) � � Experiments on Amazon EC2 � � Awarded an Amazon Web Services grant � � Can’t measure power! Must use a model. � � Any Amazonians here? Let’s make a validated energy model. 29

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend