1
FAWNdamentally
Power-efficient Clusters
David Andersen, for: Vijay Vasudevan, Jason Franklin, Amar Phanishayee, Lawrence Tan, Michael Kaminsky*, Iulian Moraru Carnegie Mellon University, *Intel Research Pittsburgh
FAWNdamentally Power-efficient Clusters David Andersen, for: Vijay - - PowerPoint PPT Presentation
FAWNdamentally Power-efficient Clusters David Andersen, for: Vijay Vasudevan, Jason Franklin, Amar Phanishayee, Lawrence Tan, Michael Kaminsky*, Iulian Moraru Carnegie Mellon University, *Intel Research Pittsburgh 1 Monthly energy
1
David Andersen, for: Vijay Vasudevan, Jason Franklin, Amar Phanishayee, Lawrence Tan, Michael Kaminsky*, Iulian Moraru Carnegie Mellon University, *Intel Research Pittsburgh
2
power cost [EPA 2007]
3
Infrastructure Efficiency
Power generation Power distribution Cooling
Dynamic Power Scaling
Sleeping when idle Rate adaptation VM consolidation
Computational Efficiency FAWN
Goal of computational efficiency: Reduce the amount of energy to do useful work
4
Fast Array of Wimpy Nodes
Improve computational efficiency
an array of well-balanced low- power systems.
AMD Geode 256MB DRAM 4GB CompactFlash
5
Workloads amenable to “scale-out” approach
6
7
Figure adapted from Tolia et. al HotPower 08
Fixed power costs
Ideal
Power (W)
8
balance?
clocked down?
CPUs?
more disks with big CPUs?
Year
CPU-to-Disk seek Speed Ratio
9
Speed vs. Efficiency
Fast processors mask memory wall at the cost of efficiency Fixed power costs can dominate efficiency for slow processors FAWN targets sweet spot in processor efficiency when including fixed costs
10
investment
11
12
Performanc e Efficiency Density Cost
Work time Perf Watt Perf $ Perf Volume
14
FAWN + CF (4W) Traditional + HD (87W) Traditional + SSD (83W)
15
424.25 2.03448 69.8795
50 100 150 200 250 300 350 400 450 Queries/Joule
Performance Efficiency FAWN is 6-200x more efficient than traditional systems
16
0.73 0.365 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Encryption Efficiency (MB/J)
AES encryption/decryption of a 512MB file with a 256-bit key FAWN is 2x more efficient for CPU-bound operations! Performance Efficiency
17
serving random access workloads?
18
Ratio of query rate to dataset size informs storage technology
19
nodes
power cost vs. engineering cost
“Each decimal order of magnitude increase in parallelism requires a major redesign and rewrite of parallel code” - Kathy Yelick
20
key-value store
Requests Responses
Creating 1.8GB BDB File Split/Merge Operations
Number of Files Insertion Time 1 12 hours 50 min 8 3 hours 18 min 32 2 hours 26 min
B-Tree does many small, random writes. Flash does not like.
128--256KB erase block
In memory
Log-like behavior is free: DB already tracks location for key-value at byte granularity Filesystem or device can do so at block granularity, higher overhead Wimpy memory limits size
Wimpies have little DRAM.
Key frag (15 bits) Offset (32 bits)
Creating 1.8GB BDB File
Number of Files Insertion Time 1 9.63 min 8 9.83 min 32 9.93 min
Creating 1.8GB FAWNDB File
Number of Files Insertion Time 1 12 hours 50 min 8 3 hours 18 min 32 2 hours 26 min
“massive multi-grep” (given 1M strings, find if any of them occur in massive dataset) with low memory requirements
28
datacenters
software on yesterday’s machine, dealing with flash, ...
Thanks to: Google, Intel, NetApp http://www.cs.cmu.edu/~fawnproj/
29
2 Gb/s of small k-v queries @ 90W from 80GB dataset (on 4 year old hardware) 891µs median access time
Max load Low load
balancing across workers
H G F A C B D E Requirements: 1) Spread data and queries uniformly 2) Handle node joins and departures without affecting many nodes
(H,B]
Hash Index Values
H G F D C B
(H,B]
Hash Index Values
H G F D C B A E
(H,B]
Keys Values
(H,A] (A,B]
Split
At :
(H,A]