FAWN - a Fast Array of Wimpy Nodes Tomasz Dubrownik University of - PowerPoint PPT Presentation

Introduction Design and Architecture FAWN-DS FAWN-KV Evaluation FAWN - a Fast Array of Wimpy Nodes Tomasz Dubrownik University of Warsaw January 12, 2011 Tomasz Dubrownik FAWN - a Fast Array of Wimpy Nodes

Introduction Design and Architecture FAWN-DS FAWN-KV Evaluation Outline Introduction 1 Design and Architecture 2 FAWN-DS 3 FAWN-KV 4 Evaluation 5 Tomasz Dubrownik FAWN - a Fast Array of Wimpy Nodes

Introduction Design and Architecture FAWN-DS FAWN-KV Evaluation Key issues Growing CPU vs. I/O gap Contemporary systems must serve millions of users Electricity consumed adds up to significant costs Tomasz Dubrownik FAWN - a Fast Array of Wimpy Nodes

Introduction Design and Architecture FAWN-DS FAWN-KV Evaluation Key issues Is there a way to exploit the CPU vs. I/O gap to the users’ advantage? Tomasz Dubrownik FAWN - a Fast Array of Wimpy Nodes

Introduction Design and Architecture FAWN-DS FAWN-KV Evaluation Observations Many industry problems exhibit massive data parallelism with relatively small computational demands A fair amount of real-life problems heavily depends on efficient, distributed key-value stores that span several gigabytes Such stores often contain millions of small items (on the order of kilobytes) Tomasz Dubrownik FAWN - a Fast Array of Wimpy Nodes

Introduction Design and Architecture FAWN-DS FAWN-KV Evaluation A motivating example Twitter A wonderfully popular service, Twitter has all the above-mentioned properties. Each tweet is limited to 140B. There is fairly little processing performed on the tweets , yet just the search system is stressed by an average of 12000 queries per second. There is a stream of over a thousand tweets per second entering the system. A high-performance key-value store is crucial to the operation. At the same time the cost of running a conventional cluster capable of meeting this demand is extremely high. Disclaimer To my knowledge, FAWN is not being used in Twitter. But it would probably make a lot of sense if it were. Thank you. Tomasz Dubrownik FAWN - a Fast Array of Wimpy Nodes

Introduction Design and Architecture FAWN-DS FAWN-KV Evaluation The problem, defined To engineer a fast, scalable key-value store for small (hundreds to thousands of bytes) items This store is expected to: respond to upwards from thousands of random queries per second ( QPS ) conserve power as much as possible meet service level agreements regarding latency scale well upwards as the system grows scale well downwards as demand fluctuates during operating hours Tomasz Dubrownik FAWN - a Fast Array of Wimpy Nodes

Introduction Design and Architecture FAWN-DS FAWN-KV Evaluation Possible solutions (1) A cluster of traditional servers with HDD as storage. Problems: very poor performance for random accesses, unless RAID or a similar disk array is used if RAID is to be used, both initial price and total cost of ownership skyrocket most of the power consumption is fixed — not much power is conserved during low load periods Tomasz Dubrownik FAWN - a Fast Array of Wimpy Nodes

Introduction Design and Architecture FAWN-DS FAWN-KV Evaluation Possible solutions (2) A cluster of traditional servers with RAM as storage (think memcached ) Problems: very high cost in terms of $/GB robustness is lost unless additional systems are employed power consumption is just as bad as before Tomasz Dubrownik FAWN - a Fast Array of Wimpy Nodes

Introduction Design and Architecture FAWN-DS FAWN-KV Evaluation Possible solutions (3) A cluster of traditional servers with SSD as storage Problems: while random reads are great, random writes are terrible (BerkleyDB running on SSD averages just 0.07MBps) power consumption is just as bad as before Tomasz Dubrownik FAWN - a Fast Array of Wimpy Nodes

Introduction Design and Architecture FAWN-DS FAWN-KV Evaluation Possible solutions (4) A combination of the above Problems: a combination of the above :) Tomasz Dubrownik FAWN - a Fast Array of Wimpy Nodes

Introduction Design and Architecture FAWN-DS FAWN-KV Evaluation Introducing FAWN A slightly different approach: Let’s use energy-efficient, wimpy processors coupled with fast SSD storage. Design a custom key-value store exploiting the characteristics of flash storage. That way power consumption can be kept to a minimum while retaining high performance and robustness. The resulting system has a lower total cost of ownership and good scalability. Tomasz Dubrownik FAWN - a Fast Array of Wimpy Nodes

Introduction Design and Architecture FAWN-DS FAWN-KV Evaluation Anatomy of a key-value data store A request can be either a get , put or delete Keys are 160-bit integers Values are small blobs (typically between 256B and 1KB) Each request pertains to a single key-value pair — there is no relational overlay at this level Tomasz Dubrownik FAWN - a Fast Array of Wimpy Nodes

Introduction Design and Architecture FAWN-DS FAWN-KV Evaluation Overview Tomasz Dubrownik FAWN - a Fast Array of Wimpy Nodes

Introduction Design and Architecture FAWN-DS FAWN-KV Evaluation Overview The cluster is composed of Front-ends and Back-ends Front-ends forward requests to appropriate back-ends and return responses to clients The front-ends are responsible for maintaining order in the cluster Back-ends run the FAWN-DS datastores (one per key-range) Together the machines form a single FAWN-KV key-value store Tomasz Dubrownik FAWN - a Fast Array of Wimpy Nodes

Introduction Design and Architecture FAWN-DS FAWN-KV Evaluation Front-end Responsibilities: passing requests and responses keeping track of back-ends’ Virtual IDs and their mapping to key ranges managing joins and leaves. Example configuration used for evaluation: Intel Atom CPU (27 W) Tomasz Dubrownik FAWN - a Fast Array of Wimpy Nodes

Introduction Design and Architecture FAWN-DS FAWN-KV Evaluation Back-end A back-end runs one FAWN-DS data store per key range. Each data store supports the basic key-value requests, as well as maintance operations ( Split , Merge , Compact ) Example configuration used for evaluation: AMD Geode LX CPU (500MHz) 256MB DDR SDRAM (400MHz) 100Mbps Ethernet Sandisk Extreme IV CompactFlash (4GB) Tomasz Dubrownik FAWN - a Fast Array of Wimpy Nodes

Introduction Design and Architecture FAWN-DS FAWN-KV Evaluation Back-ends, cont. Back-ends are organized in a logical ring which coincides with the key space (mod 2 160 ) Each back-end is assigned a fixed number of Virtual IDs in hopes of maintaining balance Virtual IDs are the lowest keys a node handles This allows for a well-defined successor relation on keys and virtual nodes More on this later. Tomasz Dubrownik FAWN - a Fast Array of Wimpy Nodes

Introduction Design and Architecture FAWN-DS FAWN-KV Evaluation Peculiarities of flash storage Flash media differ from traditional HDDs in a number of ways, some of which seriously impact persistent data store designs. Random reads are nearly as fast as sequential reads Random writes are very inefficient (owing to the fact that a whole page needs to be flashed) Sequential writes perform admirably On modern devices, semi-random writes (random appends to a small number of files) are nearly as fast as sequential writes These features can be exploited by using a log-structured data store. Tomasz Dubrownik FAWN - a Fast Array of Wimpy Nodes

Introduction Design and Architecture FAWN-DS FAWN-KV Evaluation FAWN-DS To take advantage of the properties of flash storage, FAWN-DS is structured as follows: The key-value mappings are stored in a Data Log on the flash medium. This store is append-only. To provide fast random access, a hash index map into the data log is kept in RAM. In order to reduce the memory footprint, keys are reduced, inflicting as a trade-off a (configurable) chance of necessitating more than one flash access. To reclaim unused storage space, a Compact operation is introduced. It is designed to be as efficient as possible on flash, using only bulk sequential writes. In order to facilitate reconstruction of the in-memory index, checkpointing is utilized. Tomasz Dubrownik FAWN - a Fast Array of Wimpy Nodes

Introduction Design and Architecture FAWN-DS FAWN-KV Evaluation Lookup Tomasz Dubrownik FAWN - a Fast Array of Wimpy Nodes

Introduction Design and Architecture FAWN-DS FAWN-KV Evaluation Lookup cont. Two smaller numbers are extracted from the key: The index bits — the lowest i bits key fragment — the next lowest k bits The index bits serve as an index into the first in-memory hash index. If the bucket pointed to by the index bits is valid and the key fragments match, the data log entry is retrieved and the full keys compared. If keys match, the record is returned, otherwise the next bucket in the hash chain is examined as above. If nothing is found, an appropriate response is generated. Tomasz Dubrownik FAWN - a Fast Array of Wimpy Nodes

FAWN - a Fast Array of Wimpy Nodes Tomasz Dubrownik University of - PowerPoint PPT Presentation

Introduction Design and Architecture FAWN-DS FAWN-KV Evaluation FAWN - a Fast Array of Wimpy Nodes Tomasz Dubrownik University of Warsaw January 12, 2011 Tomasz Dubrownik FAWN - a Fast Array of Wimpy Nodes Introduction Design and

FAWN FAST ARRAY OF WIMPY NODES VIRAJ SULE FAWN is a cluster architecture for low-power

FAWN - Fast Array of Wimpy Nodes David G. Andersen et al. Presented by: Ravi Kiran Boggavarapu

FAWN: A Fast Array of Wimpy Nodes David G. Andersen, Jason Franklin, Michael Kaminsky * , Amar

CSE 6350 File and Storage System Infrastructure in Data centers Supporting Internet-wide Services

A WIMPy Leptogenesis Miracle Baryogenesis via WIMP freeze-out Brian Shuve with Yanou Cui and

singly linked lists Sept. 18, 2017 1 Recall last lecture: Java array array array array of

A Brief History of Chain Replication Christopher Meiklejohn // @cmeik QCon 2015, November 17th,

Breakfast Menu Breakfast Menu Paper: PopSet Fawn 120g Size: 594 x 420 mm Scale: 40%

Review We can declare an array of any type, even other arrays A 2D array is an array of

Cache Performance 1 C and cache misses (1) int array[1024]; // 4KB array int even_sum = 0,

Habanero Operating Committee January 25 2017 Habanero Overview 1. Execute Nodes 2. Head Nodes

Minimum Number Of Nodes Minimum number of nodes in a binary tree whose height is h. At

Minimum Number Of Nodes Minimum number of nodes in a binary tree whose height is h. At

Minimum Number Of Nodes Minimum number of nodes in a binary tree whose height is h. At

Being a METS Startup Fast Failure; Fast Reward November 2016 Fast Failure; Fast Reward

Ener Energy gy and and Pe Performance Can Can a Wi Wimpy mpy Node Node Cl Clus uster

AVALON Algorithms and Software Architectures for Distributed & High Performance Computing

on Proof-of-Stake Cryptocurrencies JAEWAN HONG Proof of Stake VIRTUAL MINING TO REPLACE

IT VIRTUALIZATION FOR DISASTER MITIGATION AND RECOVERY Maurcio Tsugawa Takahiro Hirofuchi

Systemizing the Solution of Simulation-Driven Optimization Problems Marco Enriquez (joint work

Day 4 Cloud Resource Provisioning Plans Agenda for Today Cloud service providers offer cloud

Experiences using Grid Computing Technologies to Solve Optimization Problems Departamento de

Beacon Use, Issues and 406 MHz Beacon Registration Database Beacon Manufacturers Workshop 2016

Data Checking at Dropbox David Mah Dropbox Problems we are tackling Examples of Checkers

FAWN - a Fast Array of Wimpy Nodes Tomasz Dubrownik University of - PowerPoint PPT Presentation

Introduction Design and Architecture FAWN-DS FAWN-KV Evaluation FAWN - a Fast Array of Wimpy Nodes Tomasz Dubrownik University of Warsaw January 12, 2011 Tomasz Dubrownik FAWN - a Fast Array of Wimpy Nodes Introduction Design and

FAWN FAST ARRAY OF WIMPY NODES VIRAJ SULE FAWN is a cluster architecture for low-power

FAWN - Fast Array of Wimpy Nodes David G. Andersen et al. Presented by: Ravi Kiran Boggavarapu

FAWN: A Fast Array of Wimpy Nodes David G. Andersen, Jason Franklin, Michael Kaminsky * , Amar

CSE 6350 File and Storage System Infrastructure in Data centers Supporting Internet-wide Services

A WIMPy Leptogenesis Miracle Baryogenesis via WIMP freeze-out Brian Shuve with Yanou Cui and

singly linked lists Sept. 18, 2017 1 Recall last lecture: Java array array array array of

A Brief History of Chain Replication Christopher Meiklejohn // @cmeik QCon 2015, November 17th,

Breakfast Menu Breakfast Menu Paper: PopSet Fawn 120g Size: 594 x 420 mm Scale: 40%

Review We can declare an array of any type, even other arrays A 2D array is an array of

Cache Performance 1 C and cache misses (1) int array[1024]; // 4KB array int even_sum = 0,

Habanero Operating Committee January 25 2017 Habanero Overview 1. Execute Nodes 2. Head Nodes

Minimum Number Of Nodes Minimum number of nodes in a binary tree whose height is h. At

Minimum Number Of Nodes Minimum number of nodes in a binary tree whose height is h. At

Minimum Number Of Nodes Minimum number of nodes in a binary tree whose height is h. At

Being a METS Startup Fast Failure; Fast Reward November 2016 Fast Failure; Fast Reward

Ener Energy gy and and Pe Performance Can Can a Wi Wimpy mpy Node Node Cl Clus uster

AVALON Algorithms and Software Architectures for Distributed &amp; High Performance Computing

on Proof-of-Stake Cryptocurrencies JAEWAN HONG Proof of Stake VIRTUAL MINING TO REPLACE

IT VIRTUALIZATION FOR DISASTER MITIGATION AND RECOVERY Maurcio Tsugawa Takahiro Hirofuchi

Systemizing the Solution of Simulation-Driven Optimization Problems Marco Enriquez (joint work

Day 4 Cloud Resource Provisioning Plans Agenda for Today Cloud service providers offer cloud

Experiences using Grid Computing Technologies to Solve Optimization Problems Departamento de

Beacon Use, Issues and 406 MHz Beacon Registration Database Beacon Manufacturers Workshop 2016

Data Checking at Dropbox David Mah Dropbox Problems we are tackling Examples of Checkers

AVALON Algorithms and Software Architectures for Distributed & High Performance Computing