FAWN - a Fast Array of Wimpy Nodes Tomasz Dubrownik University of - - PowerPoint PPT Presentation

fawn a fast array of wimpy nodes
SMART_READER_LITE
LIVE PREVIEW

FAWN - a Fast Array of Wimpy Nodes Tomasz Dubrownik University of - - PowerPoint PPT Presentation

Introduction Design and Architecture FAWN-DS FAWN-KV Evaluation FAWN - a Fast Array of Wimpy Nodes Tomasz Dubrownik University of Warsaw January 12, 2011 Tomasz Dubrownik FAWN - a Fast Array of Wimpy Nodes Introduction Design and


slide-1
SLIDE 1

Introduction Design and Architecture FAWN-DS FAWN-KV Evaluation

FAWN - a Fast Array of Wimpy Nodes

Tomasz Dubrownik

University of Warsaw

January 12, 2011

Tomasz Dubrownik FAWN - a Fast Array of Wimpy Nodes

slide-2
SLIDE 2

Introduction Design and Architecture FAWN-DS FAWN-KV Evaluation

Outline

1

Introduction

2

Design and Architecture

3

FAWN-DS

4

FAWN-KV

5

Evaluation

Tomasz Dubrownik FAWN - a Fast Array of Wimpy Nodes

slide-3
SLIDE 3

Introduction Design and Architecture FAWN-DS FAWN-KV Evaluation

Key issues

Growing CPU vs. I/O gap Contemporary systems must serve millions of users Electricity consumed adds up to significant costs

Tomasz Dubrownik FAWN - a Fast Array of Wimpy Nodes

slide-4
SLIDE 4

Introduction Design and Architecture FAWN-DS FAWN-KV Evaluation

Key issues

Is there a way to exploit the CPU vs. I/O gap to the users’ advantage?

Tomasz Dubrownik FAWN - a Fast Array of Wimpy Nodes

slide-5
SLIDE 5

Introduction Design and Architecture FAWN-DS FAWN-KV Evaluation

Observations

Many industry problems exhibit massive data parallelism with relatively small computational demands A fair amount of real-life problems heavily depends on efficient, distributed key-value stores that span several gigabytes Such stores often contain millions of small items (on the order

  • f kilobytes)

Tomasz Dubrownik FAWN - a Fast Array of Wimpy Nodes

slide-6
SLIDE 6

Introduction Design and Architecture FAWN-DS FAWN-KV Evaluation

A motivating example

Twitter A wonderfully popular service, Twitter has all the above-mentioned

  • properties. Each tweet is limited to 140B. There is fairly little

processing performed on the tweets, yet just the search system is stressed by an average of 12000 queries per second. There is a stream of over a thousand tweets per second entering the system. A high-performance key-value store is crucial to the operation. At the same time the cost of running a conventional cluster capable of meeting this demand is extremely high. Disclaimer To my knowledge, FAWN is not being used in Twitter. But it would probably make a lot of sense if it were. Thank you.

Tomasz Dubrownik FAWN - a Fast Array of Wimpy Nodes

slide-7
SLIDE 7

Introduction Design and Architecture FAWN-DS FAWN-KV Evaluation

The problem, defined

To engineer a fast, scalable key-value store for small (hundreds to thousands of bytes) items This store is expected to: respond to upwards from thousands of random queries per second (QPS) conserve power as much as possible meet service level agreements regarding latency scale well upwards as the system grows scale well downwards as demand fluctuates during operating hours

Tomasz Dubrownik FAWN - a Fast Array of Wimpy Nodes

slide-8
SLIDE 8

Introduction Design and Architecture FAWN-DS FAWN-KV Evaluation

Possible solutions (1)

A cluster of traditional servers with HDD as storage. Problems: very poor performance for random accesses, unless RAID or a similar disk array is used if RAID is to be used, both initial price and total cost of

  • wnership skyrocket

most of the power consumption is fixed — not much power is conserved during low load periods

Tomasz Dubrownik FAWN - a Fast Array of Wimpy Nodes

slide-9
SLIDE 9

Introduction Design and Architecture FAWN-DS FAWN-KV Evaluation

Possible solutions (2)

A cluster of traditional servers with RAM as storage (think memcached) Problems: very high cost in terms of $/GB robustness is lost unless additional systems are employed power consumption is just as bad as before

Tomasz Dubrownik FAWN - a Fast Array of Wimpy Nodes

slide-10
SLIDE 10

Introduction Design and Architecture FAWN-DS FAWN-KV Evaluation

Possible solutions (3)

A cluster of traditional servers with SSD as storage Problems: while random reads are great, random writes are terrible (BerkleyDB running on SSD averages just 0.07MBps) power consumption is just as bad as before

Tomasz Dubrownik FAWN - a Fast Array of Wimpy Nodes

slide-11
SLIDE 11

Introduction Design and Architecture FAWN-DS FAWN-KV Evaluation

Possible solutions (4)

A combination of the above Problems: a combination of the above :)

Tomasz Dubrownik FAWN - a Fast Array of Wimpy Nodes

slide-12
SLIDE 12

Introduction Design and Architecture FAWN-DS FAWN-KV Evaluation

Introducing FAWN

A slightly different approach: Let’s use energy-efficient, wimpy processors coupled with fast SSD storage. Design a custom key-value store exploiting the characteristics

  • f flash storage.

That way power consumption can be kept to a minimum while retaining high performance and robustness. The resulting system has a lower total cost of ownership and good scalability.

Tomasz Dubrownik FAWN - a Fast Array of Wimpy Nodes

slide-13
SLIDE 13

Introduction Design and Architecture FAWN-DS FAWN-KV Evaluation

Outline

1

Introduction

2

Design and Architecture

3

FAWN-DS

4

FAWN-KV

5

Evaluation

Tomasz Dubrownik FAWN - a Fast Array of Wimpy Nodes

slide-14
SLIDE 14

Introduction Design and Architecture FAWN-DS FAWN-KV Evaluation

Anatomy of a key-value data store

A request can be either a get, put or delete Keys are 160-bit integers Values are small blobs (typically between 256B and 1KB) Each request pertains to a single key-value pair — there is no relational overlay at this level

Tomasz Dubrownik FAWN - a Fast Array of Wimpy Nodes

slide-15
SLIDE 15

Introduction Design and Architecture FAWN-DS FAWN-KV Evaluation

Overview

Tomasz Dubrownik FAWN - a Fast Array of Wimpy Nodes

slide-16
SLIDE 16

Introduction Design and Architecture FAWN-DS FAWN-KV Evaluation

Overview

The cluster is composed of Front-ends and Back-ends Front-ends forward requests to appropriate back-ends and return responses to clients The front-ends are responsible for maintaining order in the cluster Back-ends run the FAWN-DS datastores (one per key-range) Together the machines form a single FAWN-KV key-value store

Tomasz Dubrownik FAWN - a Fast Array of Wimpy Nodes

slide-17
SLIDE 17

Introduction Design and Architecture FAWN-DS FAWN-KV Evaluation

Front-end

Responsibilities: passing requests and responses keeping track of back-ends’ Virtual IDs and their mapping to key ranges managing joins and leaves. Example configuration used for evaluation: Intel Atom CPU (27 W)

Tomasz Dubrownik FAWN - a Fast Array of Wimpy Nodes

slide-18
SLIDE 18

Introduction Design and Architecture FAWN-DS FAWN-KV Evaluation

Back-end

A back-end runs one FAWN-DS data store per key range. Each data store supports the basic key-value requests, as well as maintance operations (Split, Merge, Compact) Example configuration used for evaluation: AMD Geode LX CPU (500MHz) 256MB DDR SDRAM (400MHz) 100Mbps Ethernet Sandisk Extreme IV CompactFlash (4GB)

Tomasz Dubrownik FAWN - a Fast Array of Wimpy Nodes

slide-19
SLIDE 19

Introduction Design and Architecture FAWN-DS FAWN-KV Evaluation

Back-ends, cont.

Back-ends are organized in a logical ring which coincides with the key space (mod 2160) Each back-end is assigned a fixed number of Virtual IDs in hopes of maintaining balance Virtual IDs are the lowest keys a node handles This allows for a well-defined successor relation on keys and virtual nodes More on this later.

Tomasz Dubrownik FAWN - a Fast Array of Wimpy Nodes

slide-20
SLIDE 20

Introduction Design and Architecture FAWN-DS FAWN-KV Evaluation

Outline

1

Introduction

2

Design and Architecture

3

FAWN-DS

4

FAWN-KV

5

Evaluation

Tomasz Dubrownik FAWN - a Fast Array of Wimpy Nodes

slide-21
SLIDE 21

Introduction Design and Architecture FAWN-DS FAWN-KV Evaluation

Peculiarities of flash storage

Flash media differ from traditional HDDs in a number of ways, some of which seriously impact persistent data store designs. Random reads are nearly as fast as sequential reads Random writes are very inefficient (owing to the fact that a whole page needs to be flashed) Sequential writes perform admirably On modern devices, semi-random writes (random appends to a small number of files) are nearly as fast as sequential writes These features can be exploited by using a log-structured data store.

Tomasz Dubrownik FAWN - a Fast Array of Wimpy Nodes

slide-22
SLIDE 22

Introduction Design and Architecture FAWN-DS FAWN-KV Evaluation

FAWN-DS

To take advantage of the properties of flash storage, FAWN-DS is structured as follows: The key-value mappings are stored in a Data Log on the flash

  • medium. This store is append-only.

To provide fast random access, a hash index map into the data log is kept in RAM. In order to reduce the memory footprint, keys are reduced, inflicting as a trade-off a (configurable) chance of necessitating more than one flash access. To reclaim unused storage space, a Compact operation is

  • introduced. It is designed to be as efficient as possible on

flash, using only bulk sequential writes. In order to facilitate reconstruction of the in-memory index, checkpointing is utilized.

Tomasz Dubrownik FAWN - a Fast Array of Wimpy Nodes

slide-23
SLIDE 23

Introduction Design and Architecture FAWN-DS FAWN-KV Evaluation

Lookup

Tomasz Dubrownik FAWN - a Fast Array of Wimpy Nodes

slide-24
SLIDE 24

Introduction Design and Architecture FAWN-DS FAWN-KV Evaluation

Lookup cont.

Two smaller numbers are extracted from the key:

The index bits — the lowest i bits key fragment — the next lowest k bits

The index bits serve as an index into the first in-memory hash index. If the bucket pointed to by the index bits is valid and the key fragments match, the data log entry is retrieved and the full keys compared. If keys match, the record is returned, otherwise the next bucket in the hash chain is examined as above. If nothing is found, an appropriate response is generated.

Tomasz Dubrownik FAWN - a Fast Array of Wimpy Nodes

slide-25
SLIDE 25

Introduction Design and Architecture FAWN-DS FAWN-KV Evaluation

Lookup, now in pseudocode!

Tomasz Dubrownik FAWN - a Fast Array of Wimpy Nodes

slide-26
SLIDE 26

Introduction Design and Architecture FAWN-DS FAWN-KV Evaluation

Store and Delete

When a value is inserted into the store, it is simply appended to the data log and the corresponding bucket are changed to point to the new record. The valid bit is set to true. When a record is to be deleted, a delete entry is appended to the log (for fault-tolerance) and the valid bit in the corresponding bucket is set to false. Actual storage space is not reclaimed until a Compact is performed.

Tomasz Dubrownik FAWN - a Fast Array of Wimpy Nodes

slide-27
SLIDE 27

Introduction Design and Architecture FAWN-DS FAWN-KV Evaluation

Maintenance operations

Split is issued when the key range is divided as a new virtual node joins the ring. It scans the data log sequentially and writes out the appropriate entries into a new one. Merge is responsible for merging two data stores into one, encompassing the combined key range. It achieves this by copying entries from one log into the other. Compact copies the valid data store entries into a new log, skipping those that have been orphaned by puts and those that were actively deleted. Owing to the append-only design it is possible to perform these

  • perations concurrently with normal requests, only locking to

switch data stores while finalizing maintenance.

Tomasz Dubrownik FAWN - a Fast Array of Wimpy Nodes

slide-28
SLIDE 28

Introduction Design and Architecture FAWN-DS FAWN-KV Evaluation

Outline

1

Introduction

2

Design and Architecture

3

FAWN-DS

4

FAWN-KV

5

Evaluation

Tomasz Dubrownik FAWN - a Fast Array of Wimpy Nodes

slide-29
SLIDE 29

Introduction Design and Architecture FAWN-DS FAWN-KV Evaluation

In order to provide a robust, scalable service the back-ends running FAWN-DS instances are joined together and managed by front-end nodes, which in turn in industry applications would be connected to a master node. Fault-tolerance is introduced via replication Each front-end is ideally responsible for some 80 back-ends and manages joins and leaves, exposing a simple put, get, delete interface Additionally, front-ends can route requests between themselves and cache responses, leaving the master node as an optimization and a convenience without leaving it a single point of failure

Tomasz Dubrownik FAWN - a Fast Array of Wimpy Nodes

slide-30
SLIDE 30

Introduction Design and Architecture FAWN-DS FAWN-KV Evaluation

Life-cycle of a request

Tomasz Dubrownik FAWN - a Fast Array of Wimpy Nodes

slide-31
SLIDE 31

Introduction Design and Architecture FAWN-DS FAWN-KV Evaluation

Life-cycle of a request, elaborated

Each front-end is assigned a contiguous portion of the key space Upon receiving a request it either processes it using its managed back-ends or forwards it if the key belongs to a different front-end Front-ends maintain a list of virtual nodes and their corresponding addresses, and thus can instantly translate the request to the appropriate FAWN-DS calls While the request is processed by back-ends, the front-end ensures replication is maintained

Tomasz Dubrownik FAWN - a Fast Array of Wimpy Nodes

slide-32
SLIDE 32

Introduction Design and Architecture FAWN-DS FAWN-KV Evaluation

Replication in Chains

Tomasz Dubrownik FAWN - a Fast Array of Wimpy Nodes

slide-33
SLIDE 33

Introduction Design and Architecture FAWN-DS FAWN-KV Evaluation

Replication in Chains, cont.

Each key defines a chain in the virtual node ring A fixed number of nodes maintains copies of the mapping The nodes are obtained by iterating the successor function of the key The first node that contains a replica is the head of the chain The last node is the tail Every put request is issued to the head of the chain and waits for an acknowledgement from the tail. Every get is passed to the tail. This ensures consistency and proper ordering of changes throughout the change.

Tomasz Dubrownik FAWN - a Fast Array of Wimpy Nodes

slide-34
SLIDE 34

Introduction Design and Architecture FAWN-DS FAWN-KV Evaluation

Replication of a put

After receiving the put request, the head forwards the put along the chain and waits for an acknowledgement. If all goes well, the tail acknowledges both to the front-end and recursively to its predecessor.

Tomasz Dubrownik FAWN - a Fast Array of Wimpy Nodes

slide-35
SLIDE 35

Introduction Design and Architecture FAWN-DS FAWN-KV Evaluation

How a join is handled

When a (virtual) node joins the FAWN-KV ring precisely one key range is split in two. To maintain replication the following happens: The current tail transmits its whole log to the new node (pre-copy) The front-end informs the nodes in the chain of the join via a chain membership message In response to said message, nodes flush updates received during pre-copy down the chain Please refer to the paper for details on how updates arriving during the flush are handled, as well as the special cases of joining as head

  • r tail.

Tomasz Dubrownik FAWN - a Fast Array of Wimpy Nodes

slide-36
SLIDE 36

Introduction Design and Architecture FAWN-DS FAWN-KV Evaluation

What happens when a node leaves

When a node leaves the ring, each node that is supposed to take

  • ver the replicas in essence joins the replica chain at a different

position in the key space, so the protocol is essentially the same as for a join. At this stage failure detection is achieved by a heartbeat. If a node misses a set number of heartbeat signals, the front-end initiates a leave and appropriate action is taken.

Tomasz Dubrownik FAWN - a Fast Array of Wimpy Nodes

slide-37
SLIDE 37

Introduction Design and Architecture FAWN-DS FAWN-KV Evaluation

Outline

1

Introduction

2

Design and Architecture

3

FAWN-DS

4

FAWN-KV

5

Evaluation

Tomasz Dubrownik FAWN - a Fast Array of Wimpy Nodes

slide-38
SLIDE 38

Introduction Design and Architecture FAWN-DS FAWN-KV Evaluation

Procedure description

FAWN’s performance was evaluated under a number of criteria: Single node efficiency (compared to baseline hardware capabilities) Cluster performance (tested on a 21 back-end/1 front-end system) Energy efficiency The results were then compared with a number of more traditional configurations.

Tomasz Dubrownik FAWN - a Fast Array of Wimpy Nodes

slide-39
SLIDE 39

Introduction Design and Architecture FAWN-DS FAWN-KV Evaluation

Single node performance

Baseline:

  • Seq. read
  • Rand. read
  • Seq. write
  • Rand. write

28.5 MBps 1424 QPS 24 MBps 110 QPS FAWN: Data size Rand read (1KB) Rand read (256B) 125MB 51968 QPS 65412 QPS 1GB 1595 QPS 1964 QPS 3.5GB 1150 QPS 1298 QPS

Tomasz Dubrownik FAWN - a Fast Array of Wimpy Nodes

slide-40
SLIDE 40

Introduction Design and Architecture FAWN-DS FAWN-KV Evaluation

Gets vs Puts

Tomasz Dubrownik FAWN - a Fast Array of Wimpy Nodes

slide-41
SLIDE 41

Introduction Design and Architecture FAWN-DS FAWN-KV Evaluation

Cluster — performance and power consumption

Tomasz Dubrownik FAWN - a Fast Array of Wimpy Nodes

slide-42
SLIDE 42

Introduction Design and Architecture FAWN-DS FAWN-KV Evaluation

Important points on power consumption

The plot displayed does not take into account the front-end (further 27W) The networking hardware used takes 20W to operate (included in the plotted figure) Even factoring in the front-end, the system achieved 330 queries per Joule. A desktop computer can provide about 50 Q/J using SSD.

Tomasz Dubrownik FAWN - a Fast Array of Wimpy Nodes

slide-43
SLIDE 43

Introduction Design and Architecture FAWN-DS FAWN-KV Evaluation

CDF of Query Latency

Tomasz Dubrownik FAWN - a Fast Array of Wimpy Nodes

slide-44
SLIDE 44

Introduction Design and Architecture FAWN-DS FAWN-KV Evaluation

Comparison with alternative approaches (projected)

Important point The FAWN entries in this table are expected performance measurements of systems built using state of the art components.

Tomasz Dubrownik FAWN - a Fast Array of Wimpy Nodes

slide-45
SLIDE 45

Introduction Design and Architecture FAWN-DS FAWN-KV Evaluation

Solution space for system builders (projected)

Tomasz Dubrownik FAWN - a Fast Array of Wimpy Nodes

slide-46
SLIDE 46

Introduction Design and Architecture FAWN-DS FAWN-KV Evaluation

Conclusions

FAWN is demonstrated to be a viable approach to providing cost-efficient data stores Using wimpy processors in an array can reduce power consumption while retaining performance Barring breakthrough discoveries, FAWN-like technologies are expected to deliver the lowest TCO for a large portion of the problem space Larger scale testing is necessary to establish the correctness of these claims and to demonstrate scalability

Tomasz Dubrownik FAWN - a Fast Array of Wimpy Nodes

slide-47
SLIDE 47

Introduction Design and Architecture FAWN-DS FAWN-KV Evaluation

References

[FAWN] D. G. Andersen, J. Franklin, M. Kaminsky, A. Phanishayee, L. Tan, and V. Vasudevan FAWN: A Fast Array of Wimpy Nodes Proceedings ACM SOSP 2009, Big Sky, MT, USA, October 2009. All images are taken from the FAWN paper.

Tomasz Dubrownik FAWN - a Fast Array of Wimpy Nodes