fawn fast array of wimpy nodes
play

FAWN - Fast Array of Wimpy Nodes David G. Andersen et al. - PowerPoint PPT Presentation

FAWN - Fast Array of Wimpy Nodes David G. Andersen et al. Presented by: Ravi Kiran Boggavarapu 1001541261 A cluster architecture for low-power and data-intensive computing. Wimpy nodes = A combination of low-power CPUs and small


  1. FAWN - Fast Array of Wimpy Nodes David G. Andersen et al. Presented by: Ravi Kiran Boggavarapu 1001541261

  2. ● A cluster architecture for low-power and data-intensive computing. ● Wimpy nodes = A combination of low-power CPUs and small flash. ○ design centers around log-structured datastores that provide high performance on flash. ● Goal of the architecture?? ○ Increase performance while minimizing power consumption -- Save the electricity bills of the Data Centers! ● How performance is measured? ○ This paper uses queries per Joule as metric. FAWN handles roughly 350 k-v qpJ. 2

  3. The above photo is taken from: http://www.cs.cmu.edu/~fawnproj/ 3

  4. Trade-offs of using Flash: ● Flash provides a non-volatile memory store with several significant benefits over traditional magnetic disks: ○ Fast random reads. ○ Efficient power consumption for I/O ● But it also introduces challenges: ○ Small writes on flash are very expensive. ○ Updating a single page requires first erasing the entire block of pages and writing the entire modified block. 4

  5. Log-structured datastore ● An append-only file system. ● Writes are appended to a sequential log Data log. ● Reads require a single random access 5

  6. Q1) “The workloads these systems support share several characteristics, they are: - I/O, not computation, intensive, - requiring random access over large datasets, - and the size of objects stored is typically small. ” Why workloads of these characteristics represent a challenge to the system design? 6

  7. Ans - Q1) ● Increasing gap between CPU performance - I/O bandwidth. ● "For data-intensive workloads storage, network, and memory bandwidth bottleneck often cause low CPU utilization." ● The "Small-write problem." - Multiple random disk writes(very slow). 7

  8. Q2) “The key design choice in FAWN-KV is the use of a log structured per-node datastore called FAWN-DS that provides high performance reads and writes using flash memory. ” “These performance problems motivate log-structured techniques for flash filesystems and data structures” What key benefit does a log structured data organization bring to the KV store design? 8

  9. Ans - Q2) ● get() = Random read. ● While, put() and delete() = Append. ● log-structured design = append only filesystem. ● Hence, using a log-structured data store prevents small random writes on disk. 9

  10. Q3) “To provide this property, FAWN-DS maintains an in-DRAM hash table (Hash Index) that maps keys to an offset in the append-only Data Log on flash. ” 10

  11. Ans - Q3) ● Large metadata - long buckets(nodes) and multiple pointers for each node(Linked List). ● RAM is volatile - in case of failure, the whole Hash Table is will be lost! 11

  12. Q4) “It stores only a fragment of the actual key in memory to find a location in the log; ” Is there concern on correctness of this design? 12

  13. Ans - Q4) ● What if multiple keys have have the fragment part that is similar? ○ Reads the full key from the log and verifies it with the key it read. ○ Therefore, no worries about the correctness. 13

  14. Q5) Explain "Basic functions:" Store, Lookup, Delete 14

  15. Ans - Q5) ● Store: ○ appends entry log updates the corresponding hash table entry. ● Lookup: ○ gets offset from hash entry and indexes into Data log, and returns the data blob ● Delete: ○ invalidates the hash entry by clearing the valid flag. ○ appends Delete entry to the log. ● Why append delete? - Discussed in the answer to the next question. Figure copied from http://vijay.vasu.org/static/talks/fawn-sosp2009-slides.pdf 15

  16. Q6) “As an optimization, FAWN-DS periodically checkpoints the index by writing the Hash Index and a pointer to the last log entry to flash. ” Why does this checkpointing help with the recovery efficiency? Why is a Delete entry needed in the log for a correct recovery? 16

  17. Ans - Q6) ● How check point helps with recovery efficient? ○ After a failure only the contents starting from the checkpoint are necessary to create the Hash Index. ● Why the Delete entry? ○ Fault tolerance. ○ Avoid random writes to disks. 17

  18. Thank you 18

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend