Haystack FACEBOOKS PHOTO STORAGE AKIB ZAMAN MOTIVATION Facebook - PowerPoint PPT Presentation

Finding a needle in Haystack FACEBOOK’S PHOTO STORAGE AKIB ZAMAN

MOTIVATION  Facebook stores an enormous amount of data:  260 billion images  20 petabytes of data  Traditional filesystems perform poorly under their workload  Several disk operations were necessary to read a single photo  Address long tail issue  Design pre-requisites:  Data is written once, read often, never modified and rarely deleted

QUESTION  “In our experience, we find that the disadvantages of a traditional POSIX based filesystem are directories and per file metadata.” Explain how this disadvantage becomes the limiting factor for the read throughput.

FOUR MAIN GOALS  High throughput and low latency  Fault-tolerant  Cost-effective  Simple

QUESTION  “We accomplish this by keeping all metadata in main memory,…”. Why did keeping metadata in memory become a challenge in Facebook’s system? Is it possible just to keep metadata of the most popular files in memory and to achieve the objective (“at most one disk operation per read”) by exploiting access locality?

QUESTION  “That simplicity lets us build and deploy a working system in a few months instead of a few years.” Comment on this statement (why can Haystack be considered as simple adaptation of UNIX file systems?)

QUESTION  “Haystack takes a straight -forward approach: it stores multiple photos in a single file and therefore maintains very large files.” Is there such a need to apply the technique in conventional file systems? If applied, what are its potential issues (give two examples)?

BRIEF OVERVIEW

QUESTION : HAYSTACK vs GFS Compare serving a photo in Haystack with GFS architecture.

QUESTION  “.. we explored whether it would be useful to build a system similar to GFS.” Comment on the statement. Why does “Serving photo requests in the long tail represents a problem” on GFS?

THE HAYSTACK ARCHITECTURE  Haystack Directory  Haystack Cache  Haystack Store  Photo Read  Photo Write  Photo Delete

QUESTION  The Cache “… caches a photo only if two conditions are met: (a) the request comes directly from a user and not the CDN and (b) the photo is fetched from a write- enabled Store machine.” Please explain this design choice.

HAYSTACK STORE

QUESTION  “To retrieve needles quickly, each Store machine maintains an in - memory data structure for each of its volumes.” What is this data structure about?

QUESTION  “As Haystack disallows overwriting needles, photos can only be modified by adding an updated needle with the same key and alternate key. “ Could you think of reason(s) why Haystack disallows overwriting?

THE INDEX FILE

QUESTION  “Store machines maintain an index file for each of their volumes.” What is this index and why is it needed? Does maintaining the index significantly increase disk load?

QUESTION  “Store machines maintain an index file for each of their volumes.” What is this index and why is it needed? How is space for deleted photos reclaimed?

THE STORE FILESYSTEM  Store machine uses XFS  XFS has two main advantages:  The block maps for several large files can be small enough to be stored in main memory  XFS provides efficient file pre-allocation and avoids fragmentation.  XFS helps to eliminate disk operation for metadata for reading a photo.

RECOVERY FROM FAILURES  Haystack needs to tolerate a variety of failures- faulty hard drives, misbehaving RAID controllers, bad motherboards.  They use the following techniques to tolerate failures:  Pitch-Fork: Background task that periodically checks the health of the machine.  Bulk Sync: Reset the data of a Store machine using the volume files supplied by a replica.

CONCLUSION  Haystack provides a fault-tolerant and simple solution to store pictures.  Done at dramatically less cost and higher throughput than a traditional approach using NAS appliances.  Haystack is incrementally scalable

Haystack FACEBOOKS PHOTO STORAGE AKIB ZAMAN MOTIVATION Facebook - PowerPoint PPT Presentation

Finding a needle in Haystack FACEBOOKS PHOTO STORAGE AKIB ZAMAN MOTIVATION Facebook stores an enormous amount of data: 260 billion images 20 petabytes of data Traditional filesystems perform poorly under their workload

Ultra-High Angular Resolution VLBI Rusen Lu ( ) rslu@haystack.mit.edu MIT Haystack

Finding the Needle in the Haystack Jonzy Data Security Analysis, Sr. Information Security

M87 Avery E. Broderick Sheperd Doeleman (MIT Haystack) Avi Loeb (Harvard) Vincent Fish (MIT

FINDING A NEEDLE IN HAYSTACK, FACEBOOKS PHOTO STORAGE Based on: D. Beaver, S. Kumar, H. C. Li,

Haystack full of needles. Beaver, D., Kumar, S., Li, H.C., Sobel, J., and Vajgel, P.: Finding

Visualizing search results Haystack Europe, London 2018 / sebastian.russ@tudock.de / Visualizing

Finding a Needle in Haystack Presentation by: Neelim Haider Authors (of paper): Doug Beaver,

IMPROVING PRECISION OF E-COMMERCE SEARCH RESULTS HAYSTACK Europe 2019 - Berlin 06.11.2019 1

Picviz finding a needle in a haystack Sbastien Tricaud INL Usenix, San Diego 2008 Sbastien

Feature Selection for Predictive Modelling A Needle in a Haystack Problem Munshi Imran Hossain

Early Detection of Aquatic Invasive Species finding the needle in the haystack Jim Grazio,

HOW AI SETS THE AUDIENCE HAYSTACK ON FIRE Lance Schafer General Manager Product &

Finding the Needle in a Haystack: Materials discovery through

NEEDLES IN A HAYSTACK: MINING INFORMATION FROM PUBLIC DYNAMIC SANDBOXES FOR MALWARE INTELLIGENCE

Ne Needle in a Haystack: : Tracking Do Down Elite Ph Phishing g Dom Domains in the Wild Ke

HeRO : Heliophysics Radio Observer Mary Knapp (MIT EAPS) Dale Gary (NJIT) PI Colin Lonsdale

LASSO: CONNECTING Ryan Mork INTERDISCIPLINARY Genevieve Shattow RESEARCHERS PIRE Hackathon

Search Marco Chiarandini Department of Mathematics & Computer Science University of Southern

Accept the Risk and Continue: Measuring the Long Tail of Government https Adoption Sudheesh

IAIR T TDS V VI: D Deali ling wit ith Long T Tail il Cla laim ims October 12, 2018

Design and Performance of the DUNE 35-ton Prototype Time Projection Chamber The 35-ton Author

Communicating the Value of Open Source Metrics Ben Lloyd Pearson Governor of GitHub @ Oath

The XENON Dark Matter Project at Gran Sasso National Laboratory Andrea Molinario

Tolerance Volume Manish Mandad David Cohen-Steiner Pierre Alliez Inria Sophia Antipolis - 1

Sambuz

Useful Links

Newsletter

Mail Us

Haystack FACEBOOKS PHOTO STORAGE AKIB ZAMAN MOTIVATION Facebook - PowerPoint PPT Presentation

Finding a needle in Haystack FACEBOOKS PHOTO STORAGE AKIB ZAMAN MOTIVATION Facebook stores an enormous amount of data: 260 billion images 20 petabytes of data Traditional filesystems perform poorly under their workload

Ultra-High Angular Resolution VLBI Rusen Lu ( ) rslu@haystack.mit.edu MIT Haystack

Finding the Needle in the Haystack Jonzy Data Security Analysis, Sr. Information Security

M87 Avery E. Broderick Sheperd Doeleman (MIT Haystack) Avi Loeb (Harvard) Vincent Fish (MIT

FINDING A NEEDLE IN HAYSTACK, FACEBOOKS PHOTO STORAGE Based on: D. Beaver, S. Kumar, H. C. Li,

Haystack full of needles. Beaver, D., Kumar, S., Li, H.C., Sobel, J., and Vajgel, P.: Finding

Visualizing search results Haystack Europe, London 2018 / sebastian.russ@tudock.de / Visualizing

Finding a Needle in Haystack Presentation by: Neelim Haider Authors (of paper): Doug Beaver,

IMPROVING PRECISION OF E-COMMERCE SEARCH RESULTS HAYSTACK Europe 2019 - Berlin 06.11.2019 1

Picviz finding a needle in a haystack Sbastien Tricaud INL Usenix, San Diego 2008 Sbastien

Feature Selection for Predictive Modelling A Needle in a Haystack Problem Munshi Imran Hossain

Early Detection of Aquatic Invasive Species finding the needle in the haystack Jim Grazio,

HOW AI SETS THE AUDIENCE HAYSTACK ON FIRE Lance Schafer General Manager Product &amp;

Finding the Needle in a Haystack: Materials discovery through

NEEDLES IN A HAYSTACK: MINING INFORMATION FROM PUBLIC DYNAMIC SANDBOXES FOR MALWARE INTELLIGENCE

Ne Needle in a Haystack: : Tracking Do Down Elite Ph Phishing g Dom Domains in the Wild Ke

HeRO : Heliophysics Radio Observer Mary Knapp (MIT EAPS) Dale Gary (NJIT) PI Colin Lonsdale

LASSO: CONNECTING Ryan Mork INTERDISCIPLINARY Genevieve Shattow RESEARCHERS PIRE Hackathon

Search Marco Chiarandini Department of Mathematics &amp; Computer Science University of Southern

Accept the Risk and Continue: Measuring the Long Tail of Government https Adoption Sudheesh

IAIR T TDS V VI: D Deali ling wit ith Long T Tail il Cla laim ims October 12, 2018

Design and Performance of the DUNE 35-ton Prototype Time Projection Chamber The 35-ton Author

Communicating the Value of Open Source Metrics Ben Lloyd Pearson Governor of GitHub @ Oath

The XENON Dark Matter Project at Gran Sasso National Laboratory Andrea Molinario

Tolerance Volume Manish Mandad David Cohen-Steiner Pierre Alliez Inria Sophia Antipolis - 1

Sambuz

Useful Links

Newsletter

Mail Us

HOW AI SETS THE AUDIENCE HAYSTACK ON FIRE Lance Schafer General Manager Product &

Search Marco Chiarandini Department of Mathematics & Computer Science University of Southern