Scarlett: Coping with Skewed Content Popularity in MapReduce - PowerPoint PPT Presentation

Scarlett: Coping with Skewed Content Popularity in MapReduce Clusters Ganesh Ananthanarayanan, Sameer Agarwal, Srikanth Kandula, Albert Greenberg, Ion Stoica, DukeHarlan, Ed Harris presented by Paweł Posielężny

MapReduce

Why Scarlett?

Scarlett uses  historical usage statistics  online predictors based on recent past  information about the jobs that have been submitted for execution

the skew in popularity and its impact.

Effect of Popularity Skew: Hotspots

Logs summary  The number of concurrent accesses is a sufficient metric to capture popularity of files.  Large files contribute to most accesses in the cluster, so reducing contention for such files improves overall performance.  Recent logs are a good indicator of future access patterns.  Hotspots in the cluster can be smoothened via appropriate placement of files.

Scarlett: System Design  Scarlett considers replicating content at the smallest granularity at which jobs can address content (file)  Scarlett replicates files based on predicted popularity.

File Replication Factor  maintains a count of the maximum number of concurrent accesses (cf) in a learningwindow of length TL  Once every rearrangement period, TR, Scarlett computes appropriate replication factors for all the files.  TL = 24 hours  TR = 12 hours δ replication factor: rf = max(cf + , 3).

Scarlett employs two approaches. the priority approach roundrobin approach

Desirable properties Scarlett’s strategy  Files that are accessed more frequently havemore replicas to smooth their load over. δ  Together, , TR and TL track changes in file popularity while being robust to shortlived effects.  Choosing appropriate values for the budget on extra storage B and the period at which replication factors change TR can limit the impact of Scarlett on the cluster.

Smooth Placement of Replicas place the desired number of replicas of a block on as many distinct machines and racks as possible while ensuring that the expected load is uniform across all machines and racks.

Smooth Placement of Replicas  load factor for each machine lm  The load factor for each rack – lr (the sum of load factors of machines in the rack)  Each replica is placed on the the rack with the least load and the machine with the least load in that rack.  Placing a replica increases both these factors by the expected load due to that replica (= cf/rf ).

Creating Replicas Efficiently  While Replicating, Read From Many Sources  Compress Data Before Replicating  Lazy Deletion

Case Studies of Frameworks How to deal with a task that cannot run at the machine(s) that it prefers to run at?  less preferred tasks can be evicted to make way  the newly arriving task can be forced to run at a suboptimal location in the cluster  one of the contending tasks can be paused until contention passes

Evictions in Dryad  Evicted task is given a 30s notice period before being evicted.  Of all tasks that began running on the cluster, 21.1% of them end up being evicted

Loss of Locality in Hadoop  achieve only 5% node locality and 59% rack locality. (data from Facebook's Hadoop's logs)

Evaluation Methodology:  using an implementation of Hadoop  using an extensive simulation of Dryad  sensitivity analysis  budget size and distribution  compression techniques

Does data locality improve in Hadoop? δ = 1 TL range from 6 to 24 hours TR ≥ 10 hours B = 10% completion times of 500 jobs.

Is eviction of tasks prevented in Dryad? δ = 1 TL range from 6 to 24 hours TR = 12 hours B = 10%

Sensitivity Analysis

Storage Budget for Replication

Increase in Network Traffic

Benefits from selective replication

Summary Scarlett uses:  historical usage statistics  Scarlett uses online predictors based on recent past  Scarlett uses information about the jobs that have been submitted for execution Scarlett replicates files based on predicted popularity.

Thank you

Any questions?

Scarlett: Coping with Skewed Content Popularity in MapReduce - PowerPoint PPT Presentation

Scarlett: Coping with Skewed Content Popularity in MapReduce Clusters Ganesh Ananthanarayanan, Sameer Agarwal, Srikanth Kandula, Albert Greenberg, Ion Stoica, DukeHarlan, Ed Harris presented by Pawe Posielny MapReduce Why Scarlett?

How kids perceive popularity? Sha Lana Clinton Outline u Popularity u Experiment u

Dr. EP Scarlett School Store Introduction * Greetings/Salutations * Introduction to the Business

Skewed Binary Search Trees 6 2 10 1 3 7 12 4 8 11 13 5 9 14 15 Gerth Stlting

Stability, Popularity, and Lower Quotas Meghana Nasre IIT Madras CAALM 2019 Chennai

Does Content Determine Information Popularity in Social Media? A Case Study of YouTube Videos

How MENTAL HEALTH SPAIN is coping with COVID-19 How MENTAL HEALTH SPAIN is coping with COVID-19

Experiencing and Coping with Grief Experiencing and Coping with Grief Stages of Grief Being

A STUDY OF REPOSITORY NETWORK Distribution of popularity & Effect of coexisting languages

CS345a: Data Mining Jure Leskovec and Anand Rajaraman j Stanford University Instead of generic

Scripting languages: Perl and PHP Tuukka Haapasalo December 1, 2009 Tuukka Haapasalo Scripting

Needham Virtual Healthcare Conference John Scarlett, M.D. Chairman and Chief Executive Officer,

Healthcare Conference John Scarlett, M.D. President and CEO, Geron Corporation November 2018

SPRING SUMMER TRENDS 2020/2021 PANTONE BRIGHTS NEW YORK FLAME SCARLETT SAFFRON CLASSIC BLUE

Dr. Scarlett Miller Assistant Professor Engineering Design & , Industrial Engineering From

Update from the College Mrs Scarlett McNally BSc FRCS(Tr&Orth) MA MBA Consultant Orthopaedic

Social Media Hub Feature The Koalas: Scarlett Taviss Aawista Chaudhry Rita Alsattah Matthew

Lecture 9: Combinational Circuit Design Outline Bubble Pushing Compound Gates

Santee Court Market & Food Hall Concept Los Angeles, CA Essex Property Trust July 11 2017

GETTING OUT OF THE KITCHEN SINK: EFFECTIVE INTERVENTIONS FOR LOW RISK YOUTH ALICIA

TESTING FOR LEAD CONTENT Presented to the Board Of Education Roger L. CayCe Deputy

Queens Boulevard & Yellowstone Boulevard Safety Improvements in Forest Hills Senior Area 2014

Atlantic Avenue Phase I: Georgia Ave to Logan St New York City Department of Transportation |

Detailed Design Review MSD 18047 - VIRTUAL CANE Agenda Project and Concept Breakdown Picture

Retroactively estimating system clock skew from stored web browser cookies Contents 1. Why? 2.

Scarlett: Coping with Skewed Content Popularity in MapReduce - PowerPoint PPT Presentation

Scarlett: Coping with Skewed Content Popularity in MapReduce Clusters Ganesh Ananthanarayanan, Sameer Agarwal, Srikanth Kandula, Albert Greenberg, Ion Stoica, DukeHarlan, Ed Harris presented by Pawe Posielny MapReduce Why Scarlett?

How kids perceive popularity? Sha Lana Clinton Outline u Popularity u Experiment u

Dr. EP Scarlett School Store Introduction * Greetings/Salutations * Introduction to the Business

Skewed Binary Search Trees 6 2 10 1 3 7 12 4 8 11 13 5 9 14 15 Gerth Stlting

Stability, Popularity, and Lower Quotas Meghana Nasre IIT Madras CAALM 2019 Chennai

Does Content Determine Information Popularity in Social Media? A Case Study of YouTube Videos

How MENTAL HEALTH SPAIN is coping with COVID-19 How MENTAL HEALTH SPAIN is coping with COVID-19

Experiencing and Coping with Grief Experiencing and Coping with Grief Stages of Grief Being

A STUDY OF REPOSITORY NETWORK Distribution of popularity &amp; Effect of coexisting languages

CS345a: Data Mining Jure Leskovec and Anand Rajaraman j Stanford University Instead of generic

Scripting languages: Perl and PHP Tuukka Haapasalo December 1, 2009 Tuukka Haapasalo Scripting

Needham Virtual Healthcare Conference John Scarlett, M.D. Chairman and Chief Executive Officer,

Healthcare Conference John Scarlett, M.D. President and CEO, Geron Corporation November 2018

SPRING SUMMER TRENDS 2020/2021 PANTONE BRIGHTS NEW YORK FLAME SCARLETT SAFFRON CLASSIC BLUE

Dr. Scarlett Miller Assistant Professor Engineering Design &amp; , Industrial Engineering From

Update from the College Mrs Scarlett McNally BSc FRCS(Tr&amp;Orth) MA MBA Consultant Orthopaedic

Social Media Hub Feature The Koalas: Scarlett Taviss Aawista Chaudhry Rita Alsattah Matthew

Lecture 9: Combinational Circuit Design Outline Bubble Pushing Compound Gates

Santee Court Market &amp; Food Hall Concept Los Angeles, CA Essex Property Trust July 11 2017

GETTING OUT OF THE KITCHEN SINK: EFFECTIVE INTERVENTIONS FOR LOW RISK YOUTH ALICIA

TESTING FOR LEAD CONTENT Presented to the Board Of Education Roger L. CayCe Deputy

Queens Boulevard &amp; Yellowstone Boulevard Safety Improvements in Forest Hills Senior Area 2014

Atlantic Avenue Phase I: Georgia Ave to Logan St New York City Department of Transportation |

Detailed Design Review MSD 18047 - VIRTUAL CANE Agenda Project and Concept Breakdown Picture

Retroactively estimating system clock skew from stored web browser cookies Contents 1. Why? 2.

A STUDY OF REPOSITORY NETWORK Distribution of popularity & Effect of coexisting languages

Dr. Scarlett Miller Assistant Professor Engineering Design & , Industrial Engineering From

Update from the College Mrs Scarlett McNally BSc FRCS(Tr&Orth) MA MBA Consultant Orthopaedic

Santee Court Market & Food Hall Concept Los Angeles, CA Essex Property Trust July 11 2017

Queens Boulevard & Yellowstone Boulevard Safety Improvements in Forest Hills Senior Area 2014