Overview Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, and - PDF document

11/10/2011 Finally, let us put things into perspective by looking at alternatives to MapReduce. We start with Dryad from Microsoft. 301 Overview • Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, and Dennis Fetterly. Dryad: distributed data-parallel programs from sequential building blocks. European Conference on Computer Systems (EuroSys), Lisbon, Portugal, March 21- 23, 2007 • Yuan Yu, Michael Isard, Dennis Fetterly, Mihai Budiu, Ulfar Erlingsson, Pradeep Kumar Gunda, and Jon Currey. DryadLINQ: A System for General-Purpose Distributed Data- Parallel Computing Using a High-Level Language. Symposium on Operating System Design and Implementation (OSDI), San Diego, CA, December 8-10, 2008 • Presentation based on authors’ slides 302 1

11/10/2011 Outline • Dryad Design • Implementation • Policies as Plug-ins • Building on Dryad 303 303 Design Space Internet Data- parallel Shared Private memory data center Latency Throughput 304 304 2

11/10/2011 2-D Piping • Unix Pipes: 1-D grep | sed | sort | awk | perl • Dryad: 2-D grep 1000 | sed 500 | sort 1000 | awk 500 | perl 50 305 305 Dryad = Execution Layer Job (Application) Pipeline ≈ Dryad Shell Cluster Machine 306 306 3

11/10/2011 Outline • Dryad Design • Implementation • Policies as Plug-ins • Building on Dryad 307 307 Virtualized 2-D Pipelines 308 308 4

11/10/2011 Virtualized 2-D Pipelines 309 309 Virtualized 2-D Pipelines 310 310 5

11/10/2011 Virtualized 2-D Pipelines 311 311 Virtualized 2-D Pipelines • 2D DAG • multi-machine • virtualized 312 312 6

11/10/2011 Dryad Job Structure grep 1000 | sed 500 | sort 1000 | awk 500 | perl 50 Channels Input Stage Output files files sort grep awk sed perl sort grep awk sed grep sort Vertices (processes) 313 313 Channels Finite Streams of items X • distributed filesystem files (persistent) Items • SMB/NTFS files (temporary) • TCP pipes M (inter-machine) • memory FIFOs (intra-machine) 314 314 7

11/10/2011 Architecture data plane Files, TCP, FIFO, Network job schedule V V V NS PD PD PD control plane Job manager cluster 315 315 Staging 1. Build 2. Send 7. Serialize .exe vertex vertices code 5. Generate graph JM code Cluster 6. Initialize vertices services 3. Start JM 8. Monitor Vertex execution 4. Query cluster resources 316 8

11/10/2011 Outline • Dryad Design • Implementation • Policies and Resource Management • Building on Dryad 317 317 Policy Managers R R R R Stage R Connection R-X X X X X Stage X R-X X Manager R manager Manager Job Manager 318 318 9

11/10/2011 Duplicate Execution Manager X[0] X[1] X[3] X[2] X’[2] Slow Duplicate Completed vertices vertex vertex Duplication Policy = f(running times, data volumes) 319 Aggregation Manager S S S S S S T static S S S S S S # 1 # 2 # 1 # 3 # 3 # 2 rack # A A A # 1 # 2 # 3 T dynamic 320 320 10

11/10/2011 Data Distribution (Group By) Source Source Source m m x n Dest Dest Dest n 321 321 Range-Distribution Manager S S S [0-100) S S S Hist [0-30),[30-100) T static D D D T T [0-30) [0-?) [30-100) [?-100) dynamic 322 322 11

11/10/2011 Goal: Declarative Programming X X X X S S S T T T T static dynamic 323 323 Outline • Dryad Design • Implementation • Policies as Plug-ins • Building on Dryad 324 324 12

11/10/2011 Software Stack Machine Job queueing, monitoring sed, awk, grep, etc. Learning C# SSIS legacy PSQL Perl C++ Queries C# Vectors code SQL server Distributed Shell DryadLINQ C++ Dryad Distributed Filesystem CIFS/NTFS Cluster Services Windows Windows Windows Windows Server Server Server Server 325 325 Example Query: Sky Server • Table photoPrimary – All identified astronomical objects (354,254,163 records) – ID, color magnitude in 5 bands (u, g, r, i, z) • Table neighbors – For each object, neighbors within 30 arc seconds (2,803,165,372 records) • Query 18: gravitational lens effect – Find all objects that have neighbors whose color is similar to that object 326 13

11/10/2011 SkyServer Query 18 select distinct U.ObjID H into results from photoPrimary U, neighbors N, n Y Y photoPrimary L L L where U.ObjID = N.ObjID 4n S S and U.mode = 1 and L.ObjID = N.NeighborObjID 4n M M and U.ObjID < L.ObjID and abs((U.u-U.g)-(L.u-L.g))<0.05 and abs((U.g-U.r)-(L.g-L.r))<0.05 n D D and abs((U.r-U.i)-(L.r-L.i))<0.05 n and abs((U.i-U.z)-(L.i-L.z))<0.05 X X U N U N 327 327 SkyServer DB query H • Took SQL plan [distinct] (u.color,n.neighborobjid) [merge outputs] n [re-partition by n.neighborobjid] Y Y • Manually coded in Dryad [order by n.neighborobjid] U U select • Manually partitioned data select u.color,n.neighborobjid 4n u.objid S S from u join n from u join <temp> where where u.objid = n.objid 4n u: objid, color M M u.objid = <temp>.neighborobjid and n: objid, neighborobjid |u.color - <temp>.color| < d [partition by objid] n D D n X X U N U N 328 14

11/10/2011 Optimization H Y U n Y Y S S S S U U M 4n S S M M 4n M M M n D D D n X X X U N U N U N 329 Optimization H Y U n Y Y S S S S U U M 4n S S M M 4n M M M n D D D n X X X U N U N U N 330 15

11/10/2011 SkyServer Q18 Performance 16.0 Dryad In-Memory 14.0 Dryad Two-pass 12.0 SQLServer 2005 10.0 Speed-up 8.0 (times) 6.0 4.0 2.0 0.0 0 2 4 6 8 10 Number of Computers 331 331 DryadLINQ • Declarative programming • Integration with Visual Studio • Integration with .Net • Type safety • Automatic serialization • Job graph optimizations  static  dynamic • Conciseness 336 336 16

11/10/2011 LINQ Collection<T> collection; bool IsLegal(Key); string Hash(Key); var results = from c in collection where IsLegal(c.key) select new { Hash(c.key), c.value}; 337 337 DryadLINQ = LINQ + Dryad Collection<T> collection; bool IsLegal(Key k); string Hash(Key); Vertex code var results = from c in collection where IsLegal(c.key) select new { Hash(c.key), c.value}; Query plan (Dryad job) Data collection C# C# C# C# results 338 338 17

11/10/2011 Data Model C# objects Partition Collection 339 339 Query Providers Client machine DryadLINQ Data center C# Distributed Query Invoke Query Expr ToDryadTable Input Tables query plan Dryad JM Execution Output (11) C# Objects Results Output Tables foreach DryadTable 340 340 18

11/10/2011 Example: Histogram public static IQueryable<Pair> Histogram( IQueryable<LineRecord> input, int k) { var words = input.SelectMany(x => x.line.Split(' ')); var groups = words.GroupBy(x => x); var counts = groups.Select(x => new Pair(x.Key, x.Count())); var ordered = counts.OrderByDescending(x => x.count); var top = ordered.Take(k); return top; } “A line of words of wisdom” [“A”, “line”, “of”, “words”, “of”, “wisdom”] [[“A”], [“line”], [“of”, “of”], [“words”], [“wisdom”]] [ {“A”, 1}, {“line”, 1}, {“of”, 2}, {“words”, 1}, {“wisdom”, 1}] [{“of”, 2}, {“A”, 1}, {“line”, 1}, {“words”, 1}, {“wisdom”, 1}] [{“of”, 2}, {“A”, 1}, {“line”, 1}] 341 341 Histogram Plan SelectMany HashDistribute Merge GroupBy Select OrderByDescending Take MergeSort Take 342 342 19

11/10/2011 Map-Reduce in DryadLINQ public static IQueryable<S> MapReduce<T,M,K,S>( this IQueryable<T> input, Expression<Func<T, IEnumerable<M>>> mapper, Expression<Func<M,K>> keySelector, Expression<Func<IGrouping<K,M>,S>> reducer) { var map = input.SelectMany(mapper); var group = map.GroupBy(keySelector); var result = group.Select(reducer); return result; } 343 343 Map-Reduce Plan M M M M M M M map Q Q Q Q Q Q Q sort map M G 1 G 1 G 1 G 1 G 1 G 1 G 1 groupby R R R R R R R reduce D distribute D D D D D D D G partial aggregation R (1) (2) (3) MS MS mergesort MS MS MS groupby X G 2 G 2 G 2 G 2 G 2 reduce R R R R R X X X mergesort MS MS reduce G 2 G 2 groupby S S S S S S R R reduce A A A consumer X X 344 344 T 20

11/10/2011 Distributed Sorting in DryadLINQ public static IQueryable<TSource> DSort<TSource, TKey>(this IQueryable<TSource> source, Expression<Func<TSource, TKey>> keySelector, int pcount) { var samples = source.Apply(x => Sampling(x)); var keys = samples.Apply(x => ComputeKeys(x, pcount)); var parts = source.RangePartition(keySelector, keys); return parts.OrderBy(keySelector); } 345 345 Distributed Sorting Plan Deterministic Sampling DS DS DS DS DS Histogram Data partitioning H H H O D D D D D (1) (2) (3) M M M M M Merge Sort S S S S S 346 346 21

11/10/2011 Outline • Introduction • Dryad • DryadLINQ • Building on DryadLINQ 349 349 Machine Learning in DryadLINQ Data analysis Machine learning Large Vector DryadLINQ Dryad 350 350 22

11/10/2011 Very Large Vector Library PartitionedVector<T> T T T Scalar<T> T 351 351 Operations on Large Vectors: Map 1 f T U f preserves partitioning T f U 352 352 23

11/10/2011 Map 2 (Pairwise) f T U V T U f V 353 353 Map 3 (Vector-Scalar) f T U V T U f V 354 354 354 24

Overview Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, and - PDF document

11/10/2011 Finally, let us put things into perspective by looking at alternatives to MapReduce. We start with Dryad from Microsoft. 301 Overview Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, and Dennis Fetterly. Dryad: distributed

01 | KPF Overview 01 | KPF Overview 01 | KPF Overview 01 | KPF Overview 01 | KPF Overview 01 |

OVERVIEW PRESENTATION / 1 OVERVIEW PRESENTATION / 1 SF park overview OVERVIEW PRESENTATION / 2

OVERVIEW PRESENTATION / 1 OVERVIEW PRESENTATION / 1 Acknowledgements OVERVIEW PRESENTATION / 2 SF

INVESTOR PRESENTATION FEBRUARY 2016 INDEX EXECUTIVE SUMMARY COMPANY OVERVIEW BUSINESS OVERVIEW

INVESTOR PRESENTATION MAY 2019 Index Executive Summary Company Overview Business Overview

INVESTOR PRESENTATION MARCH 2016 INDEX EXECUTIVE SUMMARY COMPANY OVERVIEW BUSINESS OVERVIEW

1 Overview Overview Regional demographic overview Regional demographic overview Workforce

Covid-19 and Business Interruption: Maximizing Insurance Coverage and Federal Grants Counsel

OVERVIEW OVERVIEW OVERVIEW OVERVIEW The qualifications are aimed at primary school

An overview to Maltese An overview to Maltese An overview to Maltese An overview to Maltese

GSM System Overview GSM System Overview GSM System Overview GSM System Overview Phone Lin

Butterball Employees Butterball Employees Butterball Employees Benefits Overview Ruan Benefits

Program-for-Results Financing Overview Overview Overview of World Bank Instruments

INVESTOR PRESENTATION Index Executive Summary Company Overview Business Overview Industry

Key Maths 3 UK Assessm ent overview Claire Parsons Overview 1. Key Maths 3 UK (overview) 2.

Federal Fiscal Year 2017-18 CHASE Fee Program June 21, 2018 Overview CHASE Overview Fee

Distributed Databases Instructor: Matei Zaharia cs245.stanford.edu Outline Replication

Building Spanner Better clocks stronger semantics Alex Lloyd Senior Staff Software Engineer

Zerocoin: Anonymous Distributed E-Cash from Bitcoin Ian Miers , Christina Garman, Matthew Green,

Microservice Splitting the Monolith Software Engineering II Sharif University of Technology

Non-Blocking Two Phase Commit (2PC) Using Blockchain Paul Ezhilchelvan , Amjad Aldweesh and Aad

Scaling the Relational Database for the Cloud Age Sumedh Pathak, Co-Founder & VP Engineering,

Chapter 2 Basic Concepts Contents Parallel computing. Concurrency. Parallelism

Dithering / digital halftoning AIG 5/8/20 Kevin Ly Itinerary Random dither Ordered

Overview Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, and - PDF document

11/10/2011 Finally, let us put things into perspective by looking at alternatives to MapReduce. We start with Dryad from Microsoft. 301 Overview Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, and Dennis Fetterly. Dryad: distributed

01 | KPF Overview 01 | KPF Overview 01 | KPF Overview 01 | KPF Overview 01 | KPF Overview 01 |

OVERVIEW PRESENTATION / 1 OVERVIEW PRESENTATION / 1 SF park overview OVERVIEW PRESENTATION / 2

OVERVIEW PRESENTATION / 1 OVERVIEW PRESENTATION / 1 Acknowledgements OVERVIEW PRESENTATION / 2 SF

INVESTOR PRESENTATION FEBRUARY 2016 INDEX EXECUTIVE SUMMARY COMPANY OVERVIEW BUSINESS OVERVIEW

INVESTOR PRESENTATION MAY 2019 Index Executive Summary Company Overview Business Overview

INVESTOR PRESENTATION MARCH 2016 INDEX EXECUTIVE SUMMARY COMPANY OVERVIEW BUSINESS OVERVIEW

1 Overview Overview Regional demographic overview Regional demographic overview Workforce

Covid-19 and Business Interruption: Maximizing Insurance Coverage and Federal Grants Counsel

OVERVIEW OVERVIEW OVERVIEW OVERVIEW The qualifications are aimed at primary school

An overview to Maltese An overview to Maltese An overview to Maltese An overview to Maltese

GSM System Overview GSM System Overview GSM System Overview GSM System Overview Phone Lin

Butterball Employees Butterball Employees Butterball Employees Benefits Overview Ruan Benefits

Program-for-Results Financing Overview Overview Overview of World Bank Instruments

INVESTOR PRESENTATION Index Executive Summary Company Overview Business Overview Industry

Key Maths 3 UK Assessm ent overview Claire Parsons Overview 1. Key Maths 3 UK (overview) 2.

Federal Fiscal Year 2017-18 CHASE Fee Program June 21, 2018 Overview CHASE Overview Fee

Distributed Databases Instructor: Matei Zaharia cs245.stanford.edu Outline Replication

Building Spanner Better clocks stronger semantics Alex Lloyd Senior Staff Software Engineer

Zerocoin: Anonymous Distributed E-Cash from Bitcoin Ian Miers , Christina Garman, Matthew Green,

Microservice Splitting the Monolith Software Engineering II Sharif University of Technology

Non-Blocking Two Phase Commit (2PC) Using Blockchain Paul Ezhilchelvan , Amjad Aldweesh and Aad

Scaling the Relational Database for the Cloud Age Sumedh Pathak, Co-Founder &amp; VP Engineering,

Chapter 2 Basic Concepts Contents Parallel computing. Concurrency. Parallelism

Dithering / digital halftoning AIG 5/8/20 Kevin Ly Itinerary Random dither Ordered

Scaling the Relational Database for the Cloud Age Sumedh Pathak, Co-Founder & VP Engineering,