An Introduction to DryadLINQ Christophe Poulain Microsoft Research - - PowerPoint PPT Presentation
An Introduction to DryadLINQ Christophe Poulain Microsoft Research - - PowerPoint PPT Presentation
An Introduction to DryadLINQ Christophe Poulain Microsoft Research Microsoft Research Virtual School of Computational Science and Engineering Big Data For Science Course, July 28, 2010 The Fourth Paradigm: Data The Fourth Paradigm: Data-
The Fourth Paradigm: Data The Fourth Paradigm: Data-
- Intensive Science
Intensive Science
http://research.microsoft.com/fourthparadigm
Scientific discovery is increasingly driven by exploration of large amounts of data from many sources. Scientific breakthrough will be powered by advanced computing capabilities that help researchers manipulate and explore massive datasets.
2
Powered by powerful multi-core workstations, readily available commodity clusters and cloud services platforms Data Data-
- intensive computing is increasingly prevalent
intensive computing is increasingly prevalent Programming data analyses that scale from desktop to a large number of compute nodes remains challenging
3
112 containers x 2000 servers = 224000 servers
Research programming models for writing distributed data-parallel applications that scale from a small cluster to a large data-center.
Dryad and DryadLINQ Dryad and DryadLINQ
4
A DryadLINQ programmer can use thousands of machines, each of them with multiple processors or cores, without prior knowledge in parallel programming.
Dryad/DryadLINQ on Windows HPC 2008 (SP1) is available as a free download from:
http://research.microsoft.com/collaboration/tools/dryad.aspx
Availability Availability
– DryadLINQ (in source) & Dryad (in binary) – With tutorials, programming guides, sample codes, libraries, and a community site: http://connect.microsoft.com/dryad – Windows HPC Server licenses freely available through your department’s subscription to MSDN Academic Alliance
5
Outline Outline
- DryadLINQ programming model
- Dryad and DryadLINQ overview
- Applications
Dryad DryadLINQ LINQ Experience Experience Use a cluster as if it were a single computer
- Sequential, single machine programming abstraction
- Same program runs on single-core, multi-core, or cluster
- Familiar programming languages
- C#, VB, F#, IronPython…
- C#, VB, F#, IronPython…
- Familiar development environment
- .NET, Visual Studio or other IDE
LINQ LINQ
- Microsoft’s Language INtegrated Query
– Released with .NET Framework 3.5, Visual Studio optional
- A set of operators to manipulate datasets in .NET
– Support traditional relational operators
- Select, Join, GroupBy, Aggregate, etc.
– Integrated into .NET programming languages
- Programs can call operators
- Programs can call operators
- Operators can invoke arbitrary .NET functions
- Data model
– Data elements are strongly typed .NET objects – Much more expressive than SQL tables
- Extremely extensible
– Add new custom operators – Add new execution providers
Example Example of a
- f a LINQ Query
LINQ Query
IEnumerable<string> logs = GetLogLines(); var logentries = from line in logs where !line.StartsWith("#") select new LogEntry(line); var user = from access in logentries where access.user.EndsWith(@"\ulfar") select access;
Go through logs and keep only lines that are not comments. Parse each line into a LogEntry object. Go through logentries and keep
- nly entries that are accesses by
ulfar.
9
select access; var accesses = from access in user group access by access.page into pages select new UserPageCount("ulfar", pages.Key, pages.Count()); var htmAccesses = from access in accesses where access.page.EndsWith(".htm")
- rderby access.count descending
select access;
ulfar. Group ulfar’s accesses according to what page they correspond to. For each page, count the occurrences. Sort the pages ulfar has accessed according to access frequency.
DryadLINQ Data Model DryadLINQ Data Model
Partition .Net objects
10
PartitionedTable<T>
PartitionedTable<T> implements IQueryable<T> and IEnumerable<T> PartitionedTable exposes metadata information:
- type, partition, compression scheme, etc.
A complete DryadLINQ program
public class LogEntry { public string user; public string ip; public string page; public LogEntry(string line) { string[] fields = line.Split(' '); this.user = fields[8]; this.ip = fields[9]; this.page = fields[5]; } PartitionedTable<string> logs = PartitionedTable.Get<string>( @”file:\\MSR-SCR-DRYAD01\DryadData\cpoulain\logfile.pt” ); var logentries = from line in logs where !line.StartsWith("#") select new LogEntry(line); var user = from access in logentries where access.user.EndsWith(@"\ulfar") select access; } public class UserPageCount{ public string user; public string page; public int count; public UserPageCount( string user, string page, int count) { this.user = user; this.page = page; this.count = count; } } select access; var accesses = from access in user group access by access.page into pages select new UserPageCount("ulfar", pages.Key, pages.Count()); var htmAccesses = from access in accesses where access.page.EndsWith(".htm")
- rderby access.count descending
select access; htmAccesses.ToPartitionedTable( @”file:\\MSR-SCR-DRYAD01\DryadData\cpoulain\results.pt” );
- Executing the log query
DryadLINQ for Dryad on Windows Server 2008 HPC Cluster DryadLINQ for Dryad on Windows Server 2008 HPC Cluster
12
MapReduce MapReduce in DryadLINQ in DryadLINQ
MapReduce(source, // sequence of Ts mapper, // T -> Ms keySelector, // M -> K reducer) // (K, Ms) -> Rs {
13
{ var map = source.SelectMany(mapper); var group = map.GroupBy(keySelector); var result = group.SelectMany(reducer); return result; // sequence of Rs }
Outline Outline
- DryadLINQ programming model
- Dryad and DryadLINQ overview
- Applications
Image Processing
Software Stack Software Stack
DryadLINQ Other Languages Machine Learning Graph Analysis Data Mining Applications
…
Other Applications Cosmos DFS SQL Servers
15
Windows Server Cluster Services (Azure, HPC, or Cosmos) Azure Storage Dryad DryadLINQ Windows Server Windows Server Windows Server Other Languages CIFS/NTFS
Dryad Dryad
- Provides a general, flexible execution layer
– Dataflow graph as the computation model – Higher language layer supplies graph, vertex code, channel types, hints for data locality, …
- Automatically handles execution
- Automatically handles execution
– Distributes code, routes data – Schedules processes on machines near data – Masks failures in cluster and network – Fair scheduling of concurrent jobs
Dryad Job Structure Dryad Job Structure
grep sed sort awk perl grep sort Input files Output files Channels Stage grep grep sed sort sort awk Vertices (processes)
Channel is a finite streams of items
- NTFS files (temporary)
- TCP pipes (inter-machine)
- Memory FIFOs (intra-machine)
Channel is a finite streams of items
- NTFS files (temporary)
- TCP pipes (inter-machine)
- Memory FIFOs (intra-machine)
Dryad System Architecture Dryad System Architecture Files, TCP, FIFO
Job1
data plane
PD PD PD V V V
job manager
control plane
PD PD PD
cluster
Job1: v11, v12, … Job2: v21, v22, … Job3: …
scheduler
New jobs
- Fault tolerance
DryadLINQ for Dryad on Windows Server 2008 HPC Cluster DryadLINQ for Dryad on Windows Server 2008 HPC Cluster
19
Fault Tolerance Fault Tolerance
Consider an embarrassingly parallel problem Consider an embarrassingly parallel problem
public static Pair<int, string> DoWork(int index) { System.Threading.Thread.Sleep(200); return new Pair<int, string>(index, System.Environment.MachineName); } public static void Main(string[] args)
21
{ int count = 50; var seeds = Enumerable.Range(1, count); var pairs = from seed in seeds select DoWork(seed); foreach (Pair<int, string> pair in pairs) { Console.WriteLine("{0} => {1}", pair.Key, pair.Value.ToString()); } }
An embarrassingly parallel problem An embarrassingly parallel problem Many cores, one machine with PLINQ Many cores, one machine with PLINQ
public static Pair<int, string> DoWork(int index) { System.Threading.Thread.Sleep(200); return new Pair<int, string>(index, System.Environment.MachineName); } public static void Main(string[] args)
22
{ int count = 50; var seeds = Enumerable.Range(1, count); var pairs = from seed in seeds.AsParallel() select DoWork(seed); foreach (Pair<int, string> pair in pairs) { Console.WriteLine("{0} => {1}", pair.Key, pair.Value.ToString()); } }
An embarrassingly parallel problem An embarrassingly parallel problem Many cores, many machines with DryadLINQ (& PLINQ) Many cores, many machines with DryadLINQ (& PLINQ)
public static Pair<int, string> DoWork(int index) { System.Threading.Thread.Sleep(2000); return new Pair<int, string>(index, System.Environment.MachineName); } public static void Main(string[] args) {
23
{ int count = 50; var seeds = Enumerable.Range(1, count); int[] ranges = seeds.Take(count - 1).ToArray(); var pairs = from seed in seeds.ToPartitionedTable("tmp.pt").RangePartition(i => i, ranges) select DoWork(seed); foreach (Pair<int, string> pair in pairs) { Console.WriteLine("{0} => {1}", pair.Key, pair.Value.ToString()); } }
An embarrassingly parallel An embarrassingly parallel problem problem Simulating errors on the cluster Simulating errors on the cluster
private static Random RANDOM = new Random(); public static Pair<int, string> DoWorkAndSimulateFailure(int index) { if (RANDOM.NextDouble() < 0.1) Substitute DoWorkAndSimulateFailure for DoWork:
24
{ throw new Exception("My program failed."); } System.Threading.Thread.Sleep(200); return new Pair<int, string>(index, System.Environment.MachineName); }
Will the program successfully finish when we run it? Let us see…
DryadLINQ: Friendly programming API for Dryad DryadLINQ: Friendly programming API for Dryad
Local machine Execution engines
Query
DryadLINQ terface Scalability
Cluster DryadLINQ leverages LINQ’s extensibility
PLINQ .Net program (C#, VB, F#, etc)
Query Objects
LINQ-to-SQL LINQ-to-XML LINQ provider interfa
Single-core Multi-core
Collection<T> collection; bool IsLegal(Key k); string Hash(Key); var results = from c in collection where IsLegal(c.key) select new { Hash(c.key), c.value};
DryadLINQ Provider
Vertex code Query plan (Dryad job)
26
C#
collection results
C# C# C#
(Dryad job) Data
Example: Word Count Example: Word Count Count word frequency in a set of documents:
var docs = new PartitionedTable<Doc>(“dfs://yuan/docs”); var words = docs.SelectMany(doc => doc.words); var groups = words.GroupBy(word => word); var groups = words.GroupBy(word => word); var counts = groups.Select(g => new WordCount(g.Key, g.Count())); counts.ToTable(“dfs://yuan/counts.txt”); IN
metadata
SM
doc => doc.words
GB
word => word
S
g => new …
OUT
metadata
Distributed Execution of Word Count Distributed Execution of Word Count
SM LINQ expression IN Dryad execution DryadLINQ GB S OUT
Execution Plan for Word Count Execution Plan for Word Count
SM GB SM Q GB C
SelectMany Sort GroupBy Count pipelined
29
(1)
GB S C D MS GB Sum
Count Distribute Mergesort GroupBy Sum pipelined
Execution Plan for Word Count Execution Plan for Word Count
SM GB SM Q GB C SM Q GB C SM Q GB C SM Q GB C
30
(1)
GB S C D MS GB Sum
(2)
C D MS GB Sum C D MS GB Sum C D MS GB Sum
Distributed Execution Plan Distributed Execution Plan
<ClusterName> Arden1 </ClusterName> <Resources> <Resource>C:\DryadLinqDrop\lib\retail\amd64\wrappernativeinfo.dll</Resource> <Resource>C:\DryadLinqDrop\lib\Release\LinqToDryad.dll</Resource> <Resource>C:\apps\WordCount\bin\Release\DryadLinq.dll</Resource> <Resource>C:\apps\WordCount\bin\Release\WordCount.exe</Resource> </Resources> <QueryPlan> … <Vertex> <UniqueId> 1 </UniqueId> <Partitions> 8 </Partitions> <ChannelType> DiskFile </ChannelType> <ConnectionOperator> CrossProduct </ConnectionOperator> <Entry> <AssemblyName> DryadLinq.dll </AssemblyName> <ClassName> LinqToDryad.DryadLinq_Vertex</ClassName> <MethodName> Super_2 </MethodName></Entry> <Children><Child><UniqueId> 0 </UniqueId></Child></Children> </Vertex> ... </QueryPlan>
public static int Super_2(string args) { DryadVertexEnv denv = new DryadVertexEnv(args); var dwriter_3 = denv.MakeWriter(DryadLinq_Extension.FactoryType_1); var dreader_4 = denv.MakeReader(DryadLinq_Extension.FactoryType_0);
Vertex code (in DryadLinq.dll) Vertex code (in DryadLinq.dll)
The actual code executed at Dryad vertex
var dreader_4 = denv.MakeReader(DryadLinq_Extension.FactoryType_0); var source_5 = DryadLinqVertex.SelectMany(dreader_4, doc => doc.words); var source_6 = DryadLinqVertex.Sort(source_5, word => word); var source_7 = DryadLinqVertex.OrderedGroupBy(source_6, word => word); var source_8 = DryadLinqVertex.Select(g => new Pair<String, Int32>(g.Key, g.Count())); DryadLinqVertex.HashPartition(source_8, e => e.Key, dwriter_3); return 0; }
DryadLINQ DryadLINQ
- Distributed execution plan generation
– Static optimizations: pipelining, eager aggregation, etc. – Dynamic optimizations: data-dependent partitioning, dynamic aggregation, etc.
- Vertex runtime
- Vertex runtime
– Single machine (multi-core) implementation of LINQ – Vertex code that runs on vertices – Data serialization code – Callback code for runtime dynamic optimizations – Automatically distributed to cluster machines
DryadLINQ Job Browser DryadLINQ Job Browser
Artemis
34
- A simple fault-tolerant, distributed file system that
provides the abstractions necessary for data parallel computations on HPC clusters
- High performance, reliable, scalable service
- Prototypical workload
High throughput, sequential IO, write once Simple and Scalable Distributed File System Simple and Scalable Distributed File System
- High throughput, sequential IO, write once
- Cluster machines working in parallel
- Configurable number of replicas per dataset
http://research.microsoft.com/events/techfair2010/demos.aspx
35
Outline Outline
- DryadLINQ programming model
- Dryad and DryadLINQ overview
- Applications
Dryad Dryad
- Continuously deployed since 2006
- The execution engine for Bing analytics
- Running on >> 104 machines
- Runs on clusters > 3000 machines
- Runs on clusters > 3000 machines
- Sifting through > 10Pb data daily
37
Microsoft Microsoft Kinect Kinect: Learning From Data : Learning From Data
Training examples
Kinect (formerly Project Natal) is using DryadLINQ to train enormous decision trees from millions of images across hundreds of cores.
38
Motion Capture (ground truth)
Classifier
Machine learning Rasterize Recognize players from depth map at frame rate using fraction of Xbox CPU.
- > 1022 objects
- Sparse, multi-dimensional data structures
- Complex datatypes
(images, video, matrices, etc.)
- Complex application logic and dataflow
– >35000 lines of .Net
Large Large-
- scale machine learning
scale machine learning
– >35000 lines of .Net – 140 CPU days – > 105 processes – 30 TB data analyzed – 140 avg parallelism (235 machines) – 300% CPU utilization (4 cores/machine)
39
Highly efficient Highly efficient parallellization parallellization
40
SDSS Query Q18 SDSS Query Q18
Most time-consuming query from Sloan Digital Sky Survey database Find all objects within 1' of one another other that have very similar colors: that is with the color ratios u-g, g-r, r-I are less than 0.05m. (http://www.sdss.jhu.edu/SQL/SQLQueries.html)
- Two tables 11.8GB and 41.8 GB
- Two tables 11.8GB and 41.8 GB
- Under 2 minutes with 40 nodes
- Hand-tuned Dryad
implementation is faster (92s vs 113s with 40 nodes)
- DryadLINQ code is 10x smaller
See Dryad (Eurosys’07) and DryadLINQ (OSDI’08) papers.
SDSS Query Q18 SDSS Query Q18
PartitionedTable<PhotoObjAll> photoObjAll = PartitionedTable.Get<PhotoObjAll>(@"file://\\<...>\ugriz-u9.pt"); PartitionedTable<Neighbor> neighbors = PartitionedTable.Get<Neighbor>(@"file://\\<...>\neighbor-u9.pt"); var j1 = from p in photoObjAll join n in neighbors on p.objId equals n.objId select new PhotoObjNeighbor(p, n); var w1 = from pn in j1 where pn.objId < pn.neighborObjId && pn.mode select pn; select pn; var j2 = from l in photoObjAll join pn in w1 on l.objId equals pn.neighborObjId select new PhotoObjNeighborAll(l, pn); var w2 = from lp in j2 where lp.l.mode && Math.Abs((lp.p.u-lp.p.g)-(lp.l.u-lp.l.g)) < 0.05 && Math.Abs((lp.p.g-lp.p.r)-(lp.l.g-lp.l.r)) < 0.05 && Math.Abs((lp.p.r-lp.p.i)-(lp.l.r-lp.l.i)) < 0.05 && Math.Abs((lp.p.i-lp.p.z)-(lp.l.i-lp.l.z)) < 0.05 select lp.p.objId; var q = w2.Distinct(); q.ToDryadPartitionedTable("result.pt");
Scalable clustering algorithm for N Scalable clustering algorithm for N-
- body simulations in a
body simulations in a shared shared-
- nothing cluster
nothing cluster
2 4 2 4 6 8 Speed Up S43 OS43 S92 OS92 Ideal 0.00 0.20 0.40 0.60 0.80 1.00 2 4 6 8 Scale Up S43 S92 Ideal
YongChul Kwon, Dylan Nunley, Jeffrey P. Gardner, Magdalena Balazinska, Bill Howe, and Sarah Loebman. UW Tech Report. UW-CSE-09-06-01. June 2009.
- Large-scale spatial clustering
– 916M particles in 3D clustered under 70 minutes with 8 nodes.
- Re-implemented using DryadLINQ
– Partition > Local cluster > Merge cluster > Relabel – Faster development and good scalability – Must ensure near-constant processing time per tuple
2 4 6 8 Number of nodes 2 4 6 8 Number of nodes…
(S=DryadLINQ; OS=OpenMP; 43=sparse; 92=dense)
Scalable clustering algorithm for N Scalable clustering algorithm for N-
- body simulations in a
body simulations in a shared shared-
- nothing cluster
nothing cluster
44
1791 pairs of red-light and blue-light images acquired from two telescopes, scanned into 23,040x23,040 or 14000x14000 images. ~4TB of uncompressed data. Processed into 1791 RGB color images. Stitched into one terapixel spherical image.
Terapixel Terapixel Sky Image Sky Image
Stitched into one terapixel spherical image. Image seams removed with optimization. Multi-scale resolution image available in WorldWide Telescope and Bing Map.
45
Computing Computing Vignetting Vignetting Corrections Corrections
Creating Flat Fields Normalization Matrix Normalizing Corners
DryadLINQ => concise code DryadLINQ + Windows HPC => Efficient and robust execution
– Elapsed time to process all flat fields: 8.7 hours – 28 8-core compute nodes => 1,950 CPU hours – Total input data: 417 GB compressed, 4 TB uncompressed.
46
var pixelRows = folders.SelectMany(image => ImageToRows(image, options)); var stackedPixelRows = pixelRows.GroupBy(pixelRow => pixelRow.Position); var finalRows = stackedPixelRows.Select(x => ReduceStackedRows(x)); var flatField = finalRows.Apply(x => SaveFlatField(x, options));
CAP3 CAP3 -
- DNA Sequence Assembly Program [1]
DNA Sequence Assembly Program [1]
EST (Expressed Sequence Tag) corresponds to messenger RNAs (mRNAs) transcribed from the genes residing on chromosomes. Each individual EST sequence represents a fragment of mRNA, and the EST assembly aims to re-construct full-length mRNA sequences for each expressed gene.
V V Input files (FASTA)
\\GCB-K18-N01\DryadData\cap3\cluster34442.fsa \DryadData\cap3\cap3data 10 0,344,CGB-K18-N01 1,344,CGB-K18-N01
…
9,344,CGB-K18-N01
Cap3data.00000000 Cap3data.pf
IQueryable<LineRecord> inputFiles=PartitionedTable.Get<LineRecord>(uri); IQueryable<OutputInfo> = inputFiles.Select(x=>ExecuteCAP3(x.line));
[1] X. Huang, A. Madan, “CAP3: A DNA Sequence Assembly Program,” Genome Research, vol. 9, no. 9, 1999.
Output files
\\GCB-K18-N01\DryadData\cap3\cluster34442.fsa \\GCB-K18-N01\DryadData\cap3\cluster34443.fsa
...
\\GCB-K18-N01\DryadData\cap3\cluster34467.fsa
Input files (FASTA)
CAP3 CAP3 -
- Performance
Performance
“DryadLINQ for Scientific Analyses”, Jaliya Ekanayake, Thilina Gunarathnea, Geoffrey Fox, Atilla Soner Balkir, Christophe Poulain, Nelson Araujo, Roger Barga (IEEE eScience ‘09)
High Energy Physics Data Analysis High Energy Physics Data Analysis
- Histogramming of events from a large (up to 1TB) data set
- Data analysis requires ROOT framework (ROOT Interpreted Scripts)
- Performance depends on disk access speeds
- Hadoop implementation uses a shared parallel file system (Lustre)
– ROOT scripts cannot access data from HDFS – On demand data movement has significant overhead
- Dryad stores data in local disks giving better performance over Hadoop
Pairwise Distances Pairwise Distances – – ALU Sequencing ALU Sequencing
125 million distances 125 million distances 4 hours & 46 minutes
- Calculate pairwise distances for a collection of
genes (used for clustering, MDS)
- O(N^2) effect
- Fine grained tasks in MPI
- Coarse grained tasks in DryadLINQ
- Performance close to MPI
- Performed on 768 cores (Tempest Cluster)
2000 4000 6000 8000 10000 12000 14000 16000 18000 20000 35339 50000
DryadLINQ MPI Xiaohong Qiu, Jaliya Ekanayake, Scott Beason, Thilina Gunarathne, Geoffrey Fox, Roger Barga, Dennis Gannon Cloud Technologies for Bioinformatics Applications (SuperComputing09)
Acknowledgements
MSR Silicon Valley Dryad & DryadLINQ teams
Andrew Birrell, Mihai Budiu, Jon Currey, Ulfar Erlingsson, Dennis Fetterly, Michael Isard, Pradeep Kunda, Mark Manasse, Chandu Thekkath and Yuan Yu .
http://research.microsoft.com/en-us/projects/dryad http://research.microsoft.com/en-us/projects/dryadlinq
MSR External Research
Advanced Research Tools and Services Team Advanced Research Tools and Services Team
http://research.microsoft.com/en-us/collaboration/tools/dryad.aspx
MS Product Groups: HPC, Parallel Computing Platform. Academic Collaborators
Jaliya Ekanayake, Geoffrey Fox, Thilina Gunarathne, Scott Beason, Xiaohong Qiu (Indiana University Bloomington). YongChul Kwon, Magdalena Balazinska (University of Washington). Atilla Soner Balkir, Ian Foster (University of Chicago).
Dryad/DryadLINQ Papers Dryad/DryadLINQ Papers
- 1. Dryad: Distributed Data-Parallel Programs from Sequential Building Blocks
(EuroSys’07)
- 2. DryadLINQ: A System for General-Purpose Distributed Data-Parallel
Computing Using a High-Level Language (OSDI’08)
- 3. Hunting for prolems with Artemis (Usenix WASL, San Diego 08)
- 4. Distributed Data-Parallel Computing Using a High-Level Programming
- 4. Distributed Data-Parallel Computing Using a High-Level Programming
Language (SIGMOD’09)
- 5. Quincy: Fair scheduling for distributed computing clusters (SOSP’09)
- 6. Distributed Aggregation for Data-Parallel Computing: Interfaces and
Implementations (SOSP’09)
- 7. DryadInc: Reusing work in large scale computation (HotCloud 09).
Conclusion Conclusion
DryadLINQ provides a powerful, elegant programming environment for large-scale data-parallel computing. Still an area of active research… …download it and get involved!
http://connect.microsoft.com/dryad
54
Application
Dryad & DryadLINQ in Context Dryad & DryadLINQ in Context
Language DryadLINQ Scope Sawzall Pig, Hive
SQL ≈SQL LINQ, SQL Sawzall