Stratosphere for Hadoop Users Potsdam, January 03, 2012 Arvid Heise - - PowerPoint PPT Presentation
Stratosphere for Hadoop Users Potsdam, January 03, 2012 Arvid Heise - - PowerPoint PPT Presentation
Stratosphere for Hadoop Users Potsdam, January 03, 2012 Arvid Heise Outline 2 1 Overview over Stratosphere 2 Dataflow Orientation 3 Tuple-based Data Model 4 Other Differences 5 Seminar Organization Arvid Heise | Scalable Data Analysis
Outline
1 Overview over Stratosphere 2 Dataflow Orientation 3 Tuple-based Data Model 4 Other Differences 5 Seminar Organization
Arvid Heise | Scalable Data Analysis Algorithms | January 03, 2012
2
Stratosphere Stack
Execution Engine Higher-Level Language Simple Script SOPREMO Compiler SOPREMO Plan Simple Parser PACT Optimizer PACT Program Nephele Scheduler Execution Graph Programming Model
Arvid Heise | Scalable Data Analysis Algorithms | January 03, 2012
3
Pact (PArallelization ConTracts)
Parallel programming model Implementation and generalization of Map/Reduce Similar interface as Hadoop Defines the parallelization semantics of tasks Pact plan is dataflow-oriented Pact optimizes plans and compiles them to execution graphs for Nephele Alexandrov et al. 2010. MapReduce and Pact - Comparing Data Parallel Programming Models.
Arvid Heise | Scalable Data Analysis Algorithms | January 03, 2012
4
Hadoop and Stratosphere Job
SELECT ∗ FROM Documents d JOIN Rankings r ON r . u r l = d . u r l WHERE CONTAINS( d . text , [ keywords ] ) AND r . rank > [ rank ] AND NOT EXISTS (SELECT ∗ FROM V i s i t s v WHERE v . u r l = d . u r l AND v . v i s i t D a t e = CURDATE( ) ) ;
sink d SAME-KEY MATCH IF : MAP CONTAINS [keywords] MAP rank > [rank] r MAP visitDate = CURDATE() v SAME-KEY SAME-KEY U N I Q U E
- K
E Y COGROUP IF NONE : r d v j REDUCE IF : MAP :CONTAINS [keywords]: :rank > [rank]: MAP :visitDate = CURDATE(): : sink REDUCE IF none : url (url, content) ip_addr (ip_addr, url, date, ad_revenue, ...) url () rank (rank, url, avg_duration) url (rank, url, avg_duration)
*
+
§* * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
+ +
§ § §+
Battré et al. 2010. Nephele/PACTs: A Programming Model and Execution Framework for Web-Scale Analytical Processing
Arvid Heise | Scalable Data Analysis Algorithms | January 03, 2012
5
Hadoop and Stratosphere Job
sink d SAME-KEY MATCH IF : MAP CONTAINS [keywords] MAP rank > [rank] r MAP visitDate = CURDATE() v SAME-KEY SAME-KEY U N I Q U E
- K
E Y COGROUP IF NONE : r d v j REDUCE IF : MAP :CONTAINS [keywords]: :rank > [rank]: MAP :visitDate = CURDATE(): : sink REDUCE IF none : url (url, content) ip_addr (ip_addr, url, date, ad_revenue, ...) url () rank (rank, url, avg_duration) url (rank, url, avg_duration)
*
+
§
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
+ +
§ § §
+
Arvid Heise | Scalable Data Analysis Algorithms | January 03, 2012
5
Nephele
Executes an Execution Graph Decides for each task how many instances are appropriate Assigns task instances to computation units Manages fault tolerance and adapts to changes Daniel Warneke and Odej Kao. 2009. Nephele: Efficient Parallel Data Processing in the Cloud
Arvid Heise | Scalable Data Analysis Algorithms | January 03, 2012
6
Sopremo and Simple
High level language layer Simple = query language Sopremo = semi-structured data model (JSON) and operators Extensible operators for several use cases Text Mining, Data Cleansing, Data Mining
Arvid Heise | Scalable Data Analysis Algorithms | January 03, 2012
7
Outline
1 Overview over Stratosphere 2 Dataflow Orientation 3 Tuple-based Data Model 4 Other Differences 5 Seminar Organization
Arvid Heise | Scalable Data Analysis Algorithms | January 03, 2012
8
Data Analysis Program
Hadoop Driver program + multiple jobs Driver program manually connects input and outputs Fixed pipeline 1 Job = 1 Map + 1 Reduce Stratosphere Directed acyclic graph of arbitrary Pacts Explicit data sources and sinks Pact also support two inputs (for join-like
- perations)
- K
*
+ §
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
+ + § § § +
Arvid Heise | Scalable Data Analysis Algorithms | January 03, 2012
9
Map/Reduce in Stratosphere
Works same as Hadoop But the semantics are interpreted differently Each Pact defines data dependencies Map: each tuple can be treated separately Reduce: tuple with same key are grouped and guaranteed to be processed by same reducer
Key Value Input Independent Subsets Key Value Input Independent Subsets
Arvid Heise | Scalable Data Analysis Algorithms | January 03, 2012
10
Two Input Pacts
Currently three additional Pacts for two inputs Cross: make all possible pairs Match: find all matching pairs CoGroup: group all matching tuples All pairs/groups are treated independently
Input A Input B Independent Subsets Input A Input B Independent Subsets Input A Input B Independent Subsets
Arvid Heise | Scalable Data Analysis Algorithms | January 03, 2012
11
Comparison
Hadoop Often workarounds necessary in a complex M/R program Error-prone, re-occurring manual data partitioning Pattern evolved for joins etc. (Data mining book) Fixed pipeline allows fine-grain tweaks (exploits)
Arvid Heise | Scalable Data Analysis Algorithms | January 03, 2012
12
Comparison
T L PPartblock Replicate Replicate Replicate Replicate Replicate Replicate T1 T6 H1 Fetch Fetch Fetch Store Store Store Scan Scan PPartsplit PPartsplit H2 . . . H3 Fetch Fetch Fetch Store Store Store Scan PPartsplit H4 . . . M1 Union RecReaditemize MMapmap PPartmem LPartsh LPartsh LPartsh Sortcmp Sortcmp Sortcmp SortGrpgrp SortGrpgrp SortGrpgrp MMapcombine MMapcombine MMapcombine Store Store Store Mergecmp SortGrpgrp MMapcombine Store PPartsh M2 . . . M3 RecReaditemize MMapmap LPartsh Sortcmp SortGrpgrp MMapcombine Store PPartsh M4 . . . T1 T5 T2 T4 T3 T6 Fetch Fetch Fetch Fetch Buffer Buffer Buffer Buffer Store Store Merge Store Mergecmp SortGrpgrp MMapreduce R1 Store T′ 1 . . . R2 T′ 2
Data Load Phase Map Phase Shuffle Phase Reduce Phase
Arvid Heise | Scalable Data Analysis Algorithms | January 03, 2012
12
Comparison
Hadoop Often workarounds necessary in a complex M/R program Error-prone, re-occurring manual data partitioning Pattern evolved for joins etc. (Data mining book) Fixed pipeline allows fine-grain tweaks (exploits) Stratosphere More expressiveness for complex data operations Maintains dataflow semantics to some degree Allows optimization and different shipping strategy Less hooks than Hadoop
Arvid Heise | Scalable Data Analysis Algorithms | January 03, 2012
12
Optimization
Inspired by traditional query optimization Choose best join/shipping strategy for plan
sink
- rders
U N I Q U E
- K
E Y lineitem Local Forward Repartition Repartition Local Forward Local Forward SAME-KEY SAME-KEY SUPER-KEY
MAP
none
MAP
none Local Forward
MATCH
sort-merge
REDUCE
combining-sort sink
- rders
U N I Q U E
- K
E Y lineitem SAME-KEY SAME-KEY Local Forward Repartition Broadcast Local Forward Local Forward Local Forward SUPER-KEY
MAP
none
MAP
none
MATCH
sort-merge
COMBINE
combining-sort
REDUCE
combining-sort
Example Task: Alternative 1 Example Task: Alternative 2
Reorder Pacts
Arvid Heise | Scalable Data Analysis Algorithms | January 03, 2012
13
Outline
1 Overview over Stratosphere 2 Dataflow Orientation 3 Tuple-based Data Model 4 Other Differences 5 Seminar Organization
Arvid Heise | Scalable Data Analysis Algorithms | January 03, 2012
14
Switch to Tuple-based Model
Bleeding Edge! Check pact-examples subproject Typical Hadoop job:
Map: transforms/filters data, sets key Reduce: combines data, unsets key
Preceding maps limit reordering
Arvid Heise | Scalable Data Analysis Algorithms | January 03, 2012
15
Switch to Tuple-based Model
Bleeding Edge! Check pact-examples subproject Typical Hadoop job:
Map: transforms/filters data, sets key Reduce: combines data, unsets key
Preceding maps limit reordering Stratosphere uses tuples instead of k/v-pairs User should set keys as soon as possible for ALL Pacts Each Pact is annotated to define the "key"
Arvid Heise | Scalable Data Analysis Algorithms | January 03, 2012
15
Reduce Example
1
public static class CountWords extends ReduceStub {
2
public void reduce(Iterator<PactRecord> records, Collector out){
3
PactRecord element = null;
4
int sum = 0;
5
while (records.hasNext()) {
6
element = records.next();
7
PactInteger count =
8
element.getField(1, PactInteger.class);
9
sum += count.getValue();
10
}
11 12
element.setField(1, new PactInteger(sum));
13
- ut.collect(element);
14
}
15
}
16
...
17
ReduceContract reducer = new ReduceContract(
18
CountWords.class, // <-- UDF
19
PactString.class, // <-- key class
20
0, // <-- key index
21
mapper); // <-- input
Arvid Heise | Scalable Data Analysis Algorithms | January 03, 2012
16
PactRecords Semantics
List of fields = keys or values Fields that are used as keys need to implement Key Maintains serialized form as long as possible, lazy deserialization
Fields are only deserialized when read Serializes only written fields Needs type of field to deserialize
Very efficient storage of null values ⇒ Use a separate field for each key in your code
Arvid Heise | Scalable Data Analysis Algorithms | January 03, 2012
17
Word Count Example
Wrap Pact Stubs in Pact Contracts All pacts except source specify their input
1
String dataInput = ...;
2
FileDataSource source = new FileDataSource(
3
LineInFormat.class, dataInput, "Input Lines");
4
MapContract mapper = new MapContract(
5
TokenizeLine.class, source, "Tokenize Lines");
6
ReduceContract reducer = new ReduceContract(
7
CountWords.class, PactString.class, 0, mapper,
8
"Count Words");
9
FileDataSink out = new FileDataSink(
10
WordCountOutFormat.class, output, reducer,
11
"Word Counts");
Arvid Heise | Scalable Data Analysis Algorithms | January 03, 2012
18
Word Count Example
Input format needs to be parsed in any case Here we could already split lines and omit map In this case, we emit a PactRecord with 1 field Reuse objects to minimize garbage collection
1
public static class LineInFormat extends DelimitedInputFormat {
2
private final PactString string = new PactString();
3 4
public boolean readRecord(PactRecord record, byte[] line, int numBytes) {
5
this.string.setValueAscii(line, 0, numBytes);
6
record.setField(0, this.string);
7
return true;
8
}
9
}
Arvid Heise | Scalable Data Analysis Algorithms | January 03, 2012
18
Word Count Example
Implicit semantic of fields
1
public static class TokenizeLine extends MapStub {
2
private final PactRecord outputRecord = new PactRecord();
3
private final PactString string = new PactString();
4
private final PactInteger integer = new PactInteger(1);
5 6
private final AsciiUtils.WhitespaceTokenizer tokenizer =
7
new AsciiUtils.WhitespaceTokenizer();
8 9
@Override
10
public void map(PactRecord record, Collector collector) {
11
// get the first field (as type PactString)
12
PactString str = record.getField(0, PactString.class);
13 14
// tokenize the line
15
this.tokenizer.setStringToTokenize(str);
16
while (tokenizer.next(this.string)) {
17
// we emit a (word, 1) pair
18
this.outputRecord.setField(0, this.string);
19
this.outputRecord.setField(1, this.integer);
20
collector.collect(this.outputRecord);
21
}
22
}
23
}
Arvid Heise | Scalable Data Analysis Algorithms | January 03, 2012
18
Word Count Example
1
@Combinable
2
public static class CountWords extends ReduceStub {
3
private final PactInteger theInteger = new PactInteger();
4 5
@Override
6
public void reduce(Iterator<PactRecord> records, Collector out) throws Exception {
7
PactRecord element = null;
8
int sum = 0;
9
while (records.hasNext()) {
10
element = records.next();
11
PactInteger i =
12
element.getField(1, PactInteger.class);
13
sum += i.getValue();
14
}
15 16
this.theInteger.setValue(sum);
17
element.setField(1, this.theInteger);
18
- ut.collect(element);
19
}
20
}
Arvid Heise | Scalable Data Analysis Algorithms | January 03, 2012
18
Word Count Example
1
public static class WordCountOutFormat extends FileOutputFormat {
2
private final StringBuilder buffer = new StringBuilder();
3 4
@Override
5
public void writeRecord(PactRecord record) throws IOException {
6
this.buffer.setLength(0);
7
this.buffer.append(
8
record.getField(0, PactString.class));
9
this.buffer.append(’ ’);
10
this.buffer.append(
11
record.getField(1, PactInteger.class).getValue());
12
this.buffer.append(’\n’);
13 14
byte[] bytes = this.buffer.toString().getBytes();
15
this.stream.write(bytes);
16
}
17
}
Arvid Heise | Scalable Data Analysis Algorithms | January 03, 2012
18
Composite Keys
Reduce, CoGroup, and Match use keys May be composed of more than one field Useful for block identifier: separate row and column
Arvid Heise | Scalable Data Analysis Algorithms | January 03, 2012
19
Composite Keys
Reduce, CoGroup, and Match use keys May be composed of more than one field Useful for block identifier: separate row and column Assuming a block is a three-tuple (row, column, data):
1
ReduceContract reducer = new ReduceContract(MergeBlocks. class,
2
new Class[] { PactInteger.class, PactInteger.class }, // <-- key classes
3
new int[] { 0, 1 }, // <-- key indizes
4
mapper); // <-- input
Arvid Heise | Scalable Data Analysis Algorithms | January 03, 2012
19
Comparison to Key/Value-Pairs
Key/value-pairs easier to understand May use generic syntax to check compatibility Implicit type specification through generics
Arvid Heise | Scalable Data Analysis Algorithms | January 03, 2012
20
Comparison to Key/Value-Pairs
Key/value-pairs easier to understand May use generic syntax to check compatibility Implicit type specification through generics Tuple-based is more flexible, especially for optimization Pays quickly off in more complex tasks More convention based, field semantic less clear UDF more verbose, especially when reusing objects ⇒ Work on class mapping, HLL uses schema inference Generalization of k/v-pairs, may simulate k/v
Arvid Heise | Scalable Data Analysis Algorithms | January 03, 2012
20
Outline
1 Overview over Stratosphere 2 Dataflow Orientation 3 Tuple-based Data Model 4 Other Differences 5 Seminar Organization
Arvid Heise | Scalable Data Analysis Algorithms | January 03, 2012
21
Separate Execution Engine
Nephele and Pact together are equivalent to Hadoop Nepehle may be used by itself for fine-grain data management However, decoupling may loose some optimization potential
…
Arvid Heise | Scalable Data Analysis Algorithms | January 03, 2012
22
Separate Execution Engine
Nephele and Pact together are equivalent to Hadoop Nepehle may be used by itself for fine-grain data management However, decoupling may loose some optimization potential UDF is wrapped in Pact and Nephele code
UF3 (match) UF1 (map) UF2 (map) UF4 (reduce)
function match(Key k, Tuple val1, Tuple val2)
- > (Key, Tuple)
{ Tuple res = val1.concat(val2); res.project(...); Key k = res.getColumn(1); return (k, res); } invoke(): while (!input2.eof) KVPair p = input2.next(); hash-table.put(p.key, p.value); while (!input1.eof) KVPair p = input1.next(); KVPait t = hash-table.get(p.key); if (t != null) KVPair[] result = UF.match(p.key, p.value, t.value);
- utput.write(result);
end V1 V2 V3 V4 … V1 V2 V3
span
In-Memory Channel Network Channel Node 1 Node 2 V3 V1 V2 V3 V4 V3
Nephele DAG Spanned Data Flow
user function PACT code (grouping) Nephele code (communication) compile
PACT Program
V4
Arvid Heise | Scalable Data Analysis Algorithms | January 03, 2012
22
Dynamic Resource Allocation
Hadoop works best on cluster environment Stratosphere also targets a true cloud environment Computation units may be dynamically booked and released
Arvid Heise | Scalable Data Analysis Algorithms | January 03, 2012
23
Dynamic Resource Allocation
Hadoop works best on cluster environment Stratosphere also targets a true cloud environment Computation units may be dynamically booked and released Aggregate the smallest 20% numbers, Nephele (l), Hadoop (r)
5 10 15 20 25 30 35 20 40 60 80 100 Time [minutes] Average instance utilization [%] (a) (b) (c) (d) (e)
- USR
SYS WAIT Network traffic 50 100 150 200 250 300 350 400 450 500 Average network traffic among instances [MBit/s] 20 40 60 80 100 20 40 60 80 100 Time [minutes] Average instance utilization [%] (a) (b) (c) (d) (e) (f) (g) (h)
- USR
SYS WAIT Network traffic 50 100 150 200 250 300 350 400 450 500 Average network traffic among instances [MBit/s]
Arvid Heise | Scalable Data Analysis Algorithms | January 03, 2012
23
Continuous Reoptimization
Stratosphere wants to optimize plans However, environment (cloud or cluster) is not stable An apriori optimal plan may be bad after a while
Arvid Heise | Scalable Data Analysis Algorithms | January 03, 2012
24
Continuous Reoptimization
Stratosphere wants to optimize plans However, environment (cloud or cluster) is not stable An apriori optimal plan may be bad after a while Start with robust initial plan Adapt and optimize during runtime
Arvid Heise | Scalable Data Analysis Algorithms | January 03, 2012
24
Smart Checkpointing
(Work in Progress) Hadoop materializes all results Very good for fault-tolerance A crashed computation unit may be immediately replaced
Arvid Heise | Scalable Data Analysis Algorithms | January 03, 2012
25
Smart Checkpointing
(Work in Progress) Hadoop materializes all results Very good for fault-tolerance A crashed computation unit may be immediately replaced However, often bad for runtime performance Stratosphere materializes only data when meaningful Mostly, when materialization is cheaper than recalculation
Will materialize result of computation-intensive taks Will not materialize cartesian products
Arvid Heise | Scalable Data Analysis Algorithms | January 03, 2012
25
Outline
1 Overview over Stratosphere 2 Dataflow Orientation 3 Tuple-based Data Model 4 Other Differences 5 Seminar Organization
Arvid Heise | Scalable Data Analysis Algorithms | January 03, 2012
26
Get Stratosphere Running
https://www.stratosphere.eu/register/register Write me email to be unlocked for stage1 Install git and maven2
mkdir/cd stratosphere git clone https://stratosphere.eu/git/stage1.git . git checkout -b version02 remotes/origin/version02 mvn install
http://www.stratosphere.eu/projects/ Stratosphere/wiki/GettingStarted Start local mode Get word count running
Arvid Heise | Scalable Data Analysis Algorithms | January 03, 2012
27
Port Hadoop to Stratosphere
Meeting on 01/10/2012 Conceptual port should be completed Implementation started Write email when troubled Write tickets if bug was found Help other teams, write to sdaa mailinglist Who uses joins, composite keys?
Arvid Heise | Scalable Data Analysis Algorithms | January 03, 2012
28
Cluster Times
Proposal: for the next 6 weeks, every group gets a fixed day Thursday 6pm – Monday 6pm Time slots are flexibly tradable Announce start/end on cluster mailinglist
Arvid Heise | Scalable Data Analysis Algorithms | January 03, 2012
29
Final presentations
May be rescheduled if everybody agrees First part should be similar to Hadoop talk Second part should include detailed benchmark and comparison State what you will additionally evaluate until report
Arvid Heise | Scalable Data Analysis Algorithms | January 03, 2012
30
Report
Two column format (sig-alternate.cls) 6–8 pages, max. 2 appendix How did you change the original algorithms? Which design decisions did you make? Focus on important, interesting evaluations What results did you get? Are they making sense? How would you improve the algorithm for better runtime/results? No code, only conceptional
Arvid Heise | Scalable Data Analysis Algorithms | January 03, 2012