Stratosphere for Hadoop Users Potsdam, January 03, 2012 Arvid Heise - - PowerPoint PPT Presentation

stratosphere for hadoop users
SMART_READER_LITE
LIVE PREVIEW

Stratosphere for Hadoop Users Potsdam, January 03, 2012 Arvid Heise - - PowerPoint PPT Presentation

Stratosphere for Hadoop Users Potsdam, January 03, 2012 Arvid Heise Outline 2 1 Overview over Stratosphere 2 Dataflow Orientation 3 Tuple-based Data Model 4 Other Differences 5 Seminar Organization Arvid Heise | Scalable Data Analysis


slide-1
SLIDE 1

Stratosphere for Hadoop Users

Potsdam, January 03, 2012

Arvid Heise

slide-2
SLIDE 2

Outline

1 Overview over Stratosphere 2 Dataflow Orientation 3 Tuple-based Data Model 4 Other Differences 5 Seminar Organization

Arvid Heise | Scalable Data Analysis Algorithms | January 03, 2012

2

slide-3
SLIDE 3

Stratosphere Stack

Execution Engine Higher-Level Language Simple Script SOPREMO Compiler SOPREMO Plan Simple Parser PACT Optimizer PACT Program Nephele Scheduler Execution Graph Programming Model

Arvid Heise | Scalable Data Analysis Algorithms | January 03, 2012

3

slide-4
SLIDE 4

Pact (PArallelization ConTracts)

Parallel programming model Implementation and generalization of Map/Reduce Similar interface as Hadoop Defines the parallelization semantics of tasks Pact plan is dataflow-oriented Pact optimizes plans and compiles them to execution graphs for Nephele Alexandrov et al. 2010. MapReduce and Pact - Comparing Data Parallel Programming Models.

Arvid Heise | Scalable Data Analysis Algorithms | January 03, 2012

4

slide-5
SLIDE 5

Hadoop and Stratosphere Job

SELECT ∗ FROM Documents d JOIN Rankings r ON r . u r l = d . u r l WHERE CONTAINS( d . text , [ keywords ] ) AND r . rank > [ rank ] AND NOT EXISTS (SELECT ∗ FROM V i s i t s v WHERE v . u r l = d . u r l AND v . v i s i t D a t e = CURDATE( ) ) ;

sink d SAME-KEY MATCH IF : MAP CONTAINS [keywords] MAP rank > [rank] r MAP visitDate = CURDATE() v SAME-KEY SAME-KEY U N I Q U E

  • K

E Y COGROUP IF NONE : r d v j REDUCE IF : MAP :CONTAINS [keywords]: :rank > [rank]: MAP :visitDate = CURDATE(): : sink REDUCE IF none : url (url, content) ip_addr (ip_addr, url, date, ad_revenue, ...) url () rank (rank, url, avg_duration) url (rank, url, avg_duration)

*

+

§

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

+ +

§ § §

+

Battré et al. 2010. Nephele/PACTs: A Programming Model and Execution Framework for Web-Scale Analytical Processing

Arvid Heise | Scalable Data Analysis Algorithms | January 03, 2012

5

slide-6
SLIDE 6

Hadoop and Stratosphere Job

sink d SAME-KEY MATCH IF : MAP CONTAINS [keywords] MAP rank > [rank] r MAP visitDate = CURDATE() v SAME-KEY SAME-KEY U N I Q U E

  • K

E Y COGROUP IF NONE : r d v j REDUCE IF : MAP :CONTAINS [keywords]: :rank > [rank]: MAP :visitDate = CURDATE(): : sink REDUCE IF none : url (url, content) ip_addr (ip_addr, url, date, ad_revenue, ...) url () rank (rank, url, avg_duration) url (rank, url, avg_duration)

*

+

§

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

+ +

§ § §

+

Arvid Heise | Scalable Data Analysis Algorithms | January 03, 2012

5

slide-7
SLIDE 7

Nephele

Executes an Execution Graph Decides for each task how many instances are appropriate Assigns task instances to computation units Manages fault tolerance and adapts to changes Daniel Warneke and Odej Kao. 2009. Nephele: Efficient Parallel Data Processing in the Cloud

Arvid Heise | Scalable Data Analysis Algorithms | January 03, 2012

6

slide-8
SLIDE 8

Sopremo and Simple

High level language layer Simple = query language Sopremo = semi-structured data model (JSON) and operators Extensible operators for several use cases Text Mining, Data Cleansing, Data Mining

Arvid Heise | Scalable Data Analysis Algorithms | January 03, 2012

7

slide-9
SLIDE 9

Outline

1 Overview over Stratosphere 2 Dataflow Orientation 3 Tuple-based Data Model 4 Other Differences 5 Seminar Organization

Arvid Heise | Scalable Data Analysis Algorithms | January 03, 2012

8

slide-10
SLIDE 10

Data Analysis Program

Hadoop Driver program + multiple jobs Driver program manually connects input and outputs Fixed pipeline 1 Job = 1 Map + 1 Reduce Stratosphere Directed acyclic graph of arbitrary Pacts Explicit data sources and sinks Pact also support two inputs (for join-like

  • perations)
sink d SAME-KEY MATCH IF : MAP CONTAINS [keywords] MAP rank > [rank] r MAP visitDate = CURDATE() v SAME-KEY SAME-KEY U N I Q U E
  • K
E Y COGROUP IF NONE : r d v j REDUCE IF : MAP :CONTAINS [keywords]: :rank > [rank]: MAP :visitDate = CURDATE(): : sink REDUCE IF none : url (url, content) ip_addr (ip_addr, url, date, ad_revenue, ...) url () rank (rank, url, avg_duration) url (rank, url, avg_duration)

*

+ §

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

+ + § § § +

Arvid Heise | Scalable Data Analysis Algorithms | January 03, 2012

9

slide-11
SLIDE 11

Map/Reduce in Stratosphere

Works same as Hadoop But the semantics are interpreted differently Each Pact defines data dependencies Map: each tuple can be treated separately Reduce: tuple with same key are grouped and guaranteed to be processed by same reducer

Key Value Input Independent Subsets Key Value Input Independent Subsets

Arvid Heise | Scalable Data Analysis Algorithms | January 03, 2012

10

slide-12
SLIDE 12

Two Input Pacts

Currently three additional Pacts for two inputs Cross: make all possible pairs Match: find all matching pairs CoGroup: group all matching tuples All pairs/groups are treated independently

Input A Input B Independent Subsets Input A Input B Independent Subsets Input A Input B Independent Subsets

Arvid Heise | Scalable Data Analysis Algorithms | January 03, 2012

11

slide-13
SLIDE 13

Comparison

Hadoop Often workarounds necessary in a complex M/R program Error-prone, re-occurring manual data partitioning Pattern evolved for joins etc. (Data mining book) Fixed pipeline allows fine-grain tweaks (exploits)

Arvid Heise | Scalable Data Analysis Algorithms | January 03, 2012

12

slide-14
SLIDE 14

Comparison

T L PPartblock Replicate Replicate Replicate Replicate Replicate Replicate T1 T6 H1 Fetch Fetch Fetch Store Store Store Scan Scan PPartsplit PPartsplit H2 . . . H3 Fetch Fetch Fetch Store Store Store Scan PPartsplit H4 . . . M1 Union RecReaditemize MMapmap PPartmem LPartsh LPartsh LPartsh Sortcmp Sortcmp Sortcmp SortGrpgrp SortGrpgrp SortGrpgrp MMapcombine MMapcombine MMapcombine Store Store Store Mergecmp SortGrpgrp MMapcombine Store PPartsh M2 . . . M3 RecReaditemize MMapmap LPartsh Sortcmp SortGrpgrp MMapcombine Store PPartsh M4 . . . T1 T5 T2 T4 T3 T6 Fetch Fetch Fetch Fetch Buffer Buffer Buffer Buffer Store Store Merge Store Mergecmp SortGrpgrp MMapreduce R1 Store T′ 1 . . . R2 T′ 2

Data Load Phase Map Phase Shuffle Phase Reduce Phase

Arvid Heise | Scalable Data Analysis Algorithms | January 03, 2012

12

slide-15
SLIDE 15

Comparison

Hadoop Often workarounds necessary in a complex M/R program Error-prone, re-occurring manual data partitioning Pattern evolved for joins etc. (Data mining book) Fixed pipeline allows fine-grain tweaks (exploits) Stratosphere More expressiveness for complex data operations Maintains dataflow semantics to some degree Allows optimization and different shipping strategy Less hooks than Hadoop

Arvid Heise | Scalable Data Analysis Algorithms | January 03, 2012

12

slide-16
SLIDE 16

Optimization

Inspired by traditional query optimization Choose best join/shipping strategy for plan

sink

  • rders

U N I Q U E

  • K

E Y lineitem Local Forward Repartition Repartition Local Forward Local Forward SAME-KEY SAME-KEY SUPER-KEY

MAP

none

MAP

none Local Forward

MATCH

sort-merge

REDUCE

combining-sort sink

  • rders

U N I Q U E

  • K

E Y lineitem SAME-KEY SAME-KEY Local Forward Repartition Broadcast Local Forward Local Forward Local Forward SUPER-KEY

MAP

none

MAP

none

MATCH

sort-merge

COMBINE

combining-sort

REDUCE

combining-sort

Example Task: Alternative 1 Example Task: Alternative 2

Reorder Pacts

Arvid Heise | Scalable Data Analysis Algorithms | January 03, 2012

13

slide-17
SLIDE 17

Outline

1 Overview over Stratosphere 2 Dataflow Orientation 3 Tuple-based Data Model 4 Other Differences 5 Seminar Organization

Arvid Heise | Scalable Data Analysis Algorithms | January 03, 2012

14

slide-18
SLIDE 18

Switch to Tuple-based Model

Bleeding Edge! Check pact-examples subproject Typical Hadoop job:

Map: transforms/filters data, sets key Reduce: combines data, unsets key

Preceding maps limit reordering

Arvid Heise | Scalable Data Analysis Algorithms | January 03, 2012

15

slide-19
SLIDE 19

Switch to Tuple-based Model

Bleeding Edge! Check pact-examples subproject Typical Hadoop job:

Map: transforms/filters data, sets key Reduce: combines data, unsets key

Preceding maps limit reordering Stratosphere uses tuples instead of k/v-pairs User should set keys as soon as possible for ALL Pacts Each Pact is annotated to define the "key"

Arvid Heise | Scalable Data Analysis Algorithms | January 03, 2012

15

slide-20
SLIDE 20

Reduce Example

1

public static class CountWords extends ReduceStub {

2

public void reduce(Iterator<PactRecord> records, Collector out){

3

PactRecord element = null;

4

int sum = 0;

5

while (records.hasNext()) {

6

element = records.next();

7

PactInteger count =

8

element.getField(1, PactInteger.class);

9

sum += count.getValue();

10

}

11 12

element.setField(1, new PactInteger(sum));

13

  • ut.collect(element);

14

}

15

}

16

...

17

ReduceContract reducer = new ReduceContract(

18

CountWords.class, // <-- UDF

19

PactString.class, // <-- key class

20

0, // <-- key index

21

mapper); // <-- input

Arvid Heise | Scalable Data Analysis Algorithms | January 03, 2012

16

slide-21
SLIDE 21

PactRecords Semantics

List of fields = keys or values Fields that are used as keys need to implement Key Maintains serialized form as long as possible, lazy deserialization

Fields are only deserialized when read Serializes only written fields Needs type of field to deserialize

Very efficient storage of null values ⇒ Use a separate field for each key in your code

Arvid Heise | Scalable Data Analysis Algorithms | January 03, 2012

17

slide-22
SLIDE 22

Word Count Example

Wrap Pact Stubs in Pact Contracts All pacts except source specify their input

1

String dataInput = ...;

2

FileDataSource source = new FileDataSource(

3

LineInFormat.class, dataInput, "Input Lines");

4

MapContract mapper = new MapContract(

5

TokenizeLine.class, source, "Tokenize Lines");

6

ReduceContract reducer = new ReduceContract(

7

CountWords.class, PactString.class, 0, mapper,

8

"Count Words");

9

FileDataSink out = new FileDataSink(

10

WordCountOutFormat.class, output, reducer,

11

"Word Counts");

Arvid Heise | Scalable Data Analysis Algorithms | January 03, 2012

18

slide-23
SLIDE 23

Word Count Example

Input format needs to be parsed in any case Here we could already split lines and omit map In this case, we emit a PactRecord with 1 field Reuse objects to minimize garbage collection

1

public static class LineInFormat extends DelimitedInputFormat {

2

private final PactString string = new PactString();

3 4

public boolean readRecord(PactRecord record, byte[] line, int numBytes) {

5

this.string.setValueAscii(line, 0, numBytes);

6

record.setField(0, this.string);

7

return true;

8

}

9

}

Arvid Heise | Scalable Data Analysis Algorithms | January 03, 2012

18

slide-24
SLIDE 24

Word Count Example

Implicit semantic of fields

1

public static class TokenizeLine extends MapStub {

2

private final PactRecord outputRecord = new PactRecord();

3

private final PactString string = new PactString();

4

private final PactInteger integer = new PactInteger(1);

5 6

private final AsciiUtils.WhitespaceTokenizer tokenizer =

7

new AsciiUtils.WhitespaceTokenizer();

8 9

@Override

10

public void map(PactRecord record, Collector collector) {

11

// get the first field (as type PactString)

12

PactString str = record.getField(0, PactString.class);

13 14

// tokenize the line

15

this.tokenizer.setStringToTokenize(str);

16

while (tokenizer.next(this.string)) {

17

// we emit a (word, 1) pair

18

this.outputRecord.setField(0, this.string);

19

this.outputRecord.setField(1, this.integer);

20

collector.collect(this.outputRecord);

21

}

22

}

23

}

Arvid Heise | Scalable Data Analysis Algorithms | January 03, 2012

18

slide-25
SLIDE 25

Word Count Example

1

@Combinable

2

public static class CountWords extends ReduceStub {

3

private final PactInteger theInteger = new PactInteger();

4 5

@Override

6

public void reduce(Iterator<PactRecord> records, Collector out) throws Exception {

7

PactRecord element = null;

8

int sum = 0;

9

while (records.hasNext()) {

10

element = records.next();

11

PactInteger i =

12

element.getField(1, PactInteger.class);

13

sum += i.getValue();

14

}

15 16

this.theInteger.setValue(sum);

17

element.setField(1, this.theInteger);

18

  • ut.collect(element);

19

}

20

}

Arvid Heise | Scalable Data Analysis Algorithms | January 03, 2012

18

slide-26
SLIDE 26

Word Count Example

1

public static class WordCountOutFormat extends FileOutputFormat {

2

private final StringBuilder buffer = new StringBuilder();

3 4

@Override

5

public void writeRecord(PactRecord record) throws IOException {

6

this.buffer.setLength(0);

7

this.buffer.append(

8

record.getField(0, PactString.class));

9

this.buffer.append(’ ’);

10

this.buffer.append(

11

record.getField(1, PactInteger.class).getValue());

12

this.buffer.append(’\n’);

13 14

byte[] bytes = this.buffer.toString().getBytes();

15

this.stream.write(bytes);

16

}

17

}

Arvid Heise | Scalable Data Analysis Algorithms | January 03, 2012

18

slide-27
SLIDE 27

Composite Keys

Reduce, CoGroup, and Match use keys May be composed of more than one field Useful for block identifier: separate row and column

Arvid Heise | Scalable Data Analysis Algorithms | January 03, 2012

19

slide-28
SLIDE 28

Composite Keys

Reduce, CoGroup, and Match use keys May be composed of more than one field Useful for block identifier: separate row and column Assuming a block is a three-tuple (row, column, data):

1

ReduceContract reducer = new ReduceContract(MergeBlocks. class,

2

new Class[] { PactInteger.class, PactInteger.class }, // <-- key classes

3

new int[] { 0, 1 }, // <-- key indizes

4

mapper); // <-- input

Arvid Heise | Scalable Data Analysis Algorithms | January 03, 2012

19

slide-29
SLIDE 29

Comparison to Key/Value-Pairs

Key/value-pairs easier to understand May use generic syntax to check compatibility Implicit type specification through generics

Arvid Heise | Scalable Data Analysis Algorithms | January 03, 2012

20

slide-30
SLIDE 30

Comparison to Key/Value-Pairs

Key/value-pairs easier to understand May use generic syntax to check compatibility Implicit type specification through generics Tuple-based is more flexible, especially for optimization Pays quickly off in more complex tasks More convention based, field semantic less clear UDF more verbose, especially when reusing objects ⇒ Work on class mapping, HLL uses schema inference Generalization of k/v-pairs, may simulate k/v

Arvid Heise | Scalable Data Analysis Algorithms | January 03, 2012

20

slide-31
SLIDE 31

Outline

1 Overview over Stratosphere 2 Dataflow Orientation 3 Tuple-based Data Model 4 Other Differences 5 Seminar Organization

Arvid Heise | Scalable Data Analysis Algorithms | January 03, 2012

21

slide-32
SLIDE 32

Separate Execution Engine

Nephele and Pact together are equivalent to Hadoop Nepehle may be used by itself for fine-grain data management However, decoupling may loose some optimization potential

Arvid Heise | Scalable Data Analysis Algorithms | January 03, 2012

22

slide-33
SLIDE 33

Separate Execution Engine

Nephele and Pact together are equivalent to Hadoop Nepehle may be used by itself for fine-grain data management However, decoupling may loose some optimization potential UDF is wrapped in Pact and Nephele code

UF3 (match) UF1 (map) UF2 (map) UF4 (reduce)

function match(Key k, Tuple val1, Tuple val2)

  • > (Key, Tuple)

{ Tuple res = val1.concat(val2); res.project(...); Key k = res.getColumn(1); return (k, res); } invoke(): while (!input2.eof) KVPair p = input2.next(); hash-table.put(p.key, p.value); while (!input1.eof) KVPair p = input1.next(); KVPait t = hash-table.get(p.key); if (t != null) KVPair[] result = UF.match(p.key, p.value, t.value);

  • utput.write(result);

end V1 V2 V3 V4 … V1 V2 V3

span

In-Memory Channel Network Channel Node 1 Node 2 V3 V1 V2 V3 V4 V3

Nephele DAG Spanned Data Flow

user function PACT code (grouping) Nephele code (communication) compile

PACT Program

V4

Arvid Heise | Scalable Data Analysis Algorithms | January 03, 2012

22

slide-34
SLIDE 34

Dynamic Resource Allocation

Hadoop works best on cluster environment Stratosphere also targets a true cloud environment Computation units may be dynamically booked and released

Arvid Heise | Scalable Data Analysis Algorithms | January 03, 2012

23

slide-35
SLIDE 35

Dynamic Resource Allocation

Hadoop works best on cluster environment Stratosphere also targets a true cloud environment Computation units may be dynamically booked and released Aggregate the smallest 20% numbers, Nephele (l), Hadoop (r)

5 10 15 20 25 30 35 20 40 60 80 100 Time [minutes] Average instance utilization [%] (a) (b) (c) (d) (e)

  • USR

SYS WAIT Network traffic 50 100 150 200 250 300 350 400 450 500 Average network traffic among instances [MBit/s] 20 40 60 80 100 20 40 60 80 100 Time [minutes] Average instance utilization [%] (a) (b) (c) (d) (e) (f) (g) (h)

  • USR

SYS WAIT Network traffic 50 100 150 200 250 300 350 400 450 500 Average network traffic among instances [MBit/s]

Arvid Heise | Scalable Data Analysis Algorithms | January 03, 2012

23

slide-36
SLIDE 36

Continuous Reoptimization

Stratosphere wants to optimize plans However, environment (cloud or cluster) is not stable An apriori optimal plan may be bad after a while

Arvid Heise | Scalable Data Analysis Algorithms | January 03, 2012

24

slide-37
SLIDE 37

Continuous Reoptimization

Stratosphere wants to optimize plans However, environment (cloud or cluster) is not stable An apriori optimal plan may be bad after a while Start with robust initial plan Adapt and optimize during runtime

Arvid Heise | Scalable Data Analysis Algorithms | January 03, 2012

24

slide-38
SLIDE 38

Smart Checkpointing

(Work in Progress) Hadoop materializes all results Very good for fault-tolerance A crashed computation unit may be immediately replaced

Arvid Heise | Scalable Data Analysis Algorithms | January 03, 2012

25

slide-39
SLIDE 39

Smart Checkpointing

(Work in Progress) Hadoop materializes all results Very good for fault-tolerance A crashed computation unit may be immediately replaced However, often bad for runtime performance Stratosphere materializes only data when meaningful Mostly, when materialization is cheaper than recalculation

Will materialize result of computation-intensive taks Will not materialize cartesian products

Arvid Heise | Scalable Data Analysis Algorithms | January 03, 2012

25

slide-40
SLIDE 40

Outline

1 Overview over Stratosphere 2 Dataflow Orientation 3 Tuple-based Data Model 4 Other Differences 5 Seminar Organization

Arvid Heise | Scalable Data Analysis Algorithms | January 03, 2012

26

slide-41
SLIDE 41

Get Stratosphere Running

https://www.stratosphere.eu/register/register Write me email to be unlocked for stage1 Install git and maven2

mkdir/cd stratosphere git clone https://stratosphere.eu/git/stage1.git . git checkout -b version02 remotes/origin/version02 mvn install

http://www.stratosphere.eu/projects/ Stratosphere/wiki/GettingStarted Start local mode Get word count running

Arvid Heise | Scalable Data Analysis Algorithms | January 03, 2012

27

slide-42
SLIDE 42

Port Hadoop to Stratosphere

Meeting on 01/10/2012 Conceptual port should be completed Implementation started Write email when troubled Write tickets if bug was found Help other teams, write to sdaa mailinglist Who uses joins, composite keys?

Arvid Heise | Scalable Data Analysis Algorithms | January 03, 2012

28

slide-43
SLIDE 43

Cluster Times

Proposal: for the next 6 weeks, every group gets a fixed day Thursday 6pm – Monday 6pm Time slots are flexibly tradable Announce start/end on cluster mailinglist

Arvid Heise | Scalable Data Analysis Algorithms | January 03, 2012

29

slide-44
SLIDE 44

Final presentations

May be rescheduled if everybody agrees First part should be similar to Hadoop talk Second part should include detailed benchmark and comparison State what you will additionally evaluate until report

Arvid Heise | Scalable Data Analysis Algorithms | January 03, 2012

30

slide-45
SLIDE 45

Report

Two column format (sig-alternate.cls) 6–8 pages, max. 2 appendix How did you change the original algorithms? Which design decisions did you make? Focus on important, interesting evaluations What results did you get? Are they making sense? How would you improve the algorithm for better runtime/results? No code, only conceptional

Arvid Heise | Scalable Data Analysis Algorithms | January 03, 2012

31