Online Aggregation for Large MapReduce Jobs Niketan Pansare 1 , - - PowerPoint PPT Presentation

online aggregation for large mapreduce jobs
SMART_READER_LITE
LIVE PREVIEW

Online Aggregation for Large MapReduce Jobs Niketan Pansare 1 , - - PowerPoint PPT Presentation

Online Aggregation for Large MapReduce Jobs Niketan Pansare 1 , Vinayak Borkar 2 , Chris Jermaine 1 , Tyson Condie 3 1 Rice University, 2 UC Irvine, 3 Yahoo! Research Outline Motivation Implementation Experiments Conclusion 2


slide-1
SLIDE 1

Online Aggregation for Large MapReduce Jobs

Niketan Pansare1, Vinayak Borkar2, Chris Jermaine1, Tyson Condie3

1Rice University, 2UC Irvine, 3Yahoo! Research

slide-2
SLIDE 2

2

Outline

 Motivation  Implementation  Experiments  Conclusion

slide-3
SLIDE 3

3

Outline

 Motivation  Implementation  Experiments  Conclusion

slide-4
SLIDE 4

4

OLA example

select avg(stock_price) from nasdaq_db where company = 'xyz'; (Note: final answer for this query is 1000)

slide-5
SLIDE 5

5

OLA example

select avg(stock_price) from nasdaq_db where company = 'xyz'; After 1 second,

Conventional Database:

With OLA extension:

 Output range estimate: [0, 2000 ] with 95% probability

slide-6
SLIDE 6

6

OLA example

select avg(stock_price) from nasdaq_db where company = 'xyz'; After 2 minutes,

Conventional Database:

With OLA extension:

 Output range estimate: [900, 1100 ] with 95% probability

slide-7
SLIDE 7

7

OLA example

select avg(stock_price) from nasdaq_db where company = 'xyz'; After 4 minutes,

Conventional Database:

With OLA extension:

 Output range estimate: [950, 1040 ] with 95% probability

slide-8
SLIDE 8

8

OLA example

select avg(stock_price) from nasdaq_db where company = 'xyz'; After 6 minutes,

Conventional Database:

With OLA extension:

 Output range estimate: [990, 1010 ] with 95% probability

slide-9
SLIDE 9

9

OLA example

select avg(stock_price) from nasdaq_db where company = 'xyz'; After 10 minutes,

Conventional Database:

With OLA extension:

 Output range estimate: [995, 1005 ] with 95% probability

slide-10
SLIDE 10

10

OLA example

select avg(stock_price) from nasdaq_db where company = 'xyz'; After 30 minutes,

Conventional Database:

With OLA extension:

 Output range estimate: [999, 1001.5 ] with 95% probability

slide-11
SLIDE 11

11

OLA example

select avg(stock_price) from nasdaq_db where company = 'xyz'; After 2 hours,

Conventional Database:

 Output final answer: 1000 

With OLA extension:

 Output final answer: 1000

slide-12
SLIDE 12

12

Benefit of OLA

If acceptably accurate answer reached quickly, the query can be aborted After 6 minutes,

Conventional Database:

With OLA extension:

 Output range estimate: [990, 1015 ] with 95% probability

STOP EARLY !!!

slide-13
SLIDE 13

13

Why Stop Early ?

Save human time (1 hour 54 minutes)

 'Answer 1000' v/s 'Estimate 1002.5'

  • For exploratory apps
  • Inaccuracies in ETL process
slide-14
SLIDE 14

14

Why Stop Early ?

Save human time (1 hour 54 minutes)

 'Answer 1000' v/s 'Estimate 1002.5'

  • For exploratory apps
  • Inaccuracies in ETL process

Save machine time → Cost ↓

slide-15
SLIDE 15

15

Why Stop Early ?

Save human time (1 hour 54 minutes)

 'Answer 1000' v/s 'Estimate 1002.5'

  • For exploratory apps
  • Inaccuracies in ETL process

Save machine time → Cost ↓

Very important when dealing with large data

slide-16
SLIDE 16

16

Why Stop Early ?

Save human time (1 hour 54 minutes)

 'Answer 1000' v/s 'Estimate 1002.5'

  • For exploratory apps
  • Inaccuracies in ETL process

Save machine time → Cost ↓

Very important when dealing with large data Online Aggregation

  • Introduced in 1997
  • Significant research impact (606

citations)

  • ACM SIGMOD Test of Time Award

But, limited commercial impact

  • Database market (self-managed)
slide-17
SLIDE 17

17

Self-managed DB → Cloud

Cost model

 In Self-managed DB: costs are fixed  In Cloud: You pay for amount of hardware used

  • Less resources → Less cost
  • 10 node cluster: 1h 54min → save $12.92/query on EC2

 User needs to justify the cost to the organization

slide-18
SLIDE 18

18

Self-managed DB → Cloud

Cost model

 In Self-managed DB: costs are fixed  In Cloud: You pay for amount of hardware used

  • Less resources → Less cost
  • 10 node cluster: 1h 54min → save $12.92/query on EC2

 User needs to justify the cost to the organization 

Modifiying engine to support randomization

 Traditional DB: Notoriously difficult  Cloud: Much simpler

slide-19
SLIDE 19

19

Self-managed DB → Cloud

Cost model

 In Self-managed DB: costs are fixed  In Cloud: You pay for amount of hardware used

  • Less resources → Less cost
  • 10 node cluster: 1h 54min → save $12.92/query on EC2

 User needs to justify the cost to the organization 

Modifiying engine to support randomization

 Traditional DB: Notoriously difficult  Cloud: Much simpler 

Therefore, OLA for cloud is an interesting problem

slide-20
SLIDE 20

20

Extend existing approaches

OLA over single machine

OLA over multiple machine

Why it won't work ?

How do we deal with those issues ?

slide-21
SLIDE 21

21

Extend existing approaches

OLA over single machine

 Confidence interval found using classical sampling theory  Tuples are bundled into blocks  Blocks arrive in random order 

OLA over multiple machines

Why it won't work ?

How do we deal with those issues ?

slide-22
SLIDE 22

22

OLA over single machine

Confidence interval found using classical sampling theory

Tuples are bundled into blocks

Blocks arrive in random order

Example: Find SUM of below values Note: True answer = 55

5, 9 7, 4, 2 8, 3 1, 10, 6

slide-23
SLIDE 23

23

OLA over single machine

Confidence interval found using classical sampling theory

Tuples are bundled into blocks

Blocks arrive in random order

Example: Find SUM of below values

5, 9 7, 4, 2 8, 3 1, 10, 6

slide-24
SLIDE 24

24

OLA over single machine

Confidence interval found using classical sampling theory

Tuples are bundled into blocks

Blocks arrive in random order

Example: Find SUM of below values Sample = {} Estimate = Not available

5, 9 7, 4, 2 8, 3 1, 10, 6 7, 4, 2

slide-25
SLIDE 25

25

OLA over single machine

Confidence interval found using classical sampling theory

Tuples are bundled into blocks

Blocks arrive in random order

Example: Find SUM of below values Sample = {} Estimate = Not available

5, 9 7, 4, 2 8, 3 1, 10, 6 7, 4, 2

slide-26
SLIDE 26

26

OLA over single machine

Confidence interval found using classical sampling theory

Tuples are bundled into blocks

Blocks arrive in random order

Example: Find SUM of below values Sample = {} Estimate = Not available

5, 9 7, 4, 2 8, 3 1, 10, 6 7, 4, 2

slide-27
SLIDE 27

27

OLA over single machine

Confidence interval found using classical sampling theory

Tuples are bundled into blocks

Blocks arrive in random order

Example: Find SUM of below values Sample = {13} Estimate = 13 * 4 / 1 = 52

7, 4, 2 5, 9 7, 4, 2 8, 3 1, 10, 6

slide-28
SLIDE 28

28

OLA over single machine

Confidence interval found using classical sampling theory

Tuples are bundled into blocks

Blocks arrive in random order

Example: Find SUM of below values Sample = {13} Estimate = 13 * 4 / 1 = 52

5, 9 7, 4, 2 8, 3 1, 10, 6 7, 4, 2 8, 3

slide-29
SLIDE 29

29

OLA over single machine

Confidence interval found using classical sampling theory

Tuples are bundled into blocks

Blocks arrive in random order

Example: Find SUM of below values Sample = {13} Estimate = 13 * 4 / 1 = 52

5, 9 7, 4, 2 8, 3 1, 10, 6 7, 4, 2 8, 3

slide-30
SLIDE 30

30

OLA over single machine

Confidence interval found using classical sampling theory

Tuples are bundled into blocks

Blocks arrive in random order

Example: Find SUM of below values Sample = {13, 11} Estimate = (13 + 11) * 4 / 2 = 48

5, 9 7, 4, 2 8, 3 1, 10, 6 7, 4, 2 8, 3

slide-31
SLIDE 31

31

OLA over single machine

Confidence interval found using classical sampling theory

Tuples are bundled into blocks

Blocks arrive in random order

Example: Find SUM of below values Sample = {13, 11} Estimate = (13 + 11) * 4 / 2 = 48

5, 9 7, 4, 2 8, 3 1, 10, 6 7, 4, 2 8, 3

slide-32
SLIDE 32

32

OLA over single machine

Confidence interval found using classical sampling theory

Tuples are bundled into blocks

Blocks arrive in random order

Example: Find SUM of below values Sample = {13, 11} Estimate = (13 + 11) * 4 / 2 = 48

5, 9 7, 4, 2 8, 3 1, 10, 6 7, 4, 2 8, 3 5, 9

slide-33
SLIDE 33

33

OLA over single machine

Confidence interval found using classical sampling theory

Tuples are bundled into blocks

Blocks arrive in random order

Example: Find SUM of below values Sample = {13, 11} Estimate = (13 + 11) * 4 / 2 = 48

5, 9 7, 4, 2 8, 3 1, 10, 6 7, 4, 2 8, 3 5, 9

slide-34
SLIDE 34

34

OLA over single machine

Confidence interval found using classical sampling theory

Tuples are bundled into blocks

Blocks arrive in random order

Example: Find SUM of below values Sample = {13, 11, 14} Estimate = (13 + 11 + 14) * 4 / 3 = 50.67

5, 9 7, 4, 2 8, 3 1, 10, 6 7, 4, 2 8, 3 5, 9

slide-35
SLIDE 35

35

OLA over single machine

Confidence interval found using classical sampling theory

Tuples are bundled into blocks

Blocks arrive in random order

Example: Find SUM of below values Sample = {13, 11, 14} Estimate = (13 + 11 + 14) * 4 / 3 = 50.67

5, 9 7, 4, 2 8, 3 1, 10, 6 7, 4, 2 8, 3 5, 9 1, 10, 6

slide-36
SLIDE 36

36

OLA over single machine

Confidence interval found using classical sampling theory

Tuples are bundled into blocks

Blocks arrive in random order

Example: Find SUM of below values Sample = {13, 11, 14} Estimate = (13 + 11 + 14) * 4 / 3 = 50.67

5, 9 7, 4, 2 8, 3 1, 10, 6 7, 4, 2 8, 3 5, 9 1, 10, 6

slide-37
SLIDE 37

37

OLA over single machine

Confidence interval found using classical sampling theory

Tuples are bundled into blocks

Blocks arrive in random order

Example: Find SUM of below values Sample = {13, 11, 14} Estimate = (13 + 11 + 14) * 4 / 3 = 50.67

5, 9 7, 4, 2 8, 3 1, 10, 6 7, 4, 2 8, 3 5, 9 1, 10, 6

slide-38
SLIDE 38

38

OLA over single machine

Confidence interval found using classical sampling theory

Tuples are bundled into blocks

Blocks arrive in random order

Example: Find SUM of below values Sample = {13, 11, 14} Estimate = (13 + 11 + 14) * 4 / 3 = 50.67

5, 9 7, 4, 2 8, 3 1, 10, 6 7, 4, 2 8, 3 5, 9 1, 10, 6

slide-39
SLIDE 39

39

OLA over single machine

Confidence interval found using classical sampling theory

Tuples are bundled into blocks

Blocks arrive in random order

Example: Find SUM of below values Sample = {13, 11, 14, 17} Estimate = (13 + 11 + 14 + 17) * 4 / 4 = 55

5, 9 7, 4, 2 8, 3 1, 10, 6 7, 4, 2 8, 3 5, 9 1, 10, 6

slide-40
SLIDE 40

40

Extend existing approaches

OLA over single machine

 Confidence interval found using classical sampling theory  Tuples are bundled into blocks  Blocks arrive in random order 

OLA over multiple machines

 Blocks → Non-uniform → Size, Locality, Machine, Network  Processing time for block can be large and highly variable 

Why it won't work ?

How do we deal with those issues ?

slide-41
SLIDE 41

41

OLA over multiple machines

Blocks → Non-uniform → Size, Locality, Machine, Network

Processing time for block can be large and highly variable So, instead of

Example: Find SUM of below values

5, 9 7, 4, 2 8, 3 1, 10, 6 7, 4, 2 8, 3 5, 9 1, 10, 6

slide-42
SLIDE 42

42

OLA over multiple machines

Blocks → Non-uniform → Size, Locality, Machine, Network

Processing time for block can be large and highly variable

Example: Find SUM of below values

1, 10, 6 7, 4, 2 8, 3 5, 9 1, 10, 6 5, 9 8, 3 7, 4, 2

X axis = Processing Time

slide-43
SLIDE 43

43

OLA over multiple machines

Blocks → Non-uniform → Size, Locality, Machine, Network

Processing time for block can be large and highly variable

Example: Find SUM of below values

Blocks that take

 long time to process = RED  Short time to process = Green

7, 4, 2 8, 3 5, 9 1, 10, 6

slide-44
SLIDE 44

44

OLA over multiple machines

Blocks → Non-uniform → Size, Locality, Machine, Network

Processing time for block can be large and highly variable

Example: Find SUM of below values Arrows = Random Time Instances (Polling blocks)

7, 4, 2 8, 3 5, 9 1, 10, 6

slide-45
SLIDE 45

45

OLA over multiple machines

Blocks → Non-uniform → Size, Locality, Machine, Network

Processing time for block can be large and highly variable

Example: Find SUM of below values

7, 4, 2 8, 3 5, 9 1, 10, 6

slide-46
SLIDE 46

46

OLA over multiple machines

Blocks → Non-uniform → Size, Locality, Machine, Network

Processing time for block can be large and highly variable

Example: Find SUM of below values

7, 4, 2 8, 3 5, 9 1, 10, 6

slide-47
SLIDE 47

47

OLA over multiple machines

Blocks → Non-uniform → Size, Locality, Machine, Network

Processing time for block can be large and highly variable

Example: Find SUM of below values

7, 4, 2 8, 3 5, 9 1, 10, 6

slide-48
SLIDE 48

48

OLA over multiple machines

Blocks → Non-uniform → Size, Locality, Machine, Network

Processing time for block can be large and highly variable

Example: Find SUM of below values

7, 4, 2 8, 3 5, 9 1, 10, 6

slide-49
SLIDE 49

49

OLA over multiple machines

Blocks → Non-uniform → Size, Locality, Machine, Network

Processing time for block can be large and highly variable

Example: Find SUM of below values

7, 4, 2 8, 3 5, 9 1, 10, 6

slide-50
SLIDE 50

50

OLA over multiple machines

Blocks → Non-uniform → Size, Locality, Machine, Network

Processing time for block can be large and highly variable

Example: Find SUM of below values

7, 4, 2 8, 3 5, 9 1, 10, 6

slide-51
SLIDE 51

51

OLA over multiple machines

Blocks → Non-uniform → Size, Locality, Machine, Network

Processing time for block can be large and highly variable

Example: Find SUM of below values

7, 4, 2 8, 3 5, 9 1, 10, 6

slide-52
SLIDE 52

52

OLA over multiple machines

Blocks → Non-uniform → Size, Locality, Machine, Network

Processing time for block can be large and highly variable

Example: Find SUM of below values

7, 4, 2 8, 3 5, 9 1, 10, 6

slide-53
SLIDE 53

53

OLA over multiple machines

Blocks → Non-uniform → Size, Locality, Machine, Network

Processing time for block can be large and highly variable

Example: Find SUM of below values

7, 4, 2 8, 3 5, 9 1, 10, 6

slide-54
SLIDE 54

54

OLA over multiple machines

Blocks → Non-uniform → Size, Locality, Machine, Network

Processing time for block can be large and highly variable

Example: Find SUM of below values Notice, there are more arrows on red region than green region

7, 4, 2 8, 3 5, 9 1, 10, 6

slide-55
SLIDE 55

55

OLA over multiple machines

Blocks → Non-uniform → Size, Locality, Machine, Network

Processing time for block can be large and highly variable

Example: Find SUM of below values Notice, there are more arrows on red region than green region Inspection Paradox: At any random time t, (stochastically) you will be processing those blocks that take long time

7, 4, 2 8, 3 5, 9 1, 10, 6

slide-56
SLIDE 56

56

Extend existing approaches

OLA over single machine

 Confidence interval found using classical sampling theory  Tuples are bundled into blocks

  • Arrive in random order

OLA over multiple machines

 Blocks → Non-uniform → Size, Locality, Machine, Network  Processing time for block can be large and highly variable 

Why it won't work ?

How do we deal with those issues ?

slide-57
SLIDE 57

57

Why won't previous approach work ?

Inspection paradox → At the time of estimation, processing longer blocks

Possible: correlation between processing time and value

 Eg: count query

slide-58
SLIDE 58

58

Why won't previous approach work ?

Inspection paradox → At the time of estimation, processing longer blocks

Possible: correlation between processing time and value

 Eg: count query 

Biased estimates → current techniques won't work

slide-59
SLIDE 59

59

Why won't previous approach work ?

Inspection paradox → At the time of estimation, processing longer blocks

Possible: correlation between processing time and value

 Eg: count query 

Biased estimates → current techniques won't work This effect is found experimentally in the paper: 'MapReduce Online'

slide-60
SLIDE 60

60

Why won't previous approach work ?

Inspection paradox → At the time of estimation, processing longer blocks

Possible: correlation between processing time and value

 Eg: count query 

Biased estimates → current techniques won't work

Therefore, need to deal with inspection paradox in principled fashion

slide-61
SLIDE 61

61

Extend existing approaches

OLA over single machine

 Confidence interval found using classical sampling theory  Tuples are bundled into blocks

  • Arrive in random order

OLA over multiple machines

 Blocks → Non-uniform → Size, Locality, Machine, Network  Processing time for block can be large and highly variable 

Why it won't work ?

How do we deal with those issues ?

slide-62
SLIDE 62

62

How do we deal with Inspection Paradox

Capture timing information (i.e. processing time of block)

 Along with values 

Instead of using classical sampling theory, we output estimates using bayesian model that:

 Allows for correlation between processing time and values  And also takes into account the processing time of current

block

slide-63
SLIDE 63

63

Outline

 Motivation  Implementation  Experiments  Conclusion

slide-64
SLIDE 64

64

Implementation Overview

Framework for distributed systems: MapReduce

 Hadoop

  • Staged processing

→ Online

 Hyracks (developed at UC Irvine)

  • Pipelining

→ ”Online”

  • Architecture (and API) similar to Hadoop
  • http://code.google.com/p/hyracks/

For estimates of ”Aggregation”,

 2 modifications to MapReduce (Hyracks)  Bayesian Estimator

slide-65
SLIDE 65

65

Implementation Overview

Framework for distributed systems: MapReduce

 Hadoop

  • Staged processing

→ Online

 Hyracks (developed at UC Irvine)

  • Pipelining

→ ”Online”

  • Architecture (and API) similar to Hadoop
  • http://code.google.com/p/hyracks/

For estimates of ”Aggregation”,

 2 modifications to MapReduce (Hyracks)  Bayesian Estimator

slide-66
SLIDE 66

66

Implementation Overview

Framework for distributed systems: MapReduce

 Hadoop

  • Staged processing

→ Online

 Hyracks (developed at UC Irvine)

  • Pipelining

→ ”Online”

  • Architecture (and API) similar to Hadoop
  • http://code.google.com/p/hyracks/

For estimates of ”Aggregation”,

 2 modifications to MapReduce (Hyracks)  Bayesian Estimator

slide-67
SLIDE 67

67

Modifications to MapReduce (Hyracks)

Master

 Maintains random ordering of blocks

  • Logical not physical queue

 Assigns block from head of queue  Block comes to head of queue → Timer starts (processing

time)

Two intermediates set of files

 Data file → Values  Metadata file → Timing information  Shuffle phase of reducer

slide-68
SLIDE 68

68

Modifications to MapReduce (Hyracks)

Client Master

select sum(stock_price) from nasdaq_db group by company;

Blk1 MSFT AAPL 2 4 Blk2 ORCL 3 Blk3 AAPL 4 Blk4 MSFT 2 Blk5 ORCL 3 Blk6 MSFT 2 Blk7 AAPL 4

slide-69
SLIDE 69

69

Modifications to MapReduce (Hyracks)

Client Master Blk1 MSFT AAPL 2 4 Blk2 ORCL 3 Blk3 AAPL 4 Blk4 MSFT 2 Blk5 ORCL 3 Blk6 MSFT 2 Blk7 AAPL 4

Time t = 0

slide-70
SLIDE 70

70

Modifications to MapReduce (Hyracks)

Client Master Blk1 MSFT AAPL 2 4 Blk2 ORCL 3 Blk3 AAPL 4 Blk4 MSFT 2 Blk5 ORCL 3 Blk6 MSFT 2 Blk7 AAPL 4

Time t = 1

Blk 1 Blk 2 Blk 3 Blk 4 Blk 5 Blk 6 Blk 7

Master maintains a logical queue of the blocks

slide-71
SLIDE 71

71

Modifications to MapReduce (Hyracks)

Client Master Blk1 MSFT AAPL 2 4 Blk2 ORCL 3 Blk3 AAPL 4 Blk4 MSFT 2 Blk5 ORCL 3 Blk6 MSFT 2 Blk7 AAPL 4

Time t = 1

Blk 6 Blk 5 Blk 3 Blk 1 Blk 4 Blk 7 Blk 2

Master randomizes the queue

slide-72
SLIDE 72

72

Modifications to MapReduce (Hyracks)

Client Master Blk1 MSFT AAPL 2 4 Blk2 ORCL 3 Blk3 AAPL 4 Blk4 MSFT 2 Blk5 ORCL 3 Blk6 MSFT 2 Blk7 AAPL 4

Time t = 2

Blk 6 Blk 5 Blk 3 Blk 1 Blk 4 Blk 7 Blk 2

Master forks workers

Worker 1 Worker 2

slide-73
SLIDE 73

73

Modifications to MapReduce (Hyracks)

Client Master Blk1 MSFT AAPL 2 4 Blk2 ORCL 3 Blk3 AAPL 4 Blk4 MSFT 2 Blk5 ORCL 3 Blk6 MSFT 2 Blk7 AAPL 4

Time t = 3

Blk 6 Blk 5 Blk 3 Blk 1 Blk 4 Blk 7 Blk 2

Workers request for blocks

Worker 1 Worker 2

slide-74
SLIDE 74

74

Modifications to MapReduce (Hyracks)

Client Master Blk1 MSFT AAPL 2 4 Blk2 ORCL 3 Blk3 AAPL 4 Blk4 MSFT 2 Blk5 ORCL 3 Blk6 MSFT 2 Blk7 AAPL 4

Time t = 4

Blk 6 Blk 5 Blk 3 Blk 1 Blk 4 Blk 7 Blk 2

Masters reads head of queue and assigns it to first worker

Worker 1 Worker 2 Blk6

slide-75
SLIDE 75

75

Modifications to MapReduce (Hyracks)

Client Master Blk1 MSFT AAPL 2 4 Blk2 ORCL 3 Blk3 AAPL 4 Blk4 MSFT 2 Blk5 ORCL 3 Blk6 MSFT 2 Blk7 AAPL 4

Time t = 5

Blk 6 Blk 5 Blk 3 Blk 1 Blk 4 Blk 7 Blk 2

Worker1 starts reading Blk6

Worker 1 Worker 2 Blk6

slide-76
SLIDE 76

76

Modifications to MapReduce (Hyracks)

Client Master Blk1 MSFT AAPL 2 4 Blk2 ORCL 3 Blk3 AAPL 4 Blk4 MSFT 2 Blk5 ORCL 3 Blk6 MSFT 2 Blk7 AAPL 4

Time t = 6

Blk 6 Blk 5 Blk 3 Blk 1 Blk 4 Blk 7 Blk 2

Assigns Blk5 to Worker2

Worker 1 Worker 2 <MSFT, 2> Blk5

slide-77
SLIDE 77

77

Modifications to MapReduce (Hyracks)

Client Master Blk1 MSFT AAPL 2 4 Blk2 ORCL 3 Blk3 AAPL 4 Blk4 MSFT 2 Blk5 ORCL 3 Blk6 MSFT 2 Blk7 AAPL 4

Time t = 7

Blk 6 Blk 5 Blk 3 Blk 1 Blk 4 Blk 7 Blk 2

Worker1 does its map task

Worker 1 Worker 2 <MSFT, 2> Blk5

slide-78
SLIDE 78

78

Modifications to MapReduce (Hyracks)

Client Master Blk1 MSFT AAPL 2 4 Blk2 ORCL 3 Blk3 AAPL 4 Blk4 MSFT 2 Blk5 ORCL 3 Blk6 MSFT 2 Blk7 AAPL 4

Time t = 8

Blk 6 Blk 5 Blk 3 Blk 1 Blk 4 Blk 7 Blk 2 Worker 1 Worker 2 <MSFT, 2> Blk5 Reducer Shuffle Phase Reduce Phase <MSFT, 2> tprocess = 4

slide-79
SLIDE 79

79

Modifications to MapReduce (Hyracks)

Client Master Blk1 MSFT AAPL 2 4 Blk2 ORCL 3 Blk3 AAPL 4 Blk4 MSFT 2 Blk5 ORCL 3 Blk6 MSFT 2 Blk7 AAPL 4

Time t = 9

Blk 6 Blk 5 Blk 3 Blk 1 Blk 4 Blk 7 Blk 2 Worker 1 Worker 2 Blk5 Reducer Shuffle Phase Reduce Phase <MSFT, 2> <MSFT, 2> tprocess = 4 Reducer-MSFT

slide-80
SLIDE 80

80

Modifications to MapReduce (Hyracks)

Client Master Blk1 MSFT AAPL 2 4 Blk2 ORCL 3 Blk3 AAPL 4 Blk4 MSFT 2 Blk5 ORCL 3 Blk6 MSFT 2 Blk7 AAPL 4

Time t = 9

Blk 6 Blk 5 Blk 3 Blk 1 Blk 4 Blk 7 Blk 2 Worker 1 Worker 2 Blk5 Reducer Shuffle Phase Reduce Phase <MSFT, 2> <MSFT, 2> tprocess = 4 Reducer-MSFT

Random Time instance: Do estimation

slide-81
SLIDE 81

81

Modifications to MapReduce (Hyracks)

Client Master Blk1 MSFT AAPL 2 4 Blk2 ORCL 3 Blk3 AAPL 4 Blk4 MSFT 2 Blk5 ORCL 3 Blk6 MSFT 2 Blk7 AAPL 4

Time t = 9

Blk 6 Blk 5 Blk 3 Blk 1 Blk 4 Blk 7 Blk 2 Worker 1 Worker 2 Blk5 Reducer Shuffle Phase Reduce Phase <MSFT, 2> <MSFT, 2> tprocess = 4 Reducer-MSFT

Random Time instance: Do estimation

tprocess > 3

slide-82
SLIDE 82

82

Modifications to MapReduce (Hyracks)

Client Master Blk1 MSFT AAPL 2 4 Blk2 ORCL 3 Blk3 AAPL 4 Blk4 MSFT 2 Blk5 ORCL 3 Blk6 MSFT 2 Blk7 AAPL 4

Time t = 9

Blk 6 Blk 5 Blk 3 Blk 1 Blk 4 Blk 7 Blk 2 Worker 1 Worker 2 Blk5 Reducer Shuffle Phase Reduce Phase <MSFT, 2> <MSFT, 2> tprocess = 4 Reducer-MSFT

Random Time instance: Do estimation

tprocess > 3

slide-83
SLIDE 83

83

Modifications to MapReduce (Hyracks)

Client Master Blk1 MSFT AAPL 2 4 Blk2 ORCL 3 Blk3 AAPL 4 Blk4 MSFT 2 Blk5 ORCL 3 Blk6 MSFT 2 Blk7 AAPL 4

Time t = 9

Blk 6 Blk 5 Blk 3 Blk 1 Blk 4 Blk 7 Blk 2 Worker 1 Worker 2 Blk5 Reducer Shuffle Phase Reduce Phase <MSFT, 2> <MSFT, 2> tprocess = 4 Reducer-MSFT

Random Time instance: Do estimation

tprocess > 3 Estimation code Blk6: tprocess = 4 Blk5: tprocess > 3 Blk6: <MSFT, 2>

slide-84
SLIDE 84

84

Modifications to MapReduce (Hyracks)

Client Master Blk1 MSFT AAPL 2 4 Blk2 ORCL 3 Blk3 AAPL 4 Blk4 MSFT 2 Blk5 ORCL 3 Blk6 MSFT 2 Blk7 AAPL 4

Time t = 9

Blk 6 Blk 5 Blk 3 Blk 1 Blk 4 Blk 7 Blk 2 Worker 1 Worker 2 Blk5 Reducer Shuffle Phase Reduce Phase <MSFT, 2> <MSFT, 2> tprocess = 4 Reducer-MSFT

Random Time instance: Do estimation

tprocess > 3 Estimation code

[5.8, 8]

slide-85
SLIDE 85

85

Implementation Overview

Framework for distributed systems: MapReduce

 Hadoop

  • Staged processing

→ Online

 Hyracks (developed at UC Irvine)

  • Pipelining

→ ”Online”

  • Architecture (and API) similar to Hadoop
  • http://code.google.com/p/hyracks/

For estimates of ”Aggregation”,

 2 modifications to MapReduce (Hyracks)  Bayesian Estimator

slide-86
SLIDE 86

86

Bayesian Estimator

Why ? → To deal with Inspection Paradox

slide-87
SLIDE 87

87

Bayesian Estimator

Why ? → To deal with Inspection Paradox

How ?

 Allows for correlation between processing time and values  And also take into account the processing time of current

block

slide-88
SLIDE 88

88

Bayesian Estimator

Why ? → To deal with Inspection Paradox

How ?

 Allows for correlation between processing time and values  And also take into account the processing time of current

block

Implementation:

 C++ code using GNU Scientific Library and Minuit2  Input: Data file and Metadata file from Reducer  Output: Confidence Interval → Eg:[995, 1005] with 95% prob

slide-89
SLIDE 89

89

Bayesian Estimator (Model)

Parameterized model:

 Timing Information:Tprocess, Tscheduling  Value: X

slide-90
SLIDE 90

90

Bayesian Estimator (Model)

Parameterized model:

 Timing Information:Tprocess, Tscheduling  Value: X 

Underlying distribution

 Classical sampling theory:

f(X)

slide-91
SLIDE 91

91

Bayesian Estimator (Model)

Parameterized model:

 Timing Information:Tprocess, Tscheduling  Value: X 

Underlying distribution

 Classical sampling theory:

f(X)

 Our approach:

f(X, Tprocess, Tscheduling)

slide-92
SLIDE 92

92

Bayesian Estimator (Model)

Parameterized model:

 Timing Information:Tprocess, Tscheduling  Value: X 

Underlying distribution

 Classical sampling theory:

f(X)

 Our approach:

f(X, Tprocess, Tscheduling)

  • Correlation between X, Tprocess and Tscheduling
slide-93
SLIDE 93

93

Bayesian Estimator (Model)

Parameterized model:

 Timing Information:Tprocess, Tscheduling  Value: X 

Underlying distribution

 Classical sampling theory:

f(X)

 Our approach:

f(X, Tprocess, Tscheduling)

  • Correlation between X, Tprocess and Tscheduling
  • f(X | Tprocess > 100000000, Tscheduling = 22) ≠ f(X)
slide-94
SLIDE 94

94

Bayesian Estimator (Model)

Parameterized model:

 Timing Information:Tprocess, Tscheduling  Value: X 

Underlying distribution

 Classical sampling theory:

f(X)

 Our approach:

f(X, Tprocess, Tscheduling)

  • Correlation between X, Tprocess and Tscheduling
  • f(X | Tprocess > 100000000, Tscheduling = 22) ≠ f(X)

Estimation using Bayesian Machinery

 Gibbs Sampler

  • Developed probability (or update) equations
slide-95
SLIDE 95

95

Bayesian Estimator (Model)

Parameterized model:

 Timing Information:Tprocess, Tscheduling  Value: X 

Underlying distribution

 Classical sampling theory:

f(X)

 Our approach:

f(X, Tprocess, Tscheduling)

  • Correlation between X, Tprocess and Tscheduling
  • f(X | Tprocess > 100000000, Tscheduling = 22) ≠ f(X)

Estimation using Bayesian Machinery

 Gibbs Sampler

  • Developed probability (or update) equations

Detailed discussion in the paper

slide-96
SLIDE 96

96

Outline

 Motivation  Implementation  Experiments  Conclusion

slide-97
SLIDE 97

97

Experiments

Hypothesis:

 Randomized Queue required  Allow correlation between processing time and value  Convergence of estimates 

Experiment 1: (Real dataset)

 select sum(page_count) from wikipedia_log group by language  6 months Wikipedia log (220 GB compressed, 3960 blocks)  11 node cluster (4 disks, 4 cores, 12GB RAM)  Uniform configuration: Machines, Blocks  80 mappers and 10 reducer

slide-98
SLIDE 98

98

Experiments

Hypothesis:

 Randomized Queue required  Allow correlation between processing time and value  Convergence of estimates 

Experiment 1: (Real dataset)

 select sum(page_count) from wikipedia_log group by language  6 months Wikipedia log (220 GB compressed, 3960 blocks)  11 node cluster (4 disks, 4 cores, 12GB RAM)  Uniform configuration: Machines, Blocks  80 mappers and 10 reducer

slide-99
SLIDE 99

99

Experiments

Hypothesis:

 Randomized Queue required  Allow correlation between processing time and value  Convergence of estimates 

Experiment 1: (Real dataset)

 6 months Wikipedia log (220 GB compressed, 3960 blocks)  11 node cluster (4 disks, 4 cores, 12GB RAM)  Uniform configuration: Machines, Blocks  80 mappers and 10 reducer 

Experiment 2: (Simulated data set)

 ↑ correlation (Non-uniform configuration)

Percentage of data processed

Reading the figures

slide-100
SLIDE 100

100

Experiments

Hypothesis:

 Randomized Queue required  Allow correlation between processing time and value  Convergence of estimates 

Experiment 1: (Real dataset)

 6 months Wikipedia log (220 GB compressed, 3960 blocks)  11 node cluster (4 disks, 4 cores, 12GB RAM)  Uniform configuration: Machines, Blocks  80 mappers and 10 reducer 

Experiment 2: (Simulated data set)

 ↑ correlation (Non-uniform configuration)

Reading the figures

slide-101
SLIDE 101

101

Experiments

Hypothesis:

 Randomized Queue required  Allow correlation between processing time and value  Convergence of estimates 

Experiment 1: (Real dataset)

 6 months Wikipedia log (220 GB compressed, 3960 blocks)  11 node cluster (4 disks, 4 cores, 12GB RAM)  Uniform configuration: Machines, Blocks  80 mappers and 10 reducer 

Experiment 2: (Simulated data set)

 ↑ correlation (Non-uniform configuration)

True answer

Reading the figures

slide-102
SLIDE 102

102

Experiments

Hypothesis:

 Randomized Queue required  Allow correlation between processing time and value  Convergence of estimates 

Experiment 1: (Real dataset)

 10% of data processed → Non-randomized: Inaccurate estimate  6 months Wikipedia log (220 GB compressed, 3960 blocks)  11 node cluster (4 disks, 4 cores, 12GB RAM)  Uniform configuration: Machines, Blocks  80 mappers and 10 reducer 

Experiment 2: (Simulated data set)

 ↑ correlation (Non-uniform configuration)

slide-103
SLIDE 103

103

Experiments

Hypothesis:

 Randomized Queue required  Allow correlation between processing time and value  Convergence of estimates 

Experiment 1: (Real dataset)

 20% of data processed → Non-randomized: Inaccurate estimate  6 months Wikipedia log (220 GB compressed, 3960 blocks)  11 node cluster (4 disks, 4 cores, 12GB RAM)  Uniform configuration: Machines, Blocks  80 mappers and 10 reducer 

Experiment 2: (Simulated data set)

 ↑ correlation (Non-uniform configuration)

slide-104
SLIDE 104

104

Experiments

Hypothesis:

 Randomized Queue required  Allow correlation between processing time and value  Convergence of estimates 

Experiment 1: (Real dataset)

 Non-randomized → Inaccurate estimates  6 months Wikipedia log (220 GB compressed, 3960 blocks)  11 node cluster (4 disks, 4 cores, 12GB RAM)  Uniform configuration: Machines, Blocks  80 mappers and 10 reducer 

Experiment 2: (Simulated data set)

 ↑ correlation (Non-uniform configuration)

slide-105
SLIDE 105

105

Experiments

Hypothesis:

 Randomized Queue required  Allow correlation between processing time and value  Convergence of estimates 

Experiment 1: (Real dataset)

 Processing large block → no correlation detected  6 months Wikipedia log (220 GB compressed, 3960 blocks)  11 node cluster (4 disks, 4 cores, 12GB RAM)  Uniform configuration: Machines, Blocks  80 mappers and 10 reducer 

Experiment 2: (Simulated data set)

 ↑ correlation (Non-uniform configuration)

slide-106
SLIDE 106

106

Experiments

Hypothesis:

 Randomized Queue required  Allow correlation between processing time and value  Convergence of estimates 

Experiment 1: (Real dataset)

 Correlation detected → With correlation: Slightly more accurate  6 months Wikipedia log (220 GB compressed, 3960 blocks)  11 node cluster (4 disks, 4 cores, 12GB RAM)  Uniform configuration: Machines, Blocks  80 mappers and 10 reducer 

Experiment 2: (Simulated data set)

 ↑ correlation (Non-uniform configuration)

slide-107
SLIDE 107

107

Experiments

Hypothesis:

 Randomized Queue required  Allow correlation between processing time and value  Convergence of estimates 

Experiment 1: (Real dataset)

 Correlation detected → With correlation: Unbiased  6 months Wikipedia log (220 GB compressed, 3960 blocks)  11 node cluster (4 disks, 4 cores, 12GB RAM)  Uniform configuration: Machines, Blocks  80 mappers and 10 reducer 

Experiment 2: (Simulated data set)

 ↑ correlation (Non-uniform configuration)

slide-108
SLIDE 108

108

Experiments

Hypothesis:

 Randomized Queue required  Allow correlation between processing time and value  Convergence of estimates 

Experiment 1: (Real dataset) → Uniform Configuration (low correlation)

 6 months Wikipedia log (220 GB compressed, 3960 blocks)  11 node cluster (4 disks, 4 cores, 12GB RAM)  Uniform configuration: Machines, Blocks  80 mappers and 10 reducer 

Experiment 2: (Simulated data set)

 ↑ correlation (Non-uniform configuration)

slide-109
SLIDE 109

109

Experiments

Hypothesis:

 Randomized Queue required  Allow correlation between processing time and value  Convergence of estimates 

Experiment 1: (Real dataset) → Uniform Configuration (low correlation) + As ↑ data, likelihood takes over

 6 months Wikipedia log (220 GB compressed, 3960 blocks)  11 node cluster (4 disks, 4 cores, 12GB RAM)  Uniform configuration: Machines, Blocks  80 mappers and 10 reducer 

Experiment 2: (Simulated data set)

 ↑ correlation (Non-uniform configuration)

slide-110
SLIDE 110

110

Experiments

Hypothesis:

 Randomized Queue required  Allow correlation between processing time and value  Convergence of estimates 

Experiment 1: (Real dataset) → Uniform Configuration (low correlation) + As ↑ data, likelihood takes over → estimates similar

 6 months Wikipedia log (220 GB compressed, 3960 blocks)  11 node cluster (4 disks, 4 cores, 12GB RAM)  Uniform configuration: Machines, Blocks  80 mappers and 10 reducer 

Experiment 2: (Simulated data set)

 ↑ correlation (Non-uniform configuration)

slide-111
SLIDE 111

111

Experiments

Hypothesis:

 Randomized Queue required  Allow correlation between processing time and value  Convergence of estimates 

Experiment 1: (Real dataset)

 6 months Wikipedia log (220 GB compressed, 3960 blocks)  11 node cluster (4 disks, 4 cores, 12GB RAM)  Uniform configuration: Machines, Blocks  80 mappers and 10 reducer 

Experiment 2: (Simulated data set)

 ↑ correlation (Non-uniform configuration)

slide-112
SLIDE 112

112

Outline

 Motivation  Implementation  Experiments  Conclusion

slide-113
SLIDE 113

113

Conclusion

OLA over MapReduce

 Statistically robust estimates 

Model that accounts for biases that can arise in distributed environment

Little modification to existing MapReduce architecture

slide-114
SLIDE 114

114

Thanks for your time and attention Questions ?