Karthik ik Kambatla, , Purdue ue Univ ivers ersit ity Abhinav - - PowerPoint PPT Presentation

karthik ik kambatla
SMART_READER_LITE
LIVE PREVIEW

Karthik ik Kambatla, , Purdue ue Univ ivers ersit ity Abhinav - - PowerPoint PPT Presentation

Karthik ik Kambatla, , Purdue ue Univ ivers ersit ity Abhinav Pathak, Purdue University Himabindu Pucha, IBM Research Almaden Data analytics is important/prevalent MapReduce - highly scalable solution Performing Hadoop-like data


slide-1
SLIDE 1

Karthik ik Kambatla, , Purdue

ue Univ ivers ersit ity

Abhinav Pathak, Purdue University Himabindu Pucha, IBM Research Almaden

slide-2
SLIDE 2

 Data analytics is important/prevalent

  • MapReduce - highly scalable solution

 Performing Hadoop-like data analytics in the

cloud is particularly synergistic

  • Utility model

 Request/Relinquish resources on demand  Billed by machine hours

 Not limited by number of machines

Karthik Kambatla - HotCloud 2 6/19/2009

slide-3
SLIDE 3

 Provisioning

  • Allocate resources
  • Configure for best utilization

 Current tools

  • Hadoop on Demand, Cloudera, etc.
  • Automate deployment, Do Not Optimize Resources!

 Our Contribution: Optimized provisioning

  • Minimize cost, Maximize Performance

Karthik Kambatla - HotCloud 3 6/19/2009

slide-4
SLIDE 4

Hadoop Application Input Data RS Maximizer <Conf, Cluster> RS Sizer Config ig # node|C |Clu luster ter

  • Est. Time

C1 N1 Cl x T1 C2 N2 Cl y T2 C3 N3 Cl z T3

4 Karthik Kambatla - HotCloud 6/19/2009

slide-5
SLIDE 5

5 Karthik Kambatla - HotCloud 6/19/2009

Number of Reduces doesn’t affect performance Optimal: 8 maps Significant Performance Difference (2, 2)

slide-6
SLIDE 6

6 Karthik Kambatla - HotCloud 6/19/2009

Too low doesn’t work! Too high doesn’t work either!

slide-7
SLIDE 7

7 Karthik Kambatla - HotCloud 6/19/2009

Best performance at (8, 8) Number of Reduces also affects performance So does number

  • f maps

Same configuration would not work across applications

slide-8
SLIDE 8

Karthik Kambatla - HotCloud 8 6/19/2009

slide-9
SLIDE 9

 Matrix addition, multifile-wordcount

  • Signature similar to wordcount
  • Optimal configuration is the same

9 Karthik Kambatla - HotCloud 6/19/2009

slide-10
SLIDE 10

 Add a feedback phase

  • Check if predicted values are optimal
  • Else predict new optimal configuration

 RS Sizer

10 Karthik Kambatla - HotCloud 6/19/2009

slide-11
SLIDE 11

Karthik Kambatla - HotCloud 11 6/19/2009

Questions?