Frameworks for Data-Intensive Applications By Ahmad and Cheung - - PowerPoint PPT Presentation

frameworks for
SMART_READER_LITE
LIVE PREVIEW

Frameworks for Data-Intensive Applications By Ahmad and Cheung - - PowerPoint PPT Presentation

Automatically Leveraging MapReduce Frameworks for Data-Intensive Applications By Ahmad and Cheung Presented by: Ishank Jain Department of Computer Science 03/19/2019 CONTENT Background Research Question Method Results


slide-1
SLIDE 1

Automatically Leveraging MapReduce Frameworks for Data-Intensive Applications

Presented by: Ishank Jain Department of Computer Science

03/19/2019

By Ahmad and Cheung

slide-2
SLIDE 2

CONTENT

  • Background
  • Research Question
  • Method
  • Results
  • Conclusion
  • Questions

Automatically Leveraging MapReduce Frameworks for Data-Intensive Applications PAGE 2

slide-3
SLIDE 3

BACKGROUND

  • Implementations of MapReduce
  • Source-to-Source Compilers
  • Synthesizing Efficient Implementations
  • Query Optimizers and IRs.

Automatically Leveraging MapReduce Frameworks for Data-Intensive Applications PAGE 3

slide-4
SLIDE 4

BACKGROUND: Implementations of MapReduce

Automatically Leveraging MapReduce Frameworks for Data-Intensive Applications PAGE 4

slide-5
SLIDE 5

BACKGROUND: Source-to-Source Compilers

Automatically Leveraging MapReduce Frameworks for Data-Intensive Applications PAGE 5

slide-6
SLIDE 6

BACKGROUND: Synthesizing Efficient Implementations

Automatically Leveraging MapReduce Frameworks for Data-Intensive Applications PAGE 6

slide-7
SLIDE 7

BACKGROUND: Query Optimizers and IRs.

Automatically Leveraging MapReduce Frameworks for Data-Intensive Applications PAGE 7

slide-8
SLIDE 8

MOTIVATION

Automatically Leveraging MapReduce Frameworks for Data-Intensive Applications PAGE 8

slide-9
SLIDE 9

CASPER

  • Casper is a compiler that

can automatically retarget sequential Java programs to Big Data processing frameworks such as Spark, Hadoop or Flink.

Automatically Leveraging MapReduce Frameworks for Data-Intensive Applications PAGE 9

Image credit: https://casper.uwplse.org

slide-10
SLIDE 10

CASPER

Automatically Leveraging MapReduce Frameworks for Data-Intensive Applications PAGE 10

\

slide-11
SLIDE 11

MapReduce OPERATORS

  • Map operator:
  • Converts a value of type τ into a multiset of

key-value pairs of types κ and ν.

  • Reduce operator:
  • Combines two values of type ν to produce a

final value.

  • Shuffling.

Automatically Leveraging MapReduce Frameworks for Data-Intensive Applications PAGE 11

slide-12
SLIDE 12

PROGRAM SUMMARY

  • The program summary, a high-level

intermediate representation (IR), describes how the output of the code fragment (i.e., m) can be computed using a series of map and reduce stages from the input data (i.e., mat)

Automatically Leveraging MapReduce Frameworks for Data-Intensive Applications PAGE 12

slide-13
SLIDE 13

SYSTEM ARCHITECTURE

  • Program analyzer:
  • search space description
  • Verification condition
  • Summary generator.
  • Code generator.

Automatically Leveraging MapReduce Frameworks for Data-Intensive Applications PAGE 13

slide-14
SLIDE 14

PROGRAM SUMMARIES

  • High level IR:
  • To express summaries that are translatable

into the target API.

  • Let the synthesizer efficiently search for

summaries that are equivalent to the input program.

  • Limited number of operations.

Automatically Leveraging MapReduce Frameworks for Data-Intensive Applications PAGE 14

slide-15
SLIDE 15

SEARCH SPACE

  • To generate the search space grammar,

Casper analyzes the input.

  • Code analyzer:
  • Dataflow analysis
  • Scanning function

Automatically Leveraging MapReduce Frameworks for Data-Intensive Applications PAGE 15

slide-16
SLIDE 16

SEARCH SPACE

Automatically Leveraging MapReduce Frameworks for Data-Intensive Applications PAGE 16

slide-17
SLIDE 17

VERIFYING SUMMARIES

  • Verification conditions:
  • Hoare logic
  • Predicate logic

Automatically Leveraging MapReduce Frameworks for Data-Intensive Applications PAGE 17

slide-18
SLIDE 18

SEARCH STRATEGY

  • Input:
  • a set of candidate summaries and invariants

encoded as a grammar,

  • The correctness specification for the

summary in the form of verification conditions.

  • CEGIS Algorithm

Automatically Leveraging MapReduce Frameworks for Data-Intensive Applications PAGE 18

slide-19
SLIDE 19

IMPROVISATION

  • Verifier failures:
  • Casper must first prevent summaries

that failed the theorem prover from being regenerated by the synthesizer.

  • Incremental grammar generation:
  • Helps find summaries quicker and is

more syntactically expressive.

Automatically Leveraging MapReduce Frameworks for Data-Intensive Applications PAGE 19

slide-20
SLIDE 20

IMPROVISATION

  • Search Algorithm for summaries:
  • Each synthesized summary (correct
  • r not) is eliminated from the search

space, forcing the synthesizer to generate a new summary each time.

  • When the grammar is exhausted,

Casper returns the set of correct summaries Δ if it is non-empty

Automatically Leveraging MapReduce Frameworks for Data-Intensive Applications PAGE 20

slide-21
SLIDE 21

COST MODEL

  • Dynamic cost estimation:
  • It counts the number of unique data

values that are emitted as keys.

Automatically Leveraging MapReduce Frameworks for Data-Intensive Applications PAGE 21

slide-22
SLIDE 22

IMPORTANT POINTS AND LIMITATION

  • The IR does not currently model the full range of operators across different

MapReduce implementations.

  • Biasing the search towards smaller grammars likely produces program

summaries that run more efficiently. Although this is not sufficient to guarantee

  • ptimality of generated summaries. It’s a tradeoff between efficient solution and

time spent to generate the grammar.

  • Casper can currently do this for basic Java statements, conditionals, functions,

user-defined types, and loops.

  • Recursive methods and methods with side-effects are not currently supported.

Automatically Leveraging MapReduce Frameworks for Data-Intensive Applications PAGE 22

slide-23
SLIDE 23

EVALUATION

Automatically Leveraging MapReduce Frameworks for Data-Intensive Applications PAGE 23

slide-24
SLIDE 24

EVALUATION

Automatically Leveraging MapReduce Frameworks for Data-Intensive Applications PAGE 24

slide-25
SLIDE 25

EVALUATION

Automatically Leveraging MapReduce Frameworks for Data-Intensive Applications PAGE 25

slide-26
SLIDE 26

EVALUATION

Automatically Leveraging MapReduce Frameworks for Data-Intensive Applications PAGE 26

slide-27
SLIDE 27

EVALUATION

Automatically Leveraging MapReduce Frameworks for Data-Intensive Applications PAGE 27

slide-28
SLIDE 28

QUESTIONS

  • Casper covers limited set of operations and doesn’t perform well on ML related

and Scientific images dataset. Does this make it usable only for beginner programmers?

  • “Summaries are restricted to only those expressible using the IR, which lacks

many features (e.g., pointers) that a general purpose language would have”. Does this restrict the scope of finding a better target code?

  • Certain methods such as recursive methods are not supported(reason: they

don’t gain any speedup). Is the paper not addressing issues that are essential part of general purpose coding?

  • NOTE: The paper wanted to reduce complexity for user to learn multiple DSL.

Automatically Leveraging MapReduce Frameworks for Data-Intensive Applications PAGE 28

slide-29
SLIDE 29

REFERENCE

Maaz Bin Safeer Ahmad, Alvin Cheung. Automatically Leveraging MapReduce Frameworks for Data-Intensive Applications. Proc. ACM SIGMOD International Conference on Management of Data, pages 1205-1220, 2018.

Automatically Leveraging MapReduce Frameworks for Data-Intensive Applications PAGE 29