Encapsulation of Parallelism in the Volcano Query Processing System - - PowerPoint PPT Presentation

encapsulation of parallelism in the volcano query
SMART_READER_LITE
LIVE PREVIEW

Encapsulation of Parallelism in the Volcano Query Processing System - - PowerPoint PPT Presentation

Encapsulation of Parallelism in the Volcano Query Processing System Huawei Wang Overview u Architecture u Bracket Model u Operator Model u Pros & Cons u Comparison Typical Query Engine Architecture Similar Systems u System R u Starburst Bracket


slide-1
SLIDE 1

Encapsulation of Parallelism in the Volcano Query Processing System

Huawei Wang

slide-2
SLIDE 2

Overview

u Architecture u Bracket Model u Operator Model u Pros & Cons u Comparison

slide-3
SLIDE 3

Typical Query Engine Architecture

slide-4
SLIDE 4

Similar Systems

u System R u Starburst

slide-5
SLIDE 5

Bracket Model

Problem:

  • Extensibility
  • Large overhead
slide-6
SLIDE 6

Operator Model

Single process:

  • Operator mapping to iterator(Can be applied to big data system)
  • Use stream as abstractions for input between operators

Multiple process:

  • Introduce exchange operator
  • Vertical parallelism
  • Horizontal parallelism
slide-7
SLIDE 7

Operator Model

slide-8
SLIDE 8

Vertical Parallelism

u Inter-process communication(fast)

u Shared Memory u Semaphore

  • pen_exchange

next_exchange close_exchange

slide-9
SLIDE 9

Horizontal Parallelism

u Bushy parallelism u Intra-operator parallelism

join

Producer sort A Consumer sort B

slide-10
SLIDE 10

Horizontal Parallelism

How to partition data?

Producer Queue1

Consumer1

Queue2

Consumer2

Queue3

Consumer3

Port

slide-11
SLIDE 11

Horizontal Parallelism

  • Centralized scheme
  • Propagation tree scheme
  • Primed process
slide-12
SLIDE 12

Pros & Cons

u More generalized

u Algorithm Level u System Level

u Easy Implementation u Heavy weight creating process

slide-13
SLIDE 13

Comparison

u Spark can choose whether to persist RDD u Volcano only let intermediate results exist in buffer u Volcano is only a query execution engine with 2 key meta operators.