Freddies: DHT-Based Adaptive Query Processing via Federated Eddies - - PowerPoint PPT Presentation

freddies dht based adaptive query processing via
SMART_READER_LITE
LIVE PREVIEW

Freddies: DHT-Based Adaptive Query Processing via Federated Eddies - - PowerPoint PPT Presentation

Freddies: DHT-Based Adaptive Query Processing via Federated Eddies Ryan Huebsch Shawn Jeffery CS 294-4 Peer-to-Peer Systems 12/9/03 Outline Background: PIER Motivation: Adaptive Query Processing (Eddies) Federated Eddies


slide-1
SLIDE 1

Freddies: DHT-Based Adaptive Query Processing via Federated Eddies

Ryan Huebsch Shawn Jeffery CS 294-4 Peer-to-Peer Systems 12/9/03

slide-2
SLIDE 2

Outline

Background: PIER Motivation: Adaptive Query Processing (Eddies) Federated Eddies (Freddies)

System Model Routing Policies Implementation

Experimental Results Conclusions and Continuing Work

slide-3
SLIDE 3

PIER

Fully decentralized relational query processing

engine

Principles:

Relaxed consistency Organic Scaling Data in its Natural Habitat Standard Schemas via Grassroots software

Relational queries can be executed in a number of

logically equivalent ways

Optimization step chooses the best performance-wise Currently, PIER has no means to optimize queries

slide-4
SLIDE 4

Adaptive Query Processing

Traditional query optimization occurs at query time

and is based on statistics. This is hard because:

Catalog (statistics) must be accurate and maintained Cannot recover from poor choices

The story gets worse!

  • Long running queries:

Changing selectivity/costs of operators Assumptions made at query time may no longer hold

  • Federated/autonomous data sources:

No control/knowledge of statistics

  • Heterogeneous data sources:

Different arrival rates

Thus, Adaptive Query Processing systems attempt

to change execution order during the query

Query Scrambling, Tukwila, Wisconsin, Eddies

slide-5
SLIDE 5

Eddies

Eddy: A tuple router that dynamically chooses the order of

  • perators in a query plan
  • Optimize query at runtime on a per-tuple basis
  • Monitors selectivities and costs of operators to determine where

to send a tuple to next

Currently centralized in design and implementation

  • Some other efforts for distributed Eddies from Wisconsin &

Singapore (neither use a DHT)

slide-6
SLIDE 6

Why use Eddies in P2P? (The easy answers)

Much of the promise of P2P lies in its fully

distributed nature

No central point of synchronization no central catalog Distributed catalog with statistics helps, but does not solve

all problems

Possibly stale, hard to maintain Need CAP to do the best optimization No knowledge of available resources or the current state of

the system (load, etc)

This is the PIER Philosophy!

Eddies were designed for a federated query

processor

Changing operator selectivities and costs Federated/heterogeneous data sources

slide-7
SLIDE 7

Why Eddies in P2P? (The not so obvious answers)

Available compute resources in a P2P network

are heterogeneous and dynamically changing

Where should the query be processed?

In a large P2P system, local data distributions,

arrival rates, etc. maybe different than global

slide-8
SLIDE 8

Freddies: Federated Eddies

A Freddy is an adaptive query processing

  • perator within the PIER framework

Goals:

Show feasibility of adaptive query processing in

PIER

Build foundation and infrastructure for smarter

adaptive query processing

Establish baseline for Freddy performance to

improve upon with smarter routing policies

slide-9
SLIDE 9

An Example Freddy

Freddy Put (Join Value RS) Put (Join Value ST) Get(R) Get(S) Output Get(T) R join S S join T

Local Operators To DHT From DHT R S T

slide-10
SLIDE 10

System Model

Same functionality as centralized Eddy

Allows easy concept reuse Freddy uses its Routing Policy to determine the next

  • perator for a tuple

Tuples in a Freddy are tagged with DoneBits indicating

which operators have processed it

Freddy does all state management, thus existing operators

require no modifications

Local processing comes first (in most cases)

Conserve network bandwidth Not as simple as it seems

Freddy: decide how to rehash a tuple

This determines join order Challenge: Decoupling of routing decision and operator.

Most Eddy techniques no longer valid

slide-11
SLIDE 11

Query Processing in Freddies

Query origin creates a query plan with a Freddy

Possible routings determined at this time, but not the order

Freddy operators on all participating nodes initiate

data flow

As tuples arrive, the Freddy determines the next

  • perator for this tuple based on the DoneBits and

routing policy

Source tuples tagged with clean DoneBits and routed

appropriately

When all DoneBits are set, the tuple is sent to the

  • utput operator (return to query origin)
slide-12
SLIDE 12

Tuple Routing Policy

Determines to which operator to send a tuple Local information

Messages expensive Monitor local usage and adjust locally

“Processing Buddy” information

During processing, discover general trends in input/output

nodes’ processing capabilities/output rates, etc

For instance, want to alert previous Freddy of poor PUT

decisions

Design space is huge large research area

slide-13
SLIDE 13

Freddy Routing Policies

Simple (KISS):

Static Random: Not as bad as you may think Local Stat Monitoring (sampling)

More complex:

Queue lengths Somewhat analogous to the “back-pressure” effect Monitors DHT PUT ACKs Load balancing through “learning” of global join key

distribution

Piggyback stats on other messages Don’t need global information, only stats about processing

buddies (nodes with which we communicate)

  • Different sample than local – may or may not be better
slide-14
SLIDE 14

Implementation & Experimental Setup

Design Decisions:

Simplicity is key Roughly 300 of NCSS (PIER is about 5300) Single query processing operator Separate routing policy module loaded at query time Possible routing orders determined by simple optimizer

Required generalizations to the PIER execution

engine to deal with generic operators

Allow PIER to run any dataflow operator

Simulator with 256 nodes, 100 tuples/table/node

Feasibility, not scalability In the absence of global (or stale) knowledge, a static

  • ptimizer could chose any join ordering we compare

Freddy performance to all possible static plans

slide-15
SLIDE 15

3-way join

R join S join T R join S is expensive (multiples tuple count

by 25)

S join T is highly selective (drops 90%) Possible static join orderings:

R T S S R T

slide-16
SLIDE 16

3 Way Join Results

100 200 300 400 500 600 700 800 900 1000 25 50 100 150

Bandwidth/Node (KB/s) Completion Time (s)

RST STR Eddy

slide-17
SLIDE 17

4-way join

R join S join T join U S join T is expensive Possible static join orderings:

R T S U S U T R S R T U T S U R R S T U Note: A traditional

  • ptimizer can’t make

this plan

slide-18
SLIDE 18

4-Way Join

50 100 150 200 250 300 350 50 75 100 125 150 Bandwidth/Node (KB/s) Completion Time (s) RSTU STRU STUR TUSR Bushy Eddy

slide-19
SLIDE 19

The Promise of Routing Policy

Illustrative example of

how routing policy can improve performance

This not meant to be an

exhaustive comparison

  • f policies, rather to

show the possibilities

EddyQL considers

number of outstanding PUTs (queue length) to decide where to send

20 40 60 80 100 120

Aggregate Bandwidth (MB/s)

RST STR Eddy EddyQL

slide-20
SLIDE 20

Conclusions and Continuing Work

Freddies provide adaptable query processing

in a P2P system

Require no global knowledge Baseline performance shows promise for smarter

policies

In the future…

Explore Freddy performance in a dynamic

environment

Explore more complex routing policies

slide-21
SLIDE 21

Questions? Comments? Snide remarks for Ryan? Glorious praise for Shawn?

Thanks!