0000000 ooo numerical data e.g alphabetic order names w grades - - PDF document

0000000
SMART_READER_LITE
LIVE PREVIEW

0000000 ooo numerical data e.g alphabetic order names w grades - - PDF document

Space efficient quantile selection Where U has order C an C U stream Input 9 0000000 ooo numerical data e.g alphabetic order names w grades allowed multiple passes median return the Goal minimum w 6 space passes a quantile queries


slide-1
SLIDE 1

Space efficient

quantile selection

Input

stream

9 an C U

Where U has order C

0000000

  • e.g

numerical data

names

w

alphabetic order

grades

allowed multiple

passes

Goal

return

the median

w

minimum

a

passes

6 space

more

generally

quantile queries

select rank k element

kth largest

slide-2
SLIDE 2

I

pass

Input

  • ilskin

O

n

slide-3
SLIDE 3

i iiiiiiw.am

slide-4
SLIDE 4

Passe Spacesort and

select

quickselect

random pivot

slide-5
SLIDE 5

Approximations

given

rank

KEEN and

param

E

return

element

w

rank

k

En

Sampling

for median sample

b

elements

return

  • f sample

for

rank k dn

Deterministic

slide-6
SLIDE 6

Quartiles

space efficient

mergable

answer

E approximate quantile queries

l

elements 9,692L

L

ge

E

S

along

w

intervals

Ilg

O

h

Bytracking

minimax specially

assume

ran 149

4

rankCge

n

Ge

g

j

J O

slide-7
SLIDE 7

y

e

E

E

Querying

O

n

l

I

l

I

l

I

I

k

If

Ica

EEK 2En kt2EnT for some

9i

then

return

9

I

I

I

l I

I E

9

I l

I

i l

K 2En

K Kt2en

how to

ensure

such

9

exists

t k

slide-8
SLIDE 8

iemma

9 7

E

9

ti

total

width

E

2En

ten

every query K contains

an interval Ilg

Proofy two

cases

I

KEI Gi

for

some

9

9

I

K ICg

fg

9 it

E

9

1 3

LL

I

I

k

slide-9
SLIDE 9

Suppose

KE IG

for

some

i

if Icg

E Ik

2g kt2E

then

done

else

look

at 9 it

slide-10
SLIDE 10

suppose KotICg

ti

E

I

I

5 3

1

3 I

I

T

I

I

i

In

the

combined intervals

cover

In

pick 1 covering k

  • ne
  • f

the

intervals

must

lie

inside

slide-11
SLIDE 11

Key

invariant

any

two

consecutive

intervals have width

E

2En

E APX

quantile summary

slide-12
SLIDE 12

Merging

given two

E APX quantile

summaries

  • ver

2 streams want

E APX

summary

  • ver

combinedstream

s

www.mszrrrrhhmnz

want

to

combine

QQ

to get summary of

s

www.mtszrrrrhhnhrrnz

a

s

Q'to ai

9

I

Cgi I Cgi 3

slide-13
SLIDE 13

denote

Q

  • h

9

I'Gil

I'Cg'd

Q

s

g

g m

I

Cg I

I lgz

let

9j C Q

I Cgj bounds rank gj.hr rHSz

goal bound

rank

9J

what

sits

dit l

9

is

9

I

2

3 4

N

15,1

rank

  • f 9J in

S

so

set I

Cgj

min I'Coptmin I Cgj

Max

I G'it

t

max

I Cg

slide-14
SLIDE 14

Q

  • h

gel 94

an's

w

intervals

3

to show Q

is

E APX

need

to

show

2Ers width

property

Take

two consecutive

intervals

in Q's

Two

cases

elements

from

diff

sets

elements from

same

sets

slide-15
SLIDE 15

diff

sets

A

HAD a

slide-16
SLIDE 16

same

sets

slide-17
SLIDE 17

This

shows

that merging 2

E APX

QS's

gives

E APX

QS

  • f

combined

streams

size

slide-18
SLIDE 18

Pruning

Input

O

h

E approximate quantile summary

w

too many

points

Goal

sparser summary that's still

very good

slide-19
SLIDE 19

O

4

4

3 4 15 G

e

e

e e

e e

claim resulting quantile

is

Eta

APX

Proof

suppose

we

query

a

rank K

f

k

slide-20
SLIDE 20

Recap

we

can

combine

E APX quantile summaries

to get

E APX quantile summary of

wholething

sparsify

E APX quantile summary

to

tat

APX

quantile summary

w Kpoints

Remains

to address

how

to

make

  • ne

at

all

slide-21
SLIDE 21

what

if

n

l I

claim that's

all

we

need

APX

so

  • r
  • r
  • take

k

at

the

root

µ

login

O

E approximate quartiles

slide-22
SLIDE 22

Space

D

D

D D

D

D

D

D

D D D

D D D D

  • nly keep

root summaries

slide-23
SLIDE 23

theorem

1

pass

OClog4n

E

space deterministic

E

APX

quantile

  • ver

stream

idea mergability

t dyadic intervals trick

slightly better

slide-24
SLIDE 24

s

  • g

a

a

a

a a

  • r
  • n

u

have first level

contain Ye points

D

D

D

D

D D

D

D

D D

D D

D D

D

D D D D

slide-25
SLIDE 25

Theoremett

1

pass

0ClogicEh 1E

space deterministic

E

APX

quantile

  • ver

stream

Even better

Khanna

Greenwald 2001T

loafers

space

more sophisticated quantile

summary

merging

interval

trick

slide-26
SLIDE 26

Finding

the

median and other ranks

in

p passes

Fix

p 2 for simplicity

goal

OCftp.olyloycnD space

suppose

we

are

querying

rank

k 1st

pass

build

E APX

quantile summary

for

E

Yin

VJ login

space

w

6K

query

K VF

Kt in

a b

O n

K 2rn Erankca

E KE

rank 6 E Kt 2in