Measuring the on-lineness of data streams Manfred K. Warmuth - - PowerPoint PPT Presentation

measuring the on lineness of data streams
SMART_READER_LITE
LIVE PREVIEW

Measuring the on-lineness of data streams Manfred K. Warmuth - - PowerPoint PPT Presentation

Measuring the on-lineness of data streams Manfred K. Warmuth Jiazhong Nie University of California - Santa Cruz Dec. 10, 2015 - Nips workshop on Easy Data Includes some earlier work with Corrie Scalisi, Robert Gramacy, Scott Brandt


slide-1
SLIDE 1

Measuring the “on-lineness” of data streams

Manfred K. Warmuth Jiazhong Nie

University of California - Santa Cruz

  • Dec. 10, 2015 —- Nips workshop on Easy Data

Includes some earlier work with Corrie Scalisi, Robert Gramacy, Scott Brandt and Ismail Ari

Manfred K. Warmuth, Jiazhong Nie (University of California - Santa Cruz) Measuring the “on-lineness” of data streams 1 / 25

slide-2
SLIDE 2

Goals

Design on-line algorithms in domains that are outside of the reach of theory Design good comparators that exploit the on-lineness of the data

Manfred K. Warmuth, Jiazhong Nie (University of California - Santa Cruz) Measuring the “on-lineness” of data streams 2 / 25

slide-3
SLIDE 3
  • 1. Disk spindown problem

[HLSS]

When to spin down the disk on your laptop? Best time-out time/user/usage dependent

Manfred K. Warmuth, Jiazhong Nie (University of California - Santa Cruz) Measuring the “on-lineness” of data streams 3 / 25

slide-4
SLIDE 4

Non-convex loss

If idles times expected to be short, then long timeout better long, then short timeout better

Manfred K. Warmuth, Jiazhong Nie (University of California - Santa Cruz) Measuring the “on-lineness” of data streams 4 / 25

slide-5
SLIDE 5
  • 2. Caching

[BWBA]

Want to build combined caching policy from 12 base policies (our experts): LRU, RAND, FIFO, LIFO, LFU, MFU, SIZE, GDS, GD∗, GDSF, LFUDA

Manfred K. Warmuth, Jiazhong Nie (University of California - Santa Cruz) Measuring the “on-lineness” of data streams 5 / 25

slide-6
SLIDE 6

Characteristics Vary with Time

Manfred K. Warmuth, Jiazhong Nie (University of California - Santa Cruz) Measuring the “on-lineness” of data streams 6 / 25

slide-7
SLIDE 7

Best Policy Varies with time

Manfred K. Warmuth, Jiazhong Nie (University of California - Santa Cruz) Measuring the “on-lineness” of data streams 7 / 25

slide-8
SLIDE 8

Permuting trick for disk spindown data

  • n-line :-)

not on-line :-(

Manfred K. Warmuth, Jiazhong Nie (University of California - Santa Cruz) Measuring the “on-lineness” of data streams 8 / 25

slide-9
SLIDE 9

Permuting caching data

highly on-line data some caching policies already on-line

Manfred K. Warmuth, Jiazhong Nie (University of California - Santa Cruz) Measuring the “on-lineness” of data streams 9 / 25

slide-10
SLIDE 10

Using a comparators to measure on-lineness of data

Properties Should exploit on-lineness of data Might be too expensive to compute in practice, but can serve as a goal to compare against Might rely on information not available to the on-line algorithm

Manfred K. Warmuth, Jiazhong Nie (University of California - Santa Cruz) Measuring the “on-lineness” of data streams 10 / 25

slide-11
SLIDE 11

Idea 1: Use dynamic programming to compute BestShift(K) curve

Partition of the timeline into K segments BestFixed in each segment 2 4 7

Manfred K. Warmuth, Jiazhong Nie (University of California - Santa Cruz) Measuring the “on-lineness” of data streams 11 / 25

slide-12
SLIDE 12

BestFixed(K)

Dynamic programming: O(KN2T) [H] where K # of partitions, N # of discrete idle times, T # of trials

Manfred K. Warmuth, Jiazhong Nie (University of California - Santa Cruz) Measuring the “on-lineness” of data streams 12 / 25

slide-13
SLIDE 13

BestShift curves

  • n-line

not on-line

Manfred K. Warmuth, Jiazhong Nie (University of California - Santa Cruz) Measuring the “on-lineness” of data streams 13 / 25

slide-14
SLIDE 14

Comparators for caching

BestFixed: a posteriori best of 12 policies on entire request stream BestRefetching(R): minimum number of misses with at most R refetches in any sequence of switching policies

Manfred K. Warmuth, Jiazhong Nie (University of California - Santa Cruz) Measuring the “on-lineness” of data streams 14 / 25

slide-15
SLIDE 15

Refetches & Policy Switches

Comparator: All sequences of the form We plot miss rate v.s. refetches:

Manfred K. Warmuth, Jiazhong Nie (University of California - Santa Cruz) Measuring the “on-lineness” of data streams 15 / 25

slide-16
SLIDE 16

BestRefetching(R)

Dynamic programming: O(RN2T) [H]

Manfred K. Warmuth, Jiazhong Nie (University of California - Santa Cruz) Measuring the “on-lineness” of data streams 16 / 25

slide-17
SLIDE 17

Our theoretically sound algorithms become heuristics

Use loss and share updates on non-convex losses Build a merged cache that does not correspond to the mixture

Manfred K. Warmuth, Jiazhong Nie (University of California - Santa Cruz) Measuring the “on-lineness” of data streams 17 / 25

slide-18
SLIDE 18

Spindown results

  • n-line :-)

not on-line :-(

Manfred K. Warmuth, Jiazhong Nie (University of California - Santa Cruz) Measuring the “on-lineness” of data streams 18 / 25

slide-19
SLIDE 19

Caching - we “Tracks” best policy

Manfred K. Warmuth, Jiazhong Nie (University of California - Santa Cruz) Measuring the “on-lineness” of data streams 19 / 25

slide-20
SLIDE 20

WWk

Manfred K. Warmuth, Jiazhong Nie (University of California - Santa Cruz) Measuring the “on-lineness” of data streams 20 / 25

slide-21
SLIDE 21

UMo

Manfred K. Warmuth, Jiazhong Nie (University of California - Santa Cruz) Measuring the “on-lineness” of data streams 21 / 25

slide-22
SLIDE 22

SMoLRU

Manfred K. Warmuth, Jiazhong Nie (University of California - Santa Cruz) Measuring the “on-lineness” of data streams 22 / 25

slide-23
SLIDE 23

Idea 2: Split into even/odd requests

R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 Requests: Pair1 Pair2 Pair3 Pair4 Pair5 Testing Training

Best partition based on training set Performance based on test set

Manfred K. Warmuth, Jiazhong Nie (University of California - Santa Cruz) Measuring the “on-lineness” of data streams 23 / 25

slide-24
SLIDE 24

Miss Rate of Testing Requests

No overfitting to random data: testing miss rate goes up immediately

0.02 0.04 0.06 0.08 0.035 0.04 0.045 0.05 0.055

refetch rate miss rate

random permuted data train random permuted data test

  • riginal data train
  • riginal data test

Manfred K. Warmuth, Jiazhong Nie (University of California - Santa Cruz) Measuring the “on-lineness” of data streams 24 / 25

slide-25
SLIDE 25

Upshot!

Don’t be afraid to use your algorithms as heuristics in domains where the theory breaks down

Manfred K. Warmuth, Jiazhong Nie (University of California - Santa Cruz) Measuring the “on-lineness” of data streams 25 / 25