measuring the on lineness of data streams
play

Measuring the on-lineness of data streams Manfred K. Warmuth - PowerPoint PPT Presentation

Measuring the on-lineness of data streams Manfred K. Warmuth Jiazhong Nie University of California - Santa Cruz Dec. 10, 2015 - Nips workshop on Easy Data Includes some earlier work with Corrie Scalisi, Robert Gramacy, Scott Brandt


  1. Measuring the “on-lineness” of data streams Manfred K. Warmuth Jiazhong Nie University of California - Santa Cruz Dec. 10, 2015 —- Nips workshop on Easy Data Includes some earlier work with Corrie Scalisi, Robert Gramacy, Scott Brandt and Ismail Ari Manfred K. Warmuth , Jiazhong Nie ( University of California - Santa Cruz ) Measuring the “on-lineness” of data streams 1 / 25

  2. Goals Design on-line algorithms in domains that are outside of the reach of theory Design good comparators that exploit the on-lineness of the data Manfred K. Warmuth , Jiazhong Nie ( University of California - Santa Cruz ) Measuring the “on-lineness” of data streams 2 / 25

  3. 1. Disk spindown problem [HLSS] When to spin down the disk on your laptop? Best time-out time/user/usage dependent Manfred K. Warmuth , Jiazhong Nie ( University of California - Santa Cruz ) Measuring the “on-lineness” of data streams 3 / 25

  4. Non-convex loss If idles times expected to be short, then long timeout better long, then short timeout better Manfred K. Warmuth , Jiazhong Nie ( University of California - Santa Cruz ) Measuring the “on-lineness” of data streams 4 / 25

  5. 2. Caching [BWBA] Want to build combined caching policy from 12 base policies (our experts): LRU, RAND, FIFO, LIFO, LFU, MFU, SIZE, GDS, GD ∗ , GDSF, LFUDA Manfred K. Warmuth , Jiazhong Nie ( University of California - Santa Cruz ) Measuring the “on-lineness” of data streams 5 / 25

  6. Characteristics Vary with Time Manfred K. Warmuth , Jiazhong Nie ( University of California - Santa Cruz ) Measuring the “on-lineness” of data streams 6 / 25

  7. Best Policy Varies with time Manfred K. Warmuth , Jiazhong Nie ( University of California - Santa Cruz ) Measuring the “on-lineness” of data streams 7 / 25

  8. Permuting trick for disk spindown data on-line :-) not on-line :-( Manfred K. Warmuth , Jiazhong Nie ( University of California - Santa Cruz ) Measuring the “on-lineness” of data streams 8 / 25

  9. Permuting caching data highly on-line data some caching policies already on-line Manfred K. Warmuth , Jiazhong Nie ( University of California - Santa Cruz ) Measuring the “on-lineness” of data streams 9 / 25

  10. Using a comparators to measure on-lineness of data Properties Should exploit on-lineness of data Might be too expensive to compute in practice, but can serve as a goal to compare against Might rely on information not available to the on-line algorithm Manfred K. Warmuth , Jiazhong Nie ( University of California - Santa Cruz ) Measuring the “on-lineness” of data streams 10 / 25

  11. Idea 1: Use dynamic programming to compute BestShift( K ) curve Partition of the timeline into K segments BestFixed in each segment 2 4 7 Manfred K. Warmuth , Jiazhong Nie ( University of California - Santa Cruz ) Measuring the “on-lineness” of data streams 11 / 25

  12. BestFixed( K ) Dynamic programming: O ( KN 2 T ) [H] where K # of partitions, N # of discrete idle times, T # of trials Manfred K. Warmuth , Jiazhong Nie ( University of California - Santa Cruz ) Measuring the “on-lineness” of data streams 12 / 25

  13. BestShift curves on-line not on-line Manfred K. Warmuth , Jiazhong Nie ( University of California - Santa Cruz ) Measuring the “on-lineness” of data streams 13 / 25

  14. Comparators for caching BestFixed : a posteriori best of 12 policies on entire request stream BestRefetching ( R ): minimum number of misses with at most R refetches in any sequence of switching policies Manfred K. Warmuth , Jiazhong Nie ( University of California - Santa Cruz ) Measuring the “on-lineness” of data streams 14 / 25

  15. Refetches & Policy Switches Comparator: All sequences of the form We plot miss rate v.s. refetches: Manfred K. Warmuth , Jiazhong Nie ( University of California - Santa Cruz ) Measuring the “on-lineness” of data streams 15 / 25

  16. BestRefetching( R ) Dynamic programming: O ( RN 2 T ) [H] Manfred K. Warmuth , Jiazhong Nie ( University of California - Santa Cruz ) Measuring the “on-lineness” of data streams 16 / 25

  17. Our theoretically sound algorithms become heuristics Use loss and share updates on non-convex losses Build a merged cache that does not correspond to the mixture Manfred K. Warmuth , Jiazhong Nie ( University of California - Santa Cruz ) Measuring the “on-lineness” of data streams 17 / 25

  18. Spindown results on-line :-) not on-line :-( Manfred K. Warmuth , Jiazhong Nie ( University of California - Santa Cruz ) Measuring the “on-lineness” of data streams 18 / 25

  19. Caching - we “Tracks” best policy Manfred K. Warmuth , Jiazhong Nie ( University of California - Santa Cruz ) Measuring the “on-lineness” of data streams 19 / 25

  20. WWk Manfred K. Warmuth , Jiazhong Nie ( University of California - Santa Cruz ) Measuring the “on-lineness” of data streams 20 / 25

  21. UMo Manfred K. Warmuth , Jiazhong Nie ( University of California - Santa Cruz ) Measuring the “on-lineness” of data streams 21 / 25

  22. SMoLRU Manfred K. Warmuth , Jiazhong Nie ( University of California - Santa Cruz ) Measuring the “on-lineness” of data streams 22 / 25

  23. Idea 2: Split into even/odd requests Pair1 Pair2 Pair3 Pair4 Pair5 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 Requests: Training Testing Best partition based on training set Performance based on test set Manfred K. Warmuth , Jiazhong Nie ( University of California - Santa Cruz ) Measuring the “on-lineness” of data streams 23 / 25

  24. Miss Rate of Testing Requests No overfitting to random data: testing miss rate goes up immediately 0.055 random permuted data train random permuted data test original data train original data test 0.05 miss rate 0.045 0.04 0.035 0 0.02 0.04 0.06 0.08 refetch rate Manfred K. Warmuth , Jiazhong Nie ( University of California - Santa Cruz ) Measuring the “on-lineness” of data streams 24 / 25

  25. Upshot! Don’t be afraid to use your algorithms as heuristics in domains where the theory breaks down Manfred K. Warmuth , Jiazhong Nie ( University of California - Santa Cruz ) Measuring the “on-lineness” of data streams 25 / 25

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend