Timescale Stream Statistics for Hierarchical Management Chen Ding - PowerPoint PPT Presentation

Timescale Stream Statistics for Hierarchical Management Chen Ding University of Rochester March 23 STREAM 2016 Tysons, VA

Implications of the datacenter’s shifting center. BY MIHIR NANAVATI, MALTE SCHWARZKOPF, JAKE WIRES, AND ANDREW WARFIELD Non- Volatile Storage “The arrival of high-speed, non-volatile storage … is likely the most significant architectural change that datacenter and software designers will face in the foreseeable future. ” 2 Chen Ding, University of Rochester

Hierarchical Cache Memory • Science • nothing travels faster than light • the faster the access, the smaller the data capacity • Engineering • speed, size and cost • no single technology can satisfy all demands • e.g. SCM mentioned in the CACM article • Programmability • automatic, transparent, modular, efficient, portable • efficient sharing of fast/local memory • Uses • CPU/GPU caches, virtual memory • software cache, e.g. Memcached, Redis 3

GPU G80 GT200 Fermi Transistors 681 million 1.4 billion 3.0 billion CUDA Cores 128 240 512 None 30 FMA ops / clock 256 FMA ops /clock Double Precision Floating Point Capability 128 MAD 240 MAD ops / 512 FMA ops /clock Single Precision Floating ops/clock clock Point Capability Special Function Units 2 2 4 (SFUs) / SM 1 1 2 Warp schedulers (per SM) 16 KB 16 KB Configurable 48 KB or Shared Memory (per SM) 16 KB None None Configurable 16 KB or L1 Cache (per SM) 48 KB L2 Cache None None 768 KB ECC Memory Support No No Yes Concurrent Kernels No No Up to 16 32-bit 32-bit 64-bit Load/Store Address Width Whitepaper NVIDIA’s Next Generation TM Compute Architecture: CUDA Fermi TM Chen Ding, University of Rochester

What is Locality?

“During any interval of execution, a program favors a subset of its pages, and this set of favored pages changes slowly” -- Peter Denning • locality analysis is a streaming problem • too many data points, unusable for optimization 6 Chen Ding, University of Rochester

Locality Theory • Since 1960s • working-set theory [Denning 1968] • stack simulation [Mattson et al. 1970] • Since 1999 • reuse distance (i.e. LRU stack distance) • 5 dimensions of locality [TOPLAS’09] “The authors were supported by the National Science Foundation (CAREER Award CCR-0238176 and two grants CNS-0720796 and CNS-0509270), the Department of Energy (Young Investigator Award DE-FG02-02ER25525), IBM CAS Faculty Fellowship, and a gift from Microsoft Research. ” • HPCToolkit by Mellor-Crummey et al. at Rice [CCPE’10] • not composable, unable to derive shared-cache performance • Since 2008 • footprint — timescale statistics 7 Chen Ding, University of Rochester

Timescale Stream Statistics • A stream • “a possibly unbounded sequence of events” [Stream workshop 2015] • a time window or interval • a timescale x is a length of time • f(x) is the average behavior of all windows of length x • a function for all non-negative x • Peak temperature variation pv(x) • each window has a peak variation • pv(x) is the average of all windows of length x • e.g. a week time or a month time • avoid data bias • e.g. if we were to measure just calendar weeks/months 8 Chen Ding, University of Rochester

Timescale Locality • Footprint fp(x) average footprint fp • working-set size (WSS): the amount data 4e+06 average footprint ∆ y cache size c accessed in a window ∆ x ∆ y mr(c) = ∆ x 2e+06 • fp(x): average WSS of all length x windows ∆ x 403.gcc im(c) = ∆ y 0e+00 0e+00 1e+10 2e+10 3e+10 4e+10 • Theoretical properties (selected) window size • composable • miss ratio is the increase of footprint • concavity [ASPLOS’13] • (computed) miss ratio is monotone • linear time measurement [PACT’11] • real-time sampling [CCGrid’15] • A function is worth a thousand pictures 9 Chen Ding, University of Rochester

Theory is for Optimization • Key-value store Memcached [USENIX’15] • DRAM as cache for database • optimization vs. heuristics by Facebook and Twitter • faster steadystate/convergence on a Facebook test set • monotonicity: no Belady anomaly • Concurrent memory allocation [see white paper] • optimization vs. Google’s tcmalloc • 26% higher throughput 64-thread MongoDB • consistency: intermediate steps order insensitive • Storage cache [Wires/Warfield et al. OSDI’14] • independent validation of footprint theory • Other theories • optimal data placement [PLDI’04, POPL’06, POPL’16] • optimal collaborative caching [LCPC’08, ISMM’11/12/13] 10 Chen Ding, University of Rochester

Summary: Locality Theory/Optimization • Locality theory • partly a streaming problem/solution • equivalent* definitions of locality • reuse distance, footprint, working set, miss ratio curve • Possible uses in a streaming system • Nathan’s IPPD • memory resource steering • timescale statistics • user decision support • A conjecture • memory: hierarchical and shared • timescale stream statistics: optimal sharing of a hierarchy 11

Timescale Stream Statistics for Hierarchical Management Chen Ding - PowerPoint PPT Presentation

Timescale Stream Statistics for Hierarchical Management Chen Ding University of Rochester March 23 STREAM 2016 Tysons, VA Implications of the datacenters shifting center. BY MIHIR NANAVATI, MALTE SCHWARZKOPF, JAKE WIRES, AND ANDREW

Free-Fall Timescale of Sun Free-fall timescale: The time it would take a star (or cloud) to

? sync ref chosen as sync source by Listener Stream B: Presentation Stream C: timestamps

Building a scalable time-series database using Postgres Mike Freedman Co-founder / CTO,

TimescaleDB: Re-engineering PostgreSQL as a time-series database Michael J. Freedman Co-founder

Hierarchical Bounding Volume October 11, 2005 () Hierarchical Bounding Volume October 11, 2005

What is a hierarchical model? Richard Erickson Quantitative Ecologist DataCamp Hierarchical

Stream Ciphers Stream Ciphers 1 Stream Ciphers Generalization of one-time pad Trade

Official Statistics Matt Dray, Assistant Statistician Official Statistics 2 Official

Fresh water stream ecosystem Gr ov p 2 The description of stream lives Quadrat 1: Hong Kong Newt

Phase III Stream Assessment Study: Potential Stream Restoration Projects Strawberry Run and

UPLOAD VIDEOS TO MICROSOFT STREAM VIA ACCESSUH To upload a video on Microsoft Stream, go to

Assessing stream and riparian conditions Stream Habitat Assessment Conducted yearly

CS162: Introduction to Computer Science II Streams 1 Streams A stream is a flow of data

Stream Processing Marco Serafini COMPSCI 532 Lecture 5 Stream vs. Batch Processing Batch

Stream Switching Control draft-gentric-mmusic-stream-switching-00.txt Philippe Gentric

B.e) Stream Ciphers W. Schindler: Cryptography, B-IT, winter 2006 / 2007 2 B.125 Stream Ciphers

Evaluation of Different Caching Strategies for YouTube Multimedia Content Abschlussvortrag

Vector Quantizers Quantizers for for Vector Reduced Bit- -Rate Coding Rate Coding Reduced

Resource Oblivious Parallel Computing Vijaya Ramachandran Department of Computer Science

Memory Hierarchy (Performance Optimization) 2 Lab Schedule Activities Assignments Due Labs

Ouroboros Wear-leveling: A Two-level Hierarchical Wear-leveling Model for NVRAM Qingyue Liu

Caching with PSR-6 Laravel Barcelona @laravelbcn @ hannesvdvreken Hi, my name is Hannes.

Analyzing Data Access of Algorithms and How to Make Them Cache-Friendly? Kenjiro Taura 1 / 79

Cache-Oblivious Algorithms and Data Structures Gerth Stlting Brodal University of Aarhus