Is Reuse Distance Applicable to Data Locality Analysis on Chip - PowerPoint PPT Presentation

Is Reuse Distance Applicable to Data Locality Analysis on Chip Multiprocessors? Yunlian Jiang Eddy Z. Zhang Kai Tian Xipeng Shen (presenter) Department of Computer Science The College of William and Mary, VA, USA

Cache Sharing • A common feature on modern CMP 2 The College of William and Mary

Data Locality • Extensively studied for uni-core processors • Two classes of metrics • At hardware level • E.g., cache miss rate • At program level • E.g., reuse distance 3 The College of William and Mary

Reuse Distance (RD) • Def: # of distinct data between two adjacent ref. to a data element • E.g. b c a a c b rd=2 c a RD histogram 4 The College of William and Mary

Reuse Distance (RD) • Def: # of distinct data between two adjacent ref. to a data element • E.g. b c a a c b rd=2 • Appealing properties • Hardware-independence • Accurate, point to point • Cross-input predictable • Bounded value---data size 5 The College of William and Mary

Many Uses of Reuse Distance • Cross-arch performance prediction [Marin +:SIGMETRICS04,Zhong+:PACT03] • Model reference affinity [Zhong+:PLDI04] • Guide memory disambiguation [Fang+:PACT05] • Detect locality phases [Shen+:ASPLOS04] • Software refactoring [Beyls+:HPCC06] • Model cache sharing [Chandra+:HPCA05] • Study data reuses [Ding+:SC04,Huang+:ASPLOS05] • Insert cache hints [Beyls+:JSA05] • Manage superpages [Cascaval+:PACT05] 6 The College of William and Mary

Complexity Caused by Cache Sharing • Data locality is not solely determined by a process itself • Accesses by its co-runners need to be considered 7 The College of William and Mary

Questions to Answer • Is reuse distance applicable for locality characterization on CMP? • What are the new challenges? • Are these challenges addressable? 8 The College of William and Mary

Outline • Complexities in extending reuse distance model to CMP • Loss of hardware-independence • A chicken-egg dilemma for performance prediction • Addressing the issues for some multithreading app. • A probabilistic model to derive reuse distance in co-runs • Evaluation 9 The College of William and Mary

Terms • Concurrent reuse distance (CRD) • # of distinct data accessed by all co-runners between two adjacent ref. to a data element. • Standalone reuse distance (SRD) • # of distinct data accessed by the current process between two adjacent ref. to a data element. • Example SRD = 3; CRD =3+2=5 P1: a b b c d a p q p q P2: 10 The College of William and Mary

Distinctive Property of CRD Dependance on relative running speeds of co-runners. • Example Mem. references by P 1 a b c b a SRD = 2 CRD = 2 + x r = speed(P 2 )/speed(P 1 ) The larger r is, the greater x tends to be. 11 The College of William and Mary

Two Implications • First, CRD is hard to measure in real programs. • Instrumentation changes relative speeds relative speed original: r = IPC i /IPC j after instrumentation: r’ = IPC’ i /IPC’ j changes of relative speed: |r-r’|/r 12 The College of William and Mary

Two Implications (cont.) • Second, CRD loses hardware-independence. • Relative speeds change across architectures. • Consequence • Cross-arch. perf. pred. becomes hard for co-runs 13 The College of William and Mary

Cross-Arch. Performance Prediction predictor for single runs training SRD IPC = SRD training platform testing platform for co-runs chicken-egg dilemma training CRD IPC CRD training platform testing platform 14 The College of William and Mary

Iterative Approach Not Applicable training CRD IPC CRD training platform testing platform 15 The College of William and Mary

Iterative Approach Not Applicable training CRD IPC CRD CRD(J) training platform testing platform CRD(I) IPC(J) CacheMiss(J) IPC(I) CacheMiss(I) IPC(J) IPC(I) 16 The College of William and Mary

Outline • Complexities in extending reuse distance model to CMP • Loss of hardware-independence • A chicken-egg dilemma for performance prediction • Addressing the issues for some multithreading app. • A probabilistic model to derive reuse distance in co-runs • Evaluation 17 The College of William and Mary

Favorable Observations • From a systematic study [Zhang+:PPoPP’10] on PARSEC non-pipelining multithreading benchmarks • All parallel threads of an app. conduct similar computations • Uniform relations among threads. They hold across arch, inputs, # of threads, thread-core assignments, program phases. 18 The College of William and Mary

Implication • Relative speeds among threads tend to remain the same across arch. and inputs. 19 The College of William and Mary

An Efficient Way to Estimate CRD SRD T1 CRD T1 prob. SRD T2 CRD T2 model ... ... SRD Tm CRD Tm 20 The College of William and Mary

Two Steps (1) ∆ d (# of distinct data accessed) ∆ trace of T 1 : a ... a ∆ d T1 d T2 CRD T1 = d T1 + d T1 + ... + d Tm ... d Tm assuming no data sharing (2) Handle effects of data sharing 21 The College of William and Mary

Time Distance (TD) • Def : the # of elements between reuses • E.g. b c a a c b td=4 (rd=2) • TD Histogram (TDH) Shows the probability for an access to have a certain TD. time distance 22 The College of William and Mary

TDH ∆ d • P i ( ∆ ): Probability for an object O i to be referenced in a ∆ -long interval. ∆ P i ( ∆ ) = P i ( ∆ -1) + q i ( ∆ ) P i ( ∆ -1) = P i ( ∆ -2) + q i ( ∆ -1) ... P i (1) = P i (0) + q i (1) q i ( ∆ ): O i is accessed at time point ∆ , but not at Δ the ∆ -1 points ahead. ∑ i ( Δ ) = q i ( τ ) P τ = 1 23 The College of William and Mary

TDH ∆ d • q i ( τ ): O i is accessed at time point τ , but not at the τ -1 points ahead. It is equivalent to τ 1) The object accessed at τ is O i & 2) The time distance of that reference is greater than τ . T q i ( τ ) = n i ∑ H i ( δ ) T δ = τ + 1 TDH Δ T i ( Δ ) = n i ∑ ∑ H i ( δ ) P T τ = 1 δ = τ + 1 24 The College of William and Mary

TDH ∆ d • P(k, ∆ ): prob. for a ∆ -long interval to contain k distinct data. • d: # of distinct data referenced in a ∆ -long interval. d See paper for details. 25 The College of William and Mary

Handling Data Sharing • Two effects from data sharing on CRD • Example X: references by T 2 a b X X b X c d X a • Scenario 1: Xs ∉ {a, b, c, d}. • a b p q b p c d q a CRD=3+2=5 • Scenario 2: a ∈ Xs. • a b p a b p c d q a break into 2 reuse intervals • Scenario 3: {b,c,d} ∩ Xs ≠ ϕ . • a b p c b p c d c a CRD=3+1=4 should not be counted. 26 The College of William and Mary

Treating the Effects See paper for details. • Probability for a reuse interval to break • Probability for |C|=c is S: set of all shared data. N1, N2: data size of T1 and T2. n1, n2: # of distinct data accessed by T1 and T2 in an interval of length V. C: intersection of data sets referenced by T1 and T2 in the interval. 27 The College of William and Mary

• Estimation accuracy of CRD histograms on synthetic traces s: sharing ratio n1, n2: data sizes The College of William and Mary

On Traces of Real Programs • Using simulator to record traces. • SIMICS with GEMS • Simulate UltraSPARC with 1MB shared L2 cache. • Three PARSEC programs • vips (image processing) • negligible shared data, 33,000 locks • accuracy 76% • swaptions (portfolio pricing) • 27% shared data, 23 locks • accuracy 74% • streamcluster (online clustering) • 3% shared data, 129,600 barriers • accuracy 72% 29 The College of William and Mary

Related Work • All-window profiling [Ding and Chilimbi] • Predict cache misses of co-runs from circular stack distance histograms [Chandra et al., Chen & Aamodt] • Statistical shared cache model [Berg et al.] 30 The College of William and Mary

Conclusions • Is reuse distance applicable for locality characterization on CMP? Difficult in general. • What are the new challenges? Reliance on relative speeds; loss of hardware-indep; falling into a chicken-egg dilemma. • Are these challenges addressable? Yes for a class of multithreading applications. A probabilistic model facilitates the derivation of CRD. 31 The College of William and Mary

Thanks! Questions? 32 The College of William and Mary

Is Reuse Distance Applicable to Data Locality Analysis on Chip - PowerPoint PPT Presentation

Is Reuse Distance Applicable to Data Locality Analysis on Chip Multiprocessors? Yunlian Jiang Eddy Z. Zhang Kai Tian Xipeng Shen (presenter) Department of Computer Science The College of William and Mary, VA, USA Cache Sharing A common

CONTEXT LOCALITY LOCALITY LOCALITY LOCALITY LAYOUTS M E E R L U S T R O A D PICK

Distance Education Distance education used to be about the distance. 1700s 1800s 1900s 2000s

Locality Locality CS 105 Tour of the Black Holes of Computing Principle of Locality: Programs

Exploiting Matrix Reuse and Data Locality in Sparse Matrix-Vector and Matrix-Transpose-Vector

1 Infrastructure Requirements Limit Reuse Planned Indirect Potable Reuse (Purple pipe may be a

Mark-recapture distance sampling (MRDS) in Distance 7.1 Setting up Distance for MRDS

Distance in data space Notion of distance (metrics) in data space Who is my closest neighbor?

UC Berkeley ReUSE Programs March 9, 2017 Lin King Cal Zero Waste Manager UC Berkeley Chair

TRACER TUTORIAL: TEXT REUSE DETECTION INTRODUCTION TO HISTORICAL TEXT REUSE DETECTION M arco B

Software Reuse From informal reuse (scavenging) to systematic reuse Management and technical

locality.org.uk Locality is the national network of ambitious and enterprising community-led

Highway Locality Budget Scheme Steve Dibben Highway Locality Manager Mid Herts Group

DATA MINING LECTURE 5 Similarity and Distance Sketching, Locality Sensitive Hashing SIMILARITY

Californias Regulatory Process to Protect Public Health for Crop Irrigation Reuse and Potable

Japanese waste paper trend Japanese waste paper trend High collection & reuse High

Groundwater Solutions for Indirect Potable Reuse 2014 Rocky Mountain Water Reuse Workshop August

GPAC: delivery of VR/360 videos using Tiles Ahmed Rida SEKKAT Ahmed JELIJLI Telecom ParisTech

The Software Development Standards Instructor: Dr. Hany H. Ammar Dept. of Computer Science and

its impact on the profession THE PROFESSION The AOLS as a self-governing profession

Qi Gao, Wenbin Zhang, Yan Tang, and Feng Qin The Ohio State University 1 Memory Management Bugs

Spurious Retransmission Detection (SRD) with the TCP Echo Options

Wireless standards for IoT Workshop on Rapid Prototyping of Internet of Things Solutions for

Long range dependence for heavy Joint work with R. Kulik (U Ottawa), tailed random functions V.

Deconstruction and Conditional Erasure of Correlations Joint work with Mario Berta, Fernando

Is Reuse Distance Applicable to Data Locality Analysis on Chip - PowerPoint PPT Presentation

Is Reuse Distance Applicable to Data Locality Analysis on Chip Multiprocessors? Yunlian Jiang Eddy Z. Zhang Kai Tian Xipeng Shen (presenter) Department of Computer Science The College of William and Mary, VA, USA Cache Sharing A common

CONTEXT LOCALITY LOCALITY LOCALITY LOCALITY LAYOUTS M E E R L U S T R O A D PICK

Distance Education Distance education used to be about the distance. 1700s 1800s 1900s 2000s

Locality Locality CS 105 Tour of the Black Holes of Computing Principle of Locality: Programs

Exploiting Matrix Reuse and Data Locality in Sparse Matrix-Vector and Matrix-Transpose-Vector

1 Infrastructure Requirements Limit Reuse Planned Indirect Potable Reuse (Purple pipe may be a

Mark-recapture distance sampling (MRDS) in Distance 7.1 Setting up Distance for MRDS

Distance in data space Notion of distance (metrics) in data space Who is my closest neighbor?

UC Berkeley ReUSE Programs March 9, 2017 Lin King Cal Zero Waste Manager UC Berkeley Chair

TRACER TUTORIAL: TEXT REUSE DETECTION INTRODUCTION TO HISTORICAL TEXT REUSE DETECTION M arco B

Software Reuse From informal reuse (scavenging) to systematic reuse Management and technical

locality.org.uk Locality is the national network of ambitious and enterprising community-led

Highway Locality Budget Scheme Steve Dibben Highway Locality Manager Mid Herts Group

DATA MINING LECTURE 5 Similarity and Distance Sketching, Locality Sensitive Hashing SIMILARITY

Californias Regulatory Process to Protect Public Health for Crop Irrigation Reuse and Potable

Japanese waste paper trend Japanese waste paper trend High collection &amp; reuse High

Groundwater Solutions for Indirect Potable Reuse 2014 Rocky Mountain Water Reuse Workshop August

GPAC: delivery of VR/360 videos using Tiles Ahmed Rida SEKKAT Ahmed JELIJLI Telecom ParisTech

The Software Development Standards Instructor: Dr. Hany H. Ammar Dept. of Computer Science and

its impact on the profession THE PROFESSION The AOLS as a self-governing profession

Qi Gao, Wenbin Zhang, Yan Tang, and Feng Qin The Ohio State University 1 Memory Management Bugs

Spurious Retransmission Detection (SRD) with the TCP Echo Options

Wireless standards for IoT Workshop on Rapid Prototyping of Internet of Things Solutions for

Long range dependence for heavy Joint work with R. Kulik (U Ottawa), tailed random functions V.

Deconstruction and Conditional Erasure of Correlations Joint work with Mario Berta, Fernando

Japanese waste paper trend Japanese waste paper trend High collection & reuse High