Illustration: bottlenecks of SPEC2000 on Itanium1 calculate - PDF document

Illustration: bottlenecks of SPEC2000 on Itanium1 calculate Program Interaction on Shared Cache 100% other bottleneck data cache miss Theory and Applications relative execution time 75% Chen Ding 50% Professor Department of Computer Science 25% University of Rochester 0% programs from SPEC2000 pag. Discovery of Locality-Improving Refactorings by Reuse Path Analysis – Kristof Beyls – HPCC06 – 2006-09-13 “Nothing travels faster than the speed of Anant Aggarwal, MIT 6.975, 2007 light ...” Douglas Adams key problems: Matthew Hertz’s beer latency/bandwidth capacity Trishul sharing Chilimbi’s cliff Madison Itanium 2 Chen’s L3 Cache 2002 Platform http:// cse1.ne 3 t/ Chen Ding, DragonStar lecture, ICT 2008 Cache Performance for SPEC CPU2000 Benchmarks Version 3.0 May 2003 Jason F. Cantin Department of Electrical and Computer Engineering 1415 Engineering Drive University of Wisconsin-Madison Madison, WI 53706-1691 jcantin@ece.wisc.edu http://www.jfred.org Mark D. Hill Department of Computer Science 1210 West Dayton Street University of Wisconsin-Madison Madison, WI 53706-1685 markhill@cs.wisc.edu http://www.cs.wisc.edu/~markhill http://www.cs.wisc.edu/multifacet/misc/spec2000cache-data Chen Ding, University of Rochester, PMAM 2014 Chen Ding, University of Rochester, PMAM 2014 http://en.wikipedia.org/wiki/File:Cache,missrate.png

----------------------------------------------------------------------------- | D-cache misses/inst: 1,197,717,058,456 data refs (0.34534--/inst); | |-----------------------------------------------------------------------------| | 782,173,506,477 D-cache 64-Byte block accesses (0.22949--/inst) | |-----------------------------------------------------------------------------| | Size | Direct | 2-way LRU | 4-way LRU | 8-way LRU | Full LRU | |-------+-------------+-------------+-------------+-------------+-------------| | 1KB | 0.0890418-- | 0.0762018-- | 0.0699370-- | 0.0657938-- | 0.0652996-- | | 2KB | 0.0651636-- | 0.0533596-- | 0.0486152-- | 0.0462573-- | 0.0453232-- | | 4KB | 0.0480381-- | 0.0386862-- | 0.0353534-- | 0.0337222-- | 0.0325938-- | Program Locality | 8KB | 0.0362358-- | 0.0290652-- | 0.0264135-- | 0.0254564-- | 0.0245702-- | | 16KB | 0.0277699-- | 0.0227735-- | 0.0211365-- | 0.0204821-- | 0.0196992-- | | 32KB | 0.0223409-- | 0.0190920-- | 0.0181803-- | 0.0179048-- | 0.0175964-- | | 64KB | 0.0189635-- | 0.0166430-- | 0.0161909-- | 0.0160494-- | 0.0159076-- | | 128KB | 0.0158796-- | 0.0147737-- | 0.0144648-- | 0.0143748-- | 0.0142985-- | Reuse Distance | 256KB | 0.0138840-- | 0.0131826-- | 0.0130735-- | 0.0130274-- | 0.0130001-- | | 512KB | 0.0119997-- | 0.0115157-- | 0.0114489-- | 0.0114018-- | 0.0113629-- | | 1MB | 0.0096151-- | 0.0094354-- | 0.0092640-- | 0.0093510-- | 0.0093828-- | ----------------------------------------------------------------------------- Compulsory: 0.0000150365-- Benchmarks: ! 12 Sim Time: ! 1463.66 days, ! 4.007 years File created 5/23/2003. Chen Ding, University of Rochester, PMAM 2014 A Metric and A Tool Box The SLO Tool by Beyls and D’Hollander • SLO - Suggestions for Locality Optimizations: • Reuse distance http://slo.sourceforge.net • independent of coding styles, memory allocation, or hardware • possible to correlate between different runs • An example: 173.APPLU from SPEC 2K • pattern analysis • aggregate or temporal 50 • cross-program inputs 25 • Single basis for analysis/optimization 0 • to analyze 0 1 2 3 • to compose and decompose reuse distance • to optimize 2 0 1 2 8 8 8 • to shorten long reuse distance a b c a a c b Chen Ding, University of Rochester, PMAM 2014 9 Measuring Reuse Distance Reuse Distance Measurement Measurement algorithms since 1970 Time Space O(N2) Naive counting O(N) Trace as a stack [IBM’70] O(NM) O(M) Trace as a vector [IBM’75, Illinois’02] O(NlogN) O(N) Trace as a tree [LBNL’81], splay tree • Naive counting, O(N) time per access, O(N) space [Michigan’93], interval tree O(NlogM) O(M) • N is the number of memory accesses [Illinois’02] • M is the number of distinct data elements Fixed cache sizes [Winsconsin’91] O(N) O(C) • Too costly Approximation tree [Rochester’03] O(NloglogM) O(logM) • N is up to 120 billion, M 25 million Approx. using time [Rochester’07] O(N) O(1) N is the length of the trace. M is the size of data. C is the size of cache. Chen Ding, DragonStar lecture, ICT 2008 11

Illustration: bottlenecks of SPEC2000 on Itanium1 calculate - PDF document

Illustration: bottlenecks of SPEC2000 on Itanium1 calculate Program Interaction on Shared Cache 100% other bottleneck data cache miss Theory and Applications relative execution time 75% Chen Ding 50% Professor Department of Computer

Agenda Freight-Caused Roadway Bottlenecks Roadway Freight Network Freight Strategy

An illustration of Conditional Independence Martin Emms October 8, 2020 An illustration of

PICTORIAL MODERNISM The Golden Age of Illustration was a period of unprecedented

Evaluating Approaches to Detect Bottlenecks in the Pipe & Filter Framework TeeTime Adrian

GAPP: A Fast Profiler for Detecting Serialization Bottlenecks in Parallel Linux Applications

Identifying Performance Bottlenecks in CDNs through TCP-Level Monitoring Peng Sun Minlan Yu,

Non-Photorealistic Photorealistic Rendering Rendering Non- Pen-and-Ink Illustration

ELG5377 Illustration of Performance of LMS for AR(2) Process Eric Dubois School of Electrical

Adding Constraints to Situations When, Illustration: . . . In Addition to Intervals, We Also Have

Overcoming big data bottlenecks in healthcare : a Predictive Modeling case study Predictive

Dan Gordon (NRL) Carl Schroeder (LBNL) Identify scientific and technological bottlenecks of

Beef and dairy in Tajikistan Market opportunities Structural bottlenecks and Strategic options

Tackling Performance Bottlenecks in the Diversifying CUDA HPC Ecosystem: a Molecular Dynamics

Hyperloop Technologies Inc. Business Confidential ECONOMIC IMPACT Delays Congestion Bottlenecks

RAMA HOETZLEIN Graphics Research Engineer | SIGGRAPH 2013 Outline Atomic Ops state Bottlenecks

BBR Congestion Control Neal Cardwell, Yuchung Cheng, C. Stephen Gunn, Soheil Hassas Yeganeh, Van

Parting Thoughts Urban Transportation Planning MIT Course 1.252j/11.540j Fall 2016 Frederick

Sunday Morning in the Word September 15, 2019 1 Thessalonians 5:23-24 What a man is on his

Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University

Old English Nouns: Survival Kit P . S. Langeslag Distinct Forms of a Typical Present-Day English

Semantics Avalanche: Word Sense Disambiguation, Dependency Parsing, Semantic Role Labeling/Verb

A dding V S T S upport to L inux A udio A pplications Paul Davis Lin ux Audio System

C R I M S O N C I R C L E N E T W O R K WELCOME SHAUMBRA S H O U D 9 , M A Y 2 0 1 8 SHOUD

American Fiction 03.31.12 || English 2327: American Literature I || D. Glen Smith, instructor

Illustration: bottlenecks of SPEC2000 on Itanium1 calculate - PDF document

Illustration: bottlenecks of SPEC2000 on Itanium1 calculate Program Interaction on Shared Cache 100% other bottleneck data cache miss Theory and Applications relative execution time 75% Chen Ding 50% Professor Department of Computer

Agenda Freight-Caused Roadway Bottlenecks Roadway Freight Network Freight Strategy

An illustration of Conditional Independence Martin Emms October 8, 2020 An illustration of

PICTORIAL MODERNISM The Golden Age of Illustration was a period of unprecedented

Evaluating Approaches to Detect Bottlenecks in the Pipe &amp; Filter Framework TeeTime Adrian

GAPP: A Fast Profiler for Detecting Serialization Bottlenecks in Parallel Linux Applications

Identifying Performance Bottlenecks in CDNs through TCP-Level Monitoring Peng Sun Minlan Yu,

Non-Photorealistic Photorealistic Rendering Rendering Non- Pen-and-Ink Illustration

ELG5377 Illustration of Performance of LMS for AR(2) Process Eric Dubois School of Electrical

Adding Constraints to Situations When, Illustration: . . . In Addition to Intervals, We Also Have

Overcoming big data bottlenecks in healthcare : a Predictive Modeling case study Predictive

Dan Gordon (NRL) Carl Schroeder (LBNL) Identify scientific and technological bottlenecks of

Beef and dairy in Tajikistan Market opportunities Structural bottlenecks and Strategic options

Tackling Performance Bottlenecks in the Diversifying CUDA HPC Ecosystem: a Molecular Dynamics

Hyperloop Technologies Inc. Business Confidential ECONOMIC IMPACT Delays Congestion Bottlenecks

RAMA HOETZLEIN Graphics Research Engineer | SIGGRAPH 2013 Outline Atomic Ops state Bottlenecks

BBR Congestion Control Neal Cardwell, Yuchung Cheng, C. Stephen Gunn, Soheil Hassas Yeganeh, Van

Parting Thoughts Urban Transportation Planning MIT Course 1.252j/11.540j Fall 2016 Frederick

Sunday Morning in the Word September 15, 2019 1 Thessalonians 5:23-24 What a man is on his

Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University

Old English Nouns: Survival Kit P . S. Langeslag Distinct Forms of a Typical Present-Day English

Semantics Avalanche: Word Sense Disambiguation, Dependency Parsing, Semantic Role Labeling/Verb

A dding V S T S upport to L inux A udio A pplications Paul Davis Lin ux Audio System

C R I M S O N C I R C L E N E T W O R K WELCOME SHAUMBRA S H O U D 9 , M A Y 2 0 1 8 SHOUD

American Fiction 03.31.12 || English 2327: American Literature I || D. Glen Smith, instructor

Evaluating Approaches to Detect Bottlenecks in the Pipe & Filter Framework TeeTime Adrian