Hardware Modeling 2 Cache Analyses Peter Puschner slides credits: - PowerPoint PPT Presentation

Hardware Modeling 2 Cache Analyses Peter Puschner slides credits: P. Puschner, R. Kirner, B. Huber VU 2.0 182.101 SS 2015

Recap: Caches in WCET Analysis Purpose : Bridge gap between fast CPU and memory Essential to analyze caches on many architectures Example: 40 cycles for a miss on MPC755 What: Instructions, Data, BTB, TLB Design: Direct Mapped, Set/Fully Associative Replacement Policy: LRU, FIFO, PLRU, PRR More Characteristics : read-only / write through / write back, write (no) allocate, Multi-Level Caches (inclusive/exclusive), ... 2

Caches in WCET Analysis For software running on hardware with caches, computing the WCET by IPET alone (CFG + CCG) gets too complex Ignoring caches leads to unacceptable overestimations ð Decomposition of WCET analysis into 2+ phases 1. Categorization of memory access wrt. cache behavior (e.g., always hit, always miss, etc.); Low-Level Analysis uses cache categorization. 2. WCET computation: IPET with no or simplified cache model 3

Categories of Cache Behavior ah always hit each access to the cache is a hit (MUST analysis) am always each access to the cache is a miss miss (MAY analysis ➭ complement) ps(S) persistent for each entering of context S, first access is nc , but all other accesses are hits (PERSISTANCE analysis) nc not the access is not classified as one of the above classified categorizations 4

Direct Mapped Cache address word Line: valid bit (v), tag and data (k bytes) tag ld(m) bits ld(k) bits m lines Line 1 Line 2 w1 w2 wk v tag Line is selected ... by ld(m) address bits Line m 5

START tag, line, offset 0, 0, 0 DM-$ Analysis 0, 0, 1 Example 0, 1, 0 0,1,1 0,2,0 Compiled ¡from ¡e.g. ¡ x, y, z = a, b, 0 � 0,2,1 while (x > 0 && � 0,3,0 y > 0) � {   0,3,1 z += x-- + y--   1,0,0 } � x,y = 0,0 � END 6

START DM-$ Analysis tag, line, offset 0, 0, 0 continue with 0, 0, 1 Example 2nd loop iteration 0, 1, 0 0, 1, 0 always 0, 1, 1 hit 0,1,1 0,2,0 Compiled ¡from ¡e.g. ¡ 0,2,0 x, y, z = a, b, 0 � 0, 2, 1 while (x > 0 && � 0,2,1 always hit 0,3,0 (2..n loop iteration) 0,3,0 y > 0) � {   0,3,1 z += x-- + y--   always miss 1,0,0 } � conflict with (0,0,x) x,y = 0,0 � END 7

Cache Classification (Hit/Miss) Goal: A mechanized analysis, which classifies each cache access in a certain context (e.g. call context) as either Ø Always hit: in all possible executions, this access to the cache will be a cache hit (the accessed cache block is guaranteed to be in the cache) Ø Always miss: in all possible executions, this access to the cache will be a cache miss (the accessed cache block is guaranteed NOT to be in the cache) Ø Not classified: The accessed cache block may or may not be in the cache 8

Automated Categorization of Memory Accesses à Based on Abstract Interpretation and fixed-point analysis of cache states in the CFG à Cache update function: models changes of the cache state for memory accesses à Join function: Combines states at control-flow joins à Concrete Semantics: Set of possible cache configurations (tags only, no data) at each program point à Abstract Semantics: Efficient approximation in an abstract, “more efficient” domain 9

Data-Flow Analysis (DFA) DFA analysis is based on the data-flow structure of the system behavior of interest (e.g. forward and backward propagation) • PRED(n) are the virtual predecessors of CFG node n regarding the data flow of interest (Cache Analysis: usually CFG predecessors) The data domain L of the analysis forms a lattice, on which the transfer function F n (): L → L models the semantics of the system behavior of interest. To merge two or more states, a join function ⊔ : L × L → L is used to compute the least upper bound 10

Data-Flow Analysis (2) Data-flow equations modeling the data- flow between nodes: … IN(n) IN(n) = ⊔ ( { OUT(j) | j ∈ PRED(n) } ) F n () node n OUT(n) = F n ( IN(n) ) OUT(n) 11

Data-Flow Analysis (3) Monotonicity requirements for solving the data-flow equation iteratively: • the transfer functions F n (s) as well as the join function s 1 ⊔ s 2 must be monotone to ensure termination of the analysis. Monotonicity: a function f: A à B is monotone, iff ∀ a,a ’ ∈ A. (a ⊆ A a ’ ) à ( f(a) ⊆ B f(a ’ ) ) 12

Data-Flow Analysis (4) Iterative Algorithm to find least fixpoint for data-flow equations: for i ← 1 to N do /* initialize node i: */ OUT(i) = ⊥ while ( sets are still changing ) do for i ← 1 to N do /* recompute sets at node i: */ IN(i) = ⊔ ( { OUT(j) | j ∈ PRED(n) } ) OUT(i) = Fn( IN(i) ) 13

Concrete & Abstract Semantics Concrete Cache Semantics: Model the semantics of the relevant aspects of the program (here: cache state & update). The concrete semantics collects the set of all possible cache states for each program point. Abstract Cache Semantics: Semantics in a different, usually finite domain, connected to the concrete semantics by an abstraction/concretization function. 14

N-way Set-Associative Cache address word w1 w2 wk v tag ... 2 tag ld(k) bits ld(m) bits Replacement Strategy Block (Line): valid bit (v), tag updates blocks in one set and data (k bytes) m sets Block 1,1 Block 1,2 ... Block 1,n Block 2,1 Block 2,2 ... Block 2,n ... ... ... 1 Set is selected by ld(m) address bits Block m,1 Block m,2 ... Block m,n 15 n ways

Fully-associative Cache (Associativity N) address word Associativity: N tag offset LRU, FIFO: Way 1 youngest Cache is Way 2 updated based on value of TAG. Replacement Line: valid bit (v), tag w1 w2 wk Policy v tag ... and data (k bytes) determines the update strategy used. LRU, FIFO: Way N oldest, evicted on miss 16

Concrete Cache Semantics (Fully Associative Cache) Cache Configuration: Mapping from cache line to tag S (data is irrelevant) Domain: For each program point, set of all possible cache states State at start node: Singleton set with empty cache, or set of all possible cache configurations Update: For a cache configuration C and cache reference S, the new cache configuration C’ after accessing S 17

Concrete LRU Update (Fully Associative Cache) Update Function for 4-way cache (1 line per way) with LRU a c b a access c c b HIT d d a e b a access e c b MISS d c 18

Abstract Cache Semantics for MUST / MAY Analysis Abstract Cache Configuration Compact representation of cache configuration set MUST: For each tag S, the maximum age MAY: For each tag S, the minimum age Join: MUST: For each tag S, the maximum age MAY: For each tag S, the minimum age Update (LRU) Accessed Tag: Youngest Set MUST: For other tags, increase age if may be aged MAY: For other tags, increase age if must be aged 19

Abstract Cache Representation ⊤ ¡= ¡ ∀ x, ¡x ¡≤ ¡N+1 ¡ MUST Analysis { a } a <= 1 { } b <= 3 or c <= 4 { b } d,e <= 5+ { c } ⊤ = ¡ ∀ x, ¡x ¡≥ 1 ¡ MAY Analysis { d,e } a >= 2 { } b >= 4 or c >= 5 { a } d,e >= 1 { b } 20

Abstract Cache Semantics (MUST Concretization) MUST Analysis { a } a <= 1 { } b <= 3 or c <= 4 { b } d,e <= 5+ { c } Concretization a a a a a a a a b b b b c c d e c d c e b b b b d c e c d e c c 21

Abstract Cache Semantics (MUST Join) MUST Join { a } { } a <= 1 a <= 2 { } { a } b <= 3 c <= 4 join c <= 4 d <= 4 { b } { } d,e <= 5+ b,e <= 5+ { c } { c,d } { } a <= 2 { a } b <= 5+ c <= 4 { } d,e <= 5+ { c } 22

Abstract Cache Update Function: (LRU Cache, MUST analysis) when accessing block c max-age’(c) = 1 max-age(d) ≥ max-age(c) à max-age’(d) = max-age(d) max-age(d) < max-age(c) à max-age’(d) = max-age(d) + 1 23

Abstract Cache Update Function: (LRU Cache, MUST analysis) when accessing block c max-age’(c) = 1 max-age(d) ≥ max-age(c) à max-age’(d) = max-age(d) 1. assume age(d) < age(c) à max-age(d) ≥ age(d)+1 2. assume age(d) > age(c) à age’(d) = age(d) max-age(d) < max-age(c) à max-agd’(d) = max-age(d) + 1 1. If age(d) < age(c), age’(d) = age(d) + 1 ≤ max-age(d) + 1 2. If age(d) > age(c), age’(d) = age(d) ≤ max-age(d) + 1 24

Cache Hit/Miss Classification using MUST analysis If at some program point, tag S must be in the cache, i.e., its maximum age is less than or equal to the associativity, then The cache access is classified as ALWAYS HIT If at some program point, it is not the case that tag S may be in the cache, i.e., its minimum age is greater than the associativity of the cache, then The cache access is classified as ALWAYS MISS Otherwise The cache access is NOT CLASSIFIED 25

Abstract Cache Semantics (MAY Concretization) MAY Analysis { d,e } a >= 2 { a } b >= 4 or c >= 5 { } d,e >= 1 { b } Concretization d e d e a a e d e d a a b b b b

Abstract Cache Semantics (MAY Join) MAY Analysis { d,e } { } a >= 4 a >= 2 b >= 5 { a } { e } b >= 4 join c >= 5 c >= 5 { } { } d >= 5 d,e >= 1 e >= 2 { b } { a } { d, e } a >= 2 b >= 4 { a } c >= 5 { } d >= 1 e >= 1 { b }

Hardware Modeling 2 Cache Analyses Peter Puschner slides credits: - PowerPoint PPT Presentation

Hardware Modeling 2 Cache Analyses Peter Puschner slides credits: P. Puschner, R. Kirner, B. Huber VU 2.0 182.101 SS 2015 Recap: Caches in WCET Analysis Purpose :

1 Classifying cache misses Cache Organization Classifying misses by causes (3Cs) Cache size,

What Is Memory Hierarchy A typical memory hierarchy today: Lecture 13: Cache Basics and Cache

Memory Hierarchy: Cache Memory hierarchy Cache basics Locality Cache organization Cache-aware

Web Cache Consistency Web Cache Consistency Web Cache Consistency Web Cache Consistency

L09: Cache Name: ID: Question: Direct Mapping Cache Hit Rate Consider a 4-block empty Cache,

Hardware Observability Framework Hardware Observability Framework Hardware Observability

Generations of Cache 1980: no cache in proc; 1989 first Intel proc with a cache on chip.

Cache Memory Chapter 17 S. Dandamudi Outline Introduction Types of cache misses

Cache Performance Associativity Replacement Samira Khan Cache Performance March 28,

Cache Memory Chapter 17 S. Dandamudi Outline Introduction Types of cache misses

Caches Electronic Computers M Caches 1 Cache LOCALITY PRINCIPLE (SPATIAL AND TEMPORAL)

Plan Hierarchical memories and their impact on our programs 1 Cache Memories, Cache Complexity

Cache Creek Placer Area Fee Proposal History of Placer Mining at Cache Creek Prospecting in

Cache Memories, Cache Complexity Marc Moreno Maza University of Western Ontario, London, Ontario

General Cache Mechanics CPU Block: unit of data in cache and memory. (a.k.a. line) Memory

lecture 18 cache 2 - TLB (hit and miss) - instruction or data cache - cache (hit and

Solving intertemporal CGE model in parallel using Singly Bordered Block Diagonal ordering technique

The design and analysis of multicarrier The design and analysis of multicarrier PWM based

Measurements of the isoscalar monopole response in the neutron-rich nucleus 68 Ni Introduction

ThS. Trn Th Thanh Nga Khoa CNTT, Trng H Nng Lm TPHCM Email: ngattt@hcmuaf.edu.vn

Predication and NP Structure in an Michael Hahn University of Omnipredicative Language: The Case

Sacramento Collaborative to Advance Testing and Care of Hepatitis B (SCrATCH B) Duke LeTran COE

Machine Comprehension with Discourse Relations Karthik Narasimhan Regina Barzilay CSAIL,

Every Second Counts: Organise yourself through strategic and purposeful planning jcu.edu.au

Sambuz

Useful Links

Newsletter

Mail Us