The Use of Prediction for The Use of Prediction for Accelerating - PowerPoint PPT Presentation

� ✂ ✄ ✂ ☎ ✂ ✁ ✄ ☎ The Use of Prediction for The Use of Prediction for Accelerating Upgrade Misses in Accelerating Upgrade Misses in cc-NUMA Multiprocessors cc-NUMA Multiprocessors Manuel E. Acacio , José González , José M. García and José Duato e-mail: meacacio@ditec.um.es

� ✁ Introduction Introduction � Scalable shared-memory multiprocessors � Based on the use of directories � Known as cc-NUMA architectures � Long L2 miss latencies � Mainly caused by the indirection introduced by the access to the directory information – Network latency – Directory latency � Upgrade misses � Important fraction of the L2 miss rate (> 40%) � Store instruction for which a read-only copy of the line is found in the local L2 cache � Exclusive ownership is required

� ✁ Introduction Introduction � Upgrade misses in a conventional cc-NUMA – Line L shared by nodes 1, 3 and 4 – Directory: Node 2 Line L Sharers? Owner for L ? – Node 1 issues an Upgrade for L Nodes 1,3,4 Node 1 2nd 1st Inv L Line L Store Miss Sharer Directory Shared (UPGR) Node 3 Directory? Node 2 Ack Node 2 3rd Line L 2nd Inv L Ownership Store Miss Ack 4th Node 1 3rd Line L Sharer Line L Shared Node 4 Shared

� ✁ Introduction Introduction � Upgrade misses using prediction – Line L shared by nodes 1, 3 and 4 – Directory: Node 2 Owner for L ? Line L Sharers? – Node 1 issues an Upgrade for L Nodes 1,3( OK ),4( OK ) Node 1 1st Line L Store Miss Sharer Directory 1st Shared (UPGR) Node 3 Inv L Predicted Node 2 Nodes? 3,4 Line L Ownership Ack 2nd 2nd Store Miss 1st Node 1 Inv L Line L Sharer Line L Ack Shared Node 4 Shared 2nd

� ✁ Introduction Introduction � Two key observations motivate our work: � Repetitive behavior found for upgrade misses � Small number of invalidations sent on an upgrade miss � Two main elements must be developed: � An effective prediction engine – Accessed on an upgrade miss – Provides a list of the sharers � A coherence protocol – Properly extended to support the use of prediction

� ✁ Outline Outline � Introduction � Predictor Design for Upgrade Misses � Extensions to a MESI Coherence Protocol � Performance Evaluation � Conclusions

� ✁ Predictor Design for Upgrade… Predictor Design for Upgrade… � Predictor characteristics: � Address-based predictor – Accessed using the effective address of the line � 3 pointers per entry – Small number of sharers per line – Addition of confidence bits per each pointer – (3 x log 2 N + 6) bits per entry � Implemented as a non-tagged table – Initially, all 2-bit counters store 0 – Predictor is probed on each upgrade miss � Miss predicted when confidence – Predictor is updated in two situations: � On the reply from the directory � On a load miss serviced with a $-to-$ transfer ( Migratory Data )

✁ � Predictor Design for Upgrade… Predictor Design for Upgrade… � Predictor Anatomy ��

� ✁ Extensions to a MESI Protocol Extensions to a MESI Protocol � Changes to Requesting node, sharer nodes, home directory � Requesting Node Operation � On suffering a predicted UPGRADE MISS – Create & send invalidation messages to predicted nodes � Put message Predicted bit to 1 – Send miss to the directory � Put message Predicted bit to 1 and include the list of predicted nodes – Collect directory reply and ACK / NACK from predicted nodes: � Re-invalidate those real sharers that replied NACK (if any) – Gain exclusive ownership

� ✁ Extensions to a MESI Protocol Extensions to a MESI Protocol � Sharer Node Operation � On receiving a predicted INVALIDATION message and – Pending Load Miss: store invalidation and return NACK – Pending UPGR Miss (line in the Shared state): � Directory reply not received: return ACK and invalidate line � Directory reply previously received: return NACK – Not pending UPGR Miss and line in the Shared state: � Return ACK and invalidate � Insert tag in Invalidated Lines Table ( ILT ) – Otherwise, return NACK message � On suffering a Load Miss – If entry found in the ILT , put message Invalidated bit to 1

� ✁ Extensions to a MESI Protocol Extensions to a MESI Protocol � Predictor + ILT added to each node � Anatomy of the Invalidated Lines Table (ILT) ��

� ✁ Extensions to a MESI Protocol Extensions to a MESI Protocol � Directory Node Operation � On receiving a predicted UPGRAGE MISS – If line is in the Shared state � All sharers predicted � send reply ( TOTAL HIT ) � Some actual sharers not predicted ( PARTIAL HIT ) or none correctly predicted ( TOTAL MISS ) � Invalidate and send reply – Otherwise, process as usually ( NOT INV ) � On receiving a Load Miss – If message Invalidated bit is set && requesting node present in sharing code � wait until UPGR to complete! – Otherwise, process as usually

� ✁ Performance Evaluation Performance Evaluation � Performance Evaluation � RSIM multiprocessor simulator � We assume that predictors do not add any cycle � � Benchmarks – Applications with more than 25% upgrade misses covering a variety of patterns � EM3D, FFT, MP3D, Ocean and Unstructured

� ✁ Performance Evaluation Performance Evaluation � Experimental Framework � Compared systems: – Base : Traditional cc-NUMA using a bit-vector directory – UPT : Added unlimited Prediction Table and ILT – LPT : Added a "realistic" Prediction Table and ILT � Prediction Table : 16K entries (non-tagged) � ILT : 128 entries (totally associative) � Total size less than 48 KB (1 MB L2 caches) � We study: – Predictor accuracy – Impact on latency of upgrade misses – Impact on latency of load & store misses – Impact on execution time

✁ � Performance Evaluation A Novel Architecture Performance Evaluation A Novel Architecture � Results(1). Predictor Accuracy Predictor Accuracy EM3D FFT MP3D Ocean Unstruct 1,20 Not Inv Not Predict 1,00 Total Miss Partial Hit 0,80 % Inv Misses Total Hit 0,60 0,40 0,20 0,00 UPT LPT UPT UPT UPT UPT LPT LPT LPT LPT

✁ � Performance Evaluation A Novel Architecture Performance Evaluation A Novel Architecture � Results(2). Average Upgrade Miss Latency Average Upgrade Miss Latency Misc EM3D FFT MP3D Ocean Unstruct 1,20 Directory Normalized Latency 1,00 Network 0,80 0,60 0,40 0,20 0,00 UPT UPT UPT UPT UPT Base LPT Base LPT Base LPT Base LPT Base LPT

� ✁ Performance Evaluation A Novel Architecture Performance Evaluation A Novel Architecture � Results(3). Average Load/Store Miss Latency Average Load and Store Miss Latencies EM3D FFT MP3D Ocean Unstruct 1,20 Base UPT Normalized Latency 1,00 LPT 0,80 0,60 0,40 0,20 0,00 Load Store Load Store Load Store Load Store Load Store

� ✁ Performance Evaluation A Novel Architecture Performance Evaluation A Novel Architecture � Results(4). Application Speed-ups Application Speed-ups UPT 16% 14% LPT 12% Speed-up 10% 8% 6% 4% 2% 0% EM3D FFT MP3D Ocean Unstruct

� ✁ Conclusions Conclusions � Conclusions (1) � Upgrade misses are caused by a store instruction when a read-only copy is found: – Message sent to directory – Directory lookup – Invalidations sent to sharers – Replies to the invalidations sent back – Ownership message returned � Account for an important fraction of the L2 miss rate (>40%) � We propose use of prediction for accelerating them – On an upgrade miss: predict sharers and invalidate them in parallel with the access to the directory – Based on: � Repetitive behavior � Small number of invalidations per upgrade miss

� ✁ Conclusions Conclusions � Conclusions (2) � Results: – Great fraction of upgrade misses successfully predicted – Reductions > 40% on average upgrade miss latency – Load miss latencies are not affected in most cases – Speed-ups on application execution time up to 14%

The Use of Prediction for The Use of Prediction for Accelerating - PowerPoint PPT Presentation

The Use of Prediction for The Use of Prediction for Accelerating Upgrade Misses in Accelerating Upgrade Misses in cc-NUMA Multiprocessors cc-NUMA Multiprocessors Manuel E. Acacio , Jos Gonzlez

Structured Prediction Introduction What is structured prediction? CS 6355: Structured Prediction

Branch Prediction Branch Prediction vs vs Execution Time Execution Time Prediction

Exercise 7a: Additional Intra Prediction Modes Implement Additional Block Prediction Modes Add

Using lasso and related estimators for prediction Di Liu StataCorp July 12, 2019 1 / 20

Prediction and Odds 18.05 Spring 2017 Probabilistic Prediction Also called probabilistic

Using Stata 16s lasso features for prediction and inference Di Liu StataCorp 1 / 50

CS 104 Computer Organization and Design Branch Prediction CS104:Branch Prediction 1 Branch

DeepLoc Data set statistics & performance Protein prediction II Gregor Sturm, Johannes Rest,

(seasonal) prediction systems Arun Kumar Climate Prediction Center College Park, Maryland, USA

Summary of part I: prediction and RL Prediction is important for action selection The

Prediction and Odds 18.05 Spring 2014 January 1, 2017 1 / 20 Probabilistic Prediction Also

Prediction and Odds 18.05 Spring 2014 January 1, 2017 1 / 26 Probabilistic Prediction Also

Link prediction via matrix factorization Charles Elkan University of California, San Diego

Image and Video Coding: Intra Prediction & Picture Partitioning Intra-Picture Prediction

Part-II Parametric Signal Modeling and Linear Prediction Theory 3. Linear Prediction Electrical

k -Step Ahead Prediction Error Model 1. k -Step Ahead Prediction Error Model 1. ARMAX model is

Roadmap for Section A.1 General Concepts - Windows Networking Domains & Active Directory The

Networks Xavier Martorell-Bofill 1 Ren Serral-Graci 1 Universitat Politcnica de Catalunya

gLite Data Management Agenda gLite Data Management Introduction Examples Name

CS654 Advanced Computer Architecture Lec 14 Directory Based Multiprocessors Peter Kemper

The World Wide Web How the Web Works Publishing Web Pages Aaron Stevens (azs@bu.edu) 20

Security and Privacy in Vehicular Social Networks Hongyu Jin, Mohammad Khodaei, and Panos

AgentBuilder Martin Michalowski Outline Installing AgentBuilder Introduction to

Using SLP to Discover iSCSI Targets and Name Services 50 th IETF - Minneapolis March 2001 Mark

The Use of Prediction for The Use of Prediction for Accelerating - PowerPoint PPT Presentation

The Use of Prediction for The Use of Prediction for Accelerating Upgrade Misses in Accelerating Upgrade Misses in cc-NUMA Multiprocessors cc-NUMA Multiprocessors Manuel E. Acacio , Jos Gonzlez

Structured Prediction Introduction What is structured prediction? CS 6355: Structured Prediction

Branch Prediction Branch Prediction vs vs Execution Time Execution Time Prediction

Exercise 7a: Additional Intra Prediction Modes Implement Additional Block Prediction Modes Add

Using lasso and related estimators for prediction Di Liu StataCorp July 12, 2019 1 / 20

Prediction and Odds 18.05 Spring 2017 Probabilistic Prediction Also called probabilistic

Using Stata 16s lasso features for prediction and inference Di Liu StataCorp 1 / 50

CS 104 Computer Organization and Design Branch Prediction CS104:Branch Prediction 1 Branch

DeepLoc Data set statistics &amp; performance Protein prediction II Gregor Sturm, Johannes Rest,

(seasonal) prediction systems Arun Kumar Climate Prediction Center College Park, Maryland, USA

Summary of part I: prediction and RL Prediction is important for action selection The

Prediction and Odds 18.05 Spring 2014 January 1, 2017 1 / 20 Probabilistic Prediction Also

Prediction and Odds 18.05 Spring 2014 January 1, 2017 1 / 26 Probabilistic Prediction Also

Link prediction via matrix factorization Charles Elkan University of California, San Diego

Image and Video Coding: Intra Prediction &amp; Picture Partitioning Intra-Picture Prediction

Part-II Parametric Signal Modeling and Linear Prediction Theory 3. Linear Prediction Electrical

k -Step Ahead Prediction Error Model 1. k -Step Ahead Prediction Error Model 1. ARMAX model is

Roadmap for Section A.1 General Concepts - Windows Networking Domains &amp; Active Directory The

Networks Xavier Martorell-Bofill 1 Ren Serral-Graci 1 Universitat Politcnica de Catalunya

gLite Data Management Agenda gLite Data Management Introduction Examples Name

CS654 Advanced Computer Architecture Lec 14 Directory Based Multiprocessors Peter Kemper

The World Wide Web How the Web Works Publishing Web Pages Aaron Stevens (azs@bu.edu) 20

Security and Privacy in Vehicular Social Networks Hongyu Jin, Mohammad Khodaei, and Panos

AgentBuilder Martin Michalowski Outline Installing AgentBuilder Introduction to

Using SLP to Discover iSCSI Targets and Name Services 50 th IETF - Minneapolis March 2001 Mark

DeepLoc Data set statistics & performance Protein prediction II Gregor Sturm, Johannes Rest,

Image and Video Coding: Intra Prediction & Picture Partitioning Intra-Picture Prediction

Roadmap for Section A.1 General Concepts - Windows Networking Domains & Active Directory The