GLOBALLY-SYNCHRONIZED FRAMES FOR GUARANTEED QUALITY-OF-SERVICE IN - PowerPoint PPT Presentation

GLOBALLY-SYNCHRONIZED FRAMES FOR GUARANTEED QUALITY-OF-SERVICE IN ON-CHIP NETWORKS Jae W. Lee (MIT) Man Cheuk Ng (MIT) Krste Asanovic (UC Berkeley) June 23 th 2008 ISCA-35, Beijing, China

Resource sharing increases performance variation � Resource sharing ( + ) reduces hardware cost P P P P P P P P ( - ) increases performance variation P P P P P P P P multi-hop on-chip network multi-hop on-chip n work � This performance variation becomes larger and larger as L2$ L2$ L2$ L2$ L2$ L2$ L2$ L2$ mem mem mem mem bank bank bank bank bank bank cont cont bank bank cont. cont. the number of sharers (cores) increases. Jae W. Lee (2 / 33)

Desired quality-of-service from shared resources � Performance isolation P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P (fairness) multi-hop on-chip network multi-hop on-chip n work L2$ L2$ L2$ L2$ L2$ L2$ L2$ L2$ mem mem mem mem bank bank bank bank bank bank cont cont bank bank cont. cont. (hotspot) (hotspot) accepted throughput [MB/s] minimum guaranteed BW minimum guaranteed BW processor ID 0 1 2 3 4 5 6 7 8 9 A B C D E F Jae W. Lee (3 / 33)

Desired quality-of-service from shared resources � Performance isolation P P P P P P P P P P P P P P P P (fairness) multi-hop on-chip n multi-hop on-chip network work � Differentiated services (flexibility) L2$ L2$ L2$ L2$ L2$ L2$ L2$ L2$ mem mem mem mem bank bank bank bank bank bank cont cont bank bank cont. cont. (hotspot) (hotspot) accepted accepted throughput [MB/s] throughput [MB/s] minimum guaranteed BW minimum guaranteed BW differen diff erentia tiated ted allocation processor processor ID ID 0 1 2 3 4 5 6 7 8 9 A B C D E F 0 1 2 3 4 5 6 7 8 9 A B C D E F Jae W. Lee (4 / 33)

Resources w/ centralized arbitration are well investigated � Resources with P+ P+ P+ P+ P+ P+ centralized arbitration L1$ L1$ L1$ L1$ L1$ L1$ R R R � SDRAM controllers P+ P+ P+ P+ P+ P+ on-chip L1$ L1$ L1$ L1$ L1$ L1$ � L2 cache banks routers R R R L2$ L2$ mem mem � They have a single entry bank bank ctrl ctrl R R R point for all requests. → QoS is relatively easier [MICRO ’06] [MICRO ’06] [HPCA ‘02] [HPCA ‘02] and well investigated. [PACT ’07] [PACT ’07] [ICS ‘04] [ICS ‘04] [USENIX sec. ’07] [USENIX sec. ’07] [ISCA ‘07] [ISCA ‘07] [IBM ’07] [IBM ’07] … [MICRO ’07] [MICRO ’07] [ISCA ’08] [ISCA ’08] ... ... Jae W. Lee (5 / 33)

QoS from on-chip networks is a challenge � Resources with P+ P+ P+ P+ P+ P+ distributed arbitration L1$ L1$ L1$ L1$ L1$ L1$ R R R � multi-hop on-chip networks P+ P+ P+ P+ P+ P+ on-chip L1$ L1$ L1$ L1$ L1$ L1$ routers R R R � They have distributed L2$ L2$ mem mem arbitration points. bank bank ctrl ctrl → QoS is more difficult. R R R � Off-chip solutions cannot be directly applied because of resource constraints. Jae W. Lee (6 / 33)

We guarantee QoS for flows physical link � Flow: a sequence of packets shared by 3 flows between a unique pair of end nodes (src and dest) R R R R � physical links shared by flows � multiple stages of arbitration R R R R for each packet � We provide guaranteed QoS R R R R to each flow with: � minimum bandwidth R R R R guarantees � bounded maximum delay hotspot resource Jae W. Lee (7 / 33)

Locally fair ⇒ globally fair arbitration arbitration arbitration point 1 point 2 point 3 SRC D SRC D DEST DEST SRC C SRC C SRC B SRC B SRC A SRC A channel rate = C [Gb/s] With locally fair round-robin (RR) arbitration: � Throughput (Flow A) = (0.5) C � Throughput (Flow B) = (0.5) 2 C � Throughput (Flow C) = Throughput (Flow D) = (0.5) 3 C → Throughput of a flow decreases exponentially as its distance to the destination (hotspot) increases. Jae W. Lee (8 / 33)

Motivational simulation � In 8x8 mesh network with RR arbitration (hotspot at (8, 8)) 7 65 4 3 21 accepted throughput accepted throughput [flits/cycle/node] [flits/cycle/node] hotspot 0.06 0.06 8 0.04 0.04 8 7 6 5 4 3 2 1 0.02 0.02 8x8 2D mesh 8x8 2D mesh 0 0 8 8 7 7 node index (Y) node index (Y) 6 1 1 6 2 2 5 5 ) ) ) ) X X 3 3 X X 4 4 ( ( ( ( 4 4 x x x x 3 3 e e 5 5 e e d d 2 d d 6 2 6 n n n n i i 1 7 i i 1 7 e e e e 8 8 d d d d o o o o n n n n w/ minimal-adaptive routing w/ dimension-ordered routing locally-fair round-robin scheduling → globally unfair bandwidth usage Jae W. Lee (9 / 33)

Desired bandwidth allocation: an example � Taken from simulation results with GSF: accepted throughput accepted throughput [flits/cycle/node] [flits/cycle/node] 0.06 0.06 0.04 0.04 0.02 0.02 0 0 8 8 7 7 n node index (Y) 1 1 6 6 o 2 2 5 d 5 3 3 ) 4 e 4 X ) 4 4 X ( 3 3 x 5 ( i 5 e x n 2 2 6 6 d e n d d 7 1 7 1 n i e e 8 i 8 d e x o d o n n ( Y ) Differentiated allocation Fair allocation Jae W. Lee (10 / 33)

Globally Synchronized Frames (GSF) provide guaranteed QoS guaranteed QoS with minimum bandwidth guarantees and maximum delay to each flow in multi- hop on-chip networks: � with high network utilization comparable to best-effort virtual-channel router � with minimal area/energy overhead by avoiding per-flow queues/structures in on-chip routers → scalable to # of concurrent flows Jae W. Lee (11 / 33)

Outline of this talk � Motivation � Globally-Synchronized Frames: a step-by-step development of mechanism � Implementation of GSF router � Evaluation � Related work � Conclusion Jae W. Lee (12 / 33)

GSF takes a frame-based approach shared physical link frame # fram e # R R R R 4 R R R R 3 2 R R R R 1 0 R R R R time time � Frame is a coarse quantization of time. � The network can transport a finite number of flits during this interval. � We constrain each flow source to inject a certain number of flits per frame. � shorter frames → coarser BW control but lower maximum delay � typically 1-100s Kflits / frame (over all flows) in 8x8 mesh network Jae W. Lee (13 / 33)

Admission control of flows shared physical link frame # fram e # R R R R 4 R R R R 3 2 R R R R 1 0 R R R R time time � Admission control: reject a new flow if it would make the network unable to transport all the injected flits within a frame interval Jae W. Lee (14 / 33)

Single frame does not service bursty traffic well frame # fram e # 5 4 3 2 1 0 time time regulated src regulated src bursty src bursty src � Both traffic sources have the same long-term rate: 2 flits / frame. � Allocating 2 flits / frame penalizes the bursty source. Jae W. Lee (15 / 33)

Overlapping multiple frames to help bursty traffic 7 fram frame # e # 6 6 5 5 5 4 4 4 3 3 3 2 2 2 2 future frames 2 future frames 1 1 0 head frame head frame time time � Overlapping multiple frames Overlapping multiple frames to multiply injection slots � Sources can inject flits into future frames (w/ separate per-frame buffers) � Older frames have higher priorities for contended channels. � Drain time of head frame does not change. � Future frames can use unclaimed BW by older frames. � Maximum network delay < 3 * (frame interval) � Best-effort traffic: always lowest priority (throughput ↑ ) Jae W. Lee (16 / 33)

Reclamation of frame buffers 7 fram frame # e # Frame Frame 6 6 window window 5 5 5 shift shift 4 4 4 3 3 3 2 2 2 VC2 VC2 VC1 VC1 1 1 VC0 VC0 0 time time epoch epoch epoch epoch epoch epoch 0 1 2 3 4 5 � Per-frame buffers (at each node) = virtual channels � At every frame window shift, frame buffers (or VCs) associated with the earliest frame in the previous epoch are reclaimed for the new futuremost frame. Jae W. Lee (17 / 33)

Early reclamation improves network throughput 7 7 7 7 7 fram frame # e # Frame Frame Frame Frame 6 6 6 6 6 6 6 window window window window 5 5 5 5 5 5 5 5 5 shift shift shift shift 4 4 4 4 4 4 4 4 3 3 3 3 3 3 3 3 2 2 2 2 2 2 2 2 1 1 1 1 1 0 0 0 time time e0 epoch e1 e2 epoch e3 epoch e4 e5 epoch epoch epoch e6 e7 0 1 2 3 4 5 � Observation: Head frame usually drains much earlier than frame interval → low buffer utilization � Terminate head frame early if empty Terminate head frame early if empty � Use a global barrier network to confirm no pending packet in router or source queue belongs to head frame. � Empty buffers are reclaimed much faster and overall throughput increases. (by >30% for hotspot traffic pattern) Jae W. Lee (18 / 33)

GSF in action � GSF in action: two-router network example (3 VCs) Flow A Flow A Flow B Flow B Flow C Flow C Flow D Flow D VC 0 VC 0 VC 0 VC 0 A B C (Fr0) (Fr0) (Fr0) (Fr0) VC 1 VC 1 VC 1 VC 1 A C B (Fr1) (Fr1) (Fr1) (Fr1) VC 2 VC 2 VC 2 VC 2 A B D (Fr2) (Fr2) (Fr2) (Fr2) Frame 0 Frame 0 active frame Frame 1 Frame 1 window: Frame 2 Frame 2 Frame 3 Frame 3 Frame 4 Frame 4 Frame 5 Frame 5 Jae W. Lee (19 / 33) •••

GLOBALLY-SYNCHRONIZED FRAMES FOR GUARANTEED QUALITY-OF-SERVICE IN - PowerPoint PPT Presentation

GLOBALLY-SYNCHRONIZED FRAMES FOR GUARANTEED QUALITY-OF-SERVICE IN ON-CHIP NETWORKS Jae W. Lee (MIT) Man Cheuk Ng (MIT) Krste Asanovic (UC Berkeley) June 23 th 2008 ISCA-35, Beijing, China Resource sharing increases performance variation

Buckling Resistance of Frames Buckling Resistance of Frames Buckling Resistance of Frames

framing Evoked vs. invoked frames: Words evoke frames by being strongly associated with

Overview/Questions Review: formatting HTML pages Frames Style Sheets 2 1 HTML Frames

CS 184: Foundations of Computer Graphics Lecture 23: Intro to Animation Rahul Narain Animation

Sequence Diagrams: Interaction Frames Ferd van Odenhoven Fontys Hogeschool voor Techniek en

Explicit Locks Alma Orucevic-Alagic 2013-11-28 Synchronized Java incorporates a

SHOW ME THE MONEY! THE NEW ERA OF LIVE OTT. Sye has started the synchronized live OTT revolution.

TSMP Time Synchronized Mesh Protocol Seminar in Distributed Computing, FS 2010, ETH Zrich

Loosely Time-Synchronized Snapshots in Object-Based File Systems Jan Stender, Mikael Hgqvist,

Housing Guaranteed Housing Pace offers guaranteed housing to entering first year and transfer

Guaranteed Learning of Latent Variable Models through Spectral and Tensor Methods Anima

Guaranteed Energy Savings Program (GESP) Peter Berger, GESP Manager Guaranteed Energy Savings

Improving methods for linking area frames with list frames: preliminary results Cristiano

~32 Frames E Spaced evenly= A Slides # 9 thru 28) S T W ~32 Frames Spaced evenly, I

Scalable frames Kasso Okoudjou joint with X. Chen, G. Kutyniok, F. Philipp, R. Wang Department

Molecular Biology, part 2 l Junk DNA l Reading frames, open reading frames l Splicing and number of

Importance of Session 3: How is quality of QoE and QoS experience important to for Orange

Title : An Introduction to Cache quality of service in Linux Problem statement: In todays new

QoS & Scheduling Danny Dolev Danny Dolev * Notes from * Notes from Keshav Keshav and and

QoS, CoS, BE Markus Peuhkuri 2001-09-11 Lecture topics Course organisation Why QoS Terms

Chapter 6 Service Quality Shin Ming Guo NKFUST Service Gaps Measuring Service Quality

Survey of Concepts for QoS Improvements via SDN Atanas Mirchev Chair for Network Architectures

Profiling and Autotuning for Energy- Aware Approximate Programming

Quality-of-Service and Resource management support in Task-Centric Models Artur Podobas, Mats

GLOBALLY-SYNCHRONIZED FRAMES FOR GUARANTEED QUALITY-OF-SERVICE IN - PowerPoint PPT Presentation

GLOBALLY-SYNCHRONIZED FRAMES FOR GUARANTEED QUALITY-OF-SERVICE IN ON-CHIP NETWORKS Jae W. Lee (MIT) Man Cheuk Ng (MIT) Krste Asanovic (UC Berkeley) June 23 th 2008 ISCA-35, Beijing, China Resource sharing increases performance variation

Buckling Resistance of Frames Buckling Resistance of Frames Buckling Resistance of Frames

framing Evoked vs. invoked frames: Words evoke frames by being strongly associated with

Overview/Questions Review: formatting HTML pages Frames Style Sheets 2 1 HTML Frames

CS 184: Foundations of Computer Graphics Lecture 23: Intro to Animation Rahul Narain Animation

Sequence Diagrams: Interaction Frames Ferd van Odenhoven Fontys Hogeschool voor Techniek en

Explicit Locks Alma Orucevic-Alagic 2013-11-28 Synchronized Java incorporates a

SHOW ME THE MONEY! THE NEW ERA OF LIVE OTT. Sye has started the synchronized live OTT revolution.

TSMP Time Synchronized Mesh Protocol Seminar in Distributed Computing, FS 2010, ETH Zrich

Loosely Time-Synchronized Snapshots in Object-Based File Systems Jan Stender, Mikael Hgqvist,

Housing Guaranteed Housing Pace offers guaranteed housing to entering first year and transfer

Guaranteed Learning of Latent Variable Models through Spectral and Tensor Methods Anima

Guaranteed Energy Savings Program (GESP) Peter Berger, GESP Manager Guaranteed Energy Savings

Improving methods for linking area frames with list frames: preliminary results Cristiano

~32 Frames E Spaced evenly= A Slides # 9 thru 28) S T W ~32 Frames Spaced evenly, I

Scalable frames Kasso Okoudjou joint with X. Chen, G. Kutyniok, F. Philipp, R. Wang Department

Molecular Biology, part 2 l Junk DNA l Reading frames, open reading frames l Splicing and number of

Importance of Session 3: How is quality of QoE and QoS experience important to for Orange

Title : An Introduction to Cache quality of service in Linux Problem statement: In todays new

QoS &amp; Scheduling Danny Dolev Danny Dolev * Notes from * Notes from Keshav Keshav and and

QoS, CoS, BE Markus Peuhkuri 2001-09-11 Lecture topics Course organisation Why QoS Terms

Chapter 6 Service Quality Shin Ming Guo NKFUST Service Gaps Measuring Service Quality

Survey of Concepts for QoS Improvements via SDN Atanas Mirchev Chair for Network Architectures

Profiling and Autotuning for Energy- Aware Approximate Programming

Quality-of-Service and Resource management support in Task-Centric Models Artur Podobas, Mats

QoS & Scheduling Danny Dolev Danny Dolev * Notes from * Notes from Keshav Keshav and and