A System- -on on- -a a- -Chip Lock Chip Lock A System Cache - PowerPoint PPT Presentation

A System- -on on- -a a- -Chip Lock Chip Lock A System Cache with Task Preemption Cache with Task Preemption Support Support By By Bilge S. Bilge S. Akgul Akgul, , Jaehwan Jaehwan Lee and Lee and Vincent J. Mooney Vincent J. Mooney Georgia Institute of Technology Georgia Institute of Technology School of Electrical and Com puter Engineering School of Electrical and Com puter Engineering

Outline Outline � Introduction Introduction � � Background Background � � Lock Synchronization Problem s Lock Synchronization Problem s � � Our Methodology Our Methodology � � Hardware and Software Designs Hardware and Software Designs � � Experim ents and Results Experim ents and Results � � Conclusion Conclusion �

Introduction Introduction � Multi Multi- -processor shared memory processor shared memory SoC SoC � � Intertask Intertask/ /interprocess interprocess synchronization synchronization � � Lock synchronization overheads Lock synchronization overheads � � Lock delay, lock latency Lock delay, lock latency � � Memory bandwidth consumption Memory bandwidth consumption � � Aim: Aim: � � Reduce overheads Reduce overheads � � Improve Real Time (RT) predictability Improve Real Time (RT) predictability �

Background Background � Critical Section Critical Section � � Code section where shared data between multiple Code section where shared data between multiple execution units is accessed execution units is accessed � E.g., multiple readers and multiple writers E.g., multiple readers and multiple writers � A lock is necessary to guarantee the consistency of A lock is necessary to guarantee the consistency of shared data (e.g., global variables) shared data (e.g., global variables) � Lock Delay Lock Delay � � Time between release and acquisition of a lock Time between release and acquisition of a lock � Lock Latency Lock Latency � � Time to acquire a lock in the absence of contention Time to acquire a lock in the absence of contention

Problems Problems � Ensuring mutual exclusiveness Ensuring mutual exclusiveness � � Communication bandwidth consumption Communication bandwidth consumption � � Eliminate busy Eliminate busy- -wait problems wait problems � � Busy Busy- -wait: If lock is busy, processors spin on wait: If lock is busy, processors spin on � memory bus memory bus � Effective lock hand off necessary Effective lock hand off necessary � � Fair Fair � � Predictive Predictive �

Previous Work Previous Work � Spin Spin- -lock alternatives ( lock alternatives ( Anderson ’90 ) Anderson ’90 ) � � Spin Spin- -on on- -read (spin on cache), delays in spin read (spin on cache), delays in spin- -loops loops � Queue based software locks Queue based software locks � � Array based queuing ( Array based queuing ( Anderson ’90 ) Anderson ’90 ) � MCS locks ( MCS locks ( Mellor , Scott ‘91 ) ) Mellor- -Crummey Crummey, Scott ‘91 � LH and M locks ( LH and M locks ( Ladin ) 94 ) Ladin, , Hagerston Hagerston, Magnusson , Magnusson ’ ’94 � Queue based hardware locks Queue based hardware locks � � QOLBY ( QOLBY ( Kagi ) – – makes use of collocation makes use of collocation 99 ) Kagi ’ ’99 � Cache Cache- -based locks ( based locks ( Ramachandran ) � 96 ) Ramachandran’ ’96 � Memory consistency model Memory consistency model � New cache design, extra cache states for locks New cache design, extra cache states for locks

Methodology Methodology � Custom hardware unit: SoC Lock Cache Custom hardware unit: SoC Lock Cache � � Utilize advantages of Utilize advantages of SoC SoC Design Design � � Short Critical Sections covered in DATE Short Critical Sections covered in DATE ’ ’01 01 � � Critical Sections may be long or short Critical Sections may be long or short � � Support preemption of tasks when necessary Support preemption of tasks when necessary � � Hardware Hardware- -interrupt triggered notification interrupt triggered notification � � Lock requests handled on a processor Lock requests handled on a processor- -by by- - � processor basis processor basis � Separate the lock variables according to the Separate the lock variables according to the � critical section lengths critical section lengths

SoC Lock Cache Hardware SoC Lock Cache Hardware Mechanism Mechanism P2 P1 PN SoC Lock Cache Memory SoC Arbit rat ion Lock Logic Cache

Methodology Methodology Application Software � Multiple application tasks Multiple application tasks � (Tasks) � Atalanta Atalanta- -RTOS RTOS � Extension Atalanta-RTOS � Multi Multi- -processor set processor set- -up up � Software with MPC750s with MPC750s Hardware � SoCLC SoCLC provides lock provides lock � MPC750 MPC750 synchronization among synchronization among SoC processors processors Lock Cache MPC750 MPC750

Hardware Simulation Set- -up up Hardware Simulation Set � Seamless CVE from Mentor Graphics � 4 MPC750s � SoC Lock Cache Unit (SoCLC) � Shared Memory � Interface Logic

Software Task 1 : CS access Processor 1 In the case of long Critical Sections, Processor 2 Task 2 : Try to access CS Busy-Wait CS access non-preemptive Task 3 synchronization causes Task 1 :CS access inefficient CPU Interrupt Processor 1 utilization among Processor 2 tasks. Task 2 : Try to access CS CS access preempt Task 3 Tasks Execution Context Sw Time Improvement and ISR

Software Software Lock-wait table 1 7 6 5 4 3 2 1 0 15 15 14 14 13 13 12 12 11 11 10 10 9 8 Lock 1 Lock 1 23 23 22 22 21 21 20 20 19 19 18 18 17 17 16 16 Lock 2 Lock 2 31 31 30 30 29 29 28 28 27 27 26 26 25 25 24 24 Lock 3 Lock 3 39 39 38 38 37 37 36 36 35 35 34 34 33 33 32 32 Lock 4 Lock 4 47 47 46 46 45 45 44 44 43 43 42 42 41 41 40 40 55 55 54 54 53 53 52 52 51 51 50 50 49 49 48 48 … 63 63 62 62 61 61 60 60 59 59 58 58 57 57 56 56 Lock n Lock n 7 6 5 4 3 2 1 0 15 15 14 14 13 13 12 12 11 11 10 10 9 8 � Assume 64 tasks Assume 64 tasks � 23 23 22 22 21 21 20 20 19 19 18 18 17 17 16 16 � Each lock keeps a lock Each lock keeps a lock- - 31 31 30 30 29 29 28 28 27 27 26 26 25 25 24 24 � 39 39 38 38 37 37 36 36 35 35 34 34 33 33 32 32 wait table of 64- -bit entries bit entries wait table of 64 47 47 46 46 45 45 44 44 43 43 42 42 41 41 40 40 55 55 54 54 53 53 52 52 51 51 50 50 49 49 48 48 � Expandable to > 64 Expandable to > 64 � 63 63 62 62 61 61 60 60 59 59 58 58 57 57 56 56 � Tables accessed by ISR Tables accessed by ISR � Lock-wait table 2

Software Software Lock_longCS task1 Read_lock task2 PE1 Free? task3 Remove task return from task4 PE2 from ready table Lock_longCS Context Execution Execute Switch without holding lock long CS New task Holding lock Interrupt UnLock Fail to acquire lock Execute ISR, Release lock Interrupt Handler

Experiments Experiments Database Application (database object flow) Server address � With With Atalanta Atalanta Client address � space space RTOS RTOS shared Server Client data � With 4 MPC750s With 4 MPC750s � � Database Example Database Example � Shared client server Memory application (run application (run local local with 40 tasks) with 40 tasks) memory memory

Experiments Experiments Example Database Application Transactions Observed Observed Performance Performance Improvement with Improvement with Lock Cache Unit Lock Cache Unit • 100% speedup in lock 100% speedup in lock • delay delay • 32% speedup in lock 32% speedup in lock • latency latency • • 27% speedup in total 27% speedup in total execution time execution time

Experiments Experiments Long CS lock results Without With Without With Speedup Speedup SoCLC SoCLC SoCLC SoCLC Lock Lock • Atalanta RTOS Latency Latency 1.32x 1200 908 1.32x 1200 908 • 40 tasks (clk ( clk cycles) cycles) • 4 PEs Lock Delay Lock Delay 47,264 23,590 2.00x 2.00x 47,264 23,590 (clk clk cycles) cycles) ( Exe. Time Exe. Time 36.9M 29M 1.27x 1.27x 36.9M 29M (clk clk cycles) cycles) (

Experiments Experiments Sm all CS lock results Without With Without With Speedup Speedup SoCLC SoCLC SoCLC SoCLC Lock Lock • Atalanta RTOS Latency Latency 27x 884 32 27x 884 32 • 40 tasks (clk ( clk cycles) cycles) • 4 PEs Lock Delay Lock Delay 8936 102 87.6x 87.6x 8936 102 (clk clk cycles) cycles) (

A System- -on on- -a a- -Chip Lock Chip Lock A System Cache - PowerPoint PPT Presentation

A System- -on on- -a a- -Chip Lock Chip Lock A System Cache with Task Preemption Cache with Task Preemption Support Support By By Bilge S. Bilge S. Akgul Akgul, , Jaehwan Jaehwan Lee and Lee and Vincent J. Mooney Vincent J.

1 Classifying cache misses Cache Organization Classifying misses by causes (3Cs) Cache size,

Generations of Cache 1980: no cache in proc; 1989 first Intel proc with a cache on chip.

What Is Memory Hierarchy A typical memory hierarchy today: Lecture 13: Cache Basics and Cache

Memory Hierarchy: Cache Memory hierarchy Cache basics Locality Cache organization Cache-aware

Web Cache Consistency Web Cache Consistency Web Cache Consistency Web Cache Consistency

L09: Cache Name: ID: Question: Direct Mapping Cache Hit Rate Consider a 4-block empty Cache,

Cache Memory Chapter 17 S. Dandamudi Outline Introduction Types of cache misses

Cache Performance Associativity Replacement Samira Khan Cache Performance March 28,

Cache Memory Chapter 17 S. Dandamudi Outline Introduction Types of cache misses

Caches Electronic Computers M Caches 1 Cache LOCALITY PRINCIPLE (SPATIAL AND TEMPORAL)

Plan Hierarchical memories and their impact on our programs 1 Cache Memories, Cache Complexity

Cache Coherency and Memory Consistency Why On-Chip Cache Coherence is here to stay - Motivation:

1 Reader/Writer Lock: Second Try Reader/Writer Lock: Second Try Guidelines for Condition

Calibration des Microroc (II) Alex, Cyril, Giom, Jean, Max 09 Mai 2011, Annecy 1 Reminder 2

Cache Creek Placer Area Fee Proposal History of Placer Mining at Cache Creek Prospecting in

Cache Memories, Cache Complexity Marc Moreno Maza University of Western Ontario, London, Ontario

OPENACC PROGRAMMING MODEL Xiaonan (Daniel) Tian, Brent Leback, and Michael Wolfe PGI GPU

Microsoft Office Upload Center Cache Files in Forensic Investigations Rick van Gorp, Kotaiba

Protecting Applications Against TOCTTOU Races by User-Space Caching of File Metadata Mathias

The Bunker Cache for Spatio-Value Approximation Joshua San Miguel Jorge Albericio Natalie

An Introduction to An Introduction to Geocaching Geocaching Chris Kracik Chris Kracik aka

Cache la Poudre River National Heritage Area Supported and Managed by the Poudre Heritage

DNS Session 2: DNS cache operation and DNS debugging Joe Abley AfNOG 2006 workshop How caching

Authored by, Suyong Eum, Kiyohide Nakauchi, Yozo Shoji, Nozomu Nishinaga, Masayuki Murata It