? Group 6 ? ? CPU ? CPU Memory We want multiple processors - PowerPoint PPT Presentation

Shared Memory Architecture Shared Memory Bus for Multiprocessor Systems Memory CPU ? Mat Laibowitz and Albert Chiou ? ? ? ? Group 6 ? ? CPU ? CPU Memory • We want multiple processors to share memory � Question: How do we connect them together? Shared Memory Architecture Cache Coherency Problem • Each cache needs to $ $ CPU CPU CPU CPU CPU CPU correctly handle memory 0 1 accesses across multiple processors Memory Memory Memory Memory •A value written by one processor is eventually visible Single, large memory Multiple smaller memories by the other processors •When multiple writes happen Issues to the same location by • Scalability multiple processors, all the • Access Time processors see the writes in • Cost the same order. • Application: WLAN vs Single chip multiprocessor

Snooping vs Directory MSI State Machine CPUWr/-- CPURd/-- CPU CPU CPU = M = I A B C CPU CPU M A B = I CPU C CPUWr/ RingInv/-- Memory RingInv DataMsg CPUWr/ RingInv/-- S RingInv DataMsg CPU CPU CPU A B C CPUWr/-- CPURd/ RingInv/-- CPURd/-- RingRd = M !!! CPU A Memory I MSI Transition Chart Ring Topology Cache State Pending Incoming Incoming Actions State Ring Processor Transaction Transaction I & Miss 0 - Read Pending->1; SEND Read I & Miss 0 - Write Pending->1; SEND Write I & Miss 0 Read - PASS CPU 1 CPU 2 CPU n I & Miss 0 Write - PASS I & Miss 0 WriteBack - PASS I & Miss 1 Read - DATA/S->Cache; SEND WriteBack(DATA) I & Miss 1 Write (I/S) - DATA/M->Cache, Modify Cache; SEND WriteBack(DATA) Cache Cache I & Miss Write (M) DATA/M->Cache, Modify Cache; Cache ●●● SEND WriteBack(DATA), Controller 2 SEND WriteBack(data), Controller 1 Pending->2 Controller n S 0 - Read(Hit) - Cache 1 Cache 2 Cache n S 0 - Write Pending->1; SEND Write S 0 Read(Hit) - Add DATA; PASS S 0 Read(Miss) - PASS S 0 Write(Hit) - Add DATA; Cache->I & PASS S 0 Write(Miss) - PASS S 0 WriteBack - PASS S 1 Write - Modify Cache; Cache->M & Pass Token S 1 WriteBack - Pending->0, Pass Token Memory Controller M 0 - Read(Hit) - M 0 - Write(Hit) - M 0 Read(Hit) - Add DATA; Cache->S & PASS M 0 Read(Miss) - PASS M 0 Write(Hit) - Add DATA; Cache->I & PASS M 0 Write(Miss) - PASS Memory M 0 WriteBack - PASS M 1 WriteBack - Pending->0 & Pass Token M WriteBack - Pending->1 2

Ring Implementation Test Rig • A ring topology was chosen for speed and its electrical characteristics request response mkMSICacheController FIFO FIFO – Only point-to-point ringIn ringOut $ Controller – Like a bus = rules FIFO FIFO – Scaleable • Uses a token to ensure sequential mkMSICache waitReg pending token consistency Test Rig Test Rig (cont) mkMultiCacheTH • An additional module was implemented that takes a single stream Client Client Client of memory requests and deals them out to the individual cpu data request ports. request response FIFO mkMSICacheController FIFO •This module can either send one request at a time, wait for a $ Controller $ Controller $ Controller ●●● rule rule rule response, and then go on to the next cpu or it can deal them out as ringIn ringOut $ Controller FIFO FIFO = rules fast as the memory ports are ready. •This demux allows individual processor verification prior to multi- mkDataMemoryController processor verification. ringOut mkMSICache FIFO ringIn FIFO toDMem fromDMem rule rule waitReg pending token FIFO FIFO •It can then be fed set test routines to exercise all the transitions or be hooked up to the random request generator dataReqQ dataRespQ FIFO FIFO mkMultiCache mkDataMem

=> Cache 2: toknMsg op->Tk8 => Cache 5: toknMsg op->Tk2 Design Exploration => Cache 3: ringMsg op->WrBk addr->0000022c data->aaaaaaaa valid->1 cache->1 => Cache 3: getState I => Cache 1: newCpuReq St { addr=00000230, data=ba4f0452 } => Cache 1: getState I => Cycle = 56 => Cache 2: toknMsg op->Tk7 => Cache 6: ringMsg op->Rd addr->00000250 data->aaaaaaaa valid->1 cache->6 => DataMem: ringMsg op->WrBk addr->00000374 data->aaaaaaaa valid->1 cache->5 => Cache 6: getState I => Cache 8: ringReturn op->Wr addr->000003a8 data->aaaaaaaa valid->1 cache->7 => Cache 8: getState I => Cache 8: writeLine state->M addr->000003a8 data->4ac6efe7 • Scale up number of cache controllers => Cache 3: ringMsg op->WrBk addr->00000360 data->aaaaaaaa valid->1 cache->4 => Cache 3: getState I => Cycle = 57 Trace => Cache 6: toknMsg op->Tk2 => Cache 3: toknMsg op->Tk8 • Add additional tokens to the ring allowing basic => Cache 4: ringMsg op->WrBk addr->0000022c data->aaaaaaaa valid->1 cache->1 => Cache 4: getState I => Cycle = 58 pipelining of memory requests => dMemReq: St { addr=00000374, data=aaaaaaaa } => Cache 3: toknMsg op->Tk7 => Cache 7: ringReturn op->Rd addr->00000250 data->aaaaaaaa valid->1 cache->6 Example => Cache 7: writeLine state->S addr->00000250 data->aaaaaaaa • Tokens service disjoint memory addresses => Cache 7: getState I => Cache 1: ringMsg op->WrBk addr->00000374 data->aaaaaaaa valid->1 cache->5 => Cache 1: getState I (ex. odd or even) => Cache 4: ringMsg op->WrBk addr->00000360 data->aaaaaaaa valid->1 cache->4 => Cache 4: getState I => Cache 9: ringMsg op->WrBk addr->000003a8 data->aaaaaaaa valid->1 cache->7 => Cache 9: getState I => Cycle = 59 • Compare average memory access time versus => Cache 5: ringMsg op->WrBk addr->0000022c data->aaaaaaaa valid->1 cache->1 => Cache 5: getState I => Cache 7: toknMsg op->Tk2 number of tokens and number of active CPUs => Cache 3: execCpuReq Ld { addr=000002b8, tag=00 } => Cache 3: getState I => Cache 4: toknMsg op->Tk8 => Cycle = 60 => DataMem: ringMsg op->WrBk addr->000003a8 data->aaaaaaaa valid->1 cache->7 => Cache 2: ringMsg op->WrBk addr->00000374 data->aaaaaaaa valid->1 cache->5 => Cache 2: getState I => Cache 8: ringMsg op->WrBk addr->00000250 data->aaaaaaaa valid->1 cache->6 => Cache 8: getState I => Cache 5: ringReturn op->WrBk addr->00000360 data->aaaaaaaa valid->1 cache->4 => Cache 5: getState S => Cycle = 61 => Cache 5: toknMsg op->Tk8 Test Results Test Results Number of Controllers vs. Avg. Access Time (2 Tokens) Number of Tokens vs. Avg. Access Time (9 Controllers) 30 30 25 25 Average Access Time (clock cycles) Average Access Time (clock cyles) 20 20 15 15 10 10 5 5 0 0 3 6 9 2 4 8 Number of Controllers Number of Tokens

Placed and Routed Stats (9 cache, 8 tokens) • Clock speed: 3.71ns (~270 Mhz) • Area: 1,296,726 µm 2 with memory • Average memory access time: ~39ns

? Group 6 ? ? CPU ? CPU Memory We want multiple processors - PowerPoint PPT Presentation

Shared Memory Architecture Shared Memory Bus for Multiprocessor Systems Memory CPU ? Mat Laibowitz and Albert Chiou ? ? ? ? Group 6 ? ? CPU ? CPU Memory We want multiple processors to share memory Question: How do we

Networks Computer-Computer Comm CPU CPU CPU CPU Memory Device Device Memory Memory

Router Architectures CPU CPU Memory Memory packets NFE NFE Processor Processor Line Card

TXN/SEC CPU CORES TXN/SEC CPU CORES TXN/SEC CPU CORES TXN/SEC CPU CORES TXN/SEC CPU CORES

Shared Memory Bus for Multiprocessor Systems Mat Laibowitz and Albert Chiou Group 6 Shared

Cache Systems CPU Main Main CPU Memory Memory 400MHz 10MHz Cache 10MHz Memory Hierarchy

CPU Scheduling CPU Scheduling CPU Scheduling 101 CPU Scheduling 101 The CPU scheduler makes a

CPU Scheduling CPU Scheduling CPU Scheduling 101 CPU Scheduling 101 The CPU scheduler makes a

Multiple- -Writer Distributed Memory Writer Distributed Memory Multiple The Sequential

Memory II. Memory improvement III. Problems with memory 3 systems/stages of Memory: memory

Memory and I/O buses I/O bus 1880Mbps 1056Mbps Memory CPU Crossbar CPU accesses physical

CPU scheduling CPU 1 P k P 3 P 2 P 1 . . . CPU 2 . . . CPU n The scheduling problem: - Have

Why memory hierarchy (3 rd Ed: p.468-487, 4 th Ed: p. 452-470) users want unlimited fast

Multi Cycle CPU Jason Mars Monday, February 4, 13 Why a Multiple Cycle CPU? Monday, February 4,

Lect. 4: Shared Memory Multiprocessors Obtained by connecting full processors together

COMP 590-154: Computer Architecture Shared-Memory Multi-Processors Shared-Memory Multiprocessors

Memory Management Ideally programmers want memory that is large fast non

Stateless Microservice Security via QCON SP JWT, TomEE and MicroProfile Jean-Louis Monteiro

RSA Authentication Manager 8.2 Over 25,000 customers 50 60 million active tokens in

Bharosa Acquisition Announcement Delivers next generation of risk-based access management July

EM EMVS Nationa nal Bl Bluepri rint System A Technical Overview for Pharmacies and

Digital Coin Offerings: New SEC Guidance on Registration of Blockchain Tokens as Securities

dCache Storage Resource Manager introduction Timur Perelmutov For the dCache team Edinburgh,

Contextual Access and Multi-Factor Authentication Lessons learned on getting past single-factor

Mobile Ticketing for All Transit Agencies By Sam Daly Founder, Token Transit Presentation