EE 457 Unit 7b Main Memory Organization 2 Motivation Organize - PowerPoint PPT Presentation

1 EE 457 Unit 7b Main Memory Organization

2 Motivation • Organize main memory to – Facilitate byte-addressability while maintaining… – Efficient fetching of the words in a cache block • Low order interleaving (L.O.I) helps us achieve this

3 Interleaving Analogy • Consider a journal consisting of 1000 pages (000-999) bound in – 10 volumes (0-9) of – 100 pages each (00-99) Method I Method II (Consecutive pages (Consecutive pages in in a volume) consecutive volumes) 000 000 001 010 Volume 0 Volume 0 … … 099 990 100 001 101 011 Volume 1 … Volume 1 … 199 991 … … 900 009 901 019 Volume 9 … Volume 9 … 999 999

4 Interleaving Analogy • Example: Say article 73 runs from page 730-739 – In Method I: Article 73 is completely in volume 7 – In Method II: The 73 rd page of each volume form article 73 as shown below • Which do you prefer? – If reading the article you may say method I – If you have to make a copy of the article and you have 10 photocopy machines with 10 friends to help you might say method II • Back to the scenario of reading the article, given those same 10 friends they could open each volume to page 73 for you so that you can read in a continuous manner Page 730 is page 73 of volume 0 Page 731 is page 73 of volume 1 Low Order Interleaving … Page 739 is page 73 of volume 9

5 Byte Addressability 1. Intel 8085: 16-bit addr., 8-bit data, A15-A0 byte addressable processor. 64K Memory space: 2 16 = 64KB, A15-A0, D7-D0 8 2. Intel 8086: 20-bit addr., 16-bit data, byte A19-A1 addressable, little-endian proc. ½ MB ½ MB Memory space: 2 20 = 1MB, A19-A0 8 8 [A19-A1, BHE (BE1), A0 (BE0)], D15-D0 BHE=0 A0=0 Byte 41 Byte 40 = Word 40 D[15:8] D[7:0] 3. Intel 80386: 32-bit addr., 32-bit data, byte addressable, little-endian proc. A31-A2 1 GB Memory space: 2 32 = 4GB, A31-A0 [A31-A2, BE3, BE2, BE1, BE0], D31-D0 8 8 8 8 BE3 BE2 BE1 BE0 Byte 43 Byte 42 Byte 41 Byte 40 = Word 40 D[31:24] D[7:0]

6 Byte Addressability 4. Intel 80386: 32-bit addr., 32-bit data, byte A31-A2 1 GB addressable, big-endian proc. Memory space: 2 32 = 4GB, A31-A0 8 8 8 8 [A31-A2, BE3, BE2, BE1, BE0], D31-D0 BE0 BE1 BE2 BE3 Byte 40 Byte 41 Byte 42 Byte 43 D[31:24] D[7:0] = Word 40 A31-A3 A2=1 5. Little-Endian system, 2-way interleaved system: ½ GB 32-bit addr., 32-bit data, A2=0 byte addressable 8 8 8 8 8 8 8 8 (Narrow, 32-bit data bus b/w mem. and cache) BE2 BE1 BE0 BE2 BE1 BE0 BE3 BE3 Memory space: 2 32 = 4GB, A31-A0 D[31:24] D[7:0] D[31:24] D[7:0] XCVR XCVR [A31-A2, BE3, BE2, BE1, BE0], D31-D0 Narrow Bus D[31:0] A3,A2 = 11 A3,A2 = 10 A3,A2 = 01 A3,A2 = 00 6. Same as 5 above, ¼ A31-A4 but 4-way interleaved GB XCVR XCVR XCVR XCVR D[31:0]

7 2-Way L.O.I. • System address bus uses – A1:A0 and size info to generate /BE3../BE0 Bank 1 Bank 0 (Byte Enables) • In a 32-bit data bus, we need 2 A2=1 A2=0 address bits to produce the 4 BE’s A31-A3 A28-A0 A28-A0 • In a 64-bit data bus, we would ½ GB need 3 address bits to produce 8 BE’s Shift of 3-bits in – Lower order bits to select a 8 8 8 8 8 8 8 8 address connections BE2 BE1 BE0 BE2 BE1 BE0 BE3 BE3 “bank” D[31:24] D[7:0] D[31:24] D[7:0] • Only 1 address bit, A2, to select A2=0 A2=1 XCVR XCVR one of 2 banks Narrow Bus – Upper bits connect to each memory chip D[31:0] • Each memory chip is just a collection of ½ GB requiring 29 address bits…we can connect appropriate 29 bits

8 4-Way L.O.I. • System address bus uses – A1:A0 and size info to Bank 3 Bank 3 Bank 2 Bank 2 generate /BEi (Byte A3,A2 = A3,A2 = A3,A2 = A3,A2 = Enables) 11 10 01 00 A27-0 A27-0 A27-0 A27-0 A31-A4 – Lower order bits to ¼ select a “bank” GB – Upper bits connect to XCVR XCVR XCVR XCVR each memory chip D[31:0] Shift of 4-bits in address connections

9 Organization Options a.) One-word-wide b.) Wide Memory c.) EE 457 Interleaved memory Organization Organization CPU CPU CPU Multiplexer Cache Cache Cache Bus Bus Bus Mem. Mem. Mem. Mem. Bank Memory Memory Bank Bank Bank 0 1 2 3

10 Organization Comparison • Assume following latencies Send address to MM 1 clock MM (DRAM) Access Time 15 clocks Transfer time for one word 1 clock • Find time to access a cache line of 4-words a. Narrow Memory 1 + 4*15 + 4*1 = 65 clocks (assume mem. controller will auto-increment address) b. Wide Memory 1 + 15 + 1 = 17 clocks c. Interleaved Memory 1 + 15 + 4*1 = 20 clocks

11 Example • Consider a set-associative mapping and physical organization of main memory, cache data RAMs, and cache tag RAMs. • Specs: – 32-bit physical address, byte-addressable system – Cache Size = 64KB – Block Size = 4 words (16 bytes) – Set Size = 4 blocks (64 bytes) # of MM Blocks = 2 32 / 2 4 = 2 28 # of Cache Blocks = 2 16 / 2 4 = 2 12 # of Sets = 2 12 cache blocks / 2 2 blocks/set = 2 10 # of Groups = 2 28 MM blocks / 2 10 sets = 2 18 Member TAG SET WORD BYTE A3 – A2 A1 – A0 A31-A14 A13-A4 /BE3 - /BE0

12 Tag RAM Example Tag RAM Tag RAM (Holding Tags & Valid (Holding Tags & Valid Bits for Way 0) Bits for Way 2) A13-A4 A13-A4 Set A DO A DO Hit/Miss Hit/Miss = = DI DI A31-A14 Tag A31-A14 + V-bit + V-bit Tag RAM Tag RAM (Holding Tags & Valid (Holding Tags & Valid Bits for Way 3) Bits for Way 1) A13-A4 A13-A4 A DO A DO Hit/Miss Hit/Miss = = DI DI A31-A14 A31-A14 + V-bit + V-bit

13 MM & Data RAM Example A31-A4 A3,A2 = 11 A3,A2 = 10 A3,A2 = 01 A3,A2 = 00 256MB 256MB 256MB 256MB 256MB 256MB 256MB 256MB 256MB 256MB 256MB 256MB 256MB 256MB 256MB 256MB 32-bit Bidirectional XCVR 32-bit Bidirectional XCVR 32-bit Bidirectional XCVR 32-bit Bidirectional XCVR A31-A2, Set + Word /BE3 - /BE0 /BE3 - /BE0 /BE3-/BE0 A13-A2 A13-A2 4 KB 4 KB 4 KB 4 KB Way 0 4 KB 4 KB 4 KB 4 KB Way 2 D[31:24] D[23:16] D[15:8] D[7:0] D[31:24] D[23:16] D[15:8] D[7:0] 80386 + Buffers D[31:24] D[23:16] D[15:8] D[7:0] D[31:24] D[23:16] D[15:8] D[7:0] Way 3 Way 1 4 KB 4 KB 4 KB 4 KB 4 KB 4 KB 4 KB 4 KB A13-A2 A13-A2 /BE3 - /BE0 /BE3 - /BE0

14 Main memory organization DRAM TECHNOLOGIES

15 Memory Module Organization • Memory module is designed to Processor with 64-bit Data Bus always access data in chunks the 1 DWord at address 0x000c: A[31:0] = 0000…1100 size of the data bus (64-bit data Processor Core / Registers Control bus = 64-bit accesses) 5 A[2:0] + SIZE 0x5098a7fb • Parallelizes memory access by 100 DWORD 2 4 accessing the byte at the same 50 98 a7 fb c6 13 8A 57 0000..01 A[31:3] location in all (8) memory chips Byte/ 7 6 5 4 3 2 1 0 Lane at once • Only the desired portion will be D[7:0] D[63:56] D[15:8] forwarded to the registers 50 8A 57 • Note the difference between 10 17 11 ... 6D ... A4 ... 22 system processor address and 8 f e 0x1 57 local memory chip addresses 0x1 3 0x1 8A 50 0000..01 0 7 1 Byte address from 0x0 E4 0x0 0x0 2C F8 individual chip perspective Byte address from system/processor perspective • Each chip on the module reads 1 byte and outputs it to form a collectively larger word on the data bus (i.e. 8-bytes = 64-bits)

16 Memory Chip Organization • Memory technologies share the 1K Bit Lines same layout but differ in their BL[0] BL[1024] cell implementation 1 1 – SRAM Cell Cell – DRAM WL[0] • Memories require the row bits 10-bits 1 0 be sent first and are used to 0000000001 Cell Cell Row Addr. Decoder select one row (aka "word line") WL[1] Row Addr – Uses a hardware component SRAM and DRAM differ known as a decoder 0x000410 in how each cell is made, but the • All cells in the selected row organization is roughly access their data bits and the same 0000010 0 0 output them on their respective Cell Cell Col "bit line" WL[1023] • The column address is sent next and used to select the desired XXX 000 8 bit lines (i.e. 1 byte) Amplifiers & Column Mux Column – Uses a hardware component Addr known as a mux Data[7:0] in/out

17 SRAM vs. DRAM • Dynamic RAM (DRAM) Cells (store 1 bit) – Will lose values if not refreshed periodically every few milliseconds [i.e. dynamic] – Extremely small (1 Transistor & a capacitor) • Means we can have very high density (GB of RAM) – Small circuits require more time to access the bit • SLOW – Used for main memory • Static RAM (SRAM) Cells (store 1 bit) – Will retain values as long as power is on [i.e. static] – Larger (6 transistors) This Photo by Unknown Author – Larger circuitry can access bit faster is licensed under CC BY-NC • FASTER – Used for cache memory

18 Memory Controller • DRAMs require non-trivial hardware controller (aka memory controller) – To split up the address and send the row and column address as the right time – To periodically refresh the DRAM cells Legacy architectures used separate – Plus more… chipsets for the memory and I/O controller • Used to require a separate chip from the processor • But due to scaling (i.e. Moore's Law) most processors integrate the controller on-chip – Helps reduce access time since fewer hops Current general-purpose processors usually integrate the memory controller on chip.

EE 457 Unit 7b Main Memory Organization 2 Motivation Organize - PowerPoint PPT Presentation

1 EE 457 Unit 7b Main Memory Organization 2 Motivation Organize main memory to Facilitate byte-addressability while maintaining Efficient fetching of the words in a cache block Low order interleaving (L.O.I) helps us achieve

457 Retirement Program 41-10390-29 2018/01/05 457 Retirement Program Things You Already Know

Credits These slides were derived from Gandhi Puvvadas EE 457 Class Notes EE 457 Unit 1

EE 457 Focus on CPU Design Microarchitecture EE 457 Unit 0 General Digital System

Deferred Compensation Plans 457(b) & 457(f) Presented By: Nonqualified Deferred Compensation

HOUSING PROJECT 1 UNIT 4 UNIT 1 UNIT 6 UNIT 5 UNIT 3 UNIT 2 Application of the Concept

EE 457 Unit 1 Overview of Digital System Design 1.2 Credits These slides were derived from

Unit Identifier Unit October 21, 2014 Unit Identifiers Unit Members Representing Name Email

Unit Title: Presentation Software Unit Level: 2 Unit Credit Value: 4 GLH: 30 LASER Unit

Caroline Van Wie AT&T Services Inc. T: 202.457.3053 AVP - Federal Regulatory 1120 20 th

EE 457 Unit 4 Computer System Performance 2 Motivation An individual user wants to:

EE 457 Unit 7a Cache and Memory Hierarchy 2 Memory Hierarchy & Caching Use several

EE 457 Unit 2 Fixed Point Systems and Arithmetic 2 Unsigned 2s Complement Sign and Zero

EE 457 Unit 6c Control Hazards 2 Control Hazards Control (branch) hazards are named such

EE 457 Unit 2b Fast Adders (Carry-Lookahead Adder) 2 Carry-Lookahead Adders FAST ADDERS 3

EE 457 Unit 6b Data Hazards 2 Data Hazards Consider the data dependencies in the following

EE 457 Unit 2a Unsigned 2s Complement Sign and Zero Extension Fixed Point Systems and

6502 Interrupt and Bus Philipp Koehn 23 September 2019 Philipp Koehn Computer Systems

Peer-to-Peer Networks 03 CAN (Content Addressable Network) Christian Ortolf Technical Faculty

A Scalable, Cont ent - Addressable Net work 1,2 3 1 Sylvia Rat nasamy, Paul Francis, Mark

Learning in One-Layer Networks Psych 209 January 9, 2020 Input-output mapping Simplest model of

Transaction-level modeling of bus-based systems with SystemC 2.0 Ric Hilderink, Thorsten

Modern DRAM Memory Systems Brian T. Davis MTU Interview Seminar Advanced Computer Architecture

Mr. Gursharan Singh Tatla professorgstatla@gmail.com Gursharan Singh Tatla 1 System Bus The

The Memory Hierarchy 10/25/16 Transition First half of course: hardware focus How the

EE 457 Unit 7b Main Memory Organization 2 Motivation Organize - PowerPoint PPT Presentation

1 EE 457 Unit 7b Main Memory Organization 2 Motivation Organize main memory to Facilitate byte-addressability while maintaining Efficient fetching of the words in a cache block Low order interleaving (L.O.I) helps us achieve

457 Retirement Program 41-10390-29 2018/01/05 457 Retirement Program Things You Already Know

Credits These slides were derived from Gandhi Puvvadas EE 457 Class Notes EE 457 Unit 1

EE 457 Focus on CPU Design Microarchitecture EE 457 Unit 0 General Digital System

Deferred Compensation Plans 457(b) &amp; 457(f) Presented By: Nonqualified Deferred Compensation

HOUSING PROJECT 1 UNIT 4 UNIT 1 UNIT 6 UNIT 5 UNIT 3 UNIT 2 Application of the Concept

EE 457 Unit 1 Overview of Digital System Design 1.2 Credits These slides were derived from

Unit Identifier Unit October 21, 2014 Unit Identifiers Unit Members Representing Name Email

Unit Title: Presentation Software Unit Level: 2 Unit Credit Value: 4 GLH: 30 LASER Unit

Caroline Van Wie AT&amp;T Services Inc. T: 202.457.3053 AVP - Federal Regulatory 1120 20 th

EE 457 Unit 4 Computer System Performance 2 Motivation An individual user wants to:

EE 457 Unit 7a Cache and Memory Hierarchy 2 Memory Hierarchy &amp; Caching Use several

EE 457 Unit 2 Fixed Point Systems and Arithmetic 2 Unsigned 2s Complement Sign and Zero

EE 457 Unit 6c Control Hazards 2 Control Hazards Control (branch) hazards are named such

EE 457 Unit 2b Fast Adders (Carry-Lookahead Adder) 2 Carry-Lookahead Adders FAST ADDERS 3

EE 457 Unit 6b Data Hazards 2 Data Hazards Consider the data dependencies in the following

EE 457 Unit 2a Unsigned 2s Complement Sign and Zero Extension Fixed Point Systems and

6502 Interrupt and Bus Philipp Koehn 23 September 2019 Philipp Koehn Computer Systems

Peer-to-Peer Networks 03 CAN (Content Addressable Network) Christian Ortolf Technical Faculty

A Scalable, Cont ent - Addressable Net work 1,2 3 1 Sylvia Rat nasamy, Paul Francis, Mark

Learning in One-Layer Networks Psych 209 January 9, 2020 Input-output mapping Simplest model of

Transaction-level modeling of bus-based systems with SystemC 2.0 Ric Hilderink, Thorsten

Modern DRAM Memory Systems Brian T. Davis MTU Interview Seminar Advanced Computer Architecture

Mr. Gursharan Singh Tatla professorgstatla@gmail.com Gursharan Singh Tatla 1 System Bus The

The Memory Hierarchy 10/25/16 Transition First half of course: hardware focus How the

Deferred Compensation Plans 457(b) & 457(f) Presented By: Nonqualified Deferred Compensation

Caroline Van Wie AT&T Services Inc. T: 202.457.3053 AVP - Federal Regulatory 1120 20 th

EE 457 Unit 7a Cache and Memory Hierarchy 2 Memory Hierarchy & Caching Use several