MEMORY HIERARCHY RANDOM ACCESS MEMORY Key features RAM is - - PowerPoint PPT Presentation

memory hierarchy random access memory
SMART_READER_LITE
LIVE PREVIEW

MEMORY HIERARCHY RANDOM ACCESS MEMORY Key features RAM is - - PowerPoint PPT Presentation

MEMORY HIERARCHY RANDOM ACCESS MEMORY Key features RAM is traditionally packaged as a chip. Basic storage unit is normally a cell (one bit per cell). Multiple RAM chips form a memory. RAM comes in two varieties: SRAM (Static


slide-1
SLIDE 1

MEMORY HIERARCHY

slide-2
SLIDE 2

Key features ▸ RAM is traditionally packaged as a chip. ▸ Basic storage unit is normally a cell (one bit per cell). ▸ Multiple RAM chips form a memory. RAM comes in two varieties: ▸ SRAM (Static RAM) - Very Expensive but fast (~1 nanosecond) ▹ Registers and Caches (1 – 3 MB) ▸ DRAM (Dynamic RAM) – Cheap but slow (~12 ns for DDR3) ▹ Main Memory (4 – 32 GB)

RANDOM ACCESS MEMORY

2

slide-3
SLIDE 3

A bus is a collection of parallel wires that carry address, data, and control signals. Buses are typically shared by multiple devices.

OUTSIDE OF THE CPU CHIP

3

slide-4
SLIDE 4

CPU places address A on the memory bus.

MEMORY OPERATION

4

slide-5
SLIDE 5

Main memory reads A from the memory bus, retrieves word x, and places it on the bus.

MEMORY OPERATION

5

slide-6
SLIDE 6

HARD DRIVE STRUCTURE

6

slide-7
SLIDE 7

Disks consist of platters, each with two surfaces. Each surface consists of concentric rings called tracks. Each track consists of sectors separated by gaps.

DISK GEOMETRY

7

slide-8
SLIDE 8

Aligned tracks form a cylinder.

DISK GEOMETRY

8

slide-9
SLIDE 9

Modern disks partition tracks into disjoint subsets called recording zones. ▸ Each track in a zone has the same number of sectors, determined by the circumference of innermost track. ▸ Each zone has a different number

  • f sectors/track, outer zones have

more sectors/track than inner zones. ▸ So we use average number of sectors/track when computing capacity.

RECORDING ZONES

9

slide-10
SLIDE 10

Capacity: maximum number of bits that can be stored. Capacity = (# bytes/sector) x (avg. # sectors/track) x (# tracks/surface) x (# surfaces/platter) x (# platters/disk) Example: ▸ 512 bytes/sector ▸ 300 sectors/track (on average) 20,000 tracks/surface ▸ 2 surfaces/platter ▸ 5 platters/disk Capacity = 512 x 300 x 20000 x 2 x 5 = 30,720,000,000 = 30.72 GB

COMPUTING DISK CAPACITY

10

slide-11
SLIDE 11

DISK OPERATION

11

slide-12
SLIDE 12

DISK OPERATION

12

slide-13
SLIDE 13

▸ Surface organized into tracks ▸ Tracks divided into sectors ▸ A sector is the minimum unit that can be accessed by

DISK ACCESS

13

Head in position above a track

slide-14
SLIDE 14

DISK ACCESS

14

slide-15
SLIDE 15

DISK ACCESS

15

slide-16
SLIDE 16

DISK ACCESS

16

slide-17
SLIDE 17

DISK ACCESS

17

slide-18
SLIDE 18

DISK ACCESS

18

slide-19
SLIDE 19

DISK ACCESS

19

slide-20
SLIDE 20

DISK ACCESS

20

slide-21
SLIDE 21

DISK ACCESS

21

slide-22
SLIDE 22

DISK ACCESS TIME

22

Average time to access some target sector approximated by: ▸

Taccess = Tavg seek + Tavg rotation + Tavg transfer

Seek Time (Tavg seek ) ▸ Time to position heads over cylinder containing target sector. ▸ Typical Tavg seek is 3~9 ms Rotational Latency (Tavg rotation ) ▸ Time waiting for first bit of target sector to pass under r/w head. ▸ Tavg rotation = 1/2 x 1/RPMs x 60 sec/1 min ▸ Typical Tavg rotation = 7200 RPMs Transfer Time (Tavg transfer ) ▸ Time to read the bits in the target sector. ▸ Tavg transfer = 1/RPM x 1/(avg # sectors/track) x 60 secs/1 min.

slide-23
SLIDE 23

EXAMPLE

23

A Hard Disk has the following parameters: ▸ Rotational rate = 7,200 RPM ▸ Average seek time = 9 ms ▸ Avg # sectors/track = 400 Compute the disk access time to read one sector. Answer: ▸ Tavg rotation = 1/2 x (60 secs/7200 RPM) x 1000 ms/sec = 4 ms ▸ ▸ Tavg transfer = 60/7200 RPM x 1/400 secs/track x 1000 ms/sec = 0.02 ms

Taccess = 9 ms + 4 ms + 0.02 ms

slide-24
SLIDE 24

DISK ACCESS ISSUES

24

Access time dominated by seek time and rotational latency. First bit in a sector is the most expensive, the rest are (essentially) free. SRAM access time is about 4 ns/doubleword, DRAM about 60 ns ▸ Disk is about 40,000 times slower than SRAM ▸ 2,500 times slower then DRAM.

slide-25
SLIDE 25

LOGICAL BLOCK ADDRESSING

25

Modern disks present a simpler abstract view of the complex sector geometry: ▸ The set of available sectors is modeled as a sequence of b-sized logical blocks (0, 1, 2, ...) Mapping between logical blocks and actual (physical) sectors ▸ Maintained by hardware/firmware device called disk controller ▸ Converts requests for logical blocks into (surface, track, sector) triples

slide-26
SLIDE 26

SOLID STATE DISKS (SSD)

26

▸ Pages: 512KB to 4KB, Blocks: 32 to 128 pages ▸ Data read/written in units of pages ▸ Page can be written only after its block has been erased ▸ A block wears out after about 100,000 repeated writes

slide-27
SLIDE 27

SSD PERFORMANCE

27

Sequential access faster than random access ▸ Common theme in the memory hierarchy Random writes are somewhat slower ▸ Erasing a block takes a long time (~1 ms) ▸ Modifying a block page requires all other pages to be copied to new block ▸ In earlier SSDs, the read/write gap was much larger.

Sequential Read Throughput 550 MB/s Sequential Write Throughput 470 MB/s Random Read Throughput 365 MB/s Random Write Throughput 303 MB/s Average Sequential Read Time 50 us Average Sequential Write Time 60 us

slide-28
SLIDE 28

SSD TRADEOFFS

28

Advantages ▸ No moving parts ▸ Faster ▸ Less power Disadvantages ▸ Have the potential to wear out ▸ Mitigated by “wear leveling logic” in flash translation layer ▸ E.g. Intel SSD 730 guarantees 128 petabyte (128 x 1015 bytes) of writes before they wear out ▸ In 2015, about 30 times more expensive per byte Due to cost and manufacturing limitations SSDs will not replace Hard Disk Drives entirely in the foreseeable future!

slide-29
SLIDE 29

THE CPU-MEMORY GAP

29

slide-30
SLIDE 30

LOCALITY PRINCIPLE

30

Programs tend to use data and instructions with addresses near or equal to those they have used recently Temporal locality: ▸ Recently referenced items are likely to be referenced again in the near future Spatial locality: ▸ Items with nearby addresses tend to be referenced close together in time

slide-31
SLIDE 31

LOCALITY PRINCIPLE

31

sum = 0; for (i = 0; i < n; i++) sum += a[i]; return sum; Data references ▸ Reference array elements in succession (Spatial Locality) ▸ Reference variable sum each iteration (Temporal Locality) Instruction references ▸ Reference instructions in sequence (Spatial Locality) ▸ Cycle through loop repeatedly (Temporal Locality)

slide-32
SLIDE 32

CACHE INTRODUCTION

32

Caches exploit the locality principle A cache is a small amount of fast, expensive memory ▸ Goes between the processor and the slower, dynamic main memory ▸ Keeps a copy of the most frequently used data from the main memory Memory access speed increases overall, because we’ve made the common case faster ▸ Reads and writes to the most frequently used addresses will be serviced by the cache ▸ We only need to access the slower main memory for less frequently used data

slide-33
SLIDE 33

EXAMPLE MEMORY HIERARCHY

33

slide-34
SLIDE 34

EXAMPLE MEMORY HIERARCHY

34

slide-35
SLIDE 35

CACHE CONCEPTS

35

slide-36
SLIDE 36

CACHE CONCEPTS: HIT

36

slide-37
SLIDE 37

CACHE CONCEPTS: MISS

37

slide-38
SLIDE 38

MEASURING CACHE PERFORMANCE

38

The hit time is how long it takes data to be sent from the cache to the

  • processor. This is usually fast, on the order of 1-3 clock cycles.

The miss penalty is the time to copy data from main memory to the cache. This often requires dozens of clock cycles (at least). ▸ Multiple Caches organized in levels to reduce miss penalty (Memory Hierarchy) The miss rate is the percentage of misses. ▸ Typical caches have a hit rate of 95% or higher

slide-39
SLIDE 39

AVERAGE MEMORY ACCESS TIME (AMAT)

39

The average memory access time, or AMAT, can then be computed as follows:

AMAT = Hit time + (Miss rate x Miss penalty)

How can we improve the average memory access time of a system? ▸ Obviously, a lower AMAT is better ▸ Miss penalties are usually much greater than hit times, so the best way to lower AMAT is to reduce the miss penalty or the miss rate However, AMAT should only be used as a general guideline. Execution time is still the best performance metric.

slide-40
SLIDE 40

AMAT EXAMPLE

40

Computer X has one cache (L1) with hit time of 1 cycle. The cache hit ratio is 97% and the hit time is one cycle. Computer X has DRAM with access time of 20 cycles. What is the Average Memory Access Time?

AMAT = 1 cycle + (1 - 0.97) * 20 cycles AMAT = 1.6 cycles