Content Server Caching Network Client Web Server Browser Avoid - - PowerPoint PPT Presentation

content server caching
SMART_READER_LITE
LIVE PREVIEW

Content Server Caching Network Client Web Server Browser Avoid - - PowerPoint PPT Presentation

Content Server Caching Network Client Web Server Browser Avoid Network Latency Avoid Queuing Delays at Server Cache Content Locally at client Updates? Cache Content at proxies Proxy Server Client Proxy Server Content Server Proxy


slide-1
SLIDE 1

Content Server Caching

Web Server Client Browser Network

Cache Content Locally at client Avoid Queuing Delays at Server Avoid Network Latency Content Server Proxy Server Proxy Server Proxy Server Client Cache Content at proxies

Updates?

1

slide-2
SLIDE 2

Caching in Device and Storage Array

DISK BLOCKS CACHE

PORT PORT Controller Controller

I/O devices (Disk) ~2 ms - 100 ms Main Memory (DRAM) ~100 - 200 ns

2

slide-3
SLIDE 3

Disk Buffer Caching

Buffer Cache DISK BLOCKS CACHED DISK BLOCKS MAIN MEMORY PROCESSOR I/O devices (Disk) ~2 ms - 100 ms Main Memory (DRAM) ~100 - 200 ns

Why not replace slow disk with fast main memory? Cost vs Speed (Flash SSDs?) Persistence

3

slide-4
SLIDE 4

Processor Caching

Main Memory DRAM: Dense, Cheaper, Slower, Needs Refresh (40ns - 100ns) Processor Cache SRAM: Expensive, Fast (5-20 ns) Integrated with the processor can operate at processor speed

MEMORY L2 L1 R DRAM SRAM Cache

Processor

Software is aware Transparently handled by hardware

4

slide-5
SLIDE 5

Cache Principle

  • Cache holds only a small fraction of the memory blocks at any time
  • Keep the most valuable blocks in cache
  • Blocks that hold memory words being accessed
  • Locality Principle
  • Temporal Locality:
  • In the near future a memory word that has been touched will be

accessed again

  • e.g. instructions in a loop, local variables in a procedure
  • Spatial Locality
  • Locations that are close by spatially (memory addresses) will be

accessed together (in close temporal proximity)

  • e.g. sequential instruction fetches, walking an array

5

slide-6
SLIDE 6

Memory Hierarchy Review

Speed (directly related to cost) R: Registers 1ns Integrated L1 Cache 1-2 ns L2 (SRAM) ~10 - 20 ns Main Memory (DRAM) ~100 - 200 ns I/O devices (Disk) ~ 2 ms - 50 ms

DISK MEMORY L2 L1 R Cache and Memory Controller Performance Management Virtual Memory System Space Management

6

slide-7
SLIDE 7

Memory Hierarchy Review

  • Registers
  • Limited number of physical registers
  • Register usage managed by compiler and renaming hardware
  • Cache
  • Small high-speed memory (SRAM)
  • Cache hierarchy : L1 (smallest/fastest), L2, L3 optimize cost performance
  • Main Memory (different flavors of DRAM)
  • Order-of magnitude slower than cache
  • Accessed by memory controller using physical addresses
  • Disk
  • Large persistent storage for files
  • Backing store for virtual memory implementation

7

slide-8
SLIDE 8

Two-Level Memory Hierarchy

Cache Controller

L1 CACHE DRAM CPU Memory Controller

Slow, Large capacity Fast, small

8

slide-9
SLIDE 9

Two-Level Memory Hierarchy

  • L1 cache holds copies of some subset of the locations of main memory
  • Processor memory requests intercepted by cache controller
  • Cache Hit:
  • Cache holds the requested memory word (Request can be satisfied from cache)
  • Cache Miss:
  • Stall processor till request is satisfied from main memory
  • Copy of requested word brought from main memory into cache

Simple Analytic Model to estimate effect of stalls for memory access Memory stall cycles = Number of Misses x Miss Penalty = IC x Misses/Instruction x Miss Penalty = IC x Misses/Memory Access x Memory Accesses / Instruction x Miss Penalty = IC x Miss Rate x Memory Accesses / Instruction x Miss Penalty

  • Miss Rate: Program memory access characteristics
  • Temporal and Spatial Locality
  • Cache Organization
  • Memory Accesses/Instruction: Program Characteristics
  • Density of LD and SD instructions (for data memory accesses)
  • Miss Penalty: Memory Subsystem Design
  • DRAM speed
  • Cache memory Bandwidth
  • Memory Parallelism and Controller

9

slide-10
SLIDE 10

Processor Cache Example

Assume machine parameters:

Clock Rate: 1GHz Miss penalty: 200 cycles (accessing main memory and installing in cache) Miss Rate: 1% (misses per memory access) Load/Store: make up 20% of the instructions Nominal CPI: 2 cycles (assuming 100% cache accesses) Note: Miss rate is sometimes also specified as misses per instruction, or misses per memory read, and misses per memory write Memory access/Instruction = 1 (for instruction fetch) + 0.2 (for LD or SD) = 1.2 Misses/Memory Access = Miss Rate = 1% Stall cycles/Miss = Miss penalty = 200 cycles

  • Stall cycles / instruction = 200 x 1% x 1.2 cycles = 2.4 cycles
  • Actual CPI = 2 + 2.4 = 4.4 ( > 100% increase in CPI due to cache misses)
  • If nominal CPI was 1.0: actual CPI is 3.4 and slowdown due to cache misses

would be even greater

10 1

slide-11
SLIDE 11

Processor Cache

General Cache Organization:

Main memory address: n+b bits Divided into blocks of 2b consecutive bytes (Block size or Line size) N = 2n blocks of main memory (2n+b bytes of main memory) 2m cache blocks (also called cache line) m << n b LSBs are used to select a byte (or word) from a block after access.

n MSBs are used to access a block of the cache (cache line)

Main Memory Cache Memory

N = 2n blocks

B = 2b bytes B = 2b bytes n b

Block Address Byte Offset M = 2m blocks

11 1

slide-12
SLIDE 12

Cache Example

Main memory: Byte addressable memory of size 4GB = 232 bytes Cache size: 64KB = 216 bytes Block (line) size : 64 bytes = 26 bytes Number of memory blocks = 232 / 26 = 226 Number of cache blocks = 216 / 26 = 210

Main Memory Cache Memory

N = 226 blocks

B = 64 bytes B = 64 bytes n = 26 b = 6

Block Address Byte Offset M = 210 = 1024 blocks

0 63 12

slide-13
SLIDE 13

Cache Example

Main memory: Byte addressable memory of size 4GB = 232 bytes Cache size: 64KB = 216 bytes Block (line) size : 64 bytes = 26 bytes Number of memory blocks = 232 / 26 = 226 Number of cache blocks = 216 / 26 = 210

Main Memory Cache Memory

N = 226 blocks

B = 64 bytes B = 64 bytes n = 26 b = 6

Block Address Byte Offset M = 210 = 1024 blocks

0 63

Is the accessed memory byte (word) in the cache? If so where? If not, where should I put it when I get it from main memory?

13

slide-14
SLIDE 14

Fully Associative Cache Organization

  • Fully-Associative
  • Set-Associative
  • Direct-Mapped Cache

A cache line can hold any block of main memory A block in main memory can be placed in any cache line Many- Many mapping Maintain a directory structure to indicate which block of memory currently

  • ccupies a cache block

Directory structure known as the TAG Array The TAG entry for a cache stores the block number of the memory block currently in that cache location

14