[537] TLBs Tyler Harter 9/21/14 Overview Review Paging TLBs - - PowerPoint PPT Presentation

537 tlbs
SMART_READER_LITE
LIVE PREVIEW

[537] TLBs Tyler Harter 9/21/14 Overview Review Paging TLBs - - PowerPoint PPT Presentation

[537] TLBs Tyler Harter 9/21/14 Overview Review Paging TLBs (Chapter 18) TLB measurement demo (if time) Review: Paging 0 KB 1 5 4 P1 pagetable PT PT 4 KB 6 2 3 P2 pagetable P1 8 KB P2 Physical Virtual 12 KB P2 16


slide-1
SLIDE 1

[537] TLBs

Tyler Harter 9/21/14

slide-2
SLIDE 2

Overview

Review Paging TLBs (Chapter 18) TLB measurement demo (if time)

slide-3
SLIDE 3

Review: Paging

slide-4
SLIDE 4

P1 P2 P2 P1

PT

P1

16 KB 20 KB 24 KB 8 KB 12 KB 4 KB 0 KB

PT

P1 pagetable

1 5 4 …

P2 pagetable

6 2 3 … P2

28 KB

Virtual Physical

slide-5
SLIDE 5

P1 P2 P2 P1

PT

P1

16 KB 20 KB 24 KB 8 KB 12 KB 4 KB 0 KB

PT

P1 pagetable

1 5 4 …

P2 pagetable

6 2 3 … P2

28 KB

Virtual Physical load 0x0000

what must you know?

slide-6
SLIDE 6

P1 P2 P2 P1

PT

P1

16 KB 20 KB 24 KB 8 KB 12 KB 4 KB 0 KB

PT

P1 pagetable

1 5 4 …

P2 pagetable

6 2 3 … P2

28 KB

PTBR Virtual Physical load 0x0000

slide-7
SLIDE 7

P1 P2 P2 P1

PT

P1

16 KB 20 KB 24 KB 8 KB 12 KB 4 KB 0 KB

Virtual Physical

PT

P1 pagetable

1 5 4 …

P2 pagetable

6 2 3 … P2

28 KB

load 0x0000 PTBR load 0x0800 (2KB)

slide-8
SLIDE 8

P1 P2 P2 P1

PT

P1

16 KB 20 KB 24 KB 8 KB 12 KB 4 KB 0 KB

Virtual Physical

PT

P1 pagetable

1 5 4 …

P2 pagetable

6 2 3 … P2

28 KB

load 0x0000 PTBR load 0x0800 (2KB) load 0x6000 (24KB)

slide-9
SLIDE 9

P1 P2 P2 P1

PT

P1

16 KB 20 KB 24 KB 8 KB 12 KB 4 KB 0 KB

Virtual Physical

PT

P1 pagetable

1 5 4 …

P2 pagetable

6 2 3 … P2

28 KB

load 0x0000 PTBR load 0x0800 (2KB) load 0x6000 (24KB)

slide-10
SLIDE 10

P1 P2 P2 P1

PT

P1

16 KB 20 KB 24 KB 8 KB 12 KB 4 KB 0 KB

Virtual Physical

PT

P1 pagetable

1 5 4 …

P2 pagetable

6 2 3 … P2

28 KB

load 0x0000 PTBR load 0x0800 (2KB) load 0x6000 (24KB)

slide-11
SLIDE 11

P1 P2 P2 P1

PT

P1

16 KB 20 KB 24 KB 8 KB 12 KB 4 KB 0 KB

Virtual Physical

PT

P1 pagetable

1 5 4 …

P2 pagetable

6 2 3 … P2

28 KB

load 0x0000 PTBR load 0x0800 (2KB) load 0x6000 (24KB)

slide-12
SLIDE 12

P1 P2 P2 P1

PT

P1

16 KB 20 KB 24 KB 8 KB 12 KB 4 KB 0 KB

Virtual Physical

PT

P1 pagetable

1 5 4 …

P2 pagetable

6 2 3 … P2

28 KB

load 0x0000 PTBR load 0x0800 (2KB) load 0x6000 (24KB) load 0x1444

what must you know?

slide-13
SLIDE 13

P1 P2 P2 P1

PT

P1

16 KB 20 KB 24 KB 8 KB 12 KB 4 KB 0 KB

Virtual Physical

PT

P1 pagetable

1 5 4 …

P2 pagetable

6 2 3 … P2

28 KB

load 0x0000 PTBR load 0x0800 (2KB) load 0x6000 (24KB) load 0x1444

assume 8-byte PTEs

slide-14
SLIDE 14

P1 P2 P2 P1

PT

P1

16 KB 20 KB 24 KB 8 KB 12 KB 4 KB 0 KB

Virtual Physical

PT

P1 pagetable

1 5 4 …

P2 pagetable

6 2 3 … P2

28 KB

load 0x0000 PTBR load 0x0800 (2KB) load 0x6000 (24KB) load 0x1444 load 0x0808

slide-15
SLIDE 15

P1 P2 P2 P1

PT

P1

16 KB 20 KB 24 KB 8 KB 12 KB 4 KB 0 KB

Virtual Physical

PT

P1 pagetable

1 5 4 …

P2 pagetable

6 2 3 … P2

28 KB

load 0x0000 PTBR load 0x0800 (2KB) load 0x6000 (24KB) load 0x1444 load 0x0808 load 0x2444

slide-16
SLIDE 16

P1 P2 P2 P1

PT

P1

16 KB 20 KB 24 KB 8 KB 12 KB 4 KB 0 KB

Virtual Physical

PT

P1 pagetable

1 5 4 …

P2 pagetable

6 2 3 … P2

28 KB

load 0x0000 PTBR load 0x0800 (2KB) load 0x6000 (24KB) load 0x1444 load 0x0808 load 0x2444

slide-17
SLIDE 17

P1 P2 P2 P1

PT

P1

16 KB 20 KB 24 KB 8 KB 12 KB 4 KB 0 KB

Virtual Physical

PT

P1 pagetable

1 5 4 …

P2 pagetable

6 2 3 … P2

28 KB

load 0x0000 PTBR load 0x0800 (2KB) load 0x6000 (24KB) load 0x1444 load 0x0808 load 0x2444

slide-18
SLIDE 18

P1 P2 P2 P1

PT

P1

16 KB 20 KB 24 KB 8 KB 12 KB 4 KB 0 KB

Virtual Physical

PT

P1 pagetable

1 5 4 …

P2 pagetable

6 2 3 … P2

28 KB

load 0x0000 PTBR load 0x0800 (2KB) load 0x6000 (24KB) load 0x1444 load 0x0808 load 0x2444 load 0x1444

slide-19
SLIDE 19

P1 P2 P2 P1

PT

P1

16 KB 20 KB 24 KB 8 KB 12 KB 4 KB 0 KB

Virtual Physical

PT

P1 pagetable

1 5 4 …

P2 pagetable

6 2 3 … P2

28 KB

load 0x0000 PTBR load 0x0800 (2KB) load 0x6000 (24KB) load 0x1444 load 0x0808 load 0x2444 load 0x1444 load 0x0008

slide-20
SLIDE 20

P1 P2 P2 P1

PT

P1

16 KB 20 KB 24 KB 8 KB 12 KB 4 KB 0 KB

Virtual Physical

PT

P1 pagetable

1 5 4 …

P2 pagetable

6 2 3 … P2

28 KB

load 0x0000 PTBR load 0x0800 (2KB) load 0x6000 (24KB) load 0x1444 load 0x0808 load 0x2444 load 0x1444 load 0x0008 load 0x5444

slide-21
SLIDE 21

P1 P2 P2 P1

PT

P1

16 KB 20 KB 24 KB 8 KB 12 KB 4 KB 0 KB

Virtual Physical

PT

P1 pagetable

1 5 4 …

P2 pagetable

6 2 3 … P2

28 KB

load 0x0000 PTBR load 0x0800 (2KB) load 0x6000 (24KB) load 0x1444 load 0x0808 load 0x2444 load 0x1444 load 0x0008 load 0x5444

1

slide-22
SLIDE 22

P1 P2 P2 P1

PT

P1

16 KB 20 KB 24 KB 8 KB 12 KB 4 KB 0 KB

Virtual Physical

PT

P1 pagetable

1 5 4 …

P2 pagetable

6 2 3 … P2

28 KB

load 0x0000 PTBR load 0x0800 (2KB) load 0x6000 (24KB) load 0x1444 load 0x0808 load 0x2444 load 0x1444 load 0x0008 load 0x5444

slide-23
SLIDE 23

Chapter 19: TLBs

slide-24
SLIDE 24

Outline

What work can we eliminate? Basic strategy. Workloads, systems, metrics. Context switching and security.

slide-25
SLIDE 25

Paging Advantages

Flexible Addr Space


  • don’t need to find contiguous RAM

  • doesn’t waste whole data pages (valid bit)

Easy to manage


  • fixed size pages

  • simple free list for unused pages

  • no need to coalesce
slide-26
SLIDE 26

Paging Problems

Too big Too slow

slide-27
SLIDE 27

Paging Problems

Too big Too slow [today’s focus]

slide-28
SLIDE 28

Translation Steps

H/W: for each mem reference:
 


  • 1. extract VPN (virt page num) from VA (virt addr)

  • 2. calculate addr of PTE (page table entry)

  • 3. fetch PTE

  • 4. extract PFN (page frame num)

  • 5. build PA (phys addr)

  • 6. fetch PA to register
slide-29
SLIDE 29

Translation Steps

H/W: for each mem reference:
 


  • 1. extract VPN (virt page num) from VA (virt addr)

  • 2. calculate addr of PTE (page table entry)

  • 3. fetch PTE

  • 4. extract PFN (page frame num)

  • 5. build PA (phys addr)

  • 6. fetch PA to register

Which steps are expensive?

slide-30
SLIDE 30

Translation Steps

H/W: for each mem reference:
 


  • 1. extract VPN (virt page num) from VA (virt addr)

  • 2. calculate addr of PTE (page table entry)

  • 3. fetch PTE

  • 4. extract PFN (page frame num)

  • 5. build PA (phys addr)

  • 6. fetch PA to register

Which steps are expensive?

(cheap) (cheap) (cheap) (cheap) (expensive) (expensive)

slide-31
SLIDE 31

Translation Steps

H/W: for each mem reference:
 


  • 1. extract VPN (virt page num) from VA (virt addr)

  • 2. calculate addr of PTE (page table entry)

  • 3. fetch PTE

  • 4. extract PFN (page frame num)

  • 5. build PA (phys addr)

  • 6. fetch PA to register

Which expensive step can we avoid?

(cheap) (cheap) (cheap) (cheap) (expensive) (expensive)

slide-32
SLIDE 32

Array Iterator

int sum = 0;
 for (i=0; i<N; i++) {
 sum += a[i];
 }

slide-33
SLIDE 33

Array Iterator

load 0x3000
 
 load 0x3004
 
 load 0x3008
 
 load 0x300C
 …

Virt

slide-34
SLIDE 34

Array Iterator

load 0x3000
 
 load 0x3004
 
 load 0x3008
 
 load 0x300C
 … load 0x100C
 load 0x7000
 load 0x100C
 load 0x7004
 load 0x100C
 load 0x7008
 load 0x100C
 load 0x700C

Virt Phys

slide-35
SLIDE 35

Array Iterator

load 0x3000
 
 load 0x3004
 
 load 0x3008
 
 load 0x300C
 … load 0x100C
 load 0x7000
 load 0x100C
 load 0x7004
 load 0x100C
 load 0x7008
 load 0x100C
 load 0x700C

Virt Phys

slide-36
SLIDE 36

Array Iterator

load 0x3000
 
 load 0x3004
 
 load 0x3008
 
 load 0x300C
 … load 0x100C
 load 0x7000
 load 0x100C
 load 0x7004
 load 0x100C
 load 0x7008
 load 0x100C
 load 0x700C

Virt Phys

slide-37
SLIDE 37

Array Iterator

load 0x3000
 
 load 0x3004
 
 load 0x3008
 
 load 0x300C
 … load 0x100C
 load 0x7000
 load 0x100C
 load 0x7004
 load 0x100C
 load 0x7008
 load 0x100C
 load 0x700C

Virt Phys

slide-38
SLIDE 38

Outline

What work can we eliminate? Basic strategy. Workloads, systems, metrics. Context switching and security.

slide-39
SLIDE 39

Strategy

Take advantage of repetition.
 Use a CPU cache.

slide-40
SLIDE 40

Strategy

Take advantage of repetition.
 Use a CPU cache.

CPU RAM

memory interconnect

slide-41
SLIDE 41

Strategy

Take advantage of repetition.
 Use a CPU cache.

CPU RAM

memory interconnect

PT

slide-42
SLIDE 42

Strategy

Take advantage of repetition.
 Use a CPU cache.

CPU RAM

memory interconnect

PT

slide-43
SLIDE 43

Strategy

Take advantage of repetition.
 Use a CPU cache.

CPU RAM

memory interconnect

PT

popular
 PTEs often
 transferred

slide-44
SLIDE 44

Strategy

CPU RAM

memory interconnect

PT

Take advantage of repetition.
 Use a CPU cache.

slide-45
SLIDE 45

Strategy

Name? ATC: Address Translation Cache? [OSTEP]


CPU RAM

memory interconnect

PT ATC

slide-46
SLIDE 46

Strategy

Name? ATC: Address Translation Cache? [OSTEP]


  • Nope. TLB: Translation Lookaside Buffer

CPU RAM

memory interconnect

PT TLB

slide-47
SLIDE 47

Name? ATC: Address Translation Cache? [OSTEP]


  • Nope. TLB: Translation Lookaside Buffer

CPU RAM ATC

Strategy

slide-48
SLIDE 48

Name? ATC: Address Translation Cache? [OSTEP]


  • Nope. TLB: Translation Lookaside Buffer

CPU RAM Air Traffic Controller

Strategy

slide-49
SLIDE 49

Strategy

Name? ATC: Address Translation Cache? [OSTEP]


  • Nope. TLB: Translation Lookaside Buffer

CPU RAM

memory interconnect

PT TLB

slide-50
SLIDE 50

Cache Types (more in CS 552)

Direct-Mapped: only one place to put entries Four-Way Set Associative: 4 options Fully-Associative: entries can go anywhere

slide-51
SLIDE 51

Cache Types (more in CS 552)

Direct-Mapped: only one place to put entries Four-Way Set Associative: 4 options Fully-Associative: entries can go anywhere

  • most common for TLBs

  • must store whole key/value in cache

  • search all in parallel
slide-52
SLIDE 52

Array Iterator (w/ TLB)

int sum = 0;
 for (i=0; i<2048; i++) {
 sum += a[i];
 }

slide-53
SLIDE 53

Array Iterator

load 0x1000
 
 load 0x1004
 
 load 0x1008
 
 load 0x100C
 …

Virt

slide-54
SLIDE 54

Virt Phys load 0x1000
 
 load 0x1004
 
 load 0x1008
 
 load 0x100C
 …

slide-55
SLIDE 55

Virt Phys

P1 P2 P2 P1

PT

P1

16 KB 20 KB 24 KB 8 KB 12 KB 4 KB 0 KB

PT

P1 pagetable

1 5 4 … P2

28 KB

load 0x1000
 
 load 0x1004
 
 load 0x1008
 
 load 0x100C
 …

1 2 3

Valid Virt Phys

CPU’s TLB

PTBR

slide-56
SLIDE 56

Virt Phys

P1 P2 P2 P1

PT

P1

16 KB 20 KB 24 KB 8 KB 12 KB 4 KB 0 KB

PT

P1 pagetable

1 5 4 … P2

28 KB

load 0x1000
 
 load 0x1004
 
 load 0x1008
 
 load 0x100C
 …

1 2 3

Valid Virt Phys

CPU’s TLB

PTBR

slide-57
SLIDE 57

Virt Phys

P1 P2 P2 P1

PT

P1

16 KB 20 KB 24 KB 8 KB 12 KB 4 KB 0 KB

PT

P1 pagetable

1 5 4 … P2

28 KB

load 0x1000
 
 load 0x1004
 
 load 0x1008
 
 load 0x100C
 … load 0x0004


1 2 3

Valid Virt Phys 1 1 5

CPU’s TLB

PTBR

slide-58
SLIDE 58

Virt Phys

P1 P2 P2 P1

PT

P1

16 KB 20 KB 24 KB 8 KB 12 KB 4 KB 0 KB

PT

P1 pagetable

1 5 4 … P2

28 KB

load 0x1000
 
 load 0x1004
 
 load 0x1008
 
 load 0x100C
 … load 0x0004
 load 0x5000


1 2 3

CPU’s TLB

PTBR

Valid Virt Phys 1 1 5

slide-59
SLIDE 59

Virt Phys

P1 P2 P2 P1

PT

P1

16 KB 20 KB 24 KB 8 KB 12 KB 4 KB 0 KB

PT

P1 pagetable

1 5 4 … P2

28 KB

load 0x1000
 
 load 0x1004
 
 load 0x1008
 
 load 0x100C
 … load 0x0004
 load 0x5000


1 2 3

CPU’s TLB

PTBR

Valid Virt Phys 1 1 5

slide-60
SLIDE 60

Virt Phys

P1 P2 P2 P1

PT

P1

16 KB 20 KB 24 KB 8 KB 12 KB 4 KB 0 KB

PT

P1 pagetable

1 5 4 … P2

28 KB

load 0x1000
 
 load 0x1004
 
 load 0x1008
 
 load 0x100C
 …

1 2 3

CPU’s TLB

PTBR

Valid Virt Phys 1 1 5

load 0x0004
 load 0x5000
 (TLB)


slide-61
SLIDE 61

Virt Phys

P1 P2 P2 P1

PT

P1

16 KB 20 KB 24 KB 8 KB 12 KB 4 KB 0 KB

PT

P1 pagetable

1 5 4 … P2

28 KB

load 0x1000
 
 load 0x1004
 
 load 0x1008
 
 load 0x100C
 …

1 2 3

CPU’s TLB

PTBR

Valid Virt Phys 1 1 5

load 0x0004
 load 0x5000
 (TLB)
 load 0x5004


slide-62
SLIDE 62

Virt Phys

P1 P2 P2 P1

PT

P1

16 KB 20 KB 24 KB 8 KB 12 KB 4 KB 0 KB

PT

P1 pagetable

1 5 4 … P2

28 KB

load 0x1000
 
 load 0x1004
 
 load 0x1008
 
 load 0x100C
 … load 0x0004
 load 0x5000
 (TLB)
 load 0x5004
 (TLB)
 load 0x5008
 (TLB)
 load 0x500C

1 2 3

CPU’s TLB

PTBR

Valid Virt Phys 1 1 5

slide-63
SLIDE 63

Virt Phys

P1 P2 P2 P1

PT

P1

16 KB 20 KB 24 KB 8 KB 12 KB 4 KB 0 KB

PT

P1 pagetable

1 5 4 … P2

28 KB

load 0x1000
 
 load 0x1004
 
 load 0x1008
 
 load 0x100C
 …
 
 load 0x2000 load 0x0004
 load 0x5000
 (TLB)
 load 0x5004
 (TLB)
 load 0x5008
 (TLB)
 load 0x500C

1 2 3

CPU’s TLB

PTBR

Valid Virt Phys 1 1 5

slide-64
SLIDE 64

Virt Phys

P1 P2 P2 P1

PT

P1

16 KB 20 KB 24 KB 8 KB 12 KB 4 KB 0 KB

PT

P1 pagetable

1 5 4 … P2

28 KB

load 0x1000
 
 load 0x1004
 
 load 0x1008
 
 load 0x100C
 …
 
 load 0x2000 load 0x0004
 load 0x5000
 (TLB)
 load 0x5004
 (TLB)
 load 0x5008
 (TLB)
 load 0x500C
 
 load 0x0008

1 2 3

CPU’s TLB

PTBR

Valid Virt Phys 1 1 5 1 2 4

slide-65
SLIDE 65

Virt Phys

P1 P2 P2 P1

PT

P1

16 KB 20 KB 24 KB 8 KB 12 KB 4 KB 0 KB

PT

P1 pagetable

1 5 4 … P2

28 KB

load 0x1000
 
 load 0x1004
 
 load 0x1008
 
 load 0x100C
 …
 
 load 0x2000 load 0x0004
 load 0x5000
 (TLB)
 load 0x5004
 (TLB)
 load 0x5008
 (TLB)
 load 0x500C
 
 load 0x0008
 load 0x4000

1 2 3

CPU’s TLB

PTBR

Valid Virt Phys 1 1 5 1 2 4

slide-66
SLIDE 66

Virt Phys

P1 P2 P2 P1

PT

P1

16 KB 20 KB 24 KB 8 KB 12 KB 4 KB 0 KB

PT

P1 pagetable

1 5 4 … P2

28 KB

load 0x1000
 
 load 0x1004
 
 load 0x1008
 
 load 0x100C
 …
 
 load 0x2000
 
 load 0x2004 load 0x0004
 load 0x5000
 (TLB)
 load 0x5004
 (TLB)
 load 0x5008
 (TLB)
 load 0x500C
 
 load 0x0008
 load 0x4000


1 2 3

CPU’s TLB

PTBR

Valid Virt Phys 1 1 5 1 2 4

slide-67
SLIDE 67

Virt Phys

P1 P2 P2 P1

PT

P1

16 KB 20 KB 24 KB 8 KB 12 KB 4 KB 0 KB

PT

P1 pagetable

1 5 4 … P2

28 KB

load 0x1000
 
 load 0x1004
 
 load 0x1008
 
 load 0x100C
 …
 
 load 0x2000
 
 load 0x2004 load 0x0004
 load 0x5000
 (TLB)
 load 0x5004
 (TLB)
 load 0x5008
 (TLB)
 load 0x500C
 
 load 0x0008
 load 0x4000
 (TLB)


1 2 3

CPU’s TLB

PTBR

Valid Virt Phys 1 1 5 1 2 4

slide-68
SLIDE 68

Virt Phys

P1 P2 P2 P1

PT

P1

16 KB 20 KB 24 KB 8 KB 12 KB 4 KB 0 KB

PT

P1 pagetable

1 5 4 … P2

28 KB

load 0x1000
 
 load 0x1004
 
 load 0x1008
 
 load 0x100C
 …
 
 load 0x2000
 
 load 0x2004 load 0x0004
 load 0x5000
 (TLB)
 load 0x5004
 (TLB)
 load 0x5008
 (TLB)
 load 0x500C
 
 load 0x0008
 load 0x4000
 (TLB)
 0x4004

1 2 3

CPU’s TLB

PTBR

Valid Virt Phys 1 1 5 1 2 4

slide-69
SLIDE 69

How many TLB lookups?

int sum = 0;
 for (i=0; i<2048; i++) {
 sum += a[i];
 }

(assume 1KB pages)

slide-70
SLIDE 70

How many TLB lookups?

int sum = 0;
 for (i=0; i<2048; i++) {
 sum += a[i];
 }

(assume 1KB pages) 2048/sizeof(int) = 512

slide-71
SLIDE 71

How many TLB “misses”?

int sum = 0;
 for (i=0; i<2048; i++) {
 sum += a[i];
 }

(assume 1KB pages)

slide-72
SLIDE 72

How many TLB “misses”?

int sum = 0;
 for (i=0; i<2048; i++) {
 sum += a[i];
 }

(assume 1KB pages) if a%4096 is 0, then 2 else 3

slide-73
SLIDE 73

Miss rate?

int sum = 0;
 for (i=0; i<2048; i++) {
 sum += a[i];
 }

(assume 1KB pages) 2/512 = 0.4% or 3/512 = 0.6%

slide-74
SLIDE 74

Hit rate?

int sum = 0;
 for (i=0; i<2048; i++) {
 sum += a[i];
 }

(assume 1KB pages) 510/512 = 99.6% or 509/512 = 99.4%

slide-75
SLIDE 75

Outline

What work can we eliminate? Basic strategy. Workloads, systems, metrics. Context switching and security.

slide-76
SLIDE 76

Reasoning about TLB

Workload: series of loads/stores to accesses TLB: chooses entries to store in CPU Metric: performance (i.e., hit rate) TLB “algebra”, given 2 variables, find the 3rd:

f(W, T) = M

slide-77
SLIDE 77

Reasoning about TLB

Workload: series of loads/stores to accesses TLB: chooses entries to store in CPU Metric: performance (i.e., hit rate) TLB “algebra”, given 2 variables, find the 3rd:

f(W, T) = M

slide-78
SLIDE 78

TLB Workloads

Sequential array accesses can almost always hit in the TLB, and so are very fast! What pattern would be slow?

slide-79
SLIDE 79

TLB Workloads

Sequential array accesses can almost always hit in the TLB, and so are very fast! What pattern would be slow?


  • highly random, with no repeat accesses
slide-80
SLIDE 80

Workload Characteristics

int sum = 0;
 for (i=0; i<2048; i++) {


  • sum += a[i];


}

int sum = 0;
 srand(1234);
 for (i=0; i<1000; i++) {


  • sum += a[rand() % N];


}
 srand(1234);
 for (i=0; i<1000; i++) {


  • sum += a[rand() % N];


}

Workload A Workload B

slide-81
SLIDE 81

time address ? time address ? … …

slide-82
SLIDE 82

time address Workload A time address Workload B … …

slide-83
SLIDE 83

time address Workload A time address Workload B … … Spatial Locality Temporal Locality

slide-84
SLIDE 84

Workload Locality

Spatial Locality: future access will be to nearby addresses Temporal Locality: future access will be repeats to the same data

slide-85
SLIDE 85

Workload Locality

Spatial Locality: future access will be to nearby addresses Temporal Locality: future access will be repeats to the same data What TLB characteristics are best for each type?

slide-86
SLIDE 86

A couple policies

LRU: evict least-recently used a TLB slot is needed Random: randomly choose entries to evict When is each better?

slide-87
SLIDE 87

LRU Troubles

Valid Virt Phys

virtual addresses: 1 2 3 4

slide-88
SLIDE 88

LRU Troubles

Valid Virt Phys

virtual addresses: 1 2 3 4

slide-89
SLIDE 89

LRU Troubles

Valid Virt Phys 1 ?

virtual addresses: 1 2 3 4 miss!

slide-90
SLIDE 90

LRU Troubles

Valid Virt Phys 1 ?

virtual addresses: 1 2 3 4

slide-91
SLIDE 91

LRU Troubles

Valid Virt Phys 1 ? 1 1 ?

virtual addresses: 1 2 3 4 miss!

slide-92
SLIDE 92

LRU Troubles

Valid Virt Phys 1 ? 1 1 ?

virtual addresses: 1 2 3 4

slide-93
SLIDE 93

LRU Troubles

Valid Virt Phys 1 ? 1 1 ? 1 2 ?

virtual addresses: 1 2 3 4 miss!

slide-94
SLIDE 94

LRU Troubles

Valid Virt Phys 1 ? 1 1 ? 1 2 ?

virtual addresses: 1 2 3 4

slide-95
SLIDE 95

LRU Troubles

Valid Virt Phys 1 ? 1 1 ? 1 2 ? 3 ?

virtual addresses: 1 2 3 4 miss!

slide-96
SLIDE 96

LRU Troubles

Valid Virt Phys 1 ? 1 1 ? 1 2 ? 3 ?

virtual addresses: 1 2 3 4

slide-97
SLIDE 97

LRU Troubles

Valid Virt Phys 1 4 ? 1 1 ? 1 2 ? 3 ?

virtual addresses: 1 2 3 4 miss!

slide-98
SLIDE 98

LRU Troubles

Valid Virt Phys 1 4 ? 1 1 ? 1 2 ? 3 ?

virtual addresses: 1 2 3 4

slide-99
SLIDE 99

LRU Troubles

Valid Virt Phys 1 4 ? 1 ? 1 2 ? 3 ?

virtual addresses: 1 2 3 4 miss!

slide-100
SLIDE 100

LRU Troubles

Valid Virt Phys 1 4 ? 1 ? 1 2 ? 3 ?

virtual addresses: 1 2 3 4

slide-101
SLIDE 101

LRU Troubles

Valid Virt Phys 1 4 ? 1 ? 1 1 ? 3 ?

virtual addresses: 1 2 3 4 miss!

slide-102
SLIDE 102

LRU Troubles

Valid Virt Phys 1 4 ? 1 ? 1 1 ? 3 ?

virtual addresses: 1 2 3 4

slide-103
SLIDE 103

LRU Troubles

Valid Virt Phys 1 4 ? 1 ? 1 1 ? 2 ?

virtual addresses: 1 2 3 4 miss!

slide-104
SLIDE 104

A couple policies

LRU: evict least-recently used a TLB slot is needed Random: randomly choose entries to evict When is each better?
 Sometimes random is better than a “smart” policy!

slide-105
SLIDE 105

Outline

What work can we eliminate? Basic strategy. Workloads, systems, metrics. Context switching and security.

slide-106
SLIDE 106

Context Switches

What happens if a process uses the cached TLB entries from another process?

slide-107
SLIDE 107

Context Switches

What happens if a process uses the cached TLB entries from another process? Solutions?

slide-108
SLIDE 108

Context Switches

What happens if a process uses the cached TLB entries from another process? Solutions?


  • flush TLB on each switch

  • remember which entries are for each process
slide-109
SLIDE 109

Address Space Identifier

Tag each TLB entry with an 8-bit ASID


  • how many ASIDs to we get?

  • why not use PIDs?

  • what if there are more PIDs than ASIDs?
slide-110
SLIDE 110

P1 P2 P2 P1

PT

P1

16 KB 20 KB 24 KB 8 KB 12 KB 4 KB 0 KB

Virtual Physical

PT

P1 pagetable (ASID 11)

1 5 4 …

P2 pagetable (ASID 12)

6 2 3 … P2

28 KB

PTBR

Valid Virt Phys ASID 1 9 11 1 1 5 11 1 1 2 12 1 1 11

TLB:

slide-111
SLIDE 111

P1 P2 P2 P1

PT

P1

16 KB 20 KB 24 KB 8 KB 12 KB 4 KB 0 KB

Virtual Physical

PT

P1 pagetable (ASID 11)

1 5 4 …

P2 pagetable (ASID 12)

6 2 3 … P2

28 KB

PTBR load 0x1444

Valid Virt Phys ASID 1 9 11 1 1 5 11 1 1 2 12 1 1 11

TLB:

slide-112
SLIDE 112

P1 P2 P2 P1

PT

P1

16 KB 20 KB 24 KB 8 KB 12 KB 4 KB 0 KB

Virtual Physical

PT

P2

28 KB

PTBR load 0x1444 load 0x2444

P1 pagetable (ASID 11)

1 5 4 …

P2 pagetable (ASID 12)

6 2 3 …

Valid Virt Phys ASID 1 9 11 1 1 5 11 1 1 2 12 1 1 11

TLB:

slide-113
SLIDE 113

P1 P2 P2 P1

PT

P1

16 KB 20 KB 24 KB 8 KB 12 KB 4 KB 0 KB

Virtual Physical

PT

P2

28 KB

PTBR load 0x1444 load 0x2444

P1 pagetable (ASID 11)

1 5 4 …

P2 pagetable (ASID 12)

6 2 3 …

Valid Virt Phys ASID 1 9 11 1 1 5 11 1 1 2 12 1 1 11

TLB:

slide-114
SLIDE 114

P1 P2 P2 P1

PT

P1

16 KB 20 KB 24 KB 8 KB 12 KB 4 KB 0 KB

Virtual Physical

PT

P2

28 KB

PTBR load 0x1444 load 0x2444

P1 pagetable (ASID 11)

1 5 4 …

P2 pagetable (ASID 12)

6 2 3 …

Valid Virt Phys ASID 1 9 11 1 1 5 11 1 1 2 12 1 1 11

TLB:

slide-115
SLIDE 115

P1 P2 P2 P1

PT

P1

16 KB 20 KB 24 KB 8 KB 12 KB 4 KB 0 KB

Virtual Physical

PT

P2

28 KB

PTBR load 0x1444 load 0x2444 load 0x1444

P1 pagetable (ASID 11)

1 5 4 …

P2 pagetable (ASID 12)

6 2 3 …

Valid Virt Phys ASID 1 9 11 1 1 5 11 1 1 2 12 1 1 11

TLB:

slide-116
SLIDE 116

P1 P2 P2 P1

PT

P1

16 KB 20 KB 24 KB 8 KB 12 KB 4 KB 0 KB

Virtual Physical

PT

P2

28 KB

PTBR load 0x1444 load 0x2444 load 0x1444 load 0x5444

P1 pagetable (ASID 11)

1 5 4 …

P2 pagetable (ASID 12)

6 2 3 …

Valid Virt Phys ASID 1 9 11 1 1 5 11 1 1 2 12 1 1 11

TLB:

slide-117
SLIDE 117

Address Space Identifier

Context switches are expensive. Even with ASID, other processes “pollute” the TLB.

slide-118
SLIDE 118

Who changes the TLB?

H/W or OS?

slide-119
SLIDE 119

Who changes the TLB?

H/W or OS? H/W: CPU must know where pagetables are


  • CR3 on x86

  • pagetable structure not flexible

  • “walk” the pagetable

OS: CPU traps into OS upon TLB miss


  • how to avoid double traps?

  • more modern
slide-120
SLIDE 120

Security

Modifying TLB entries is privileged


  • otherwise what could you do?

Need same protection bits in TLB as pagetable


  • rwx
slide-121
SLIDE 121

Measurement Demo


(if enough time)