Previous Lecture Slides for Lecture 12 ENCM 501: Principles of - - PDF document

previous lecture slides for lecture 12
SMART_READER_LITE
LIVE PREVIEW

Previous Lecture Slides for Lecture 12 ENCM 501: Principles of - - PDF document

slide 2/19 ENCM 501 W14 Slides for Lecture 12 Previous Lecture Slides for Lecture 12 ENCM 501: Principles of Computer Architecture Winter 2014 Term more about multi-level caches Steve Norman, PhD, PEng classifying cache misses: the 3


slide-1
SLIDE 1

Slides for Lecture 12

ENCM 501: Principles of Computer Architecture Winter 2014 Term Steve Norman, PhD, PEng

Electrical & Computer Engineering Schulich School of Engineering University of Calgary

25 February, 2014

ENCM 501 W14 Slides for Lecture 12

slide 2/19

Previous Lecture

◮ more about multi-level caches ◮ classifying cache misses: the 3 C’s ◮ introduction to virtual memory

ENCM 501 W14 Slides for Lecture 12

slide 3/19

Today’s Lecture

◮ Continued explanation of virtual memory.

Related reading in Hennessy & Patterson: Sections B.4–B.5

ENCM 501 W14 Slides for Lecture 12

slide 4/19

Quick review of address translation

  • ffset

page number

  • ffset

page physical page number virtual page translation (no translation!) straight copy virtual address physical address

The master list of VPN-to-PPN translations for a single process is maintained by the O/S kernel in a data structure called a page table. TLBs are circuits capable of doing some

  • f these translations very quickly.

ENCM 501 W14 Slides for Lecture 12

slide 5/19

A couple of questions about address translation (1)

Process 98 and 99 are running at the same time. Suppose that 0x7fffff567 is the VPN for a page for process 98’s stack, and the corresponding PPN is 0x13579bd. Suppose that 0x7fffff567 is also the VPN for a page for process 99’s stack. What can we conclude about the VPN-to-PPN translation for VPN 0x7fffff567 in process 99?

ENCM 501 W14 Slides for Lecture 12

slide 6/19

A couple of questions about address translation (2)

As on the previous slide, process 98 and 99 are running at the same time. Suppose that 0x000000400 is the VPN for a page for process 98’s instructions, and the corresponding PPN is 0x1234567. Suppose that 0x000000400 is also the VPN for a page for process 99’s instructions. What can we conclude about the VPN-to-PPN translation for VPN 0x000000400 in in process 99?

slide-2
SLIDE 2

ENCM 501 W14 Slides for Lecture 12

slide 7/19

Linux / Mac OS X virtual address spaces on x86-64

Pointers are 64 bits wide, but only the least significant 48 bits are used in a virtual address.

0x0000 7fff ffff ffff 0x0000 7fff ffff fffe 0x0000 0000 0000 0000 . . . virtual address space for user processes virtual address space for O/S kernel 0xffff ffff ffff ffff 0xffff ffff ffff fffe 0xffff 8000 0000 0000 . . . HUGE range of invalid addresses byte address

(For 64-bit Microsoft Windows, the picture is either identical,

  • r not quite the same but very similar.)

ENCM 501 W14 Slides for Lecture 12

slide 8/19

A page table for an x86-64 Linux process

The normal page size is 4 KB. So bits 11–0 of an address are page offset, and bits 46–12 of a virtual address are VPN (virtual page number). Conceptually, a page table is just an array of PTEs (page table entries), where the indexes are VPNs:

0x7fff ffff f 0x7fff ffff e 0x0000 0000 1 0x0000 0000 0 64-bit PTE 64-bit PTE 64-bit PTE 64-bit PTE . . . . . . VPN

ENCM 501 W14 Slides for Lecture 12

slide 9/19

Suppose that a page table really is just a big array, as shown

  • n the previous slide.

How much space would such a page table occupy? The answer to the above question is a totally unreasonable number, so we’ll need to use more complex and much more space-efficient data structures for page tables. Let’s worry about the data structures later, and continue for a while with the simple model that a page table is just a big array of PTEs.

ENCM 501 W14 Slides for Lecture 12

slide 10/19

What information is in a PTE?

A PTE answers several different questions about a virtual

  • page. Here is an incomplete list:

◮ First, does the virtual page even exist? (For a typical

x86-64 Linux process, the vast majority of VPNs in the range from 0x0000 0000 0 from 0x7fff ffff f correspond to non-existent virtual pages.)

◮ If the page exists, is it present in physical memory? ◮ If the page is present, what is the PPN (physical page

number)?

◮ What are the permissions for the page—can the process

write to the page, and can it fetch instructions from the page?

ENCM 501 W14 Slides for Lecture 12

slide 11/19

PTE formats in x86-64 Linux (1)

First, let’s look at a PTE for a page that does not exist. I haven’t found documentation to confirm this, but I’m pretty sure that 64 zeros indicate that there is no page corresponding to a VPN:

63

bit numbers within PTE 0 0 0 0 · · ·

ENCM 501 W14 Slides for Lecture 12

slide 12/19

PTE formats in x86-64 Linux (2)

Now let’s look at a PTE for a page that does exist, and is present in physical memory. How can a page exist but NOT be present in physical memory? Okay, back to the PTE format for a page that is present . . .

63

bit numbers within PTE 1

1

XD R/W P

51 12

up to 40 bits for PPN

2 8

more page status bits : unused bits

Let’s make some notes about the P, R/W and XD bits.

slide-3
SLIDE 3

ENCM 501 W14 Slides for Lecture 12

slide 13/19

PTE formats in x86-64 Linux (3)

And here is a PTE for a page that exists, but is not present in physical memory.

63 1

P page location on disk, other info about page

We won’t go into detail about bits 63–1, but if the assumption

  • n slide 11 is correct, they must not all be zero.

Source for information on this slide and slide 12: Bryant, R. E. and O’Hallaron, D. R., Computer Systems: A Programmer’s Perspective, 2nd ed., published by Prentice Hall.

ENCM 501 W14 Slides for Lecture 12

slide 14/19

Review of P3/P4 memory system structure

CORE DRAM MODULES UNIFIED L2 CACHE L1 I- CACHE I-TLB L1 D- CACHE D-TLB DRAM CONTROLLER

On every instruction fetch, the I-TLB must attempt to translate a virtual instruction address into a physical instruction address. On every data read or write, the D-TLB must attempt to translate a virtual data address into a physical data address.

ENCM 501 W14 Slides for Lecture 12

slide 15/19

TLB structure

A TLB is essentially a cache for page table information. A page table is a complete list of the statuses of all of the virtual pages belonging to a process. A TLB contains some of the most recently accessed information in a page table.

ENCM 501 W14 Slides for Lecture 12

slide 16/19

TLB hits

Let’s outline:

◮ how a TLB hit is detected; ◮ what happens as a result of a TLB hit.

ENCM 501 W14 Slides for Lecture 12

slide 17/19

Simple TLB misses

The simplest form of a TLB miss occurs when there is a valid VPN-to-PPN translation, which is in the page table, but not in the TLB. Let’s describe how such a TLB miss is handled.

ENCM 501 W14 Slides for Lecture 12

slide 18/19

DRAM, disk storage and flash memory

Here’s a story that is simple, easy to understand, but not actually true . . .

◮ Instructions and data belonging to the kernel and to

processses are in DRAM.

◮ I-caches and D-caches allow processor cores to access

instructions and data much faster than if all such accesses really had to go to DRAM.

◮ Non-volatile storage, such as magnetic disks and flash

memory arrays, are used for file storage. That’s actually a good model to start with, but it’s wrong! What is a more accurate model?

slide-4
SLIDE 4

ENCM 501 W14 Slides for Lecture 12

slide 19/19

Upcoming Topics

Short-term:

◮ Completion of material on virtual memory. ◮ Simple pipelining.

Related reading in Hennessy & Patterson: Sections B.4–B.5, Appendix C. Big topics for the second half of the course:

◮ Instruction-level parallelism. ◮ Thread-level parallelism.

Related reading in Hennessy & Patterson: Appendix C, Chapters 3 and 5.