Previous Lecture Slides for Lecture 15 ENCM 501: Principles of - - PDF document

previous lecture slides for lecture 15
SMART_READER_LITE
LIVE PREVIEW

Previous Lecture Slides for Lecture 15 ENCM 501: Principles of - - PDF document

slide 2/20 ENCM 501 W14 Slides for Lecture 15 Previous Lecture Slides for Lecture 15 ENCM 501: Principles of Computer Architecture Winter 2014 Term Steve Norman, PhD, PEng Virtual memory, page tables, and TLBs. Electrical & Computer


slide-1
SLIDE 1

Slides for Lecture 15

ENCM 501: Principles of Computer Architecture Winter 2014 Term Steve Norman, PhD, PEng

Electrical & Computer Engineering Schulich School of Engineering University of Calgary

6 March, 2014

ENCM 501 W14 Slides for Lecture 15

slide 2/20

Previous Lecture

◮ Virtual memory, page tables, and TLBs.

ENCM 501 W14 Slides for Lecture 15

slide 3/20

Today’s Lecture

◮ context switches and effects on memory latency ◮ memory system summary ◮ introduction to ILP (instruction-level parallelism) ◮ review of simple pipelining

Related reading in Hennessy & Patterson: Sections C.1–C.3

ENCM 501 W14 Slides for Lecture 15

slide 4/20

Context switch: a definition

Consider a computer with multiple cores. For (relative) simplicity, assume that every process on this computer has a single thread; if that’s true, then a process is either running in one core only, or suspended, waiting for the OS kernel to give it some running time in one of the cores. With this assumption, a context switch is an event in which a running process (say, Process A) is suspended, and some other process (say, Process B) gets to continue (or start) in the core where Process A was running. In reality, in 2014, some processes will be single-threaded and

  • thers will be multi-threaded. We’ll look at that in detail later

in the course.

ENCM 501 W14 Slides for Lecture 15

slide 5/20

Causes of context switches

What might cause an OS kernel to suspend Process A, and give Process B some time to run? Here is an incomplete list of possible reasons:

◮ the kernel receives a timer interrupt, indicating that

Process A has used up a time slice;

◮ the kernel notices that Process A is blocked, waiting for

input from user, disk, network, or some other source;

◮ Process A asks to be suspended, with a system call such

as nanosleep;

◮ page fault in Process A—Process A tries to access a page

that is not present in physical memory.

ENCM 501 W14 Slides for Lecture 15

slide 6/20

Saving process state in a context switch (1)

Suppose the kernel is suspending Process A and allowing Process B to resume. The kernel will

◮ save Process A’s register values (GPRs, floating-point

registers, PC, other special purpose registers) by copying them to some safe location in memory;

◮ restore Process B’s register values by copying them from

memory into the appropriate registers. What else does the kernel have to do regarding the states of Processes A and B?

slide-2
SLIDE 2

ENCM 501 W14 Slides for Lecture 15

slide 7/20

Saving process state in a context switch (2)

Again, suppose the kernel is suspending Process A and allowing Process B to resume. Q1: What must the kernel do with all the physical pages of memory (also known as page frames) that Process A was using? Q2: What must the kernel do with the page tables for Processes A and B? Q3: What must the kernel do with the TLBs in the core where Process A was running? Q4: What is the impact of the context switch on TLB miss rates, and I-cache and D-cache miss rates?

ENCM 501 W14 Slides for Lecture 15

slide 8/20

Memory system summary

Think about this simple C function: int add_em(const int *a, int n) { int i, sum = 0; for (i = 0; i < n; i++) sum += a[i]; return sum; } What is the cost of the statement sum += a[i]; ? That’s not an easy question! The answer depends both on the effects of memory accesses in the recent past, and also on the impact of this particular data memory access on memory accesses in the near future.

ENCM 501 W14 Slides for Lecture 15

slide 9/20

ILP: Instruction-Level Parallelism

ILP is a general term for enhancing instruction throughput within a single processor core by having multiple instructions “in flight” at any given time. Two important forms of ILP are

◮ pipelining: each instruction takes several clock cycles to

complete, but instructions are started one per clock cycle

◮ multiple issue: two or more instructions are started in the

same clock cycle Modern processors use both pipelining and multiple issue, and use complex sets of related features to try to maximize instruction throughput.

ENCM 501 W14 Slides for Lecture 15

slide 10/20

Review of simple pipelining

Before diving into microarchitectures with multiple pipelines, let’s review the design challenges of getting a single pipeline to work fast and correctly. The basic organization of a pipeline involves

◮ pipeline stages: A stage performs some small simple step

as part of handling an instruction. For example, one stage might be responsible for reading GPR values used in an instruction, and another stage might compute memory addresses to be used in loads and stores.

◮ pipeline registers: At the end of each clock cycle, a

pipeline register captures the results produced by a stage, making those results available for the next stage in the next cycle.

ENCM 501 W14 Slides for Lecture 15

slide 11/20

First stage of a simple pipeline: IF (instruction fetch)

We’ll look at an example pipeline that can handle a few different kinds of MIPS instructions. The IF stage is responsible for

◮ updating the PC register as appropriate ◮ reading an instruction from memory and copying the

instruction in a pipeline register so the instruction is available to the next stage, called the ID stage. Despite what we’ve just learned about memory, we’ll pretend that “instruction memory” is a simple functional unit that can be read within a single clock cycle!

ENCM 501 W14 Slides for Lecture 15

slide 12/20

32 32 32 32 32 32 32 32

instruction memory

32

PC

add

1

address target branch decision branch usual PC update

CLK CLK IF/ID

address instruction

0x00000004

IF stage ID stage

In every single clock cycle, the IF stage will dump a new instruction into the IF/ID pipeline register.

slide-3
SLIDE 3

ENCM 501 W14 Slides for Lecture 15

slide 13/20

More stages

This lecture will follow the 5-stage design presented in Section C.3 of the course textbook. The stages are:

◮ IF, which we’ve just seen ◮ ID: instruction decode and GPR read ◮ EX: execute—perform computation in ALU

(arithmetic/logic unit)

◮ MEM: access to data memory for load or store ◮ WB: writeback—write result of a load or an instruction

like DADD to a GPR Let’s sketch out what each of these stages do . . .

ENCM 501 W14 Slides for Lecture 15

slide 14/20

IF ID EX MEM WB

Attention: This slide and others like it will not attempt to describe every detail of a pipeline stage. Instead it will just explain the general role of a stage. The ID stage:

◮ decodes the instruction—finds out what kind of

instruction it is, and what its operands are

◮ copies two GPR values into the ID/EX register ◮ copies an offset into the ID/EX register, in case the offset

is needed for load, store, or branch

◮ copies some instruction address information into the

ID/EX register, in case that is needed to generate a branch target address

ENCM 501 W14 Slides for Lecture 15

slide 15/20

“R-type” instructions

R-type is MIPS jargon for instructions such as DADDU, DSUBU, OR, AND, etc. An R-type instruction involves performing some simple ALU computation involving two GPR values, and writing the result to a GPR.

ENCM 501 W14 Slides for Lecture 15

slide 16/20

IF ID EX MEM WB

The EX stage performs a computation in the ALU. For an R-type instruction, the ALU performs whatever

  • peration is appropriate (add, subtract, AND, OR, etc.), and

writes the result into the EX/MEM register. For a load or store, the ALU computes a memory address, and writes the address into the EX/MEM register. For a branch, the ALU computes a branch target address and makes a branch decision. Both of those results get written into the EX/MEM register. Attention: The branch instruction handling described on this slide is specific to textbook Figure C.22! We’ll look at problems related to that design in the next lecture.

ENCM 501 W14 Slides for Lecture 15

slide 17/20

IF ID EX MEM WB

The MEM stage is mostly for data memory access by loads and stores. Again we pretend that memory is really simple! For an R-type instruction, not much happens. Results are copied from the EX/MEM register to the MEM/WB register. For a load, data read from memory gets copied into the MEM/WB register. For a store, data memory is updated using an address and data found in the EX/MEM register. For a branch, if the decision in EX was to take the branch, the PC gets updated with the branch target address. Attention, again: The branch instruction handling described

  • n this slide is specific to textbook Figure C.22!

ENCM 501 W14 Slides for Lecture 15

slide 18/20

IF ID EX MEM WB

The WB stage is used to update a GPR with the result of an R-type or load instruction. For an R-type or load instruction, a GPR is updated, using the appropriate result from the MEM/WB register. It wasn’t mentioned before, but the 5-bit number specifying the destination register had to be passed from ID through EX and MEM to get to WB at the same time as the ALU or load result. For a store or a branch, nothing happens in WB. Those instructions finish in MEM.

slide-4
SLIDE 4

ENCM 501 W14 Slides for Lecture 15

slide 19/20

A rough sketch of the 5-stage pipeline

CLK IF/ID CLK ID/EX CLK EX/MEM CLK MEM/WB

GPRs I-mem D-mem instr. decode add ID IF EX MEM WB ALU

? CLK PC CLK

A lot of detail has been left out, but there’s enough here for us to trace processing of LW followed by DSUBU followed by SW.

ENCM 501 W14 Slides for Lecture 15

slide 20/20

Upcoming Topics

◮ Pipeline hazards and solutions to pipeline hazards.

Related reading in Hennessy & Patterson: Sections C.1–C.3