Slides for Lecture 3 ENCM 501: Principles of Computer Architecture - - PowerPoint PPT Presentation

slides for lecture 3
SMART_READER_LITE
LIVE PREVIEW

Slides for Lecture 3 ENCM 501: Principles of Computer Architecture - - PowerPoint PPT Presentation

Slides for Lecture 3 ENCM 501: Principles of Computer Architecture Winter 2014 Term Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary 16 January, 2014 slide 2/19 ENCM 501 W14


slide-1
SLIDE 1

Slides for Lecture 3

ENCM 501: Principles of Computer Architecture Winter 2014 Term Steve Norman, PhD, PEng

Electrical & Computer Engineering Schulich School of Engineering University of Calgary

16 January, 2014

slide-2
SLIDE 2

ENCM 501 W14 Slides for Lecture 3

slide 2/19

Previous Lecture

◮ brief list of ENCM 501 topics ◮ what does “computer architecture” mean? ◮ trends in computer system performance ◮ classes of computers ◮ trends in computer technology ◮ preliminaries for energy and power use

slide-3
SLIDE 3

ENCM 501 W14 Slides for Lecture 3

slide 3/19

Today’s Lecture

◮ completion of yesterday’s tutorial ◮ energy and power use in processors ◮ brief coverage of trends in cost

Related material in Hennessy & Patterson (our course textbook): Sections 1.5–1.6.

slide-4
SLIDE 4

ENCM 501 W14 Slides for Lecture 3

slide 4/19

About the Wed Jan 15 tutorial

We didn’t quite finish, and it’s useful to review how if statements and loops work at the assembly language level. So the next two slides come from yesterday’s tutorial . . .

slide-5
SLIDE 5

ENCM 501 W14 Slides for Lecture 3

slide 5/19

Comparisons, branches and jumps

Compare: dest gets 1 if src1 < src2 0 otherwise . . . SLT dest , src1 , src2 MIPS is unusual—the result does not go into a “condition code register”. Note also that there are variations for unsigned comparison and comparison of a GPR to a constants. Branch: If GPR1 == GPR2 goto label . . . BEQ GPR1 , GPR2 , label There are variations. The most common is BNE: branch if not equal. Jump: Goto label . . . J label

slide-6
SLIDE 6

ENCM 501 W14 Slides for Lecture 3

slide 6/19

Example loop and if statement

Translate this fragment: long int *p, *q; // R16, R17 long int sum; // R18 sum = 0; q = p + 100; while (p != q) { if (*p > 0) sum += *p; p++; } Repeat, changing the types of p and q to int *.

slide-7
SLIDE 7

ENCM 501 W14 Slides for Lecture 3

slide 7/19

Below is a very straightforward translation of the C code. A good C compiler with optimization turned on would produce faster but less straightforward code. OR R18, R0, R0 # sum = 0 DADDIU R17, R16, 800 # q = p + 100 L1: BEQ R16, R17, L2 # if (p == q) goto L2 NOP LD R8, 0(R16) # R8 = *p SLT R9, R0, R8 # R9 = 0 < *p BEQ R9, R0, L3 # if (!R9) goto L3 NOP DADDU R18, R18, R8 # sum += *p L3: DADDIU R16, R16, 8 # p++ J L1 # goto L1 NOP L2: # [next instruction after while loop]

slide-8
SLIDE 8

ENCM 501 W14 Slides for Lecture 3

slide 8/19

Remarks about the previous slide

Why the NOPs? In MIPS there is a delay slot following every jump or branch. After a jump, the delay slot instruction is executed before the jump target instruction is executed. After a branch, the delay slot instruction gets executed regardless of whether the branch is taken. Okay, but why do delay slots exist? It made sense in the 1980’s—simple pipelining was feasible, but getting the jump/branch target instruction started one clock cycle after a jump/branch was not. It’s up to compiler writers to try to find safe and useful work to do in delay slots. (Filling a delay slot with a NOP is pure waste.)

slide-9
SLIDE 9

ENCM 501 W14 Slides for Lecture 3

slide 9/19

“Repeat, changing the types of p and q to int *.”

◮ Change 800 to 400 for q = p + 100 ◮ Change 8 to 4 for p++ ◮ Change LD to LW. Subtle detail: LW will sign-extend the

32-bit number it gets from memory to make an equivalent 64-bit number in the destination GPR, so the 64-bit SLT that follows will “do the right thing”. You will not be tested on the weird little detail about LW!

slide-10
SLIDE 10

ENCM 501 W14 Slides for Lecture 3

slide 10/19

Preliminaries for energy and power use (3)

VDD C gate output RPU RPD VDD C gate output RPU RPD

What are the energy flows when the gate output goes from logic 0 to logic 1? What are they when the gate output goes from logic 1 to logic 0?

slide-11
SLIDE 11

ENCM 501 W14 Slides for Lecture 3

slide 11/19

Energy and power

Power is the time rate of energy use. (That should not be a new idea for 4th-year engineering students!) instantaneous power = d dt energy use average power = energy use over time interval duration of time interval

slide-12
SLIDE 12

ENCM 501 W14 Slides for Lecture 3

slide 12/19

Energy and power use of a single logic gate

The energy spent per clock cycle of a gate with an output that makes a 0 → 1 or 1 → 0 transition every single clock cycle is 1 2 C VDD

2.

If the clock period is T, the frequency is f = 1/T, so the power use by the gate is 1 2 C VDD

2 / T = 1

2 C VDD

2 f .

The equations are correct but an assumption here is incorrect. Why is this not a good model for power use by a logic gate in a processor circuit?

slide-13
SLIDE 13

ENCM 501 W14 Slides for Lecture 3

slide 13/19

Energy and power use of a processor chip (1)

A useful concept, unfortunately not mentioned in Section 1.5

  • f your textbook, is a, the activity factor.

Let Ctotal is the sum of all of the capacitive loads for all of the logic gates in an IC. Then aCtotal is the average capacitive load that actually does a 0 → 1 or 1 → 0 transition in a clock cycle. Why is a much less than 1 for a modern processor chip? How could a be greater than 1 for certain small regions within a modern processor chip? Which is a better way to think?

◮ a is not hard for engineers to estimate, and is pretty much

determined by the design of a processor chip.

◮ a is scarily unpredictable.

slide-14
SLIDE 14

ENCM 501 W14 Slides for Lecture 3

slide 14/19

Energy and power use of a processor chip (2)

Two formulas, assuming that a varies over time, but doesn’t change very much over a single processor clock cycle . . . Energy used and heat that must be dissipated in a single clock cycle, due to switching: Edynamic = 1 2 a(t) Ctotal VDD

2.

Power consumption: Pdynamic = 1 2 a(t) Ctotal VDD

2f .

slide-15
SLIDE 15

ENCM 501 W14 Slides for Lecture 3

slide 15/19

Energy and power use of a processor chip (3)

An ideal CMOS logic gate does not consume any power when its output is not switching, because either its pull-up network

  • r its pull-down network is completely turned off.

In real CMOS ICs, however, there are are various paths for current to leak from VDD to ground: Pstatic = VDD Ileakage This is a major concern at both ends of the computing spectrum:

◮ It gradually drains batteries in battery-powered embedded

systems.

◮ It wastes power in servers that spend significant time idle,

waiting for tasks to arrive.

slide-16
SLIDE 16

ENCM 501 W14 Slides for Lecture 3

slide 16/19

Both energy and power matter in processor design

Because most processors are idle much of the time energy spent on a typical task is a good measure of the efficiency of a processor. However, power at maximum load is critical as well . . .

◮ The power supply must be able to supply the needed

current without dropping VDD.

◮ The cooling system must be capable of removing heat at

a rate equal to average power during sustained heavy load.

slide-17
SLIDE 17

ENCM 501 W14 Slides for Lecture 3

slide 17/19

Energy and power management in processor chips

A simple processor chip is either on or off. When it’s on, the whole chip is on, and VDD and f are fixed. More complex processor chips . . .

◮ turn off idle regions within the chip; ◮ use DVFS (dynamic voltage-frequency scaling)—VDD and

f go up and down with the processor load. DVFS relies on the fact that a CMOS circuit can operate correctly over a wide range of VDD values. Lower VDD is more energy-efficient but results in slower switching times, so when VDD is reduced, f must be reduced as well.

slide-18
SLIDE 18

ENCM 501 W14 Slides for Lecture 3

slide 18/19

Trends in Cost

This is a massively complex topic; we’ll look at in the same brief and superficial way that the textbook does. First, let’s be clear about what a chip die is and what a wafer is.

slide-19
SLIDE 19

ENCM 501 W14 Slides for Lecture 3

slide 19/19

Upcoming Topics

◮ measuring and reporting computer performance ◮ quantitative principles of computer design ◮ a survey of ISA design ideas

Related reading in Hennessy & Patterson: Sections 1.8–1.9, A.1–A.7