Slides for Lecture 3 ENCM 501: Principles of Computer Architecture - PowerPoint PPT Presentation

Slides for Lecture 3 ENCM 501: Principles of Computer Architecture Winter 2014 Term Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary 16 January, 2014

/19 ENCM 501 W14 Slides for Lecture 3 Previous Lecture ◮ brief list of ENCM 501 topics ◮ what does “computer architecture” mean? ◮ trends in computer system performance ◮ classes of computers ◮ trends in computer technology ◮ preliminaries for energy and power use

/19 ENCM 501 W14 Slides for Lecture 3 Today’s Lecture ◮ completion of yesterday’s tutorial ◮ energy and power use in processors ◮ brief coverage of trends in cost Related material in Hennessy & Patterson (our course textbook): Sections 1.5–1.6.

/19 ENCM 501 W14 Slides for Lecture 3 About the Wed Jan 15 tutorial We didn’t quite finish, and it’s useful to review how if statements and loops work at the assembly language level. So the next two slides come from yesterday’s tutorial . . .

/19 ENCM 501 W14 Slides for Lecture 3 Comparisons, branches and jumps Compare: dest gets 1 if src1 < src2 0 otherwise . . . SLT dest , src1 , src2 MIPS is unusual—the result does not go into a “condition code register”. Note also that there are variations for unsigned comparison and comparison of a GPR to a constants. Branch: If GPR1 == GPR2 goto label . . . GPR1 , GPR2 , label BEQ There are variations. The most common is BNE : branch if not equal. Jump: Goto label . . . J label

/19 ENCM 501 W14 Slides for Lecture 3 Example loop and if statement Translate this fragment: long int *p, *q; // R16, R17 long int sum; // R18 sum = 0; q = p + 100; while (p != q) { if (*p > 0) sum += *p; p++; } Repeat, changing the types of p and q to int * .

/19 ENCM 501 W14 Slides for Lecture 3 Below is a very straightforward translation of the C code. A good C compiler with optimization turned on would produce faster but less straightforward code. OR R18, R0, R0 # sum = 0 DADDIU R17, R16, 800 # q = p + 100 L1: BEQ R16, R17, L2 # if (p == q) goto L2 NOP LD R8, 0(R16) # R8 = *p SLT R9, R0, R8 # R9 = 0 < *p BEQ R9, R0, L3 # if (!R9) goto L3 NOP DADDU R18, R18, R8 # sum += *p L3: DADDIU R16, R16, 8 # p++ J L1 # goto L1 NOP L2: # [next instruction after while loop]

/19 ENCM 501 W14 Slides for Lecture 3 Remarks about the previous slide Why the NOPs? In MIPS there is a delay slot following every jump or branch. After a jump, the delay slot instruction is executed before the jump target instruction is executed. After a branch, the delay slot instruction gets executed regardless of whether the branch is taken. Okay, but why do delay slots exist? It made sense in the 1980’s—simple pipelining was feasible, but getting the jump/branch target instruction started one clock cycle after a jump/branch was not. It’s up to compiler writers to try to find safe and useful work to do in delay slots. (Filling a delay slot with a NOP is pure waste.)

/19 ENCM 501 W14 Slides for Lecture 3 “Repeat, changing the types of p and q to int * .” ◮ Change 800 to 400 for q = p + 100 ◮ Change 8 to 4 for p++ ◮ Change LD to LW . Subtle detail: LW will sign-extend the 32-bit number it gets from memory to make an equivalent 64-bit number in the destination GPR, so the 64-bit SLT that follows will “do the right thing”. You will not be tested on the weird little detail about LW !

/19 ENCM 501 W14 Slides for Lecture 3 Preliminaries for energy and power use (3) V DD V DD R PU R PU gate output gate output C C R PD R PD What are the energy flows when the gate output goes from logic 0 to logic 1? What are they when the gate output goes from logic 1 to logic 0?

/19 ENCM 501 W14 Slides for Lecture 3 Energy and power Power is the time rate of energy use. (That should not be a new idea for 4th-year engineering students!) instantaneous power = d dt energy use average power = energy use over time interval duration of time interval

/19 ENCM 501 W14 Slides for Lecture 3 Energy and power use of a single logic gate The energy spent per clock cycle of a gate with an output that makes a 0 → 1 or 1 → 0 transition every single clock cycle is 1 2 . 2 C V DD If the clock period is T , the frequency is f = 1 / T , so the power use by the gate is 1 2 / T = 1 2 f . 2 C V DD 2 C V DD The equations are correct but an assumption here is incorrect. Why is this not a good model for power use by a logic gate in a processor circuit?

/19 ENCM 501 W14 Slides for Lecture 3 Energy and power use of a processor chip (1) A useful concept, unfortunately not mentioned in Section 1.5 of your textbook, is a , the activity factor . Let C total is the sum of all of the capacitive loads for all of the logic gates in an IC. Then aC total is the average capacitive load that actually does a 0 → 1 or 1 → 0 transition in a clock cycle. Why is a much less than 1 for a modern processor chip? How could a be greater than 1 for certain small regions within a modern processor chip? Which is a better way to think? ◮ a is not hard for engineers to estimate, and is pretty much determined by the design of a processor chip. ◮ a is scarily unpredictable.

/19 ENCM 501 W14 Slides for Lecture 3 Energy and power use of a processor chip (2) Two formulas, assuming that a varies over time, but doesn’t change very much over a single processor clock cycle . . . Energy used and heat that must be dissipated in a single clock cycle, due to switching: E dynamic = 1 2 . 2 a ( t ) C total V DD Power consumption: P dynamic = 1 2 f . 2 a ( t ) C total V DD

/19 ENCM 501 W14 Slides for Lecture 3 Energy and power use of a processor chip (3) An ideal CMOS logic gate does not consume any power when its output is not switching, because either its pull-up network or its pull-down network is completely turned off. In real CMOS ICs, however, there are are various paths for current to leak from V DD to ground: P static = V DD I leakage This is a major concern at both ends of the computing spectrum: ◮ It gradually drains batteries in battery-powered embedded systems. ◮ It wastes power in servers that spend significant time idle, waiting for tasks to arrive.

/19 ENCM 501 W14 Slides for Lecture 3 Both energy and power matter in processor design Because most processors are idle much of the time energy spent on a typical task is a good measure of the efficiency of a processor. However, power at maximum load is critical as well . . . ◮ The power supply must be able to supply the needed current without dropping V DD . ◮ The cooling system must be capable of removing heat at a rate equal to average power during sustained heavy load.

/19 ENCM 501 W14 Slides for Lecture 3 Energy and power management in processor chips A simple processor chip is either on or off. When it’s on, the whole chip is on, and V DD and f are fixed. More complex processor chips . . . ◮ turn off idle regions within the chip; ◮ use DVFS (dynamic voltage-frequency scaling)— V DD and f go up and down with the processor load. DVFS relies on the fact that a CMOS circuit can operate correctly over a wide range of V DD values. Lower V DD is more energy-efficient but results in slower switching times, so when V DD is reduced, f must be reduced as well.

/19 ENCM 501 W14 Slides for Lecture 3 Trends in Cost This is a massively complex topic; we’ll look at in the same brief and superficial way that the textbook does. First, let’s be clear about what a chip die is and what a wafer is.

/19 ENCM 501 W14 Slides for Lecture 3 Upcoming Topics ◮ measuring and reporting computer performance ◮ quantitative principles of computer design ◮ a survey of ISA design ideas Related reading in Hennessy & Patterson: Sections 1.8–1.9, A.1–A.7

Slides for Lecture 3 ENCM 501: Principles of Computer Architecture - PowerPoint PPT Presentation

Slides for Lecture 3 ENCM 501: Principles of Computer Architecture Winter 2014 Term Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary 16 January, 2014 slide 2/19 ENCM 501 W14

MARKDOWN SLIDES [EN] MARKDOWN SLIDES [EN] MARKDOWN SLIDES [EN] MARKDOWN SLIDES [EN] MARKDOWN

Needs Slides Needs Slides Needs Slides Needs Slides Needs Slides Needs Slides Needs Slides

SBF AGM 2017 CEO Slides SBF AGM 2017 CEO Slides SBF AGM 2017 CEO Slides SBF AGM 2017 CEO Slides

Malaysian Healthy Ageing Society Plenary Lecture Plenary Lecture Plenary Lecture Plenary

Previous Lecture Todays Lecture Slides for Lecture 5 ENEL 353: Digital Circuits Fall 2013

Previous Lecture Todays Lecture Slides for Lecture 30 ENEL 353: Digital Circuits Fall

Previous Lecture Todays Lecture Slides for Lecture 28 Completion of divide-by-3 counter

Previous Lecture Todays Lecture Slides for Lecture 12 ENEL 353: Digital Circuits Fall

Previous Lecture Todays Lecture Slides for Lecture 3 ENEL 353: Digital Circuits Fall 2013

Previous Lecture Todays Lecture Slides for Lecture 2 ENEL 353: Digital Circuits Fall 2013

Previous Lecture Todays Lecture Slides for Lecture 35 ENEL 353: Digital Circuits Fall

Previous Lecture Todays Lecture Slides for Lecture 32 Completion of a timing analysis

Previous Lecture Todays Lecture Slides for Lecture 26 ENEL 353: Digital Circuits Fall

Previous Lecture Todays Lecture Slides for Lecture 33 ENEL 353: Digital Circuits Fall

Previous Lecture Todays Lecture Slides for Lecture 6 ENEL 353: Digital Circuits Fall 2013

Previous Lecture Todays Lecture Slides for Lecture 27 ENEL 353: Digital Circuits Fall

CSE443 Compilers Dr. Carl Alphonce alphonce@buffalo.edu 343 Davis Hall Phases of a compiler

lecture 14 MIPS data path and control 2 - merging data paths (for add, lw, sw, bne) -

Key Points: Control Hazards Control hazards occur when we don t know what the next

CSSE 232 Computer Architecture I Procedures I 1 / 29 Class Status Reading for today 2.8

Equivalent Measure Changes for Problem Jump-Diffusions Result Applications CIR Short Rate

Logarithmic derivatives of densities for jump processes Atsushi TAKEUCHI Osaka City University

Estimating Jump-Diffusions Using Closed-form Likelihood Expansions Chenxu Li Guanghua School of

Moment-Based Variational Inference for Markov Jump Processes Christian Wildner and Heinz Koeppl