previous lecture slides for lecture 4
play

Previous Lecture Slides for Lecture 4 ENCM 501: Principles of - PDF document

slide 2/30 ENCM 501 W14 Slides for Lecture 4 Previous Lecture Slides for Lecture 4 ENCM 501: Principles of Computer Architecture Winter 2014 Term completion of Wed Jan 15 tutorial Steve Norman, PhD, PEng energy and power use in


  1. slide 2/30 ENCM 501 W14 Slides for Lecture 4 Previous Lecture Slides for Lecture 4 ENCM 501: Principles of Computer Architecture Winter 2014 Term ◮ completion of Wed Jan 15 tutorial Steve Norman, PhD, PEng ◮ energy and power use in processors Electrical & Computer Engineering ◮ brief coverage of trends in cost Schulich School of Engineering University of Calgary 21 January, 2014 ENCM 501 W14 Slides for Lecture 4 slide 3/30 ENCM 501 W14 Slides for Lecture 4 slide 4/30 Today’s Lecture More about die yields Here is the formula presented last lecture: wafer yield die yield = (1 + defects per unit area × die area) N ◮ a little more about die yield ◮ measuring and reporting computer performance The formula is derived from many year of IC process data. N is called the process-complexity factor . 2010 numbers are ◮ quantitative principles of computer design 11.5 to 15.5 for N and 0.016 to 0.057 defects per cm 2 . Related reading in Hennessy & Patterson: Sections 1.8–1.9 Examples in the textbook with wafer yield = 100%, N = 13 . 5, and 0.031 defects per cm 2 give yields of ◮ 66% for 1 . 0 cm × 1 . 0 cm dies; ◮ 40% for 1 . 5 cm × 1 . 5 cm dies. slide 5/30 slide 6/30 ENCM 501 W14 Slides for Lecture 4 ENCM 501 W14 Slides for Lecture 4 Textbook Section 1.7: Dependability Let’s think about that 66% yield for a minute. The defect density is about 3 per 100 cm 2 . With a 1 cm 2 die size, that suggests about 3 defects spread over every 100 dies. So why is the yield not approximately 97%? With a couple of hours of Google search I found the yield formula (poorly explained) in multiple technical documents, often along with competing formulas. We’re not going to cover this material in ENCM 501. Here is my best guess as to what is correct: N represents a number of process layers , and the defect density is specified per process layer . N is adjusted up or down from the real number of process layers to reflect the fact that some layers are more defect-prone than others. Regardless, it is true for a given IC fabrication process, die yield gets worse as die size increases.

  2. slide 7/30 slide 8/30 ENCM 501 W14 Slides for Lecture 4 ENCM 501 W14 Slides for Lecture 4 How to evaluate performance (1) How to evaluate performance (2) Given two different computer designs, how do you decide The analogy to vehicle selection can be used to make two key which is “better”? points . . . ◮ Obviously, making the best choice of machine, or at least Think about comparing other kinds of machines. a reasonably good choice, depends on what the machine For example, which is “better”, (a) a “3/4 ton” pickup truck, is going to be used for. or (b) a midsize luxury AWD sedan? Do you want to ◮ No single narrow-scope measurement of performance is ◮ . . . move construction supplies? very useful. It doesn’t make sense to use fastest ◮ . . . pull a large trailer? acceleration from 0 to 60 mph, or fastest time to sort an ◮ . . . commute comfortably to an office job? array of 10 million double s, as a sole criterion. ENCM 501 W14 Slides for Lecture 4 slide 9/30 ENCM 501 W14 Slides for Lecture 4 slide 10/30 Often this makes sense: performance ∝ 1/time Use ratios of running time to compare time-based performance Think about these examples: For a given task run on Systems A and B, ◮ Software developer builds an executable from a large performance A = time B body of C or C ++ code. performance B time A ◮ Digital designer runs a detailed simulation of a complex circuit. Example: For some task, time A = 1000 s and time B = 750 s. ◮ Meteorologist runs 5-day weather forecast program using Then, for this task, System B is 1000 / 750 = 1 . 33 times as fast as System A. Equivalently, System B provides a speedup current atmospheric data as input. of 1.33 relative to System A. These tasks can take minutes or hours to run. There are obvious incentives to find hardware that will help minimize Ratios are easier to work with and harder to misinterpret than running time. other ways to compare speed. For example, avoid saying things like, “System B gives a 25% decrease in running time,” or, “System B gives a 33% increase in speed.” slide 11/30 slide 12/30 ENCM 501 W14 Slides for Lecture 4 ENCM 501 W14 Slides for Lecture 4 What might System A and System B be? What programs should be used for performance There are lots of different kinds of interesting practical evaluation? comparisons. Some of the many possibilities: This is a hard question, because every user is different. ◮ same source code, different ISAs, different hardware, SPEC (Standard Performance Evaluation Corporation, different compilers www.spec.org) takes the position that complete runs of ◮ same source code, same ISA, same compiler, different “suites” of carefully-chosen real-world programs are the best hardware way to get general performance indexes for computer systems. ◮ same source code, same ISA, same hardware, different Alternatives, such as runs of much smaller programs that are compiler supposedly representative of practical code are ◮ same source code, same ISA, same hardware, same problematic : compiler, different compiler options ◮ the small programs will more likely fail to test some ◮ different source codes for the same task , same important features that real-world programs depend on; everything else ◮ hardware designers and compiler and library writers can Don’t forget about the last one! Choice of data structures and sometimes “game” synthetic benchmarks. algorithms can be a huge factor!

  3. slide 13/30 slide 14/30 ENCM 501 W14 Slides for Lecture 4 ENCM 501 W14 Slides for Lecture 4 SPEC CPU benchmark suites More quotes from the same source . . . “SPEC CPU2006 contains two suites that focus on two Quote from www.spec.org/cpu2006/Docs/readme1st.html: different types of compute intensive performance: ◮ The CINT2006 suite measures compute-intensive integer “SPEC CPU2006 focuses on compute intensive performance, performance, and which means these benchmarks emphasize the performance of ◮ The CFP2006 suite measures compute-intensive floating ◮ the computer processor (CPU), point performance.” ◮ the memory architecture, and ◮ the compilers. “SPEC CPU2006 is not intended to stress other computer components such as networking, the operating system, “It is important to remember the contribution of the latter graphics, or the I/O system. For single-CPU tests, the effects two components. SPEC CPU performance intentionally from such components on SPEC CPU2006 performance are depends on more than just the processor.” usually minor.” ENCM 501 W14 Slides for Lecture 4 slide 15/30 ENCM 501 W14 Slides for Lecture 4 slide 16/30 “compute-intensive integer performance” “compute-intensive floating-point performance” Programs suitable for this suite would tend to Programs suitable for this suite would tend to have some of ◮ have a lot of integer arithmetic instructions, especially the same properties as “compute-intensive integer” programs, add, subtract, and compare, and logical operations such but would also have as shifts, bitwise AND, OR, NOR or XOR, etc.; ◮ relatively heavy concentrations of floating-point ◮ do a lot of load and store operations between instructions for operations such as + , - , * , / , sqrt , etc. general-purpose registers and the memory hierarchy; ◮ a lot of load and store operations between floating-point ◮ frequently encounter (conditional) branches and registers and the memory hierarchy. (unconditional) jumps; Why would a “compute-intensive floating-point” program have ◮ have very few floating-point instructions or none at all. a lot of integer arithmetic instructions? slide 17/30 slide 18/30 ENCM 501 W14 Slides for Lecture 4 ENCM 501 W14 Slides for Lecture 4 An example, reflecting the structure of SPEC CPU Arithmetic means and geometric means benchmark reporting: ◮ Ref is an older, slower “reference” machine. ◮ Foo and Bar are newer, faster machines. Notation for a sum of N times: Time 1 + Time 2 + · · · + Time N = � N k =1 Time k ◮ All times, arithmetic means, and geometric means are in seconds. Notation for a product of N times: Time 1 × Time 2 × · · · × Time N = � N program run time k =1 Time k machine A B C AM GM 1 � N Arithmetic mean (average) of times: k =1 Time k N Ref 1000 2000 10000 4333 2714 � 1 Foo 500 1000 8000 3166 1587 �� N Geometric mean of times: k =1 Time k N Bar 750 1600 6000 2873 1931 It turns out that the geometric mean is a better way to Let’s check the geometric mean calculation for Foo. combine program run times than is the arithmetic mean . . . Let’s make an argument that we should ignore arithmetic mean, and use geometric mean to conclude that Foo is faster overall than Bar.

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend