ee 457
play

EE 457 Focus on CPU Design Microarchitecture EE 457 Unit 0 - PowerPoint PPT Presentation

0.1 0.2 EE 457 Focus on CPU Design Microarchitecture EE 457 Unit 0 General Digital System Design Focus on Memory Hierarchy Cache Class Introduction Virtual Memory Focus on Computer Arithmetic Basic Hardware


  1. 0.1 0.2 EE 457 • Focus on CPU Design – Microarchitecture EE 457 Unit 0 – General Digital System Design • Focus on Memory Hierarchy – Cache Class Introduction – Virtual Memory • Focus on Computer Arithmetic Basic Hardware Organization – Fast Adders – Fast Multipliers 0.3 0.4 Course Info Prerequisites • Lecture: • EE 354L “Introduction to Digital Circuits” – Prof. Redekopp (redekopp@usc.edu) • Discussion: – Logic design – TA: See website – State machine implementation • Website: – Datapath/control unit implementation http://bytes.usc.edu/ee457 – Verilog HDL https://courses.uscden.net/d2l/home • EE 109/354 “Basic Computer Organization” • Midterm (30%): – Assembly language programming • Final (36%): – Basic hardware organization and structures • Homework Assignments (14%): Individual • C or similar high-level programming knowledge • Lab Assignments (20%): Individual and Teams of 2 – Contact TA • Familiarity with Verilog HDL

  2. 0.5 0.6 EE 109/354 Required Knowledge EE 354L Requisite Knowledge • You must know and understand the following terms and • You must know and understand the following terms and concepts; please review them as necessary concepts; please review them as necessary – Bit, Nibble (four bit word), Byte, Word (16- or 32-bit value) – Combinational design of functions specified by truth tables and function tables – CPU, ALU, CU (Control Unit), ROM, RAM (RWM), Word length of a computer, System Bus (Address, Data, Control) – Design of adders, comparators, multiplexers, decoders, demultiplexers – General Purpose Registers, Instruction Register (IR), Program Counter – Tri-state outputs and buses (PC), Stack, Stack Pointer (SP) Subroutine calls, Flag register (or – Sequential Logic components: D-Latches and D-Flip-Flops, counters, Condition Code Register or Processor Status Word), registers Microprogramming – State Machine Design: State diagrams, Mealy vs. Moore-style outputs, – Instruction Set, Addressing Modes, Machine Language, Assembly Input Function Logic, Next State Logic, State Memory, Output Function Language, Assembler, High Level Language, Compiler, Linker, Object Logic, power-on reset state code, Loader – State Machine Design using encoded state assignments vs. one-hot – Interrupts, Exceptions, Interrupt Vector, Vectored Interrupts, Traps state assignment – Drawing, interpretation, and analysis of waveform diagrams 0.7 0.8 Levels of Architecture Computer Arithmetic Requisite Knowledge • System architecture • You must know and understand the following terms and – High-level HW org. concepts; please review them as necessary Applications C / C++ / • Instruction Set Architecture – Unsigned and Signed (2’s complement representation) Numbers Java – A contract or agreement about what the OS Libraries – Unsigned and signed addition and subtraction SW HW will support and how the programmer Programmer’s Model – Overflow in addition and subtraction can write SW for the HW Assembly / Virtualization (Instruction Set Architecture) Machine Code – Vocabulary that the HW understands and Layer – Multiplication SW is composed of Processor / Memory / – Booth’s algorithm for multiplications of signed numbers I/O • Microarchitecture Microarchitecture – Restoring or Non-Restoring Division for unsigned numbers Functional Units – HW implementation for executing – Hardware implementations for adders and multipliers (Registers, Adders, Muxes) instructions – Usually transparent to SW programs but not HW Logic Gates program performance Transistors – Example: Intel and AMD have different microarchitectures but support essentially Voltage / Currents the same instruction set

  3. 0.9 0.10 Why is Architecture Important Digital System Spectrum Application • Key idea: Any “algorithm” can be implemented in HW or Specific Hardware • Enabling ever more capable computers (no software) SW or some mixture of both • Different systems require different architectures A digital systems can be located anywhere in a spectrum • Computing System of: Flexibility, Design Time – PC’s – ALL HW: (a.k.a. Application-Specific IC’s) Spectrum Performance – Servers – ALL SW: An embedded computer system Cost – Embedded Systems • Advantages of application specific HW – Faster, less power • Simple control devices like ATM’s, toys, appliances • Advantages of an embedded computer system (i.e. • Media systems like game consoles and MP3 players general purpose HW for executing SW) • Robotics – Reprogrammable (i.e. make a mistake, fix it) General Purpose – Less expensive than a dedicated hardware system (single HW w/ Software computer system can be used for multiple designs) • MP3 Player: System-on-Chip (SoC) approach – Some dedicated HW for intensive MP3 decoding operations – Programmable processor for UI & other simple tasks http://d2rormqr1qwzpz.cloudfront.net/photos/2014/01/01/56914-moto_x.jpg 0.11 0.12 Computer Components Combine 2c. Flour Mix in 3 eggs • Processor Instructions – Executes the program and performs all the operations • Main Memory Data – Stores data and program Processor (Reads instructions, ( instructions) operates on data) – Different forms: Processor • RAM = read and write but Arithmetic + volatile (lose values when power off) Logic + Control • ROM = read-only but non-volatile Drivers and Trends Software (maintains values when power Circuitry Program off) ARCHITECTURE OVERVIEW – Significantly slower than the processor speeds Program • Input / Output Devices Input Output (Instructions) – Generate and consume data from Devices Devices Data the system (Operands) – MUCH, MUCH slower than the Memory (RAM) Disk Drive processor Data

  4. 0.13 0.14 Moore’s Law, Computer Architecture & Real- Architecture Issues Estate Planning • Fundamentally, architecture is all about the different • Moore’s Law = Number of transistors able to be ways of answering the question: fabricated on a chip grows exponentially with time “What do we do with the ever-increasing number of • Computer architects decide, transistors available to us” “What should we do with all of this capability?” • Similarly real-estate • Goal of a computer architect is to take increasing developers ask, “How do we transistor budgets of a chip (i.e. Moore’s Law) and make best use of the land produce an equivalent increase in computational area given to us?” ability USC University Park Development Master Plan http://re.usc.edu/docs/University%20Park%20Development%20Project.pdf 0.15 0.16 Transistor Physics Technology Nodes • Cross-section of transistors on an IC • Moore’s Law is founded on our ability to keep shrinking transistor sizes – Gate/channel width shrinks – Gate oxide shrinks • Transistor feature size is referred to as the implementation “technology node”

  5. 0.17 0.18 Growth of Transistors on Chip Implications of Moore’s Law 1,000,000 Core 2 Duo Pentium D (291M) (230M) Pentium 4 • What should we do with all these transistors Prescott (125M) 100,000 Pentium 3 – Put additional simple cores on a chip (28M) Pentium 4 Northwood (42M) – Use transistors to make cores execute instructions 10,000 Tranistor Count (Thousands) Pentium Pentium 2 (3.1M) (7M) faster Pentium Pro (5.5M) Intel '486 1,000 – Use transistors for more on-chip cache memory (1.2M) Intel '386 • Cache is an on-chip memory used to store data the (275K) 100 Intel '286 processor is likely to need (134K) • Faster than main-memory which is on a separate chip Intel 8086 (29K) 10 and much larger (thus slower) 1 1975 1980 1985 1990 1995 2000 2005 2010 Year 0.19 0.20 Pentium 4 Increase in Clock Frequency 10000 Pentium 4 Prescott (3600) L2 Cache Core 2 Duo (2400) Pentium 4 Pentium D Willamette (2800) (1500) 1000 Pentium 3 Pentium 2 (700) (266) Frequency (MHz) Pentium Pro L1 Data 100 (200) Intel '486 Pentium Intel '386 (25) (60) (20) Intel '286 (12.5) 10 Intel 8086 (8) 1 L1 Instruc. 1975 1980 1985 1990 1995 2000 2005 2010 Year

  6. 0.21 0.22 Intel Nehalem Quad Core Progression to Parallel Systems • If power begins to limit clock frequency, how can we continue to achieve more and more operations per second? – By running several processor cores in parallel at lower frequencies – Two cores @ 2 GHz vs. 1 core @ 4 GHz yield the same theoretical maximum ops./sec. • We’ll end our semester by examining (briefly) a few parallel architectures – Chip multiprocessors (multicore) – Graphics Processor Units (SIMT) 0.23 0.24 Flynn’s Taxonomy GPU Chip Layout • Categorize architectures based on relationship between • 2560 Small program (instructions) and data Cores SISD SIMD / SIMT • Upwards of Single-Instruction, Single-Data Single Instruction, Multiple Data (Single Instruction, Multiple Thread) 7.2 billion • Typical, single-threaded processor • Vector Units (e.g. Intel MMX, SSE, transistors SSE2) • GPU’s • 8.2 TFLOPS • 320 MISD MIMD Gbytes/sec Multiple Instruction, Single-Data Multiple Instruction, Multiple-Data • Less commonly used (some streaming • Multi-threaded processors architectures may be considered in this • Typical CMP/Multicore system (Task category) parallelism with different threads Source: NVIDIA executing) Photo: http://www.theregister.co.uk/2010/01/19/nvidia_gf100/

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend