Lecture 2: Processor Design, Single-Processor Performance - PowerPoint PPT Presentation

Lecture 2: Processor Design, Single-Processor Performance G63.2011.002/G22.2945.001 · September 14, 2010 Intro Basics Assembly Memory Pipelines

Outline Intro The Basic Subsystems Machine Language The Memory Hierarchy Pipelines Intro Basics Assembly Memory Pipelines

Admin Bits • Lec. 1 slides posted • New here? Welcome! Please send in survey info (see lec. 1 slides) via email • PASI • Please subscribe to mailing list • Near end of class: 5-min, 3-question ‘concept check’ Intro Basics Assembly Memory Pipelines

Introduction Goal for Today High Performance Computing : Discuss the actual computer end of this. . . . . . and its influence on performance Intro Basics Assembly Memory Pipelines

What’s in a computer? Intro Basics Assembly Memory Pipelines

What’s in a computer? Processor Intel Q6600 Core2 Quad, 2.4 GHz Intro Basics Assembly Memory Pipelines

What’s in a computer? Die Processor (2 × ) 143 mm 2 , 2 × 2 cores Intel Q6600 Core2 Quad, 2.4 GHz 582,000,000 transistors ∼ 100W Intro Basics Assembly Memory Pipelines

What’s in a computer? Intro Basics Assembly Memory Pipelines

What’s in a computer? Memory Intro Basics Assembly Memory Pipelines

A Basic Processor Memory Interface Address ALU Address Bus Data Bus Register File Flags Internal Bus Insn. PC fetch Data ALU Control Unit (loosely based on Intel 8086) Intro Basics Assembly Memory Pipelines

A Basic Processor Memory Interface Address ALU Address Bus Data Bus Register File Flags Internal Bus Insn. Bonus Question: PC fetch Data ALU What’s a bus? Control Unit (loosely based on Intel 8086) Intro Basics Assembly Memory Pipelines

How all of this fits together Everything synchronizes to the Clock . Control Unit (“CU”): The brains of the Memory Interface operation. Everything connects to it. Address ALU Address Bus Data Bus Bus entries/exits are gated and Register File Flags (potentially) buffered . Internal Bus CU controls gates, tells other units Insn. fetch PC Data ALU Control Unit about ‘what’ and ‘how’: • What operation? • Which register? • Which addressing mode? Intro Basics Assembly Memory Pipelines

What is. . . an ALU? A rithmetic L ogic U nit One or two operands A, B Operation selector (Op): • (Integer) Addition, Subtraction A B • (Logical) And, Or, Not • (Bitwise) Shifts (equivalent to multiplication by power of two) Op • (Integer) Multiplication, Division Specialized ALUs: R • Floating Point Unit (FPU) • Address ALU Operates on binary representations of numbers. Negative numbers represented by two’s complement. Intro Basics Assembly Memory Pipelines

What is. . . a Register File? Registers are On-Chip Memory %r0 • Directly usable as operands in %r1 Machine Language %r2 • Often “general-purpose” %r3 %r4 • Sometimes special-purpose: Floating point, Indexing, Accumulator %r5 %r6 • Small: x86 64: 16 × 64 bit GPRs %r7 • Very fast (near-zero latency) Intro Basics Assembly Memory Pipelines

How does computer memory work? One (reading) memory transaction (simplified): D0..15 Memory Processor A0..15 R/ ¯ W CLK Intro Basics Assembly Memory Pipelines

How does computer memory work? One (reading) memory transaction (simplified): D0..15 Memory Processor A0..15 R/ ¯ W CLK Observation: Access (and addressing) happens in bus-width-size “chunks”. Intro Basics Assembly Memory Pipelines

What is. . . a Memory Interface? Memory Interface gets and stores binary words in off-chip memory. Smallest granularity: Bus width Tells outside memory • “where” through address bus • “what” through data bus Computer main memory is “Dynamic RAM” (DRAM): Slow, but small and cheap. Intro Basics Assembly Memory Pipelines

A Very Simple Program 4: c7 45 f4 05 00 00 00 movl $0x5, − 0xc(%rbp) b: c7 45 f8 11 00 00 00 movl $0x11, − 0x8(%rbp) int a = 5; 12: 8b 45 f4 mov − 0xc(%rbp),%eax int b = 17; 15: 0f af 45 f8 imul − 0x8(%rbp),%eax int z = a ∗ b; 19: 89 45 fc mov %eax, − 0x4(%rbp) 1c: 8b 45 fc mov − 0x4(%rbp),%eax Things to know: • Addressing modes (Immediate, Register, Base plus Offset) • 0xHexadecimal • “AT&T Form”: (we’ll use this) <opcode><size> <source>, <dest> Intro Basics Assembly Memory Pipelines

Another Look Memory Interface Address ALU Address Bus Data Bus Register File Flags Internal Bus Insn. PC fetch Data ALU Control Unit Intro Basics Assembly Memory Pipelines

Another Look 4: c7 45 f4 05 00 00 00 movl $0x5, − 0xc(%rbp) b: c7 45 f8 11 00 00 00 movl $0x11, − 0x8(%rbp) 12: 8b 45 f4 mov − 0xc(%rbp),%eax Memory Interface 15: 0f af 45 f8 imul − 0x8(%rbp),%eax 19: 89 45 fc mov %eax, − 0x4(%rbp) 1c: 8b 45 fc mov − 0x4(%rbp),%eax Address ALU Address Bus Data Bus Register File Flags Internal Bus Insn. PC fetch Data ALU Control Unit Intro Basics Assembly Memory Pipelines

A Very Simple Program: Intel Form 4: c7 45 f4 05 00 00 00 mov DWORD PTR [rbp − 0xc],0x5 b: c7 45 f8 11 00 00 00 mov DWORD PTR [rbp − 0x8],0x11 12: 8b 45 f4 mov eax,DWORD PTR [rbp − 0xc] 15: 0f af 45 f8 imul eax,DWORD PTR [rbp − 0x8] 19: 89 45 fc mov DWORD PTR [rbp − 0x4],eax 1c: 8b 45 fc mov eax,DWORD PTR [rbp − 0x4] • “Intel Form”: (you might see this on the net) <opcode> <sized dest>, <sized source> • Goal: Reading comprehension. • Don’t understand an opcode? Google “ <opcode> intel instruction ”. Intro Basics Assembly Memory Pipelines

Machine Language Loops 0: 55 push %rbp 1: 48 89 e5 mov %rsp,%rbp int main() 4: c7 45 f8 00 00 00 00 movl $0x0, − 0x8(%rbp) { b: c7 45 fc 00 00 00 00 movl $0x0, − 0x4(%rbp) int y = 0, i ; 12: eb 0a jmp 1e < main+0x1e > 14: 8b 45 fc mov − 0x4(%rbp),%eax for (i = 0; 17: 01 45 f8 add %eax, − 0x8(%rbp) y < 10; ++i) 1a: 83 45 fc 01 addl $0x1, − 0x4(%rbp) y += i; 1e: 83 7d f8 09 cmpl $0x9, − 0x8(%rbp) return y; 22: 7e f0 jle 14 < main+0x14 > 24: 8b 45 f8 mov − 0x8(%rbp),%eax } 27: c9 leaveq 28: c3 retq Things to know: • Condition Codes (Flags): Zero, Sign, Carry, etc. • Call Stack: Stack frame, stack pointer, base pointer • ABI: Calling conventions Intro Basics Assembly Memory Pipelines

Machine Language Loops 0: 55 push %rbp 1: 48 89 e5 mov %rsp,%rbp int main() 4: c7 45 f8 00 00 00 00 movl $0x0, − 0x8(%rbp) { b: c7 45 fc 00 00 00 00 movl $0x0, − 0x4(%rbp) int y = 0, i ; 12: eb 0a jmp 1e < main+0x1e > 14: 8b 45 fc mov − 0x4(%rbp),%eax for (i = 0; 17: 01 45 f8 add %eax, − 0x8(%rbp) y < 10; ++i) 1a: 83 45 fc 01 addl $0x1, − 0x4(%rbp) y += i; 1e: 83 7d f8 09 cmpl $0x9, − 0x8(%rbp) return y; 22: 7e f0 jle 14 < main+0x14 > 24: 8b 45 f8 mov − 0x8(%rbp),%eax } 27: c9 leaveq 28: c3 retq Things to know: Want to make those yourself? • Condition Codes (Flags): Zero, Sign, Carry, etc. Write myprogram.c . • Call Stack: Stack frame, stack pointer, base pointer $ cc -c myprogram.c $ objdump --disassemble myprogram.o • ABI: Calling conventions Intro Basics Assembly Memory Pipelines

We know how a computer works! All of this can be built in about 4000 transistors. (e.g. MOS 6502 in Apple II, Commodore 64, Atari 2600) So what exactly is Intel doing with the other 581,996,000 transistors? Answer: Intro Basics Assembly Memory Pipelines

We know how a computer works! All of this can be built in about 4000 transistors. (e.g. MOS 6502 in Apple II, Commodore 64, Atari 2600) So what exactly is Intel doing with the other 581,996,000 transistors? Answer: Make things go faster! Intro Basics Assembly Memory Pipelines

Lecture 2: Processor Design, Single-Processor Performance - PowerPoint PPT Presentation

Lecture 2: Processor Design, Single-Processor Performance G63.2011.002/G22.2945.001 September 14, 2010 Intro Basics Assembly Memory Pipelines Outline Intro The Basic Subsystems Machine Language The Memory Hierarchy Pipelines Intro

Processor Design Pipelined Processor Hung-Wei Tseng Drawbacks of a single-cycle processor

Processor Design Single Cycle Processor Hung-Wei Tseng Recap: the stored-program computer

FPGA co-processor Patrick Dunne for the co-processor group Introduction Co-processor will

UCSB CS240A, Winter 2016 1 Motivation Most applications in a single processor runs at only

Chapter 18 Parallel Processing Multiple Processor Organization Single instruction, single

Processor Design in Three Acts Act I: A single-cycle CPU Foreshadowing Act I: A

Processor Datapath Levels in Processor Design We can talk about design at a variety of levels

SoC SoC Design Design Lecture 2: Design Methodology and Lecture Lecture 2: Design Methodology

Using Single Photons Using Single Photons Using Single Photons Using Single Photons for WIMP

Systems Architecture The ARM Processor The ARM Processor p. 1/14 The ARM Processor ARM:

Outline Introduction to CMOS VLSI Design Partitioning Design MIPS Processor Example

Processor Performance and Parallelism Y. K. Malaiya Processor Execution time The time taken by

SINGLE PIPE Yes! One (1) single pipe connected to all units on single pipe loop(s); no return pipe!

The Processor: Datapath and Control 3/ 24/ 2016 1 A single-cycle MIPS processor An

Today Finish up performance measurement benchmarks Dissect some C code and assembly

Cortex-A15 Processor ARMs next generation mobile applications processor Travis Lanier Senior

Valgrind register allocator overhaul Ivo Raisr FOSDEM 2018 Ivo Raisr 39.6 GNU Toolchain

Removing ROP Gadgets from OpenBSD AsiaBSDCon 2019 Todd Mortimer mortimer@openbsd.org Overview

LLVM Backend for HHVM Brett Simmers Maksim Panchenko Facebook HHVM JIT for PHP/Hack

Binarylevel program analysis: A discussion of x8664 Gang Tan CSE 597 Spring 2019 Penn

How Julia Goes Fast Leah Hanson Main Points 1. Design choices make Julia fast. 2. Design and

STACK AND HEAP: COMMONLY ABUSED TERMS Simon Brand Codeplay Soware Ltd. AGENDA A bit about

Learning Automatic Schedulers through Projective Reparameterization Ajay Jain Saman Amarasinghe

Using Hardware Performance Events for Instruction-Level Monitoring on the x86 Architecture