Computer Organization & Assembly Language Programming (CSE - - PowerPoint PPT Presentation

computer organization assembly language programming cse
SMART_READER_LITE
LIVE PREVIEW

Computer Organization & Assembly Language Programming (CSE - - PowerPoint PPT Presentation

Computer Organization & Assembly Language Programming (CSE 2312) Lecture 3 Taylor Johnson Summary from Last Time Binary to decimal, decimal to binary, ASCII Structured computers Multilevel computers and architectures


slide-1
SLIDE 1

Computer Organization & Assembly Language Programming (CSE 2312)

Lecture 3 Taylor Johnson

slide-2
SLIDE 2

Summary from Last Time

  • Binary to decimal, decimal to binary, ASCII
  • Structured computers
  • Multilevel computers and architectures
  • Abstraction layers

August 28, 2014 CSE2312, Fall 2014 2

slide-3
SLIDE 3

Announcements and Outline

  • Quiz 1 on Blackboard site (due 11:59PM Friday)
  • Review binary arithmetic, Boolean operations, and

representing numbers in binary

  • Homework 1 on course website
  • Read chapter 1
  • Review from last time
  • Structured computers
  • Performance metrics

August 28, 2014 CSE2312, Fall 2014 3

slide-4
SLIDE 4

Review: Multilevel Architectures

August 28, 2014 CSE2312, Fall 2014 4

Physical Device Level (Electronics) Digital Logic Level Microarchitecture Level Instruction Set Architecture (ISA) Level Operating System Level

Level 0 Level 1 Level 2 Level 3 Level 4

n/a / Physics VHDL / Verilog

n/a / Microcode

Assembly / Machine Language C / …

slide-5
SLIDE 5

Review: Levels of Program Code

  • High-level language
  • Level of abstraction closer

to problem domain

  • Provides for productivity

and portability

  • Assembly language
  • Textual representation of

instructions

  • Hardware representation
  • Binary digits (bits)
  • Encoded instructions and

data

August 28, 2014 CSE2312, Fall 2014 5

slide-6
SLIDE 6

Review: Computer Organization Overview

  • CPU
  • Executes instructions
  • Memory
  • Stores programs and data
  • Buses
  • Transfers data
  • Storage
  • Permanent
  • I/O devices
  • Input: keypad, mouse, touch
  • Output: printer, screen
  • Both (input and output), such

as:

  • USB, network, Wifi, touch screen,

hard drive

August 28, 2014 CSE2312, Fall 2014 6

slide-7
SLIDE 7

Review: Von Neumann Architecture

  • Both data and program

stored in memory

  • Allows the computer to

be “re-programmed”

  • Input/output (I/O) goes

through CPU

  • I/O part is not

representative of modern systems (direct memory access [DMA])

  • Memory layout is

representative of modern systems

August 28, 2014 CSE2312, Fall 2014 7

Memory (Data + Program [Instructions]) CPU I/O

DMA

slide-8
SLIDE 8

Review: Abstract Processor Execution Cycle

August 28, 2014 CSE2312, Fall 2014 8

FETCH[PC] (Get instruction from memory) EXECUTE (Execute instruction fetched from memory) Interrupt ? PC++ (Increment the Program Counter)

No Yes

Handle Interrupt (Input/Output Event)

slide-9
SLIDE 9

Demonstration

  • VMWare, QEMU, and ARM ISA and gdb
  • We will use QEMU and ARM later in this course
  • Particularly for programming assignments
  • ARM versus x86
  • ARM is prevalent in embedded systems and handheld devices,

many of which have more limited resources than your x86/x86- 64 PC

  • Limited resources sometimes requires being very efficient (in

space/memory or time/processing complexity)

  • Potentially greater need to interface with hardware

August 28, 2014 CSE2312, Fall 2014 9

slide-10
SLIDE 10

August 28, 2014 CSE2312, Fall 2014 10

slide-11
SLIDE 11

August 28, 2014 CSE2312, Fall 2014 11

slide-12
SLIDE 12

August 28, 2014 CSE2312, Fall 2014 12

slide-13
SLIDE 13

August 28, 2014 CSE2312, Fall 2014 13

slide-14
SLIDE 14

Announcements and Outline

  • Quiz 1 on Blackboard site (due 11:59PM Friday)
  • Review binary arithmetic, Boolean operations, and

representing numbers in binary

  • Homework 1 on course website
  • Read chapter 1
  • Review from last time
  • Structured computers
  • Performance metrics

August 28, 2014 CSE2312, Fall 2014 14

slide-15
SLIDE 15

Performance Metrics

  • Performance is important in computer systems
  • How to quantitatively compare different computer systems?
  • How to do this in general?
  • Cars: MPG, speed, acceleration, towing capability,

passengers, …

  • Computer processors
  • execution time of a program (seconds)
  • instruction count (instructions executed in a program)
  • CPI: clock cycles per instruction (average number of clock cycles per

instruction)

  • Clock cycle time (seconds per clock cycle)

August 28, 2014 CSE2312, Fall 2014 15

slide-16
SLIDE 16

200 400 600

Douglas DC-8-50 BAC/Sud Concorde Boeing 747 Boeing 777

Passenger Capacity

Defining Performance

  • Which airplane has the best performance?

5000 10000 Douglas DC-8-50 BAC/Sud Concorde Boeing 747 Boeing 777 Cruising Range (miles)

500 1000 1500 Douglas DC-8-50 BAC/Sud Concorde Boeing 747 Boeing 777 Cruising Speed… 200000 400000 Douglas DC-8-50 BAC/Sud

Concorde Boeing 747 Boeing 777 Passengers x mph

August 28, 2014 16

slide-17
SLIDE 17

Some Units You Must Know

  • Hertz (Hz): unit of

frequency

  • 1 Hz: once per second
  • 1 Megahertz (1 MHz):
  • ne million times per

second

  • 1 Gigahertz (1 GHz):
  • ne billion times per

second

17

  • second: unit of time

– 1 millisecond (1ms): a thousandth of a second. – 1 microsecond (1μs): a millionth of a second. – 1 nanosecond (1ns): a billionth of a second.

  • Similarly for meters:

– millimeter: a thousandth – micrometer: a millionth – nanometer: a billionth

August 28, 2014 CSE2312, Fall 2014

slide-18
SLIDE 18

Units of Memory

  • One bit (binary digit): the smallest amount of

information that we can store:

  • Either a 1 or a 0
  • Sometimes refer to 1 as high/on/true, 0 as low/off/false
  • One byte = 8 bits
  • Can store a number from 0 to 255
  • Kilobyte (KB): 103 = 1000 bytes
  • Kibibyte (KiB): 210 = 1024 bytes
  • Kilobit: (Kb): 103 = 1000 bits (125 bytes)
  • Kibibit: (Kib): 210 = 1024 bits (128 bytes)

18 August 28, 2014 CSE2312, Fall 2014

slide-19
SLIDE 19

Metric Units

August 28, 2014 CSE2312, Fall 2014 19

slide-20
SLIDE 20

Moore's Law for the Intel Family

August 28, 2014 CSE2312, Fall 2014 20

slide-21
SLIDE 21

Moore's Law

  • Not a real "law" of nature, just a practical
  • bservation that has remained surprisingly accurate

for decades

  • Predicts a 60% annual increase in the number of

transistors per chip

  • Number of transistors on a chip doubles every 18

months

  • Memory capacity doubles every 2 years
  • Disk capacity doubles every year

21 August 28, 2014 CSE2312, Fall 2014

slide-22
SLIDE 22

The Power Wall

August 28, 2014 CSE2312, Fall 2014 22

slide-23
SLIDE 23

Moore's Law

  • These observations are more like rules of thumb
  • However, they have been good predictors since the

1960's, more than half a century!

  • Moore's law originally was stated in 1965
  • How long will this exponential growth in hardware

capabilities grow?

  • Nobody really knows
  • Expected to continue for the next few years
  • When transistors get to be the size of an atom, hard to

predict if and how this growth can continue

23 August 28, 2014 CSE2312, Fall 2014

slide-24
SLIDE 24

Moore Law Example 1

  • Suppose average disk capacity right now is 1TB
  • Suppose disk capacity doubles each year
  • What will average disk capacity be in 5 years?

24 August 28, 2014 CSE2312, Fall 2014

slide-25
SLIDE 25

Moore Law Example 1

  • Suppose average disk capacity right now is 1TB.
  • Suppose disk capacity doubles each year.
  • What will average disk capacity be in 5 years?
  • Answer: 32 TB

25 August 28, 2014 CSE2312, Fall 2014

slide-26
SLIDE 26

Moore Law Example 2

  • Suppose average number of instructions per second

in 1960 was 100,000 (this number is made up)

  • Suppose average number of instructions per second

in 1970 was 10,000,000 (this number is made up)

  • What would be Moore's law for the average

number of instructions? How often does it double?

26 August 28, 2014 CSE2312, Fall 2014

slide-27
SLIDE 27

Moore Law Example 2

  • Suppose average number of instructions per second

in 1960 was 100,000 (this number is made up)

  • Suppose average number of instructions per second

in 1970 was 10,000,000 (this number is made up)

  • What would be Moore's law for the average

number of instructions? How often does it double?

  • Answer:
  • In 10 years, this number increased by 100 times.
  • 100 = 26.64
  • Thus, this number doubles every 10/6.64 years = about 18

months

27 August 28, 2014 CSE2312, Fall 2014

slide-28
SLIDE 28

Silicon Integrated Circuit Manufacturing Process

August 28, 2014 CSE2312, Fall 2014 28

slide-29
SLIDE 29

August 28, 2014 CSE2312, Fall 2014 29

A 12-inch (300mm) wafer of AMD Opteron X2 chips, the predecessor

  • f Opteron X4 chips (Courtesy

AMD). The number of dies per wafer at 100% yield is 117. The several dozen partially rounded chips at the boundaries of the wafer are useless; they are included because it’s easier to create the masks used to pattern the silicon. This die uses a 90-nanometer technology, which means that the smallest transistors are approximately 90 nm in size, although they are typically somewhat smaller than the actual feature size, which refers to the size

  • f the transistors as “drawn” versus

the final manufactured size.

slide-30
SLIDE 30

Integrated Circuit Cost

  • Nonlinear relation to area and defect rate
  • Wafer cost and area are fixed
  • Defect rate determined by manufacturing process
  • Die area determined by architecture and circuit design

2

area/2)) Die area per (Defects (1 1 Yield area Die area Wafer wafer per Dies Yield wafer per Dies wafer per Cost die per Cost      

August 28, 2014 30

slide-31
SLIDE 31

Moore's Law

Moore’s law predicts a 60 percent annual increase in the number of transistors that can be put on a chip. The data points given above and below the line are memory sizes, in bits.

August 28, 2014 CSE2312, Fall 2014 31

slide-32
SLIDE 32

Growth in in processor performance sin ince the mid id-1980 (rela lative to VAX 11/780 on SPECin int benchmarks)

August 28, 2014 CSE2312, Fall 2014 32

slide-33
SLIDE 33

Response Time and Throughput

  • Response time
  • How long it takes to do a task
  • Throughput
  • Total work done per unit time
  • e.g., tasks/transactions/… per hour
  • How are response time and throughput affected by
  • Replacing the processor with a faster version?
  • Adding more processors?
  • We’ll focus on response time for now…

August 28, 2014 33

slide-34
SLIDE 34

Relative Performance

  • Define Performance = 1/Execution Time
  • “X is n time faster than Y”

n  

X Y Y X

time Execution time Execution e Performanc e Performanc

 Example: time taken to run a program

 10s on A, 15s on B  Execution TimeB / Execution TimeA

= 15s / 10s = 1.5

 So A is 1.5 times faster than B

34

slide-35
SLIDE 35

Measuring Execution Time

  • Elapsed time
  • Total response time, including all aspects
  • Processing, I/O, OS overhead, idle time
  • Determines system performance
  • CPU time
  • Time spent processing a given job
  • Discounts I/O time, other jobs’ shares
  • Comprises user CPU time and system CPU time
  • Different programs are affected differently by CPU and

system performance

August 28, 2014 35

slide-36
SLIDE 36

CPU Clocking

  • Operation of digital hardware governed by a constant-

rate clock

Clock (cycles) Data transfer and computation Update state Clock period

 Clock period: duration of a clock cycle

 e.g., 250ps = 0.25ns = 250×10–12s

 Clock frequency (rate): cycles per second

 e.g., 4.0GHz = 4000MHz = 4.0×109Hz

August 28, 2014 36

slide-37
SLIDE 37

CPU Time

  • Performance improved by
  • Reducing number of clock cycles
  • Increasing clock rate
  • Hardware designer must often trade off clock rate against

cycle count

Rate Clock Cycles Clock CPU Time Cycle Clock Cycles Clock CPU Time CPU   

August 28, 2014 37

slide-38
SLIDE 38

CPU Time Example

  • Computer A: 2GHz clock, 10s CPU time
  • Designing Computer B
  • Aim for 6s CPU time
  • Can do faster clock, but causes 1.2 × clock cycles
  • How fast must Computer B clock be?

4GHz 6s 10 24 6s 10 20 1.2 Rate Clock 10 20 2GHz 10s Rate Clock Time CPU Cycles Clock 6s Cycles Clock 1.2 Time CPU Cycles Clock Rate Clock

9 9 B 9 A A A A B B B

              

August 28, 2014 38

slide-39
SLIDE 39

Instruction Count and CPI

  • Instruction Count for a program = number of

instructions in program

  • Determined by program, ISA and compiler
  • Average cycles per instruction (CPI) = number of cycles

to execute an instruction (on average)

  • Determined by CPU hardware
  • If different instructions have different CPI
  • Average CPI affected by instruction mix

Rate Clock CPI Count n Instructio Time Cycle Clock CPI Count n Instructio Time CPU n Instructio per Cycles Count n Instructio Cycles Clock       

August 28, 2014 39

slide-40
SLIDE 40

CSE2312, Fall 2014

CPI Example

  • Computer A: Cycle Time = 250ps, CPI = 2.0
  • Computer B: Cycle Time = 500ps, CPI = 1.2
  • Same ISA
  • Which is faster, and by how much?

1.2 500ps I 600ps I A Time CPU B Time CPU 600ps I 500ps 1.2 I B Time Cycle B CPI Count n Instructio B Time CPU 500ps I 250ps 2.0 I A Time Cycle A CPI Count n Instructio A Time CPU                    

A is faster… …by this much

40

slide-41
SLIDE 41

CPI in More Detail

  • If different instruction classes take different numbers
  • f cycles

 

n 1 i i i

) Count n Instructio (CPI Cycles Clock

 Weighted average CPI

        

n 1 i i i

Count n Instructio Count n Instructio CPI Count n Instructio Cycles Clock CPI

Relative frequency

August 28, 2014 41

slide-42
SLIDE 42

CPI Example

  • Alternative compiled code sequences using

instructions in classes A, B, C

Class A B C CPI for class 1 2 3 IC in sequence 1 2 1 2 IC in sequence 2 4 1 1

 Sequence 1: IC = 5

 Clock Cycles

= 2×1 + 1×2 + 2×3 = 10

 Avg. CPI = 10/5 = 2.0

 Sequence 2: IC = 6

 Clock Cycles

= 4×1 + 1×2 + 1×3 = 9

 Avg. CPI = 9/6 = 1.5

August 28, 2014 42

slide-43
SLIDE 43

Performance Summary

  • Performance depends on
  • Algorithm: affects IC, possibly CPI
  • Programming language: affects IC, CPI
  • Compiler: affects IC, CPI
  • Instruction set architecture: affects IC, CPI, Tc

cycle Clock Seconds n Instructio cycles Clock Program ns Instructio Time CPU   

August 28, 2014 43

slide-44
SLIDE 44

Reducing Power

  • Suppose a new CPU has
  • 85% of capacitive load of old CPU
  • 15% voltage and 15% frequency reduction

0.52 0.85 F V C 0.85 F 0.85) (V 0.85 C P P

4

  • ld

2

  • ld
  • ld
  • ld

2

  • ld
  • ld
  • ld

new

         

 The power wall

 We can’t reduce voltage further  We can’t remove more heat

 How else can we improve performance?

August 28, 2014 44

slide-45
SLIDE 45

SPEC CPU Benchmark

  • Programs used to measure performance
  • Supposedly typical of actual workload
  • Standard Performance Evaluation Corp (SPEC)
  • Develops benchmarks for CPU, I/O, Web, …
  • SPEC CPU2006
  • Elapsed time to execute a selection of programs
  • Negligible I/O, so focuses on CPU performance
  • Normalize relative to reference machine
  • Summarize as geometric mean of performance ratios
  • CINT2006 (integer) and CFP2006 (floating-point)

n n 1 i i

ratio time Execution

August 28, 2014 45

slide-46
SLIDE 46

SPECint2006 / CINT2006 for Intel Core i7 920

August 28, 2014 46

slide-47
SLIDE 47

Pitfall: MIPS as a Performance Metric

  • MIPS: Millions of Instructions Per Second
  • Doesn’t account for
  • Differences in ISAs between computers
  • Differences in complexity between instructions

6 6 6

10 CPI rate Clock 10 rate Clock CPI count n Instructio count n Instructio 10 time Execution count n Instructio MIPS       

 CPI varies between programs on a given CPU

August 28, 2014 47

slide-48
SLIDE 48

Technological and Economic Forces

  • Improvements in hardware creates opportunities

for new applications

  • New applications attract new businesses
  • New businesses drive competition
  • Competition drives improvements in hardware

48 August 28, 2014 CSE2312, Fall 2014

slide-49
SLIDE 49

Technological and Economic Forces

  • "Software is a gas. It expands to fill the container

holding it."

  • Nathan Myhrvold, former Microsoft executive.
  • Software expands with additional features, to exploit

new hardware capabilities.

  • Software expansion creates need for better hardware.

49 August 28, 2014 CSE2312, Fall 2014

slide-50
SLIDE 50

Chapter 1 Summary

  • Cost/performance is improving
  • Due to underlying technology development
  • Hierarchical layers of abstraction
  • In both hardware and software
  • Instruction set architecture
  • The hardware/software interface
  • Execution time: the best performance measure
  • Power is a limiting factor
  • Use parallelism to improve performance

August 28, 2014 CSE2312, Fall 2014 50

slide-51
SLIDE 51

Summary

  • Reviewed structured computers
  • Basic performance metrics, units
  • Quiz 1 on Blackboard
  • Homework 1

August 28, 2014 CSE2312, Fall 2014 51

slide-52
SLIDE 52

Questions?

August 28, 2014 CSE2312, Fall 2014 52

slide-53
SLIDE 53

SPEC Power Benchmark

  • Power consumption of server at different workload

levels

  • Performance: ssj_ops/sec
  • Power: Watts (Joules/sec)

            

 

  10 i i 10 i i

power ssj_ops Watt per ssj_ops Overall

August 28, 2014 53

slide-54
SLIDE 54

SPECpower_ssj2008 for Xeon X5650

August 28, 2014 54

slide-55
SLIDE 55

Pitfall: Amdahl’s Law

  • Improving an aspect of a computer and expecting a

proportional improvement in overall performance

20 80 20   n

 Can’t be done!

unaf f ected af f ected improv ed

T factor t improvemen T T  

 Example: multiply accounts for 80s/100s

 How much improvement in multiply performance to

get 5× overall?

 Corollary: make the common case fast

August 28, 2014 55

slide-56
SLIDE 56

Fallacy: Low Power at Idle

  • Look back at i7 power benchmark
  • At 100% load: 258W
  • At 50% load: 170W (66%)
  • At 10% load: 121W (47%)
  • Google data center
  • Mostly operates at 10% – 50% load
  • At 100% load less than 1% of the time
  • Consider designing processors to make power

proportional to load

August 28, 2014 56