Computer Organization & Assembly Language Programming (CSE 2312)
Lecture 3 Taylor Johnson
Computer Organization & Assembly Language Programming (CSE - - PowerPoint PPT Presentation
Computer Organization & Assembly Language Programming (CSE 2312) Lecture 3 Taylor Johnson Summary from Last Time Binary to decimal, decimal to binary, ASCII Structured computers Multilevel computers and architectures
Lecture 3 Taylor Johnson
August 28, 2014 CSE2312, Fall 2014 2
August 28, 2014 CSE2312, Fall 2014 3
August 28, 2014 CSE2312, Fall 2014 4
n/a / Microcode
Assembly / Machine Language C / …
to problem domain
and portability
instructions
data
August 28, 2014 CSE2312, Fall 2014 5
as:
hard drive
August 28, 2014 CSE2312, Fall 2014 6
stored in memory
be “re-programmed”
through CPU
representative of modern systems (direct memory access [DMA])
representative of modern systems
August 28, 2014 CSE2312, Fall 2014 7
Memory (Data + Program [Instructions]) CPU I/O
DMA
August 28, 2014 CSE2312, Fall 2014 8
FETCH[PC] (Get instruction from memory) EXECUTE (Execute instruction fetched from memory) Interrupt ? PC++ (Increment the Program Counter)
Handle Interrupt (Input/Output Event)
many of which have more limited resources than your x86/x86- 64 PC
space/memory or time/processing complexity)
August 28, 2014 CSE2312, Fall 2014 9
August 28, 2014 CSE2312, Fall 2014 10
August 28, 2014 CSE2312, Fall 2014 11
August 28, 2014 CSE2312, Fall 2014 12
August 28, 2014 CSE2312, Fall 2014 13
August 28, 2014 CSE2312, Fall 2014 14
instruction)
August 28, 2014 CSE2312, Fall 2014 15
200 400 600
Douglas DC-8-50 BAC/Sud Concorde Boeing 747 Boeing 777
Passenger Capacity
5000 10000 Douglas DC-8-50 BAC/Sud Concorde Boeing 747 Boeing 777 Cruising Range (miles)
500 1000 1500 Douglas DC-8-50 BAC/Sud Concorde Boeing 747 Boeing 777 Cruising Speed… 200000 400000 Douglas DC-8-50 BAC/Sud
Concorde Boeing 747 Boeing 777 Passengers x mph
August 28, 2014 16
17
August 28, 2014 CSE2312, Fall 2014
18 August 28, 2014 CSE2312, Fall 2014
August 28, 2014 CSE2312, Fall 2014 19
August 28, 2014 CSE2312, Fall 2014 20
21 August 28, 2014 CSE2312, Fall 2014
August 28, 2014 CSE2312, Fall 2014 22
23 August 28, 2014 CSE2312, Fall 2014
24 August 28, 2014 CSE2312, Fall 2014
25 August 28, 2014 CSE2312, Fall 2014
26 August 28, 2014 CSE2312, Fall 2014
27 August 28, 2014 CSE2312, Fall 2014
August 28, 2014 CSE2312, Fall 2014 28
August 28, 2014 CSE2312, Fall 2014 29
A 12-inch (300mm) wafer of AMD Opteron X2 chips, the predecessor
AMD). The number of dies per wafer at 100% yield is 117. The several dozen partially rounded chips at the boundaries of the wafer are useless; they are included because it’s easier to create the masks used to pattern the silicon. This die uses a 90-nanometer technology, which means that the smallest transistors are approximately 90 nm in size, although they are typically somewhat smaller than the actual feature size, which refers to the size
the final manufactured size.
2
August 28, 2014 30
August 28, 2014 CSE2312, Fall 2014 31
August 28, 2014 CSE2312, Fall 2014 32
August 28, 2014 33
X Y Y X
Example: time taken to run a program
10s on A, 15s on B Execution TimeB / Execution TimeA
So A is 1.5 times faster than B
34
August 28, 2014 35
Clock (cycles) Data transfer and computation Update state Clock period
Clock period: duration of a clock cycle
e.g., 250ps = 0.25ns = 250×10–12s
Clock frequency (rate): cycles per second
e.g., 4.0GHz = 4000MHz = 4.0×109Hz
August 28, 2014 36
August 28, 2014 37
9 9 B 9 A A A A B B B
August 28, 2014 38
August 28, 2014 39
CSE2312, Fall 2014
A is faster… …by this much
40
n 1 i i i
Weighted average CPI
n 1 i i i
Relative frequency
August 28, 2014 41
Class A B C CPI for class 1 2 3 IC in sequence 1 2 1 2 IC in sequence 2 4 1 1
Sequence 1: IC = 5
Clock Cycles
Avg. CPI = 10/5 = 2.0
Sequence 2: IC = 6
Clock Cycles
Avg. CPI = 9/6 = 1.5
August 28, 2014 42
August 28, 2014 43
4
2
2
new
The power wall
We can’t reduce voltage further We can’t remove more heat
How else can we improve performance?
August 28, 2014 44
n n 1 i i
August 28, 2014 45
August 28, 2014 46
6 6 6
CPI varies between programs on a given CPU
August 28, 2014 47
48 August 28, 2014 CSE2312, Fall 2014
49 August 28, 2014 CSE2312, Fall 2014
August 28, 2014 CSE2312, Fall 2014 50
August 28, 2014 CSE2312, Fall 2014 51
August 28, 2014 CSE2312, Fall 2014 52
10 i i 10 i i
August 28, 2014 53
August 28, 2014 54
Can’t be done!
unaf f ected af f ected improv ed
Example: multiply accounts for 80s/100s
How much improvement in multiply performance to
Corollary: make the common case fast
August 28, 2014 55
August 28, 2014 56