Trends and evaluation Computer Architecture J. Daniel Garca Snchez - - PowerPoint PPT Presentation

trends and evaluation
SMART_READER_LITE
LIVE PREVIEW

Trends and evaluation Computer Architecture J. Daniel Garca Snchez - - PowerPoint PPT Presentation

Trends and evaluation Trends and evaluation Computer Architecture J. Daniel Garca Snchez (coordinator) David Expsito Singh Francisco Javier Garca Blas ARCOS Group Computer Science and Engineering Department University Carlos III of


slide-1
SLIDE 1

Trends and evaluation

Trends and evaluation

Computer Architecture

  • J. Daniel García Sánchez (coordinator)

David Expósito Singh Francisco Javier García Blas

ARCOS Group Computer Science and Engineering Department University Carlos III of Madrid

cbed

– Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 1/46

slide-2
SLIDE 2

Trends and evaluation Technology trends

1

Technology trends

2

Power and energy trends

3

Trends in cost

4

Performance evaluation

5

Conclusion

cbed

– Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 2/46

slide-3
SLIDE 3

Trends and evaluation Technology trends

Technology impact

Technology changes have impact on ISA implementation mechanisms. Technologies:

Integrated circuit logic. DRAM. Flash. Magnetic disks. Networks.

cbed

– Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 3/46

slide-4
SLIDE 4

Trends and evaluation Technology trends

Trends

Integrated circuits technologies.

Transistors density: ↑ 35% per year. Die size: ↑ 10%-20% per year. Combined effect: ↑ 40%-55% per year (Moore’s Law).

DRAM Capacity.

↑ 25%-40% per year (going down).

Flash Capacity.

↑ 50%-60% per year. 15-20 times cheaper per bit than DRAM.

Magnetic disks capacity.

↑ 40% per year. 15-25 times cheaper per bit than Flash. 300-500 times cheaper than DRAM.

cbed

– Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 4/46

slide-5
SLIDE 5

Trends and evaluation Technology trends

Bandwidth and latency

Bandwidth or throughput.

Amount of work performed per unit of time. Processors: Increase between 10,000 and 25,000. Memory and disks: Increase between 300 and 1,200.

Latency and response time.

Time between event start and end. Processors: Increase between 30 and 80. Memories and disks: Increase between 6 and 8.

cbed

– Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 5/46

slide-6
SLIDE 6

Trends and evaluation Power and energy trends

1

Technology trends

2

Power and energy trends

3

Trends in cost

4

Performance evaluation

5

Conclusion

cbed

– Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 6/46

slide-7
SLIDE 7

Trends and evaluation Power and energy trends

Power and Energy: Example

Two different systems (A y B).

A consumes 20% more power than B. A runs a task in 70% of B time. Which has a lower cost?

The adequate metric for comparison is Energy.

E(B) = P(B) · t(B) E(A) = 1.2 · P(B) · 0.7 · t(B) = 0.84 · E(B) System A uses 84% of B energy.

cbed

– Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 7/46

slide-8
SLIDE 8

Trends and evaluation Power and energy trends

Energy and power in microprocessors

In CMOS technology, energy consumption is derived from transistors switching. Dynamic energy:

Amount of energy needed to switch.

0 → 1 or 1 → 0. Ed ≈ 1

2 · Xc · V 2

Dynamic power:

Depends on switching frequency.

Pd ≈ 1

2 · Xc · V 2 · f

Note

Xc: Capacitive load V: Voltage f: Frequency

cbed

– Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 8/46

slide-9
SLIDE 9

Trends and evaluation Power and energy trends

Example

If a 15% voltage reduction implies a 15% frequency reduction:

Which is the effect on dynamic power?

Solution Pnew Pold = (V · 0.85)2 · (f · 0.85) V 2 · f = 0.853 = 0.61

cbed

– Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 9/46

slide-10
SLIDE 10

Trends and evaluation Power and energy trends

Consequences

Reduction:

Power and dynamic energy get reduced when voltage is reduced.

In 20 years voltage has reduced from 5V to 1V.

Capacitive load depends on transistors fan-out.

Mechanism to control power and energy.

cbed

– Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 10/46

slide-11
SLIDE 11

Trends and evaluation Power and energy trends

Evolution

Evolution dominated by number of transistors increase and frequency increase.

Power and energy increase.

Intel 80386 → 2 W Intel Core i7 3.3 GHz → 130 W.

Chip: 1.5 × 1.5 cm. Limit of cooling by ventilation.

cbed

– Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 11/46

slide-12
SLIDE 12

Trends and evaluation Power and energy trends

Energy efficiency

Techniques:

Turn off clock for inactive modules. Dynamic Voltage-Frequency Scaling (DVFS). Low power modes for memory and disks.

Requires reactivation mechanism.

Automatic overclocking.

Enabled when it is safe. Example: Core i7 3.3 GHz may run busts at 3.6 GHz.

cbed

– Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 12/46

slide-13
SLIDE 13

Trends and evaluation Trends in cost

1

Technology trends

2

Power and energy trends

3

Trends in cost

4

Performance evaluation

5

Conclusion

cbed

– Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 13/46

slide-14
SLIDE 14

Trends and evaluation Trends in cost

Cost

Manufacturing cost for a computer decreases over time.

Learning curve principle.

Measured by yield of manufacturing process (percentage of devices surviving manufacturing) When yield is doubled, cost is reduced to half. DRAM: Average yearly decrease around 40% in cost and price (except when there is shortage or oversupply).

Volume:

10% decrease in cost when volume is doubled. Reduction of cost amortized per unit. Increase of manufacturing process efficiency.

Multiple vendors selling the same product (commodities):

Highly competitive market.

cbed

– Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 14/46

slide-15
SLIDE 15

Trends and evaluation Trends in cost

Cost of integrated circuits

Manufacturing process.

Wafer → Dies.

Cost

CostIC = Costdie + Costtesting + Costpacking yield Costdie = Costwafer Dieswafer × yield Dieswafer = π × ( diameter

2

)2 area − π × diameter √ 2 × area

cbed

– Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 15/46

slide-16
SLIDE 16

Trends and evaluation Trends in cost

Example

Wafer with 30 cm. diameter.

Dies of 1.5 cm.

Dies per wafer: 270.

Dies of 1 cm.

Dies per wafer: 640.

cbed

– Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 16/46

slide-17
SLIDE 17

Trends and evaluation Performance evaluation

1

Technology trends

2

Power and energy trends

3

Trends in cost

4

Performance evaluation

5

Conclusion

cbed

– Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 17/46

slide-18
SLIDE 18

Trends and evaluation Performance evaluation Performance metrics

4

Performance evaluation Performance metrics Benchmarks Amdahl’s Law Processor performance

cbed

– Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 18/46

slide-19
SLIDE 19

Trends and evaluation Performance evaluation Performance metrics

Execution speed

What does it mean that computer A is faster than computer B?

Desktop.

My program runs in less time. I want to decrease execution time.

Website admin.

I can process more transactions per hour. I want to increase throughput.

cbed

– Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 19/46

slide-20
SLIDE 20

Trends and evaluation Performance evaluation Performance metrics

Performance and execution time

Performance P(x) is a metric, inverse to execution time T(x). Performance P(x) =

1 T(x)

High Performance → Low execution time. x runs n times faster than Y. Speedup n = T(x)

T(y) =

1 P(x) 1 P(y) = P(y)

P(x)

cbed

– Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 20/46

slide-21
SLIDE 21

Trends and evaluation Performance evaluation Performance metrics

Metrics

The only reliable metric for comparing computer performance is the execution of real programs.

Any other metric is error-prone. Any alternative other than real programs is error-prone.

Execution time.

Response time: Total elapsed time. Perceived by the user: CPU time: Time the CPU has been busy.

cbed

– Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 21/46

slide-22
SLIDE 22

Trends and evaluation Performance evaluation Benchmarks

4

Performance evaluation Performance metrics Benchmarks Amdahl’s Law Processor performance

cbed

– Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 22/46

slide-23
SLIDE 23

Trends and evaluation Performance evaluation Benchmarks

Workload

Computer performance depends on the evaluated workload. Computers adapted to specific workloads:

Web servers. Database servers. File servers. Personal computers. Multiprocessors. Multicomputers. . . .

cbed

– Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 23/46

slide-24
SLIDE 24

Trends and evaluation Performance evaluation Benchmarks

Benchmarks

Application or set of applications used to evaluate performance. Approaches:

Kernels: Small parts of real applications.

Example: FFT.

Toy programs: Short programs.

Example: Quicksort.

Synthetic benchmarks: Invented to represent real applications.

Example: Dhrystone.

All are bad approaches:

Architect and compiler might cheat!

cbed

– Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 24/46

slide-25
SLIDE 25

Trends and evaluation Performance evaluation Benchmarks

Benchmarks

Embedded:

Dhrystone (arguable relevance). EEMBC (kernels).

Desktop:

SPEC2006 (mix of integer and floating point programs).

Servers:

SPECWeb, SPECSFS, SPECjbb, SPECvirt_Sc2010. TPC

cbed

– Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 25/46

slide-26
SLIDE 26

Trends and evaluation Performance evaluation Benchmarks

Example: SPEC2006

CINT2006: Part with integer programs (without floating point).

12 programs (9 in C, 3 in C++). Several application domains:

Languages and compilers, compression, video, combinatorial optimization, artificial intelligence, protein sequencing, quantum physics, . . .

CFP2006: Part with floating point programs.

17 programs.

Fortran: 6. C: 3 Fortran and C: 4 C++: 4

Several application domains:

Physics, Chemistry, Biology, Algebra, image rendering, speech recognition, . . .

cbed

– Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 26/46

slide-27
SLIDE 27

Trends and evaluation Performance evaluation Amdahl’s Law

4

Performance evaluation Performance metrics Benchmarks Amdahl’s Law Processor performance

cbed

– Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 27/46

slide-28
SLIDE 28

Trends and evaluation Performance evaluation Amdahl’s Law

Amdahl’s Law

Performance increase obtained using a faster execution mode is limited by the fraction of time that the mode can be used. Speedup:

Ratio between improved performance (P(I)) and original performance (P(O)).

S = P(I) P(O) S = T(O) T(I)

cbed

– Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 28/46

slide-29
SLIDE 29

Trends and evaluation Performance evaluation Amdahl’s Law

Execution time

T(A) T(B) T(A′) T(B)

F = T(A) T(A) + T(B) S(i) = T(A) T(A′) T ′ = T(A′) + T(B) = T(A) S(i) + (1 − F) × T T ′ = F × T S(i) + (1 − F) × T T ′ = T ×

  • (1 − F) +

F S(i)

  • cbed

– Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 29/46

slide-30
SLIDE 30

Trends and evaluation Performance evaluation Amdahl’s Law

Example

20 5 10 5

F = 20 20 + 5 = 0.8 S(i) = 20 10 = 2 T ′ = T ×

  • (1 − F) +

F S(i)

  • = 25 ×
  • (1 − 0.8) + 0.8

2

  • = 15

We already knew this!

cbed

– Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 30/46

slide-31
SLIDE 31

Trends and evaluation Performance evaluation Amdahl’s Law

Amdahl’s Law

T(A) T(B) T(A′) T(B)

T ′ = T ×

  • (1 − F) +

F S(i)

  • S = T

T ′ = T T ×

  • (1 − F) +

F S(i)

= 1 (1 − F) +

F S(i)

Speedup depends exclusively on the improvement fraction and the speedup of the improvement.

cbed

– Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 31/46

slide-32
SLIDE 32

Trends and evaluation Performance evaluation Amdahl’s Law

Case 1

A Web server distributes its time between:

Computing: 40 I/O: 60

When replaced by another machine that can perform computing 10 times faster, which is the global speedup? Solution S = 1 0.6 + 0.4

10

= 1 0.64 = 1.5625

cbed

– Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 32/46

slide-33
SLIDE 33

Trends and evaluation Performance evaluation Amdahl’s Law

Case 2

An application has a parallelizable part that takes 50% of the execution time.

Assuming that this part can be fully parallelized with 32 processors, which is the maximum speedup?

Solution S = 1 0.5 + 0.5

32

= 1 0.515625 = 1.9393

cbed

– Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 33/46

slide-34
SLIDE 34

Trends and evaluation Performance evaluation Amdahl’s Law

Speedup evolution

20 40 60 80 100 120 2 4 6 8 10 processors speedup F=0.5 F=0.6 F=0.7 F=0.8 F=0.9

cbed

– Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 34/46

slide-35
SLIDE 35

Trends and evaluation Performance evaluation Amdahl’s Law

Amdahl’s Law consequences

The greater the fraction of improvement (F), the more effective the improvement is. To improve a complex system you must optimized the elements that are used most of the time (most common case). Optimization application:

Within the processor: in the data path. In the instruction set: the execution of most frequent instructions. In the design of memory hierarchy, programming and compilation: exploiting reference locality.

10% of code is executed for 90% of time.

cbed

– Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 35/46

slide-36
SLIDE 36

Trends and evaluation Performance evaluation Processor performance

4

Performance evaluation Performance metrics Benchmarks Amdahl’s Law Processor performance

cbed

– Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 36/46

slide-37
SLIDE 37

Trends and evaluation Performance evaluation Processor performance

Execution time

A processor executes each instruction during several clock cycles. Time consumed by CPU timeCPU = cyclesCPU clock frequency

cbed

– Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 37/46

slide-38
SLIDE 38

Trends and evaluation Performance evaluation Processor performance

CPI: Cycles per instruction

Average speed may be expressed as cycles per instruction (CPI) using:

Total number of consumed cycles, and number of executed instructions or instruction count (IC).

CPI CPI = cyclesCPU IC

cbed

– Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 38/46

slide-39
SLIDE 39

Trends and evaluation Performance evaluation Processor performance

Factors in execution time

CPI and CPU time CPI = cyclesCPU IC timeCPU = cyclesCPU f = CPI × IC f = CPI × IC × T If any of the 3 factors is reduced by 10% the total execution time is reduced by 10%.

But the 3 factors are related.

cbed

– Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 39/46

slide-40
SLIDE 40

Trends and evaluation Performance evaluation Processor performance

Instructions classes

Different instruction classes have different IC and CPI. Global CPI

cyclesCPU =

n

  • i=1

ICi × CPI timeCPU =

  • n
  • i=1

ICi × CPIi

  • × T

CPIglobal = n

i=1 ICi × CPIi

IC =

n

  • i=1

ICi IC ×CPIi

Impact of instructions relative frequency in program execution.

cbed

– Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 40/46

slide-41
SLIDE 41

Trends and evaluation Performance evaluation Processor performance

Example

In a program’s execution we have observed that:

Floating point operation: 25% (4.0 CPI on average). Operation FPSQR (square root): 2% (20.0 CPI).

Included in floating point.

Rest of instructions: 1.33 CPI.

Choose among design alternatives:

a Decrease FPSQR CPI to 2. b Decrease all floating point instructions CPI to 2.5.

cbed

– Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 41/46

slide-42
SLIDE 42

Trends and evaluation Performance evaluation Processor performance

Solution

CPI = 0.25 × 4 + 1.33 × 0.75 = 1.9975 0.25 × CPIFP = 0.23 × CPIotherFP + 0.02 × CPIFPSQR 0.25 × 4 = 0.23 × CPIotherFP + 0.02 × 20 CPIotherFP = 0.24 × 4 − 0.02 × 20 0.23 = 2.6087 CPInuevoFPSQR = 0.23×2.6087+0.02×2+0.75×1.33 = 1.6375 CPInewFP = 0.25 × 2.5 + 0.75 × 1.33 = 1.6225

cbed

– Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 42/46

slide-43
SLIDE 43

Trends and evaluation Conclusion

1

Technology trends

2

Power and energy trends

3

Trends in cost

4

Performance evaluation

5

Conclusion

cbed

– Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 43/46

slide-44
SLIDE 44

Trends and evaluation Conclusion

Summary

Bandwidth has improved much more than latency during the last 20 years. The increasing used power limits the clock frequency. Decrease in manufacturing cost over time. The only reliable metric to compare computer performance is the execution of real programs. Amdahl’s Law sets a limit on performance improvement with multiple applications. Relative instruction frequency has a high impact on program execution speed.

cbed

– Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 44/46

slide-45
SLIDE 45

Trends and evaluation Conclusion

References

Computer Architecture. A Quantitative Approach 5th Ed. Hennessy and Patterson. Sections 1.4 to 1.9.

cbed

– Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 45/46

slide-46
SLIDE 46

Trends and evaluation Conclusion

Trends and evaluation

Computer Architecture

  • J. Daniel García Sánchez (coordinator)

David Expósito Singh Francisco Javier García Blas

ARCOS Group Computer Science and Engineering Department University Carlos III of Madrid

cbed

– Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 46/46