Performance Characterization of SPEC CPU Benchmarks on Intels Core - - PowerPoint PPT Presentation

performance characterization of spec cpu benchmarks on
SMART_READER_LITE
LIVE PREVIEW

Performance Characterization of SPEC CPU Benchmarks on Intels Core - - PowerPoint PPT Presentation

Laboratory for Computer Architecture Laboratory for Computer Architecture The University of Texas at Austin and IBM Performance Characterization of SPEC CPU Benchmarks on Intels Core Microarchitecture Based Processor Sarah Bird, Aashish


slide-1
SLIDE 1

1

Laboratory for Computer Architecture

January 21, 2007 Performance Characterization of SPEC CPU Benchmarks on Intel’s Core Microarchitecture based processor

1

Performance Characterization of SPEC CPU Benchmarks on Intel’s Core Microarchitecture Based Processor

Sarah Bird, Aashish Phansalkar, Lizy K. John, Alex Mericas, Rajeev Indukuru

Laboratory for Computer Architecture The University of Texas at Austin and IBM

slide-2
SLIDE 2

2

Laboratory for Computer Architecture

January 21, 2007 Performance Characterization of SPEC CPU Benchmarks on Intel’s Core Microarchitecture based processor

2

Outline

  • Motivation
  • Objectives
  • Methodology
  • System Design and Details
  • Performance Characterization Results of SPEC CPU Benchmarks
  • Fusion Description and Results
  • Conclusion
  • Questions
slide-3
SLIDE 3

3

Laboratory for Computer Architecture

January 21, 2007 Performance Characterization of SPEC CPU Benchmarks on Intel’s Core Microarchitecture based processor

3

Motivation

  • Study the design of the Core Microarchitecture and it’s new features to

learn how they work

  • Study the behavior of the SPEC CPU benchmark suites on the Core

Microarchitecture

  • Study the effect of the new features on the behavior of the SPEC CPU

benchmarks

slide-4
SLIDE 4

4

Laboratory for Computer Architecture

January 21, 2007 Performance Characterization of SPEC CPU Benchmarks on Intel’s Core Microarchitecture based processor

4

Objectives

  • To analyze the behavior of the SPEC CPU2006 suite in comparison to

the behavior of the SPEC CPU2000 suite on a Core Microarchitecture processor.

  • To determine if fusion (macro and micro-op) contributed noticeably to

the improved performance of the Core Microarchitecture processor as compared to its predecessors.

slide-5
SLIDE 5

5

Laboratory for Computer Architecture

January 21, 2007 Performance Characterization of SPEC CPU Benchmarks on Intel’s Core Microarchitecture based processor

5

Methodology

  • Run SPEC CPU2006 and CPU2000 benchmark suites on a Core

Microarchitecture based processor*

  • Use performance counters to collect information about the behavior of

the benchmarks*

  • Use data provided by performance counters to compare CPU2006 and

CPU2000

  • Use runtimes from the SPEC website for Core predecessors to

determine the performance improvement for each benchmark on the Core Microarchitecture

  • Compare the amount of fusion (macro and micro-op) measured by the

performance counters to the calculated to performance improvement

*Steps performed by IBM

slide-6
SLIDE 6

6

Laboratory for Computer Architecture

January 21, 2007 Performance Characterization of SPEC CPU Benchmarks on Intel’s Core Microarchitecture based processor

6

System

  • Woodcrest System

– Tyan S5380 Motherboard – 2 Xeon 5160 CPU’s running at 3.0Ghz – 4x1GB memory DIMMS at 667Mhz

  • Benchmark Compilers

– Intel C Compiler for 32-bit applications, Version 9.1 – Intel Fortran Compiler for 32-bit applications, Version 9.1

*Image taken from Real World Technologies “Intel’s Next Generation Microarchitecture Unveiled”

slide-7
SLIDE 7

7

Laboratory for Computer Architecture

January 21, 2007 Performance Characterization of SPEC CPU Benchmarks on Intel’s Core Microarchitecture based processor

7

System Details

  • L1 Cache

– 2 32KB caches – 8 way associativity

  • L2 Cache

– 1 unified 4MB cache – 16 way associativity

  • Macro-Fusion

– Fuses 2 x86 instructions – Compare and jump instructions

  • Micro-op Fusion

– Fuses 2 micro-ops – Store address and data micro-ops

*Image taken from Real World Technologies “Intel’s Next Generation Microarchitecture Unveiled”

slide-8
SLIDE 8

8

Laboratory for Computer Architecture

January 21, 2007 Performance Characterization of SPEC CPU Benchmarks on Intel’s Core Microarchitecture based processor

8

Performance Characterization Results

slide-9
SLIDE 9

9

Laboratory for Computer Architecture

January 21, 2007 Performance Characterization of SPEC CPU Benchmarks on Intel’s Core Microarchitecture based processor

9

Instruction Mix for CPU2006

SPEC CPU2006 Integer SPEC CPU2006 Floating Point

9.0% 32.1% 25.7% 483.xalancbmk 4.6% 26.9% 17.1% 473.astar 17.7% 34.2% 20.7% 471.omnetpp 12.1% 35.0% 7.5% 464.h264ref 5.0% 14.4% 27.3% 462.libquantum 8.0% 21.1% 21.4% 458.sjeng 16.2% 40.8% 8.4% 456.hmmer 14.2% 27.9% 20.7% 445.gobmk 8.6% 30.6% 19.2% 429.mcf 13.1% 25.6% 21.9% 403.gcc 8.9% 26.4% 15.3% 401.bzip2 11.5% 23.9% 23.3% 400.perlbench % Stores % Loads % Branches Benchmark 3.0% 30.4% 10.2% 482.sphinx3 7.5% 30.7% 5.7% 481.wrf 8.5% 46.5% 0.7% 410.bwaves 14.5% 29.4% 3.4% 435.gromacs 8.5% 26.3% 0.9% 470.ibm 10.8% 34.8% 5.9% 465.tonto 10.0% 45.1% 1.5% 459.GemsFDTD 3.1% 31.9% 4.6% 454.calculix 8.8% 30.0% 14.3% 453.povray 7.5% 38.9% 16.4% 450.soplex 7.3% 34.6% 17.2% 447.dealll 6.0% 23.3% 4.9% 444.namd 10.6% 45.4% 3.2% 437.leslie3d 13.2% 46.5% 0.2% 436.cactusADM 8.1% 28.7% 4.0% 434.zeusmp 10.7% 37.3% 1.5% 433.milc 9.2% 34.6% 7.9% 416.games % Stores % Loads % Branches Benchmark

slide-10
SLIDE 10

10

Laboratory for Computer Architecture

January 21, 2007 Performance Characterization of SPEC CPU Benchmarks on Intel’s Core Microarchitecture based processor

10

L1 data cache misses per 1000 Instructions

25 50 75 100 125 150 400.perlbench 401.bzip2 403.gcc 429.mcf 445.gobmk 456.hmmer 458.sjeng 462.libquantum 464.h264ref 471.omnetpp 473.astar 483.xalancbmk Misses per Kinst

25 50 75 100 125 150 164.gzip 175.vpr 176.gcc 181.mcf 186.crafty 197.parser 252.eon 253.perlbmk 254.gap 255.vortex 256.bzip2 300.twolf Misses per Kinst

SPEC CPU2006 SPEC CPU2000

slide-11
SLIDE 11

11

Laboratory for Computer Architecture

January 21, 2007 Performance Characterization of SPEC CPU Benchmarks on Intel’s Core Microarchitecture based processor

11

L2 cache misses per 1000 Instructions

SPEC CPU2006 SPEC CPU2000

5 10 15 20

400.perlbench 401.bzip2 403.gcc 429.mcf 445.gobmk 456.hmmer 458.sjeng 462.libquantum 464.h264ref 471.omnetpp 473.astar 483.xalancbmk

Misses per Kinst

36.73 5 10 15 20 164.gzip 175.vpr 176.gcc 181.mcf 186.crafty 197.parser 252.eon 253.perlbmk 254.gap 255.vortex 256.bzip2 300.twolf Misses per Kinst

slide-12
SLIDE 12

12

Laboratory for Computer Architecture

January 21, 2007 Performance Characterization of SPEC CPU Benchmarks on Intel’s Core Microarchitecture based processor

12

Branch mispredictions per 1000 Instructions

SPEC CPU2006 SPEC CPU2000

5 10 15 20 25 400.perlbench 401.bzip2 403.gcc 429.mcf 445.gobmk 456.hmmer 458.sjeng 462.libquantum 464.h264ref 471.omnetpp 473.astar 483.xalancbmk Mispredictions per Kinst

5 10 15 20 25 164.gzip 175.vpr 176.gcc 181.mcf 186.crafty 197.parser 252.eon 253.perlbmk 254.gap 255.vortex 256.bzip2 300.twolf Mispredictions per Kinst

slide-13
SLIDE 13

13

Laboratory for Computer Architecture

January 21, 2007 Performance Characterization of SPEC CPU Benchmarks on Intel’s Core Microarchitecture based processor

13

Performance Characteristics Correlation with CPI

  • L2 misses have the strongest correlation with CPI
  • L1-D caches misses also show significant correlation with CPI
  • Branch mispredictions do not appear to directly impact CPI

0.964 L2 misses per KI and CPI 0.918 L1-D cache misses per KI and CPI 0.150 Branch mispresdictions per KI and CPI Correlation Coefficient Characteristics

slide-14
SLIDE 14

14

Laboratory for Computer Architecture

January 21, 2007 Performance Characterization of SPEC CPU Benchmarks on Intel’s Core Microarchitecture based processor

14

Fusion

slide-15
SLIDE 15

15

Laboratory for Computer Architecture

January 21, 2007 Performance Characterization of SPEC CPU Benchmarks on Intel’s Core Microarchitecture based processor

15

Macro-Fusion

  • New feature for Core
  • Decreases number of micro-ops
  • One macro-fusion per cycle
  • Fused in pre-decode phase
  • Fuses branch and compare

instructions

*Image taken from Real World Technologies “Intel’s Next Generation Microarchitecture Unveiled”

slide-16
SLIDE 16

16

Laboratory for Computer Architecture

January 21, 2007 Performance Characterization of SPEC CPU Benchmarks on Intel’s Core Microarchitecture based processor

16

Micro-op Fusion

  • Enhanced version of Pentium

M feature

  • Occurs in the decode phase
  • Fused pair is issued/executed

separately, but tracked by the reorder buffer as one micro-op

  • Typically fuses store address

micro-op and a data micro-op

  • Increases space in reorder

buffer

*Image taken from Real World Technologies “Intel’s Next Generation Microarchitecture Unveiled”

slide-17
SLIDE 17

17

Laboratory for Computer Architecture

January 21, 2007 Performance Characterization of SPEC CPU Benchmarks on Intel’s Core Microarchitecture based processor

17

Performance Improvement Calculation Details

(Predecessor Runtime) *(Predecessor Frequency) - (Core Cycles) (Predecessor Runtime) *(Predecessor Frequency)

Core and Predecessor Comparison Chart

No Yes Yes Micro-op Fusion No No Yes Macro-Fusion SPEC Website Reported Results SPEC Website Reported Results Performance Counter Data Data Source Pentium Extreme Edition 965 Core Duo T2500 Xeon 5160 Processor NetBurst Yonah Core Architecture

Increase in Performance

=

slide-18
SLIDE 18

18

Laboratory for Computer Architecture

January 21, 2007 Performance Characterization of SPEC CPU Benchmarks on Intel’s Core Microarchitecture based processor

18

Percentage of fused operations for SPEC CPU2006

SPEC CPU2006 Integer SPEC CPU2006 Floating Point

37.10% 21.12% 15.98% 483.xalancbmk 27.06% 14.40% 12.86% 473.astar 32.06% 23.87% 8.19% 471.omnetpp 24.96% 23.45% 1.51% 464.h264ref 15.15% 13.92% 1.59% 462.libquantum 32.65% 18.32% 14.33% 458.sjeng 23.43% 23.30% 0.13% 456.hmmer 31.52% 20.19% 11.33% 445.gobmk 35.33% 21.40% 13.93% 429.mcf 34.57% 18.39% 16.18% 403.gcc 30.79% 18.95% 11.84% 401.bzip2 31.86% 19.68% 12.18% 400.perlbench Total Micro Macro Benchmark 16.29% 15.58% 0.71% 435.gromacs 19.78% 19.56% 0.22% 470.ibm 26.04% 24.35% 1.69% 465.tonto 19.04% 18.66% 0.38% 459.GemsFDTD 20.11% 19.67% 0.44% 454.calculix 26.28% 21.88% 4.40% 453.povray 18.69% 13.59% 5.10% 450.soplex 31.01% 22.98% 8.03% 447.dealll 10.38% 10.02% 0.36% 444.namd 24.82% 24.12% 0.70% 437.leslie3d 24.14% 24.14% 0.00% 436.cactusADM 16.41% 16.32% 0.09% 434.zeusmp 18.17% 17.82% 0.35% 433.milc 25.76% 23.58% 2.18% 416.games Total Micro Macro Benchmark

slide-19
SLIDE 19

19

Laboratory for Computer Architecture

January 21, 2007 Performance Characterization of SPEC CPU Benchmarks on Intel’s Core Microarchitecture based processor

19 Percentage of increase in performance and percentage fused macro-ops for Woodcrest over Yonah

CPU2006 integer

  • 80%
  • 60%
  • 40%
  • 20%

0% 20% 40% 0% 5% 10% 15% 20% % of fused macro-fused ops % increase in performance

CPU2006 fp

  • 200%
  • 150%
  • 100%
  • 50%

0% 50% 0% 2% 4% 6% 8% 10% % of fused macro-fused ops % increase in performance

slide-20
SLIDE 20

20

Laboratory for Computer Architecture

January 21, 2007 Performance Characterization of SPEC CPU Benchmarks on Intel’s Core Microarchitecture based processor

20 Percentage of increase in performance and percentage fused macro-ops for Woodcrest over Netburst

CPU2006 integer

  • 40%
  • 20%

0% 20% 40% 60% 80% 0% 5% 10% 15% 20% % of fused macro-fused ops % increase in performance

CPU2006 fp

  • 160%
  • 120%
  • 80%
  • 40%

0% 40% 80% 0% 2% 4% 6% 8% 10% % of fused macro-fused ops % increase in performance

slide-21
SLIDE 21

21

Laboratory for Computer Architecture

January 21, 2007 Performance Characterization of SPEC CPU Benchmarks on Intel’s Core Microarchitecture based processor

21 Percentage increase in performance compared with measured micro-op fusion for Netburst to Core

CPU2006 integer

  • 80%
  • 60%
  • 40%
  • 20%

0% 20% 40% 0% 5% 10% 15% 20% 25% 30%

%

  • f fused micro-ops

%increase in performance

CPU2006 fp

0% 10% 20% 30% 40% 50% 60% 0% 5% 10% 15% 20% 25% 30%

% of fused micro-ops % increase in performance

slide-22
SLIDE 22

22

Laboratory for Computer Architecture

January 21, 2007 Performance Characterization of SPEC CPU Benchmarks on Intel’s Core Microarchitecture based processor

22

Conclusion

  • SPEC CPU2006 stresses L2 Cache more
  • The measured amount of Macro-fusion in floating point benchmarks is

very low and does not correlate well with increase in performance

  • The measured amount of Macro-fusion in integer benchmarks shows

significant correlation with the increase in performance

  • Micro-op fusion does not correlate well with increase in performance
slide-23
SLIDE 23

23

Laboratory for Computer Architecture

January 21, 2007 Performance Characterization of SPEC CPU Benchmarks on Intel’s Core Microarchitecture based processor

23

Questions?

For more information please see the Laboratory for Computer Architecture website http://lca.ece.utexas.edu