Outline 2 Outline 2 ZSim core simulation techniques Outline 2 - - PowerPoint PPT Presentation

outline
SMART_READER_LITE
LIVE PREVIEW

Outline 2 Outline 2 ZSim core simulation techniques Outline 2 - - PowerPoint PPT Presentation

ZSim Tutorial MICRO 2015 S TATS AND C ONFIGURATION Core Models Po-An Tsai Outline 2 Outline 2 ZSim core simulation techniques Outline 2 ZSim core simulation techniques ZSim core structure I1D I1I Simple IPC 1 core


slide-1
SLIDE 1

STATS AND CONFIGURATION

ZSim Tutorial – MICRO 2015

Core Models

Po-An Tsai

slide-2
SLIDE 2

Outline

2

slide-3
SLIDE 3

Outline

 ZSim core simulation techniques

2

slide-4
SLIDE 4

Outline

 ZSim core simulation techniques  ZSim core structure

 Simple IPC 1 core  Timing core  OOO core

2 Core I1D I1I

slide-5
SLIDE 5

Outline

 ZSim core simulation techniques  ZSim core structure

 Simple IPC 1 core  Timing core  OOO core

 Coding examples with demo

 Branch predictor  Westmere to Silvermont

2 Core I1D I1I

slide-6
SLIDE 6

Core Simulation Techniques

3

slide-7
SLIDE 7

Core Simulation Techniques

 ZSim simulates the system using Pin

 Leverages dynamic binary translation

3

slide-8
SLIDE 8

Core Simulation Techniques

 ZSim simulates the system using Pin

 Leverages dynamic binary translation

 ZSim mainly uses 4 types of analysis routine

 Basic block  Load and Store  Branch

to cover the simulated program

3

slide-9
SLIDE 9

Core Simulation Technique

4

slide-10
SLIDE 10

Core Simulation Technique

 A basic block (BBL) from Pin

4

slide-11
SLIDE 11

Core Simulation Technique

 A basic block (BBL) from Pin

4

mov (%rbp),%rcx add %rax,%rbx mov %rdx,(%rbp) ja 40530a

slide-12
SLIDE 12

Core Simulation Technique

 A basic block (BBL) from Pin

mov (%rbp),%rcx add %rax,%rbx mov %rdx,(%rbp) ja 40530a

5

slide-13
SLIDE 13

Core Simulation Technique

 A basic block (BBL) from Pin  1. Simulate core activities with a BBL descriptor that

contains most of the static information

mov (%rbp),%rcx add %rax,%rbx mov %rdx,(%rbp) ja 40530a

5

slide-14
SLIDE 14

Core Simulation Technique

 A basic block (BBL) from Pin  1. Simulate core activities with a BBL descriptor that

contains most of the static information

mov (%rbp),%rcx add %rax,%rbx mov %rdx,(%rbp) ja 40530a mov (%rbp),%rcx add %rax,%rbx mov %rdx,(%rbp) ja 40530a

5

slide-15
SLIDE 15

Core Simulation Technique

 A basic block (BBL) from Pin  1. Simulate core activities with a BBL descriptor that

contains most of the static information

mov (%rbp),%rcx add %rax,%rbx mov %rdx,(%rbp) ja 40530a mov (%rbp),%rcx add %rax,%rbx mov %rdx,(%rbp) ja 40530a

5

Decode BBL into BBL descriptor

slide-16
SLIDE 16

Core Simulation Technique

 A basic block (BBL) from Pin  1. Simulate core activities with a BBL descriptor that

contains most of the static information

mov (%rbp),%rcx add %rax,%rbx mov %rdx,(%rbp) ja 40530a mov (%rbp),%rcx add %rax,%rbx mov %rdx,(%rbp) ja 40530a BblDescriptor: numInstructions = 4 numBytes = 4 uop[]

5

Decode BBL into BBL descriptor

slide-17
SLIDE 17

Core Simulation Technique

 A basic block (BBL) from Pin  1. Simulate core activities with a BBL descriptor that

contains most of the static information

mov (%rbp),%rcx add %rax,%rbx mov %rdx,(%rbp) ja 40530a mov (%rbp),%rcx add %rax,%rbx mov %rdx,(%rbp) ja 40530a BblDescriptor: numInstructions = 4 numBytes = 4 uop[]

5

Decode BBL into BBL descriptor BasicBlock(BblDescriptor)

slide-18
SLIDE 18

Core Simulation Technique

6

 Decode x86 instructions into uops

 With different latencies, src/dst pair, function unit ports

slide-19
SLIDE 19

Core Simulation Technique

7

slide-20
SLIDE 20

Core Simulation Technique

 2. Simulate memory system operations with addresses

7

slide-21
SLIDE 21

Core Simulation Technique

 2. Simulate memory system operations with addresses

mov (%rbp),%rcx add %rax,%rbx mov %rdx,(%rbp) BasicBlock(BblDescriptor) ja 40530a

7

slide-22
SLIDE 22

Core Simulation Technique

 2. Simulate memory system operations with addresses

mov (%rbp),%rcx add %rax,%rbx mov %rdx,(%rbp) BasicBlock(BblDescriptor) ja 40530a

7

Load(%rbp) Store(%rbp)

slide-23
SLIDE 23

Core Simulation Technique

 2. Simulate memory system operations with addresses

mov (%rbp),%rcx add %rax,%rbx mov %rdx,(%rbp) BasicBlock(BblDescriptor) ja 40530a Load(Address addr) { L1D->load(addr); } Store(Address addr) { L1D->Store(addr); }

7

Load(%rbp) Store(%rbp)

slide-24
SLIDE 24

Core Simulation Technique

8

slide-25
SLIDE 25

Core Simulation Technique

 Instruction-driven core activity (basic block) simulation

 Simulates multiple stages for single instruction at once  Each stage maintains a separate clock

8

slide-26
SLIDE 26

Core Simulation Technique

 Instruction-driven core activity (basic block) simulation

 Simulates multiple stages for single instruction at once  Each stage maintains a separate clock BasicBlock(BblDescriptor) { foreach uop { simulateFetch(uop); simulateDecode(uop); simulateIssue(uop); simulateExecute(uop); simulateCommit(uop); } }

8

slide-27
SLIDE 27

Core Simulation Technique

 Instruction-driven core activity (basic block) simulation

 Simulates multiple stages for single instruction at once  Each stage maintains a separate clock BasicBlock(BblDescriptor) { foreach uop { simulateFetch(uop); simulateDecode(uop); simulateIssue(uop); simulateExecute(uop); simulateCommit(uop); } } simulateIssue(uop) { addUopToRob(curRobCycle, uop); if(rob.isFull()){ nextRobAvailCycle = rob.advance(); } }

8

slide-28
SLIDE 28

Core Simulation Technique

9

 Event-driven uncore activity simulation

slide-29
SLIDE 29

Core Simulation Technique

9

 Event-driven uncore activity simulation

Request from core

slide-30
SLIDE 30

Core Simulation Technique

9

 Event-driven uncore activity simulation

Cache Tag Acc @50 Request from core

slide-31
SLIDE 31

Core Simulation Technique

9

 Event-driven uncore activity simulation

Cache Tag Acc @50 Cache Miss WB @60 Request from core

slide-32
SLIDE 32

Core Simulation Technique

9

 Event-driven uncore activity simulation

Cache Tag Acc @50 Cache Miss WB @60 Mem Data Read @60 Request from core

slide-33
SLIDE 33

Core Simulation Technique

9

 Event-driven uncore activity simulation

Cache Tag Acc @50 Cache Miss WB @60 Mem Data Read @60 Cache Data Write @160 Request from core

slide-34
SLIDE 34

Core Simulation Technique

9

 Event-driven uncore activity simulation

Cache Tag Acc @50 Cache Miss WB @60 Mem Data Read @60 Cache Data Write @160 Request from core Response to core

slide-35
SLIDE 35

Core Simulation Technique

9

 Event-driven uncore activity simulation

Cache Tag Acc @50 Cache Miss WB @60 Mem Data Read @60 Cache Data Write @160 Request from core Response to core @200 Weave phase

slide-36
SLIDE 36

General Core Structure

10

slide-37
SLIDE 37

General Core Structure

 ZSim simulates a core with 4 functions using Pin’s APIs

 BblFunc  LoadFunc  StoreFunc  BranchFunc

10

slide-38
SLIDE 38

General Core Structure

 ZSim simulates a core with 4 functions using Pin’s APIs

 BblFunc  LoadFunc  StoreFunc  BranchFunc

 Current supported core type

 Simple IPC1 core  Timing core  OOO core (Westmere-like)

10

slide-39
SLIDE 39

Simple IPC1 Core

11

IPC1 Core I1D I1I Current cycle = 0 mov (%rbp),%rcx add %rax,%rbx mov %rdx,(%rbp) ja 40530a

slide-40
SLIDE 40

Simple IPC1 Core

11

IPC1 Core I1D I1I Current cycle = 0 mov (%rbp),%rcx Load(%rbp) add %rax,%rbx mov %rdx,(%rbp) ja 40530a

slide-41
SLIDE 41

Simple IPC1 Core

11

IPC1 Core I1D I1I Current cycle = 0 mov (%rbp),%rcx Load(%rbp) add %rax,%rbx mov %rdx,(%rbp) ja 40530a Current cycle = l1d->load(curCycle)

slide-42
SLIDE 42

Simple IPC1 Core

11

IPC1 Core I1D I1I Current cycle = 0 mov (%rbp),%rcx Load(%rbp) add %rax,%rbx mov %rdx,(%rbp) Store(%rbp) ja 40530a Current cycle = l1d->load(curCycle)

slide-43
SLIDE 43

Simple IPC1 Core

11

IPC1 Core I1D I1I Current cycle = 0 mov (%rbp),%rcx Load(%rbp) add %rax,%rbx mov %rdx,(%rbp) Store(%rbp) ja 40530a Current cycle = l1d->load(curCycle) Current cycle = l1d->store(curCycle)

slide-44
SLIDE 44

Simple IPC1 Core

11

IPC1 Core I1D I1I Current cycle = 0 mov (%rbp),%rcx Load(%rbp) add %rax,%rbx mov %rdx,(%rbp) Store(%rbp) BasicBlock(BblDescriptor) ja 40530a Current cycle = l1d->load(curCycle) Current cycle = l1d->store(curCycle)

slide-45
SLIDE 45

Simple IPC1 Core

11

IPC1 Core I1D I1I Current cycle = 0 mov (%rbp),%rcx Load(%rbp) add %rax,%rbx mov %rdx,(%rbp) Store(%rbp) BasicBlock(BblDescriptor) ja 40530a Current cycle = l1d->load(curCycle) Current cycle = l1d->store(curCycle) Current cycle += 4

slide-46
SLIDE 46

Timing Core

12

IPC1 Core I1D I1I Current cycle = 0 mov (%rbp),%rcx add %rax,%rbx mov %rdx,(%rbp) ja 40530a

slide-47
SLIDE 47

Timing Core

12

IPC1 Core I1D I1I Current cycle = 0 mov (%rbp),%rcx Load(%rbp) add %rax,%rbx mov %rdx,(%rbp) ja 40530a

slide-48
SLIDE 48

Timing Core

12

IPC1 Core I1D I1I Current cycle = 0 mov (%rbp),%rcx Load(%rbp) add %rax,%rbx mov %rdx,(%rbp) ja 40530a Current cycle = l1d->load(curCycle)

slide-49
SLIDE 49

Timing Core

12

IPC1 Core I1D I1I Current cycle = 0 mov (%rbp),%rcx Load(%rbp) add %rax,%rbx mov %rdx,(%rbp) ja 40530a Current cycle = l1d->load(curCycle)

Tag Acc Miss Write back Mem Data Read Data Write Request from core Response to core

slide-50
SLIDE 50

Timing Core

12

IPC1 Core I1D I1I Current cycle = 0 mov (%rbp),%rcx Load(%rbp) add %rax,%rbx mov %rdx,(%rbp) Store(%rbp) ja 40530a Current cycle = l1d->load(curCycle)

Tag Acc Miss Write back Mem Data Read Data Write Request from core Response to core

slide-51
SLIDE 51

Timing Core

12

IPC1 Core I1D I1I Current cycle = 0 mov (%rbp),%rcx Load(%rbp) add %rax,%rbx mov %rdx,(%rbp) Store(%rbp) ja 40530a Current cycle = l1d->load(curCycle) Current cycle = l1d->store(curCycle)

Tag Acc Miss Write back Mem Data Read Data Write Request from core Response to core

slide-52
SLIDE 52

Timing Core

12

IPC1 Core I1D I1I Current cycle = 0 mov (%rbp),%rcx Load(%rbp) add %rax,%rbx mov %rdx,(%rbp) Store(%rbp) ja 40530a Current cycle = l1d->load(curCycle) Current cycle = l1d->store(curCycle)

Tag Acc Miss Write back Mem Data Read Data Write Request from core Response to core

slide-53
SLIDE 53

Timing Core

12

IPC1 Core I1D I1I Current cycle = 0 mov (%rbp),%rcx Load(%rbp) add %rax,%rbx mov %rdx,(%rbp) Store(%rbp) ja 40530a Current cycle = l1d->load(curCycle) Current cycle = l1d->store(curCycle)

Tag Acc Data Write Request from core Tag Acc Miss Write back Mem Data Read Data Write Request from core Response to core

slide-54
SLIDE 54

Timing Core

12

IPC1 Core I1D I1I Current cycle = 0 mov (%rbp),%rcx Load(%rbp) add %rax,%rbx mov %rdx,(%rbp) Store(%rbp) BasicBlock(BblDescriptor) ja 40530a Current cycle = l1d->load(curCycle) Current cycle = l1d->store(curCycle)

Tag Acc Data Write Request from core Tag Acc Miss Write back Mem Data Read Data Write Request from core Response to core

slide-55
SLIDE 55

Timing Core

12

IPC1 Core I1D I1I Current cycle = 0 mov (%rbp),%rcx Load(%rbp) add %rax,%rbx mov %rdx,(%rbp) Store(%rbp) BasicBlock(BblDescriptor) ja 40530a Current cycle = l1d->load(curCycle) Current cycle = l1d->store(curCycle) Current cycle += 4

Tag Acc Data Write Request from core Tag Acc Miss Write back Mem Data Read Data Write Request from core Response to core

slide-56
SLIDE 56

OOO Core - BBL

13

 Simulate all stages at once

Load A Exec Store A Exec

slide-57
SLIDE 57

Fetch

OOO Core - BBL

13

 Simulate all stages at once

Load A Exec Store A Exec Decode Issue OOO Execute Commit

slide-58
SLIDE 58

Fetch

OOO Core - BBL

14

 Simulate all stages at once

Load A Exec Store A Exec Decode Issue OOO Execute Commit

slide-59
SLIDE 59

Fetch

OOO Core - BBL

15

 Simulate all stages at once

Store A Exec Decode Issue OOO Execute Commit Load A Exec

slide-60
SLIDE 60

Fetch

OOO Core - BBL

15

 Simulate all stages at once

Store A Exec Decode Issue OOO Execute Commit Load A Exec

slide-61
SLIDE 61

Fetch

OOO Core - BBL

15

 Simulate all stages at once

Store A Exec Decode Issue OOO Execute Commit Load A Exec

slide-62
SLIDE 62

Fetch

OOO Core - BBL

15

 Simulate all stages at once

Store A Exec Decode Issue OOO Execute Commit Load A Exec

slide-63
SLIDE 63

Fetch

OOO Core - BBL

15

 Simulate all stages at once

Store A Exec Decode Issue OOO Execute Commit Load A Exec

slide-64
SLIDE 64

Fetch

OOO Core - BBL

15

 Simulate all stages at once

Store A Exec Decode Issue OOO Execute Commit Exec

slide-65
SLIDE 65

Fetch

OOO Core - BBL

15

 Simulate all stages at once

Store A Exec Decode Issue OOO Execute Commit Exec

slide-66
SLIDE 66

Fetch

OOO Core - BBL

15

 Simulate all stages at once

Store A Exec Decode Issue OOO Execute Commit Exec

slide-67
SLIDE 67

Fetch

OOO Core - BBL

15

 Simulate all stages at once

Store A Exec Decode Issue OOO Execute Commit Exec

slide-68
SLIDE 68

Fetch

OOO Core - BBL

15

 Simulate all stages at once

Store A Exec Decode Issue OOO Execute Commit Exec

slide-69
SLIDE 69

Fetch

OOO Core - BBL

15

 Simulate all stages at once

Store A Exec Decode Issue OOO Execute Commit

slide-70
SLIDE 70

Fetch

OOO Core - BBL

16

 Simulate all stages at once

Load A Decode Issue OOO Execute Commit

slide-71
SLIDE 71

Fetch

OOO Core - BBL

17

 Simulate all stages at once

Fetch wrong ins Miss prediction Fetch whole bbl Ins Fetch Load A Adjust Fetch clock Decode Issue OOO Execute Commit Fetch cycle

slide-72
SLIDE 72

Fetch

OOO Core - BBL

18

 Simulate all stages at once

Fetch wrong ins Miss prediction Fetch whole bbl Ins Fetch Load A Adjust Fetch clock Decode Issue OOO Execute Commit uop Queue Decode cycle Adjust Decode clock Check next available cycle

slide-73
SLIDE 73

Fetch

OOO Core - BBL

19

 Simulate all stages at once

Fetch wrong ins Miss prediction Fetch whole bbl Ins Fetch Load A Adjust Fetch clock Decode Issue OOO Execute Commit uop Queue Dispatch cycle Adjust Decode clock Check next available cycle Check src available cycle Reg Scoreboard Issue width RegFile width Adjust issue clock Check next avail cycle Rob

slide-74
SLIDE 74

Fetch

OOO Core - BBL

20

 Simulate all stages at once

Fetch wrong ins Miss prediction Fetch whole bbl Ins Fetch Load A Adjust Fetch clock Decode Issue OOO Execute Commit uop Queue Ins Window Commit cycle Adjust Decode clock Check next available cycle Schedule uop in the next cycle that needed ports avail Adjust issue clock LS Unit* Issue Load/Store Check src available cycle Reg Scoreboard Issue width RegFile width Adjust issue clock Check next avail cycle Rob

*Only for load/store

slide-75
SLIDE 75

Fetch

OOO Core - BBL

21

 Simulate all stages at once

Fetch wrong ins Miss prediction Fetch whole bbl Ins Fetch Load A Adjust Fetch clock Decode Issue OOO Execute Commit uop Queue Adjust Decode clock Check next available cycle Adjust issue clock Check src available cycle Reg Scoreboard Issue width RegFile width Adjust issue clock Check next avail cycle Rob Set dst available cycle Reg Scoreboard Retire uop considering rob width Rob Adjust retire clock Ins Window Schedule uop in the next cycle that needed ports avail LS Unit* Issue Load/Store

slide-76
SLIDE 76

OOO Core – Load/Store

22

Simulate MLP Load A Load B

slide-77
SLIDE 77

OOO Core – Load/Store

22

Issue A @ 30 Cache Hit @ 50 Dispatch @ 40 Response @ 70

Simulate MLP Load A Load B

slide-78
SLIDE 78

OOO Core – Load/Store

23

Issue A @ 30 Cache Hit @ 50 Dispatch @ 40 Response @ 70 Issue B @ 50 Cache Miss @ 70 Dispatch @ 60 Response @ 110 Mem Read @ 90 Mem WB @ 110 Cache Write @ 100

Simulate MLP Load A Load B

slide-79
SLIDE 79

OOO Core – Load/Store

23

Issue A @ 30 Cache Hit @ 50 Dispatch @ 40 Response @ 70 Issue B @ 50 Cache Miss @ 70 Dispatch @ 60 Response @ 110 Mem Read @ 90 Mem WB @ 110 Cache Write @ 100

Simulate MLP Load A Load B In weave phase, request B will not be delayed due to contentions for A

slide-80
SLIDE 80

Simulation Speed for Different Core Type

 SPECCPU 2006 suite

24

slide-81
SLIDE 81

Simulation Speed for Different Core Type

 SPECCPU 2006 suite

24

~3X difference between IPC1 and OOO-C in Hmean

slide-82
SLIDE 82

Not Modeled Core Behaviors

25

slide-83
SLIDE 83

Not Modeled Core Behaviors

 Wrong path execution

 Hard to simulate for Pin  Okay to skip for Westmere

25

slide-84
SLIDE 84

Not Modeled Core Behaviors

 Wrong path execution

 Hard to simulate for Pin  Okay to skip for Westmere

 Fine-grained message-passing

 Need significant changes

25

slide-85
SLIDE 85

Not Modeled Core Behaviors

 Wrong path execution

 Hard to simulate for Pin  Okay to skip for Westmere

 Fine-grained message-passing

 Need significant changes

 TLBs and SMT

 Not supported yet

25

slide-86
SLIDE 86

Coding Examples

26

slide-87
SLIDE 87

Coding Examples

 Implement a branch predictor for OOO core

26

slide-88
SLIDE 88

Coding Examples

 Implement a branch predictor for OOO core  Change OOO core type

 From Westmere to Silvermont

26

slide-89
SLIDE 89

Implement Branch Predictors

27

slide-90
SLIDE 90

Implement Branch Predictors

 Have a new branch predictor class

27

slide-91
SLIDE 91

Implement Branch Predictors

 Have a new branch predictor class

class GShareBranchPredictor {

27

slide-92
SLIDE 92

Implement Branch Predictors

 Have a new branch predictor class

class GShareBranchPredictor { private: bool lastSeen; …… }

27

slide-93
SLIDE 93

Implement Branch Predictors

 Have a new branch predictor class

class GShareBranchPredictor { private: bool lastSeen; …… }

 Implement the predict method

27

slide-94
SLIDE 94

Implement Branch Predictors

 Have a new branch predictor class

class GShareBranchPredictor { private: bool lastSeen; …… }

 Implement the predict method

public: // Predicts and updates; returns false if mispredicted inline bool predict(Address branchPc, bool taken) {

27

slide-95
SLIDE 95

Implement Branch Predictors

 Have a new branch predictor class

class GShareBranchPredictor { private: bool lastSeen; …… }

 Implement the predict method

public: // Predicts and updates; returns false if mispredicted inline bool predict(Address branchPc, bool taken) { bool prediction = (taken == lastSeen);

27

slide-96
SLIDE 96

Implement Branch Predictors

 Have a new branch predictor class

class GShareBranchPredictor { private: bool lastSeen; …… }

 Implement the predict method

public: // Predicts and updates; returns false if mispredicted inline bool predict(Address branchPc, bool taken) { bool prediction = (taken == lastSeen); lastSeen = taken;

27

slide-97
SLIDE 97

Implement Branch Predictors

 Have a new branch predictor class

class GShareBranchPredictor { private: bool lastSeen; …… }

 Implement the predict method

public: // Predicts and updates; returns false if mispredicted inline bool predict(Address branchPc, bool taken) { bool prediction = (taken == lastSeen); lastSeen = taken; return prediction; // always predict taken }

27

slide-98
SLIDE 98

Implement Branch Predictors

 Have a new branch predictor class

class GShareBranchPredictor { private: bool lastSeen; …… }

 Implement the predict method

public: // Predicts and updates; returns false if mispredicted inline bool predict(Address branchPc, bool taken) { bool prediction = (taken == lastSeen); lastSeen = taken; return prediction; // always predict taken }

 Replace the branch predictor in ooo_core.h

27

slide-99
SLIDE 99

Implement Branch Predictors

 Have a new branch predictor class

class GShareBranchPredictor { private: bool lastSeen; …… }

 Implement the predict method

public: // Predicts and updates; returns false if mispredicted inline bool predict(Address branchPc, bool taken) { bool prediction = (taken == lastSeen); lastSeen = taken; return prediction; // always predict taken }

 Replace the branch predictor in ooo_core.h

//BranchPredictorPAg<11, 18, 14> branchPred; GSharePredictor branchPred;

27

slide-100
SLIDE 100

Demo

28

slide-101
SLIDE 101

Different OOO Micro-architecture

29

slide-102
SLIDE 102

Different OOO Micro-architecture

 The original zsim assumes Westmere OOO core, but what

if I want to simulate a Silvermont/Haswell OOO core?

29

slide-103
SLIDE 103

Different OOO Micro-architecture

 The original zsim assumes Westmere OOO core, but what

if I want to simulate a Silvermont/Haswell OOO core?

 Step 1: obtain the important ooo core parameters

29

slide-104
SLIDE 104

Different OOO Micro-architecture

 The original zsim assumes Westmere OOO core, but what

if I want to simulate a Silvermont/Haswell OOO core?

 Step 1: obtain the important ooo core parameters  Step 2: change the core parameters in ooo_core.h/cpp

29

slide-105
SLIDE 105

Different OOO Micro-architecture

 The original zsim assumes Westmere OOO core, but what

if I want to simulate a Silvermont/Haswell OOO core?

 Step 1: obtain the important ooo core parameters  Step 2: change the core parameters in ooo_core.h/cpp  Step 3: verify it against real system

29

slide-106
SLIDE 106

Obtain Important Core Parameters

 [1] http://www.realworldtech.com/nehalem/  [2] http://www.realworldtech.com/silvermont/

Westmere[1] Silvermont[2] Issue width 4 2 F/D/I/E stages 1/4/7/13 1/3/5/8 Fetch width 16B 8B RF read width 3 2 ROB size 128 32 Ins window 1K * 36 1K * 16 Issue queue 28 8 30

slide-107
SLIDE 107

Change OOO Core Parameters

31

slide-108
SLIDE 108

Change OOO Core Parameters

 Change sizes of hardware structures in ooo_core.h

31

slide-109
SLIDE 109

Change OOO Core Parameters

 Change sizes of hardware structures in ooo_core.h

 CycleQueue<28> uopQueue

  • > <8>

 ReorderBuffer<128, 4> rob

  • > <32, 2>

31

slide-110
SLIDE 110

Change OOO Core Parameters

 Change sizes of hardware structures in ooo_core.h

 CycleQueue<28> uopQueue

  • > <8>

 ReorderBuffer<128, 4> rob

  • > <32, 2>

 Change the ooo core parameter in ooo_core.cpp

31

slide-111
SLIDE 111

Change OOO Core Parameters

 Change sizes of hardware structures in ooo_core.h

 CycleQueue<28> uopQueue

  • > <8>

 ReorderBuffer<128, 4> rob

  • > <32, 2>

 Change the ooo core parameter in ooo_core.cpp

 #define FETCH_STAGE 1 -> 1  #define DECODE_STAGE 4 -> 3  #define ISSUE_STAGE 7 -> 5  #define DISPATCH_STAGE 13 -> 8  #define FETCH_BYTES_PER_CYCLE 16 -> 8  #define ISSUES_PER_CYCLE 4 -> 2  #define RF_READS_PER_CYCLE 3 -> 2

31

slide-112
SLIDE 112

Demo

32

slide-113
SLIDE 113

Verify It Against Real System

33

 IPC traces for Westmere and Silvermont

Westmere (6% performance difference) Silvermont (9% performance difference)

slide-114
SLIDE 114

Summary

34

slide-115
SLIDE 115

Summary

 ZSim uses instruction-driven simulation for core activities

and event-driven simulation for uncore activities

34

slide-116
SLIDE 116

Summary

 ZSim uses instruction-driven simulation for core activities and

event-driven simulation for uncore activities

 ZSim currently supports 3 types of core

 Simple IPC1 core (simple_core.h)  Timing core (timing_core.h)  Westmere-like OOO core (ooo_core.h)

34

slide-117
SLIDE 117

Summary

 ZSim uses instruction-driven simulation for core activities and

event-driven simulation for uncore activities

 ZSim currently supports 3 types of core

 Simple IPC1 core (simple_core.h)  Timing core (timing_core.h)  Westmere-like OOO core (ooo_core.h)

 Extending zsim core model is straightforward

 Modify 4 basic analysis routines  Substitute the hardware structure with your implementation  Change the parameters in OOO

34

slide-118
SLIDE 118

Hacking Advice

 As common Pin programming, functions in the core are

very frequently called in zsim

 You should be aware of performance when coding  It’s the main reason why zsim statically allocates hardware

structures and set ooo parameters

35

slide-119
SLIDE 119

Thank you!

Any questions?

36

slide-120
SLIDE 120

Break / Q&A

Try zsim now! https://zsim.csail.mit.edu

37