Gem5 in a nutshell Christophe Huriaux, Post-doc Inria, IRISA CAIRN - - PowerPoint PPT Presentation

gem5 in a nutshell
SMART_READER_LITE
LIVE PREVIEW

Gem5 in a nutshell Christophe Huriaux, Post-doc Inria, IRISA CAIRN - - PowerPoint PPT Presentation

Gem5 in a nutshell Christophe Huriaux, Post-doc Inria, IRISA CAIRN Project-Team CAIRN project-team SAV 2016 June 30th-July 1st 2016 - 1 1 Spoiler alert Not a research report ... but a (quick) overview of how Gem5 works and


slide-1
SLIDE 1

1

Gem5 in a nutshell

Christophe Huriaux, Post-doc

Inria, IRISA — CAIRN Project-Team

CAIRN project-team — SAV 2016 June 30th-July 1st 2016 - 1

slide-2
SLIDE 2

2

Spoiler alert

§ Not a research report… § ... but a (quick) overview of how Gem5 works and what it can offer

June 30th-July 1st 2016 CAIRN project-team — SAV 2016

  • 2
slide-3
SLIDE 3

3

Outline

§ Introduction

§ What is Gem5 useful for ?

(or rather: what you should not use it for)

§ Overview of the system simulator

§ Simulation modes § Behind the scene of a simulation § What’s under the hood ? § Memory system

§ Running example § Conclusion

June 30th-July 1st 2016 CAIRN project-team — SAV 2016

  • 3
slide-4
SLIDE 4

4

Introduction

§ Gem5 is the fusion of two projects

§ GEMS : simulation of multi-processor systems § M5 : simulation of networked systems

§ System simulator

§ Accurate simulation of complex components interactions (OS / CPU / Caches / Devices / …)

§ Accuracy depends on the model completeness

§ All-in-one simulation framework

§ Don’t rely on other software

§ But we can plug them in easily…

§ Lot of components available out-of-the-box

§ (CPUs, memories, I/Os, …)

June 30th-July 1st 2016 CAIRN project-team — SAV 2016

  • 4
slide-5
SLIDE 5

5

What is Gem5 useful for ?

June 30th-July 1st 2016 CAIRN project-team — SAV 2016

  • 5

Architectural exploration ? Yes ! 👎

Gem5 provides a fast and easy framework to interconnect hardware components and evaluate them !

Hardware/software performance evaluation ? Yes ! 👎

Gem5 have a good support of various ISA and allows for realistic HW/SW performance evaluation.

slide-6
SLIDE 6

6

What is Gem5 useful for ?

June 30th-July 1st 2016 CAIRN project-team — SAV 2016

  • 6

Hardware/software verification ? No … 👏

RTL functional verification is much more mature and accurate !

Software development and verification ? Ugh.. Please stop! 👏 👏 👏

Faster technologies are available through binary-translation (e.g. QEMU, OVP)

slide-7
SLIDE 7

7

Simulation modes

§ Full-system (FS)

§ Models bare-metal hardware

§ Includes the various specified devices, caches, …

§ Boots an entire OS from scratch

§ Gem5 can boot Linux (several variants) or Android out-

  • f-the-box

§ Syscall Emulation (SE)

§ Runs a single static application § System calls are emulated or forwarded to the host OS § Lot of simplifications (address translation, scheduling, no pthread …)

June 30th-July 1st 2016 CAIRN project-team — SAV 2016

  • 7
slide-8
SLIDE 8

8

Behind the scene of a simulation

June 30th-July 1st 2016 CAIRN project-team — SAV 2016

  • 8

Collection of components Simulator internals

C++ C++ / Python

Gem5 binary

Compilation of the simulator

slide-9
SLIDE 9

9

Behind the scene of a simulation

June 30th-July 1st 2016 CAIRN project-team — SAV 2016

  • 9

Python script instanciating the component hierarchy and defining simulation parameters

Gem5 binary

Simulation !

Python interpreter

Collection of component interfaces Assembled C++

  • bjects

Simulation

Output

(statistics, traces…)

slide-10
SLIDE 10

10

What’s under the hood ?

June 30th-July 1st 2016 CAIRN project-team — SAV 2016

  • 10

Collection of components

C++ / Python

.cc .py

1 component = 1 simulation object C++ functional model

(for simulation)

Python interface

(for instanciation)

+

§ SimObjects follow a strict C++ class hierarchy for easier extension with code reuse

SimObject BaseCPU BaseTimingCPU BaseO3CPU

… … Simulation objects

ClockedObject

slide-11
SLIDE 11

11

What’s under the hood ?

§ Gem5 is event-driven

§ Discrete event timing model § Not related to real time whatsoever § The real time duration of 1 tick can be user- defined

§ Simulation objects schedule events for the next cycle of after a specific time elapsed

§ The Gem5 simulation scheduler takes care of the rest !

June 30th-July 1st 2016 CAIRN project-team — SAV 2016

  • 11

Events

slide-12
SLIDE 12

12

What’s under the hood ?

§ Memory ports are present on every MemObject

§ They model physical memory connections § You interconnect them during the hierarchy instanciation

§ E.g. a CPU data bus to a L1 cache

§ Work by pairs: 1 master port always connect to 1 slave port § Data is exchanged atomically as packets

June 30th-July 1st 2016 CAIRN project-team — SAV 2016

  • 12

Memory ports CPU

Inst. L1$ Data L1$ X bar DDR3 Flash

M M M M M M S S S S S S

slide-13
SLIDE 13

13

What’s under the hood ?

§ 3 types of transport interfaces for packets

§ Functional

§ Instantaneous in a single function call § Caches and memories are updated automagically at

  • nce

§ Atomic

§ Instantaneous § Memory model updated (caches, coherence …) § Approximate latency, but no contention nor delay

§ Timing

§ Transaction split into multiple phases § Models all timing in the memory system

§ The transport interface depends on the SimObject implementation

June 30th-July 1st 2016 CAIRN project-team — SAV 2016

  • 13

Memory ports

slide-14
SLIDE 14

14

Memory system

§ Models a system running heterogeneous applications… § … running on heterogeneous processing tiles § ... using heterogeneous memories and interconnect

June 30th-July 1st 2016 CAIRN project-team — SAV 2016

  • 14

One memory system to rule them all… CPU CPU CPU CPU CPU GPU CPU CPU Accelerators DDR3 SRAM Flash

Interconnect

slide-15
SLIDE 15

15

Memory system

§ Two memory systems in Gem5

§ Classic

§ All components instanciated in a hierarchy along with

CPUs, etc.

§ MOESI coherence protocol only

§ Ruby

§ Detailed simulation model of various cache hierarchies § Various cache coherence protocols (MESI, MOESI, …) § Interconnection networks

§ Classic is faster but less detailed

June 30th-July 1st 2016 CAIRN project-team — SAV 2016

  • 15

…and in the simulation, interconnect them

slide-16
SLIDE 16

16

Running example

§ FFT kernel from the SPLASH2 benchmark suite § ARM Instruction Set Architecture § 1 or 4 out-of-order detailed CPUs § Caches

§ L1: 64kb data, 32kb instruction § L2: 2Mb, shared

June 30th-July 1st 2016 CAIRN project-team — SAV 2016

  • 16

Is my FFT faster with more processors ? Hardware/software performance evaluation ! 👎

Let’s evaluate !

slide-17
SLIDE 17

17

Conclusion

§ Quick introduction to Gem5 § Much more things to explore in the framework !

§ Integration of power models in development § Memory / CPU traces generation § Statistics output for performance evaluation § SystemC co-simulation § Automatic benchmark run § Checkpointing, fast-forwarding

June 30th-July 1st 2016 CAIRN project-team — SAV 2016

  • 17
slide-18
SLIDE 18

18

Thank you for your attention J

CAIRN project-team — SAV 2016 June 30th-July 1st 2016 - 18