A Monitoring System for NoCs Leandro Fiorin ALaRI, Faculty of - - PowerPoint PPT Presentation

a monitoring system for nocs
SMART_READER_LITE
LIVE PREVIEW

A Monitoring System for NoCs Leandro Fiorin ALaRI, Faculty of - - PowerPoint PPT Presentation

ALaRI Advanced Learning and Research Institute A Monitoring System for NoCs Leandro Fiorin ALaRI, Faculty of Informatics - University of Lugano Lugano, Switzerland Gianluca Palermo, Cristina Silvano Politecnico di Milano - DEI Milano,


slide-1
SLIDE 1

ALaRI – Advanced Learning and Research Institute

A Monitoring System for NoCs

ALaRI - University of Lugano 1/24 11/04/10

NoCArc'10 – December 4, 2010, Atlanta, Georgia, USA

Leandro Fiorin ALaRI, Faculty of Informatics - University of Lugano Lugano, Switzerland Gianluca Palermo, Cristina Silvano Politecnico di Milano - DEI Milano, Italy

slide-2
SLIDE 2

ALaRI – Advanced Learning and Research Institute

Outline

 Motivations  Monitoring in NoCs  Our contributions  Event categories  Monitoring architecture – Programmable probes – Data management & collection  Experimental results  Conclusions and future work

ALaRI - University of Lugano 2/24 11/04/10

slide-3
SLIDE 3

ALaRI – Advanced Learning and Research Institute

Motivations

 Next generation MPSoC platforms will integrate a large number of processing cores, storage elements, and I/O peripherals, interconnected by NoCs  A high number of complex concurrent applications will share available resources providing users with new services and functionalities  Platform-based design will allow to reduce the cost per single item by giving the system the possibility to easily adapt to different application requirements

ALaRI - University of Lugano 3/24 11/04/10

slide-4
SLIDE 4

ALaRI – Advanced Learning and Research Institute

Motivations

ALaRI - University of Lugano

New tools are needed for helping designers in these tasks, exploiting information derived by measurements taken on the running system How can we exploit efficiently available resources? How can we understand system behaviour?

4/24 11/04/10

slide-5
SLIDE 5

ALaRI – Advanced Learning and Research Institute

Motivations

ALaRI - University of Lugano

Taken from “Philip J. Mucci, Hardware Performance Analysis on the Opteron with PAPI ClusterWorld 2004, San Jose, CA“

 Modern, high performance processors use dedicated on-chip hardware event detectors and counters  Performance Counters are hw registers dedicated to counting events within the processor or system  Each register has an associated control register that tells it what to count and how to do it

5/24 11/04/10

slide-6
SLIDE 6

ALaRI – Advanced Learning and Research Institute

Monitoring in NoCs

ALaRI - University of Lugano

 NoCs monitoring was proposed for: Debugging

[4] C. Ciordas,et al. An Event-Based Monitoring Service for Networks on Chip. ACM Trans. on Design Automation of Electronic Systems, 10(4):702–723, Oct. 2005. [5] C. Ciordas,et al. NoC Monitoring: Impact on the Desing Flow. In Proc. of ISCAS ’06, 2006.

Testing

[15] S. Tang and Q. Xu. A multi-core debug platform for noc-based systems. In Proc. of DATE’07, 2007.

Detecting congestion

[16] J. van Den Brand, et al. Congestion-Controlled Best-Effort Communication for Networks-on-Chip. In Proc. of DATE ’07, 2007.

Platform run-time management

[10] V. Nollet, at al. Run-Time Management of a MPSoC Containing FPGA Fabric Tiles. IEEE Trans. on VLSI Systems, 16(1):24–33, January 2008.

Security

[7] L. Fiorin, at al. Security Aspects in Networks-on-Chips: Overview and Proposals for Secure Implementations. In Proc. of DSD’07, 2007 6/24 11/04/10

slide-7
SLIDE 7

ALaRI – Advanced Learning and Research Institute

Our contributions

 to perform a comprehensive study of the most common events in NoCs  to propose the utilization of a multipurpose programmable monitoring probe  to propose and discuss an efficient and automatic collection and storage of the information related to the events detected, and to evaluate the intrusiveness of the components and activities of the monitoring system  we propose an architecture for a monitoring system for NoCs  while mainly focusing on performance tuning, the system could be easily adapted to provide information useful for debugging, run-time management of system resources, and security

ALaRI - University of Lugano 7/24 11/04/10

slide-8
SLIDE 8

ALaRI – Advanced Learning and Research Institute

Event categories

 We focus on events of cores and NoC resources related to the communication system – Throughput characterization – Timing and Latency – Resources utilization – NoC Events and Messages characteristics

ALaRI - University of Lugano 8/24 11/04/10

slide-9
SLIDE 9

ALaRI – Advanced Learning and Research Institute

Monitoring architecture

ALaRI - University of Lugano

 Probes  Probes Management Unit  Data collection and storage

9/24 11/04/10

slide-10
SLIDE 10

ALaRI – Advanced Learning and Research Institute

Programmable probe

ALaRI - University of Lugano

 Event detector  Accumulator  Preprocessing modules  Configuration registers  Message generator  Output queue

10/24 11/04/10

slide-11
SLIDE 11

ALaRI – Advanced Learning and Research Institute

Event detectors

ALaRI - University of Lugano

 The event detector observe OCP/IP, and NI, and router signals and monitor events selected by the configuration registers  We use a programmable multipurpose probe, able to monitor all the events of the system  Depending on the area budget, several multipurpose probes can be deployed for each NI  Event detectors operate in parallel with NI kernel, not interfering with its operations (not intrusiveness)

11/24 11/04/10

slide-12
SLIDE 12

ALaRI – Advanced Learning and Research Institute

Event detectors

ALaRI - University of Lugano

 Throughput detector  Keeps track of incoming/outgoing traffic  Choice of connections  Choice of period of collection  Timing/Latency detector  Measures time proprieties of transactions  Different types of measurements: I2I, I2T, EXEC, T2I  Collaboration between probes at initiator and target  Collection for different transactions and connections

12/24 11/04/10

slide-13
SLIDE 13

ALaRI – Advanced Learning and Research Institute

Event detectors

ALaRI - University of Lugano

 Resources utilization detector – Monitors status of internal queue of NI and router  Message characteristics detector – Detects user configuration events – NoC configuration events

13/24 11/04/10

slide-14
SLIDE 14

ALaRI – Advanced Learning and Research Institute

Data Preprocessing

ALaRI - University of Lugano

 We implement the possibility to pre-process data for reducing traffic  Time windows – Messages sent at the end of time window – Generated using 32 bit counter  Threshold – Messages generated only if >, <, =, =>, =< of threshold value – Only critical information is sent  Average calculation – Values of samples are collected during the execution, together with number of

  • ccurrences

– Values sent at the end of collection

14/24 11/04/10

slide-15
SLIDE 15

ALaRI – Advanced Learning and Research Institute

Message generator e Probes configuration

ALaRI - University of Lugano

 The Message generator creates packets to be sent to the PMU  Data collection triggered at the end of the time frame or for occurrence

  • f events

 It acts as initiator, writing in memory address associated to during configuration  Possibility to aggregate data, reducing traffic generate of up to 92%  Configuration registers are memory mapped to the PMU  PMU keeps track of all the configurations

15/24 11/04/10

slide-16
SLIDE 16

ALaRI – Advanced Learning and Research Institute

Data Management

ALaRI - University of Lugano

 Intrusiveness of monitoring system should be limited in collection and storage  We performed an analysis of bandwidth needed by each probe

16/24 11/04/10

slide-17
SLIDE 17

ALaRI – Advanced Learning and Research Institute

Data collection

ALaRI - University of Lugano

 PMU Local memory – Used for event generating a limited number of messages for execution – Fast access to information – Local storage important for analysis of run-time system behaviour and adaptive systems  Streaming memory – For data exceeding allocated space in PMU local memory, and for data with unknown dimension – All the message packet is stored, and retrieved when elaborated

17/24 11/04/10

slide-18
SLIDE 18

ALaRI – Advanced Learning and Research Institute

Probe Management Unit

ALaRI - University of Lugano

 Programs the configuration registers (before execution)  Retrieves and elaborates collected data (after execution)  These tasks can be implemented as software routines (no overhead associated)  For run-time management, a third task should be active during execution in order to implementing adaptivity based of the information detected

18/24 11/04/10

slide-19
SLIDE 19

ALaRI – Advanced Learning and Research Institute

Experimental results

ALaRI - University of Lugano

 We implemented the monitoring system and synthesized with Synopsys, using a 0.13um technology library, and targeting 500MHz  Adding reconfigurability (multipurpose) costs around 13%  For 4 multipurpose probes, we save around 73% with respect to complete monitoring system  4 probes counts for around 35% of area NI (buffers long 8)+router (buffers long 4) (generated for architecture with 10 initiator and 1 target)  Overhead with regard to NoC elements for 4 probes is around 55%, while if we consider also a typical embedded processor (ARM920T), it is 3%

19/24 11/04/10

slide-20
SLIDE 20

ALaRI – Advanced Learning and Research Institute

Experiments

 We implement a cycle accurate simulator in SystemC, where generation of transactions is driven by memory requests generated by the application  As use case, we considered a ray tracing application  We consider an architecture with 10 initiators and one L2 shared memory, mapped on a 4 x 3 mesh (L2 in [1,1])  4 probes for each tile  We measure: – total traffic from each initiator – Average I2I latency – Number of times I2I over threshold (95 cycles) – Throughput generated in each time window

ALaRI - University of Lugano 20/24 11/04/10

slide-21
SLIDE 21

ALaRI – Advanced Learning and Research Institute

Experiments

ALaRI - University of Lugano 21/24 11/04/10

slide-22
SLIDE 22

ALaRI – Advanced Learning and Research Institute

ALaRI - University of Lugano ALaRI - University of Lugano

Experiments

 By using the threshold functionality the number of messages is reduced in average of the 91%  The traffic generated by the probes is around the 5% of the traffic generated by the application, and 0.2% of the link bandwidth  Assumption about non intrusiveness verified in the experiment

ALaRI - University of Lugano 22/24 11/04/10

slide-23
SLIDE 23

ALaRI – Advanced Learning and Research Institute

ALaRI - University of Lugano

Conclusions and future work

 We approached the problem of monitoring NoC based systems  We performed a comprehensive study on the type of events  We propose a programmable multipurpose probe able to detect a large number of events, with a relatively small overhead Future work will focus on the analysis of other possible monitoring functionalities, to the exploration of alternative for the data collection, and the implementation of tools for management of configuration registers and analysis of collected data

23/24 11/04/10

slide-24
SLIDE 24

ALaRI – Advanced Learning and Research Institute

ALaRI - University of Lugano

Thanks for your attention!

fiorin@alari.ch

24/24 11/04/10