Debugging Distributed-Shared-Memory Communication at Multiple - - PowerPoint PPT Presentation

debugging distributed shared memory communication at
SMART_READER_LITE
LIVE PREVIEW

Debugging Distributed-Shared-Memory Communication at Multiple - - PowerPoint PPT Presentation

Debugging Distributed-Shared-Memory Communication at Multiple Granularities in Networks on Chip Bart Vermeulen 1 Kees Goossens 1,2 Siddharth Umrani 2 1 Research, NXP Semiconductors 2 Computer Engineering, Delft University of Technology 2


slide-1
SLIDE 1

Debugging Distributed-Shared-Memory Communication at Multiple Granularities in Networks on Chip

Bart Vermeulen 1 Kees Goossens 1,2 Siddharth Umrani 2

1 Research, NXP Semiconductors 2 Computer Engineering, Delft University of Technology

slide-2
SLIDE 2

2 2008-04-08 NOCS

  • verview

transaction-based communication-centric debug traditional debug architecture & flow and NOC architecture – distributed shared memory (DSM) – communication model new debug architecture & flow and NOC architecture – debug granularity, DCI, TPR, EDI, FSM, TAP, API example conclusions

slide-3
SLIDE 3

3 2008-04-08 NOCS

debug is…

error localisation when a chip does not work in its intended application difficult due to limited visibility

  • f the internal behaviour

debugging first silicon uses >50% of project time unpredictable negative impact on – time to market – brand image

slide-4
SLIDE 4

4 2008-04-08 NOCS

communication-centric debug

processor debug is mature system debug complexity resides in the interactions between IP blocks – multi-processor debug is a challenge

  • lder interconnects serialised all transactions

– a unique global communication trace latest interconnects allow split, pipelined, concurrent transactions – no unique communication trace

ML-AHB AXI NOC CPU T T mem mem CPU T T B B B B

slide-5
SLIDE 5

5 2008-04-08 NOCS

communication-centric debug

traditional processor-centric debug focusses on control of the IP (computation) interconnect is the locus of all IP interactions we propose to focus debug on the interactions between IPs through control of the interconnect (communication)

IP interconnect IP monitor monitor debug control monitor IP interconnect IP monitor monitor debug control

slide-6
SLIDE 6

2008-04-08 NOCS

transactions

transaction request & response valid/accept handshake – signal groups – data words (elements) communication types – peer-to-peer streaming – distributed shared memory master slave master slave 0x00-0x1F slave 0x20-0xFF

cmd_valid cmd_accept cmd_read cmd_addr cmd_block_size wr_valid wr_accept wr_data wr_last rd_valid rd_accept rd_data rd_last

initiator target

command write data read data

slide-7
SLIDE 7

7 2008-04-08 NOCS

communication & debug granularities

coarser granularity clock cycle message element (write or read data element) (flit) message (request or response) transaction (request and response) channel (request or response between a master and 1 slave) connection (requests and response channels between a master and all its slaves)

finest grain that is based on handshake finest grain that is based on transactions required for distributed shared memory

slide-8
SLIDE 8

debug flow

Start Program Breakpoint(s) Optional Functional Reset Switch to Debug Mode Inspect System State Switch to Functional Mode Finish

N Y N Y

Monitor(s) hit breakpoint? New run or continue? Quiescent State?

Y

Force Stop?

N Y N

Distribute event

2008-04-08 NOCS

slide-9
SLIDE 9

9 2008-04-08 NOCS

conventional master network interface

NI shell FSM implements – protocol (de)serialisation (s) – distributed address map (d) – request/response ordering (i) – width conversion (not shown)

cmd rdata wdata

port port port

req1 resp1

port port

req1 resp1

port narrowcast (multi-slave master) NI shell NI kernel s d i master IP FSM per-channel QoS

transactions messages (peer-to-peer streaming data) packets

NI kernel FSM implements – per-channel QoS – (de)packetisation

slide-10
SLIDE 10

10 2008-04-08 NOCS

conventional slave network interface

converse for slave shell

transactions messages (peer-to-peer streaming data) packets

cmd rdata wdata

port

req1 resp1

port port

req1 resp1

port multi-master slave NI shell NI kernel s i d slave IP FSM per-channel QoS port port

slide-11
SLIDE 11

11 2008-04-08 NOCS NI4 Router R00 Master IP port 1 NI 1

accept valid request

FSM

Slave IP port 2 Master IP port 2 NI 3

FSM

accept valid request

NI 2

FSM FSM

SOC architecture

Slave IP port 1

slide-12
SLIDE 12

12 2008-04-08 NOCS NI 2 NI4 Router R00 monitor Master IP port 1 NI 1

accept valid request

FSM

Slave IP port 2 Master IP port 2 NI 3

FSM

accept valid request

FSM FSM

debug architecture: monitors

EDI node

EDI

FSM

Slave IP port 1

EDI distributed events from monitors to NI shells (and IP)

slide-13
SLIDE 13

13 2008-04-08 NOCS

EDI node FSM

wait send idle more? reset / 0 event / 1

  • / 0
  • / 0

event / 1 event / 0

slide-14
SLIDE 14

14 2008-04-08 NOCS NI4 Router R00 monitor EDI node

EDI

Master IP port 1 NI 1

accept valid request

FSM

Slave IP port 2 Master IP port 2 NI 3

FSM

accept valid request

TPR FSM TPR TPR

NI 2

FSM FSM TPR TPR

debug architecture: test point registers (TPR)

Slave IP port 1

debug behaviour is controlled by TPRs

slide-15
SLIDE 15

test point registers (TPR)

control debug behaviour – link monitors: which conditions to monitor – NI shells: how to react to incoming events per channel

  • perate on test clock

W+2 Condition Triggered? Enable monitor TPR W = width of data (and control) on monitored link.

Resp. channels

Condition

Request channels Resp. channels

IP_stop 10N+1

Resp. channels

Continue

Request channels

Quiescent?

Request channels Resp. channels

Granularity

Request channels Resp. channels

Enable

Request channels

NI FSM TPR N = Number of Request channels = Number of Response channels.

2008-04-08 NOCS

slide-16
SLIDE 16

16 2008-04-08 NOCS s2

NI shell FSM

stop conditions (s2, s6) –

  • riginal_condition and stop_enable and (stop or stop_condition)

modified transitions (f2’, f6’, d7’) –

  • riginal_condition and not (stop_enable and (stop or stop_condition))

continue conditions (c2, c6, c7) –

  • riginal_condition and continue

protocol serialisation can now be stopped & resumed general recipe for different protocols idle wdata accpt cmd dec cmd accpt read

reset

cmd dec’ wdata accpt’

s6 c6 c2 f2’ f1 f6’ f7’ f3 f5 f4 c7 NI

FSM TPR

slide-17
SLIDE 17

17 2008-04-08 NOCS PC NI4 Router R00 monitor EDI node Debug Control Interconnect (DCI) TAP controller

EDI

Master IP port 1 NI 1

accept valid request

FSM

IEEE 1149.1 TAP

Slave IP port 2 Master IP port 2 NI 3

FSM

accept valid request

TPR FSM TPR TPR

NI 2

FSM FSM TPR TPR

Device under Debug (SOC) debugger SW

debug architecture: debug control interconnect

Slave IP port 1

TPRs are controlled by DCI (dedicated asynchronous scan chain)

slide-18
SLIDE 18

18 2008-04-08 NOCS PC NI4 Router R00 monitor EDI node Debug Control Interconnect (DCI) TAP controller

EDI

Master IP port 1 NI 1

accept valid request

FSM

IEEE 1149.1 TAP

Slave IP port 2 Master IP port 2 NI 3

FSM

accept valid request

TPR FSM TPR TPR

NI 2

FSM FSM TPR TPR

Debug Data Interconnect (DDI) Device under Debug (SOC) debugger SW

debug architecture: scan chains, clock control, etc.

Slave IP port 1

down/upload functional state using DDI (scan chains for structural test)

slide-19
SLIDE 19

19 2008-04-08 NOCS

debug architecture: software control API

the debug architecture is controlled using IEEE1149.1 test access port from a PC running debug software basically can down/upload system state, on the test clock separate scan chains for debug control/status and functional state – can modify debug state independently from functional state, and during functional mode “high-level” functions to get/set debug state

– reset – set_bp_monitor <condition> – set_bp_action <channel> <granularity> <condition> – get_mon_status <monitor> – get_ni_status <ni> – continue: set continue bits in NI TPRs – synchronise: down/upload entire SOC state

slide-20
SLIDE 20

20 2008-04-08 NOCS

example

while the system is running in functional mode set breakpoint on value 378 in link monitor make channel between master 1 & slave 2 sensitive to events (A)

M1 S1 M2 S2 A NI_stop_enable

slide-21
SLIDE 21

21 2008-04-08 NOCS

example

while polling the monitor after a number of transactions (B) it triggers and the NI receives a stop event (C) NI completes ongoing message & ignores next request (D)

C B D NI_stop_in M1_cmd_valid

slide-22
SLIDE 22

22 2008-04-08 NOCS

example

after checking that there are no transactions in flight program NI to single-step mode with message granularity (E) and continue (F) the NI accepts a single write request (G) and continue again (read request, H)

NI_stop_condition NI_continue M1_cmd_accept G E F H

slide-23
SLIDE 23

23 2008-04-08 NOCS

example

change debug granularity to word (data element) (I) and continue 5 times –

  • ne command and four data handshakes (J, K)

M1_cmd_accept J K I M1_data_accept NI_stop_granularity

slide-24
SLIDE 24

24 2008-04-08 NOCS

example

change debug sensitivity to EDI only (i.e. no single stepping) (L) communication resumes at full speed after continue pulse (M) all this time, the rest of the system could have been in functional mode

NI_continue NI_stop_condition L M

slide-25
SLIDE 25

25 2008-04-08 NOCS

conclusions

debug scope – per channel (master-slave pair) – per connection (master with all its slaves) debug granularity – data words (equivalently: valid/accept handshake) – request/response – transaction all channels can be debugged or not, at any granularity, independently required for distributed-shared memory debugging debug architecture – re-uses existing functional & test infrastructures (e.g. scan chains) – simple programmable building blocks (monitors, TPRs) – general recipe to modify functional NI shell FSM for debug – very basic software API

slide-26
SLIDE 26