SLIDE 1 Debugging Distributed-Shared-Memory Communication at Multiple Granularities in Networks on Chip
Bart Vermeulen 1 Kees Goossens 1,2 Siddharth Umrani 2
1 Research, NXP Semiconductors 2 Computer Engineering, Delft University of Technology
SLIDE 2 2 2008-04-08 NOCS
transaction-based communication-centric debug traditional debug architecture & flow and NOC architecture – distributed shared memory (DSM) – communication model new debug architecture & flow and NOC architecture – debug granularity, DCI, TPR, EDI, FSM, TAP, API example conclusions
SLIDE 3 3 2008-04-08 NOCS
debug is…
error localisation when a chip does not work in its intended application difficult due to limited visibility
debugging first silicon uses >50% of project time unpredictable negative impact on – time to market – brand image
SLIDE 4 4 2008-04-08 NOCS
communication-centric debug
processor debug is mature system debug complexity resides in the interactions between IP blocks – multi-processor debug is a challenge
- lder interconnects serialised all transactions
– a unique global communication trace latest interconnects allow split, pipelined, concurrent transactions – no unique communication trace
ML-AHB AXI NOC CPU T T mem mem CPU T T B B B B
SLIDE 5
5 2008-04-08 NOCS
communication-centric debug
traditional processor-centric debug focusses on control of the IP (computation) interconnect is the locus of all IP interactions we propose to focus debug on the interactions between IPs through control of the interconnect (communication)
IP interconnect IP monitor monitor debug control monitor IP interconnect IP monitor monitor debug control
SLIDE 6 2008-04-08 NOCS
transactions
transaction request & response valid/accept handshake – signal groups – data words (elements) communication types – peer-to-peer streaming – distributed shared memory master slave master slave 0x00-0x1F slave 0x20-0xFF
cmd_valid cmd_accept cmd_read cmd_addr cmd_block_size wr_valid wr_accept wr_data wr_last rd_valid rd_accept rd_data rd_last
initiator target
command write data read data
SLIDE 7 7 2008-04-08 NOCS
communication & debug granularities
coarser granularity clock cycle message element (write or read data element) (flit) message (request or response) transaction (request and response) channel (request or response between a master and 1 slave) connection (requests and response channels between a master and all its slaves)
finest grain that is based on handshake finest grain that is based on transactions required for distributed shared memory
SLIDE 8 debug flow
Start Program Breakpoint(s) Optional Functional Reset Switch to Debug Mode Inspect System State Switch to Functional Mode Finish
N Y N Y
Monitor(s) hit breakpoint? New run or continue? Quiescent State?
Y
Force Stop?
N Y N
Distribute event
2008-04-08 NOCS
SLIDE 9 9 2008-04-08 NOCS
conventional master network interface
NI shell FSM implements – protocol (de)serialisation (s) – distributed address map (d) – request/response ordering (i) – width conversion (not shown)
cmd rdata wdata
port port port
req1 resp1
port port
req1 resp1
port narrowcast (multi-slave master) NI shell NI kernel s d i master IP FSM per-channel QoS
transactions messages (peer-to-peer streaming data) packets
NI kernel FSM implements – per-channel QoS – (de)packetisation
SLIDE 10 10 2008-04-08 NOCS
conventional slave network interface
converse for slave shell
transactions messages (peer-to-peer streaming data) packets
cmd rdata wdata
port
req1 resp1
port port
req1 resp1
port multi-master slave NI shell NI kernel s i d slave IP FSM per-channel QoS port port
SLIDE 11 11 2008-04-08 NOCS NI4 Router R00 Master IP port 1 NI 1
accept valid request
FSM
Slave IP port 2 Master IP port 2 NI 3
FSM
accept valid request
NI 2
FSM FSM
SOC architecture
Slave IP port 1
SLIDE 12 12 2008-04-08 NOCS NI 2 NI4 Router R00 monitor Master IP port 1 NI 1
accept valid request
FSM
Slave IP port 2 Master IP port 2 NI 3
FSM
accept valid request
FSM FSM
debug architecture: monitors
EDI node
EDI
FSM
Slave IP port 1
EDI distributed events from monitors to NI shells (and IP)
SLIDE 13 13 2008-04-08 NOCS
EDI node FSM
wait send idle more? reset / 0 event / 1
event / 1 event / 0
SLIDE 14 14 2008-04-08 NOCS NI4 Router R00 monitor EDI node
EDI
Master IP port 1 NI 1
accept valid request
FSM
Slave IP port 2 Master IP port 2 NI 3
FSM
accept valid request
TPR FSM TPR TPR
NI 2
FSM FSM TPR TPR
debug architecture: test point registers (TPR)
Slave IP port 1
debug behaviour is controlled by TPRs
SLIDE 15 test point registers (TPR)
control debug behaviour – link monitors: which conditions to monitor – NI shells: how to react to incoming events per channel
W+2 Condition Triggered? Enable monitor TPR W = width of data (and control) on monitored link.
Resp. channels
Condition
Request channels Resp. channels
IP_stop 10N+1
Resp. channels
Continue
Request channels
Quiescent?
Request channels Resp. channels
Granularity
Request channels Resp. channels
Enable
Request channels
NI FSM TPR N = Number of Request channels = Number of Response channels.
2008-04-08 NOCS
SLIDE 16 16 2008-04-08 NOCS s2
NI shell FSM
stop conditions (s2, s6) –
- riginal_condition and stop_enable and (stop or stop_condition)
modified transitions (f2’, f6’, d7’) –
- riginal_condition and not (stop_enable and (stop or stop_condition))
continue conditions (c2, c6, c7) –
- riginal_condition and continue
protocol serialisation can now be stopped & resumed general recipe for different protocols idle wdata accpt cmd dec cmd accpt read
reset
cmd dec’ wdata accpt’
s6 c6 c2 f2’ f1 f6’ f7’ f3 f5 f4 c7 NI
FSM TPR
SLIDE 17 17 2008-04-08 NOCS PC NI4 Router R00 monitor EDI node Debug Control Interconnect (DCI) TAP controller
EDI
Master IP port 1 NI 1
accept valid request
FSM
IEEE 1149.1 TAP
Slave IP port 2 Master IP port 2 NI 3
FSM
accept valid request
TPR FSM TPR TPR
NI 2
FSM FSM TPR TPR
Device under Debug (SOC) debugger SW
debug architecture: debug control interconnect
Slave IP port 1
TPRs are controlled by DCI (dedicated asynchronous scan chain)
SLIDE 18 18 2008-04-08 NOCS PC NI4 Router R00 monitor EDI node Debug Control Interconnect (DCI) TAP controller
EDI
Master IP port 1 NI 1
accept valid request
FSM
IEEE 1149.1 TAP
Slave IP port 2 Master IP port 2 NI 3
FSM
accept valid request
TPR FSM TPR TPR
NI 2
FSM FSM TPR TPR
Debug Data Interconnect (DDI) Device under Debug (SOC) debugger SW
debug architecture: scan chains, clock control, etc.
Slave IP port 1
down/upload functional state using DDI (scan chains for structural test)
SLIDE 19
19 2008-04-08 NOCS
debug architecture: software control API
the debug architecture is controlled using IEEE1149.1 test access port from a PC running debug software basically can down/upload system state, on the test clock separate scan chains for debug control/status and functional state – can modify debug state independently from functional state, and during functional mode “high-level” functions to get/set debug state
– reset – set_bp_monitor <condition> – set_bp_action <channel> <granularity> <condition> – get_mon_status <monitor> – get_ni_status <ni> – continue: set continue bits in NI TPRs – synchronise: down/upload entire SOC state
SLIDE 20
20 2008-04-08 NOCS
example
while the system is running in functional mode set breakpoint on value 378 in link monitor make channel between master 1 & slave 2 sensitive to events (A)
M1 S1 M2 S2 A NI_stop_enable
SLIDE 21
21 2008-04-08 NOCS
example
while polling the monitor after a number of transactions (B) it triggers and the NI receives a stop event (C) NI completes ongoing message & ignores next request (D)
C B D NI_stop_in M1_cmd_valid
SLIDE 22
22 2008-04-08 NOCS
example
after checking that there are no transactions in flight program NI to single-step mode with message granularity (E) and continue (F) the NI accepts a single write request (G) and continue again (read request, H)
NI_stop_condition NI_continue M1_cmd_accept G E F H
SLIDE 23 23 2008-04-08 NOCS
example
change debug granularity to word (data element) (I) and continue 5 times –
- ne command and four data handshakes (J, K)
M1_cmd_accept J K I M1_data_accept NI_stop_granularity
SLIDE 24
24 2008-04-08 NOCS
example
change debug sensitivity to EDI only (i.e. no single stepping) (L) communication resumes at full speed after continue pulse (M) all this time, the rest of the system could have been in functional mode
NI_continue NI_stop_condition L M
SLIDE 25
25 2008-04-08 NOCS
conclusions
debug scope – per channel (master-slave pair) – per connection (master with all its slaves) debug granularity – data words (equivalently: valid/accept handshake) – request/response – transaction all channels can be debugged or not, at any granularity, independently required for distributed-shared memory debugging debug architecture – re-uses existing functional & test infrastructures (e.g. scan chains) – simple programmable building blocks (monitors, TPRs) – general recipe to modify functional NI shell FSM for debug – very basic software API
SLIDE 26