Embedded analy+cs delivers system-wide visibility for debug, safety, - - PowerPoint PPT Presentation
Embedded analy+cs delivers system-wide visibility for debug, safety, - - PowerPoint PPT Presentation
Embedded analy+cs delivers system-wide visibility for debug, safety, security and more... Design and Reuse IP-SoC Days Shanghai 2017 Agenda Some obvious statements Some problems with exis4ng approaches Key requirements The
Agenda
- Some obvious statements
- Some problems with exis4ng approaches
- Key requirements
- The UltraSoC approach
- Some examples of performance analysis and debug
- Use cases
- Summary
21 September 2017
Some obvious statements
- SoCs have become increasingly complicated & are not going to get simpler
- Contain several processors, from different vendors
- Verified in isola4on and come with test suite
- Contain 100s of IP blocks
- Each verified in isola4on
- Contain complex interconnects
- Verified for certain, iden4fied condi4ons
- SoLware created by large disparate teams
- If lucky, modules and subsystem verified for certain, iden4fied condi4ons.
- All this has to successfully work together
- Understanding real world system behaviour is HARD!
21 September 2017
Some problems with exis+ng approaches
- Processor-centric, not system-centric
- Processors are a very small part of the overall system
- It’s very difficult to monitor:
- Bus behaviour, memory controllers, interac4ons between blocks
- There is very liXle analy4cs
- Just extrac4ng raw data
- Intrusive
- Ad hoc
- Developing, but s4ll essen4ally signal-based
- Hard to close 4ming
- In-field monitoring is not easy
21 September 2017
Key requirements
- A system-centric vendor-neutral debug and monitoring
infrastructure
- One that enables access to different proprietary debug schemes
- Enables monitoring of interconnect, interfaces and custom logic
- Run-4me configurable
- Re-use the hardware to provide visibility for different scenarios
- Run-4me configura4on of cross-triggering
- Support 10s if not 100s of cross-triggering events
- These can be interrogated aLer a problem to determine actual status
- Need to be power aware
- Built-in security
- Can be used during the whole development flow and in the field
21 September 2017
UltraSoC embedded analy+cs architecture
21 September 2017
Message infrastructure Analytic module Analytic module Analytic module Communicator Communicator
Upstream Downstream
CPU System interconnect Custom logic
External software API External debugger
SoC boundary
- 1. Protocol-aware
analysis modules with “smart” filters and trace System modules and interconnect
- 2. Op4mized message-
passing infrastructure
- 3. Communicators. Eg:
USB, JTAG, streaming,
- n-chip
- 4. Visualiza4on soLware
Message infrastructure Analytic module Analytic module Analytic module Communicator Communicator
Upstream Downstream
CPU System interconnect Custom logic
External software API External debugger
SoC boundary
How does it work?
- Protocol-aware analysis modules
- Processors: ARM, MIPS, Ceva, RISC-V, + more
- Buses: AXi, CHI, Netspeed, + more
- Filter, match, trigger, store, output
- Analysis done in hardware, on-chip
- Reduces need for high-speed off-chip transport
- Can be used in-system and in-field
- A choice of communicators
- To suite system requirements
21 September 2017
Example problems UltraSoC analy+cs solves
DDR3 Interconnect DFI-PHY DRAM controller Interconnect RAM DMA-1 Peripheral Interconnect USB MAC Turbo DSP Processor I$ D$
I TCM D TCM
Processor I$ D$
I TCM D TCM
DSP PHY DMA-2 DSP Timer Radio IF Radio IF FFT Interconnect Bus mon Bus mon Status mon Status mon Status mon Status mon Status mon UltraSoC Infrastructure Debug Hub UltraSoC IP Security Status mon Status mon Status mon
Why do some DMA transfers take too long? Why is the CPU not performing as fast as expected? What is going on with my memory controller? Why does the system hang
- r deadlock
- n rare
- ccasions?
What is the mismatch between the host & the DSP?
21 September 2017
Example 1: “Where have my MIPS gone?”
DDR3 Interconnect DFI-PHY DRAM controller Interconnect RAM DMA-1 Peripheral Interconnect USB MAC Turbo DSP Processor I$ D$
I TCM D TCM
Processor I$ D$
I TCM D TCM
DSP PHY DMA-2 DSP Timer Radio IF Radio IF FFT Interconnect Bus mon Bus mon Status mon Status mon Status mon Status mon Status mon UltraSoC Infrastructure Debug Hub UltraSoC IP Security Status mon Status mon Status mon
SM
UltraSoC Infrastructure
Why is the CPU not performing as fast as expected?
21 September 2017
Example 1: “Where have my MIPS gone?”
DDR3 Interconnect DFI-PHY DRAM controller Interconnect RAM DMA-1 Peripheral Interconnect USB MAC Turbo DSP Processor I$ D$
I TCM D TCM
Processor I$ D$
I TCM D TCM
DSP PHY DMA-2 DSP Timer Radio IF Radio IF FFT Interconnect Bus mon Bus mon Status mon Status mon Status mon Status mon Status mon UltraSoC Infrastructure Debug Hub UltraSoC IP Security Status mon Status mon Status mon
SM
UltraSoC Infrastructure
80% 12% 8%
CPU spent cycles
Compute Stall 1
- utstanding
Stall 2
- utstanding
Why is the CPU not performing as fast as expected?
21 September 2017
Example 2: DDR bandwidth
DDR3 Interconnect DFI-PHY DRAM controller Interconnect RAM DMA-1 Peripheral Interconnect USB MAC Turbo DSP Processor I$ D$
I TCM D TCM
Processor I$ D$
I TCM D TCM
DSP PHY DMA-2 DSP Timer Radio IF Radio IF FFT Interconnect Bus mon Bus mon Status mon Status mon Status mon Status mon Status mon UltraSoC Infrastructure Debug Hub UltraSoC IP Security Status mon Status mon Status mon
SM SM
Why do some DMA transfers take too long? What is going on with my memory controller?
21 September 2017
Example 2: DDR bandwidth
DDR3 Interconnect DFI-PHY DRAM controller Interconnect RAM DMA-1 Peripheral Interconnect USB MAC Turbo DSP Processor I$ D$
I TCM D TCM
Processor I$ D$
I TCM D TCM
DSP PHY DMA-2 DSP Timer Radio IF Radio IF FFT Interconnect Bus mon Bus mon Status mon Status mon Status mon Status mon Status mon UltraSoC Infrastructure Debug Hub UltraSoC IP Security Status mon Status mon Status mon
SM SM
Why do some DMA transfers take too long? What is going on with my memory controller?
- Look at I$ from compute engines
- Aggregate bandwidth from each is within spec
- But at Time 2300 Combined peak I$ read request of
>2GB/s, cf average of ~570MBs
0.00E+00 5.00E+08 1.00E+09 1000 4000 7000 10000 13000 16000 19000 22000 25000 28000 31000 34000 37000 40000 43000 46000 49000
Effec4ve B/s Time in ns
Windowed DDR traffic
DSP1 DSP2 CPU1 CPU2
21 September 2017
Cross-triggering
21 September 2017
Message infrastructure Analytic module Analytic module Analytic module Communicator Communicator
Upstream Downstream
CPU System interconnect Custom logic
External software API External debugger
SoC boundary
- 2. Configure to store
trace in ring buffer. Stop & output on event A
- 1. If write to an address
range occurs, send event A
- 3. Write detected. Event
A broadcast to all modules in chip
- 4. Event A received.
Output trace
- 5. Trace data displayed
The importance of cross-triggering
21 September 2017
ADATA AID ADATA AID ADATA AID ADATA AID ADATA AID ADATA AID ADATA AID ADATA AID ADATA AID ADATA AID ADATA AID ADATA AID ADATA AID ADATA AID ADATA AID ADATA AID
Without Cross- Triggering With Cross- Triggering Only capture data of interest ATB Samples
- Gigabytes of trace data can
be reduced to kilobytes
- In-field, events that only
- ccur once a week can be
captured and uploaded
- Cross-trigger events can be
sourced from anywhere, even hardware signals
- Run-4me selec4on is
essen4al
Use case 1: classic debug
21 September 2017
Use case 2: in-field debugging and analysis
21 September 2017
- Find the cause of rate problems
- Monitor ongoing performance
- Fix problems through upgrades
- Input to next-genera4on SoC
Use-case 3: bare metal security and safety
21 September 2017
Summary
- In complex SoCs
- Embedded analy4cs is essen4al
- A unified approach can save months of effort and a lot of money
- Embedded analy4cs hardware can be used for
- Classic lab debug
- In field problem solving
- Life4me analysis
- A separate domain to enhance security and safety
21 September 2017