Embedded analy+cs delivers system-wide visibility for debug, safety, - - PowerPoint PPT Presentation

embedded analy cs delivers system wide visibility for
SMART_READER_LITE
LIVE PREVIEW

Embedded analy+cs delivers system-wide visibility for debug, safety, - - PowerPoint PPT Presentation

Embedded analy+cs delivers system-wide visibility for debug, safety, security and more... Design and Reuse IP-SoC Days Shanghai 2017 Agenda Some obvious statements Some problems with exis4ng approaches Key requirements The


slide-1
SLIDE 1

Embedded analy+cs delivers system-wide visibility for debug, safety, security and more...

Design and Reuse IP-SoC Days – Shanghai 2017

slide-2
SLIDE 2

Agenda

  • Some obvious statements
  • Some problems with exis4ng approaches
  • Key requirements
  • The UltraSoC approach
  • Some examples of performance analysis and debug
  • Use cases
  • Summary

21 September 2017

slide-3
SLIDE 3

Some obvious statements

  • SoCs have become increasingly complicated & are not going to get simpler
  • Contain several processors, from different vendors
  • Verified in isola4on and come with test suite
  • Contain 100s of IP blocks
  • Each verified in isola4on
  • Contain complex interconnects
  • Verified for certain, iden4fied condi4ons
  • SoLware created by large disparate teams
  • If lucky, modules and subsystem verified for certain, iden4fied condi4ons.
  • All this has to successfully work together
  • Understanding real world system behaviour is HARD!

21 September 2017

slide-4
SLIDE 4

Some problems with exis+ng approaches

  • Processor-centric, not system-centric
  • Processors are a very small part of the overall system
  • It’s very difficult to monitor:
  • Bus behaviour, memory controllers, interac4ons between blocks
  • There is very liXle analy4cs
  • Just extrac4ng raw data
  • Intrusive
  • Ad hoc
  • Developing, but s4ll essen4ally signal-based
  • Hard to close 4ming
  • In-field monitoring is not easy

21 September 2017

slide-5
SLIDE 5

Key requirements

  • A system-centric vendor-neutral debug and monitoring

infrastructure

  • One that enables access to different proprietary debug schemes
  • Enables monitoring of interconnect, interfaces and custom logic
  • Run-4me configurable
  • Re-use the hardware to provide visibility for different scenarios
  • Run-4me configura4on of cross-triggering
  • Support 10s if not 100s of cross-triggering events
  • These can be interrogated aLer a problem to determine actual status
  • Need to be power aware
  • Built-in security
  • Can be used during the whole development flow and in the field

21 September 2017

slide-6
SLIDE 6

UltraSoC embedded analy+cs architecture

21 September 2017

Message infrastructure Analytic module Analytic module Analytic module Communicator Communicator

Upstream Downstream

CPU System interconnect Custom logic

External software API External debugger

SoC boundary

  • 1. Protocol-aware

analysis modules with “smart” filters and trace System modules and interconnect

  • 2. Op4mized message-

passing infrastructure

  • 3. Communicators. Eg:

USB, JTAG, streaming,

  • n-chip
  • 4. Visualiza4on soLware
slide-7
SLIDE 7

Message infrastructure Analytic module Analytic module Analytic module Communicator Communicator

Upstream Downstream

CPU System interconnect Custom logic

External software API External debugger

SoC boundary

How does it work?

  • Protocol-aware analysis modules
  • Processors: ARM, MIPS, Ceva, RISC-V, + more
  • Buses: AXi, CHI, Netspeed, + more
  • Filter, match, trigger, store, output
  • Analysis done in hardware, on-chip
  • Reduces need for high-speed off-chip transport
  • Can be used in-system and in-field
  • A choice of communicators
  • To suite system requirements

21 September 2017

slide-8
SLIDE 8

Example problems UltraSoC analy+cs solves

DDR3 Interconnect DFI-PHY DRAM controller Interconnect RAM DMA-1 Peripheral Interconnect USB MAC Turbo DSP Processor I$ D$

I TCM D TCM

Processor I$ D$

I TCM D TCM

DSP PHY DMA-2 DSP Timer Radio IF Radio IF FFT Interconnect Bus mon Bus mon Status mon Status mon Status mon Status mon Status mon UltraSoC Infrastructure Debug Hub UltraSoC IP Security Status mon Status mon Status mon

Why do some DMA transfers take too long? Why is the CPU not performing as fast as expected? What is going on with my memory controller? Why does the system hang

  • r deadlock
  • n rare
  • ccasions?

What is the mismatch between the host & the DSP?

21 September 2017

slide-9
SLIDE 9

Example 1: “Where have my MIPS gone?”

DDR3 Interconnect DFI-PHY DRAM controller Interconnect RAM DMA-1 Peripheral Interconnect USB MAC Turbo DSP Processor I$ D$

I TCM D TCM

Processor I$ D$

I TCM D TCM

DSP PHY DMA-2 DSP Timer Radio IF Radio IF FFT Interconnect Bus mon Bus mon Status mon Status mon Status mon Status mon Status mon UltraSoC Infrastructure Debug Hub UltraSoC IP Security Status mon Status mon Status mon

SM

UltraSoC Infrastructure

Why is the CPU not performing as fast as expected?

21 September 2017

slide-10
SLIDE 10

Example 1: “Where have my MIPS gone?”

DDR3 Interconnect DFI-PHY DRAM controller Interconnect RAM DMA-1 Peripheral Interconnect USB MAC Turbo DSP Processor I$ D$

I TCM D TCM

Processor I$ D$

I TCM D TCM

DSP PHY DMA-2 DSP Timer Radio IF Radio IF FFT Interconnect Bus mon Bus mon Status mon Status mon Status mon Status mon Status mon UltraSoC Infrastructure Debug Hub UltraSoC IP Security Status mon Status mon Status mon

SM

UltraSoC Infrastructure

80% 12% 8%

CPU spent cycles

Compute Stall 1

  • utstanding

Stall 2

  • utstanding

Why is the CPU not performing as fast as expected?

21 September 2017

slide-11
SLIDE 11

Example 2: DDR bandwidth

DDR3 Interconnect DFI-PHY DRAM controller Interconnect RAM DMA-1 Peripheral Interconnect USB MAC Turbo DSP Processor I$ D$

I TCM D TCM

Processor I$ D$

I TCM D TCM

DSP PHY DMA-2 DSP Timer Radio IF Radio IF FFT Interconnect Bus mon Bus mon Status mon Status mon Status mon Status mon Status mon UltraSoC Infrastructure Debug Hub UltraSoC IP Security Status mon Status mon Status mon

SM SM

Why do some DMA transfers take too long? What is going on with my memory controller?

21 September 2017

slide-12
SLIDE 12

Example 2: DDR bandwidth

DDR3 Interconnect DFI-PHY DRAM controller Interconnect RAM DMA-1 Peripheral Interconnect USB MAC Turbo DSP Processor I$ D$

I TCM D TCM

Processor I$ D$

I TCM D TCM

DSP PHY DMA-2 DSP Timer Radio IF Radio IF FFT Interconnect Bus mon Bus mon Status mon Status mon Status mon Status mon Status mon UltraSoC Infrastructure Debug Hub UltraSoC IP Security Status mon Status mon Status mon

SM SM

Why do some DMA transfers take too long? What is going on with my memory controller?

  • Look at I$ from compute engines
  • Aggregate bandwidth from each is within spec
  • But at Time 2300 Combined peak I$ read request of

>2GB/s, cf average of ~570MBs

0.00E+00 5.00E+08 1.00E+09 1000 4000 7000 10000 13000 16000 19000 22000 25000 28000 31000 34000 37000 40000 43000 46000 49000

Effec4ve B/s Time in ns

Windowed DDR traffic

DSP1 DSP2 CPU1 CPU2

21 September 2017

slide-13
SLIDE 13

Cross-triggering

21 September 2017

Message infrastructure Analytic module Analytic module Analytic module Communicator Communicator

Upstream Downstream

CPU System interconnect Custom logic

External software API External debugger

SoC boundary

  • 2. Configure to store

trace in ring buffer. Stop & output on event A

  • 1. If write to an address

range occurs, send event A

  • 3. Write detected. Event

A broadcast to all modules in chip

  • 4. Event A received.

Output trace

  • 5. Trace data displayed
slide-14
SLIDE 14

The importance of cross-triggering

21 September 2017

ADATA AID ADATA AID ADATA AID ADATA AID ADATA AID ADATA AID ADATA AID ADATA AID ADATA AID ADATA AID ADATA AID ADATA AID ADATA AID ADATA AID ADATA AID ADATA AID

Without Cross- Triggering With Cross- Triggering Only capture data of interest ATB Samples

  • Gigabytes of trace data can

be reduced to kilobytes

  • In-field, events that only
  • ccur once a week can be

captured and uploaded

  • Cross-trigger events can be

sourced from anywhere, even hardware signals

  • Run-4me selec4on is

essen4al

slide-15
SLIDE 15

Use case 1: classic debug

21 September 2017

slide-16
SLIDE 16

Use case 2: in-field debugging and analysis

21 September 2017

  • Find the cause of rate problems
  • Monitor ongoing performance
  • Fix problems through upgrades
  • Input to next-genera4on SoC
slide-17
SLIDE 17

Use-case 3: bare metal security and safety

21 September 2017

slide-18
SLIDE 18

Summary

  • In complex SoCs
  • Embedded analy4cs is essen4al
  • A unified approach can save months of effort and a lot of money
  • Embedded analy4cs hardware can be used for
  • Classic lab debug
  • In field problem solving
  • Life4me analysis
  • A separate domain to enhance security and safety

21 September 2017