Building Blocks for PRU Development Overview Embedded Processing

Agenda • PRU Hardware Overview • PRU Firmware Development • Linux Drivers Introduction

PRU Hardware Overview Building Blocks for PRU Development

ARM SoC Architecture ARM Subsystem • L1 D/I caches: Single-cycle access Cortex-A L1 Instruction L1 Data • L2 cache: Cache Cache Minimum latency of 8 cycles L2 Data Cache • Access to on-chip SRAM: On-chip SRAM 20 cycles • Access to shared memory L3 Interconnect over L3 Interconnect: 40 cycles Shared Peripherals Memory L4 Interconnect Peripherals GP I/O

ARM + PRU SoC Architecture ARM Subsystem Programmable Real-Time Unit (PRU) Subsystem Cortex-A PRU0 PRU1 PRU0 I/O (200MHz) (200MHz) L1 Instruction L1 Data PRU1 I/O Cache Inst. Data Inst. Data Cache Shared RAM RAM RAM RAM RAM L2 Data Cache Interconnect On-chip SRAM INTC Peripherals L3 Interconnect L3 Interconnect Shared Peripherals Access Times: Memory • Instruction RAM = 1 cycle L4 Interconnect • DRAM = 3 cycles • Shared DRAM = 3 cycles Peripherals GP I/O

Programmable Real-Time Unit (PRU) Subsystem • Programmable Real-Time Unit (PRU) is a low-latency microcontroller subsystem. • Two independent PRU PRU Subsystem Block Diagram execution units: Industrial MII0 RX/TX – 32-Bit RISC architecture Ethernet Data RAM0 – 200MHz; 5ns per instruction PRU0 32 GPO Core – Single cycle execution; No 30 GPI (IRAM0) Data RAM1 pipeline 32-bit Interconnect bus Scratchpad – Dedicated instruction and data Shared RAM PRU1 RAM per core 32 GPO Core 30 GPI – Shared RAM (IRAM1) Master I/F (to SoC interconnect) • Includes Interrupt Controller for Industrial MII1 RX/TX Slave I/F Ethernet system event handling (from SoC interconnect) MDIO IEP (Timer) • Fast I/O interface: Up to 30 UART Events to eCAP inputs and 32 outputs on Interrupt ARM INTC Controller external pins per PRU unit. MPY/MAC Events from (INTC) Peripherals + PRUs

Now let’s go a little deeper…

PRU Functional Block Diagram Constant Table General Purpose Registers  Ease SW development by  All instructions are performed on PRU Execution Unit providing freq used constants registers and complete in a single cycle.  Peripheral base addresses  Register file appears as linear block for all CONST TABLE register-to-memory operations.  Few entries programmable R0 R1 R2 EXECUTION Execution Unit … UNIT  Logical, arithmetic, and flow R29 control instructions 32 GPO R30  Scalar, no Pipeline, Little R31 Instruction 30 GPI Endian RAM  Register-to-register data flow INTC  Addressing modes: Ld Immediate & Ld/St to Mem Special Registers (R30 and R31)  R30  Write: 32 GPO Instruction RAM  R31  Typical size is a multiple of 4KB (or  Read: 30 GPI + 2 Host Int status 1K Instructions)  Write: Generate INTC Event  Can be updated with PRU reset 8

Fast I/O Interface Cortex A8 L3F L3S L4 PER Peripherals GPIO1 GPIO2 GPIO3 .... GPIO 3.19 Pinmux Device pin

Fast I/O Interface Cortex A8 • Reduced latency through direct access to pins: – Read or toggle I/O within a single PRU cycle – Detect and react to I/O event within two PRU cycles L3F L3S • Independent general purpose inputs (GPIs) and general purpose outputs (GPOs): L4 PER – PRU R31 directly reads from up to 30 GPI pins. – PRU R30 directly writes up to 32 PRU GPOs. Peripherals GPIO1 • Configurable I/O modes per PRU core: PRU Subsystem GPIO2 – GP input modes: GPIO3 .... • Direct input • 16-bit parallel capture PRU • 28-bit shift output 5 GPIO 3.19 – GP output modes: • Direct output • Shift out Pinmux Device pin

GPIO Toggle: Bench Measurements ARM GPIO Toggle: PRU IO Toggle: ~200ns ~5ns = ~40x Faster

Integrated Peripherals • Provide reduced PRU read/write access latency compared to external peripherals • No need for local peripherals to go through external L3 or L4 interconnects • Can be used by PRU or by the ARM as additional hardware peripherals on the device • Integrated peripherals: – PRU UART – PRU eCAP – PRU IEP (Timer) Programmable Real-Time Unit (PRU) Subsystem PRU0 PRU1 (200MHz) (200MHz) Inst. Data Inst. Data Shared RAM RAM RAM RAM RAM Interconnect IEP INTC UART eCAP (Timer)

PRU Read Latencies: Local vs Global Memory Map The PRU directly accessing internal MMRs (Local MMR Access) is faster than going through the L3 interconnects (Global MMR Access). Local MMR Global MMR Access Access ( PRU cycles ( PRU cycles @ 200MHz ) @ 200MHz ) PRU R31 (GPI) 1 N/A PRU CTRL 4 36 PRU CFG 3 35 PRU INTC 3 35 PRU DRAM 3 35 PRU Shared DRAM 3 35 PRU ECAP 4 36 PRU UART 14 46 PRU IEP 12 44 Note: Latency values listed are “best-case” values.

PRU “Interrupts” • The PRU does not support asynchronous interrupts: – However, specialized h/w and instructions facilitate efficient polling of system events. – The PRU-ICSS can also generate interrupts for the ARM, other PRU-ICSS, and sync events for EDMA. • From UofT CSC469 lecture notes, “ Polling is like picking up your phone every few seconds to see if you have a call. Interrupts are like waiting for the phone to ring. – Interrupts win if processor has other work to do and event response time is not critical – Polling can be better if processor has to respond to an event ASAP ” • Asynchronous interrupts can introduce jitter in execution time and generally reduce determinism. The PRU is optimized for highly deterministic operation.

Sitara Device Comparison AM18x/ AM335x AM437x AM571x AM572x (PG1.1) OMAPL138 Features PRUSS PRU-ICSS1 PRU-ICSS1 PRU-ICSS0 2 x PRU-ICSS 2 x PRU-ICSS PRU core version 1 3 3 3 3 3 Number of PRU cores (per 2 2 2 2 2 2 subsystem) Max frequency CPU freq / 2 200 MHz 200 MHz 200 MHz 200 MHz 200 MHz IRAM size (per PRU core) 4 KB 8 KB 12 KB 4 KB 12 KB 12 KB DRAM size (per PRU core) 512 B 8 KB 8 KB 4 KB 8 KB 8 KB Shared DRAM size (per -- 12 KB 32 KB -- 32KB 32KB subsystem) Direct; or 16-bit Direct; or 16-bit Direct; or 16-bit Direct; or 16-bit parallel capture; or parallel capture; or parallel capture; or Direct; or 16-bit General purpose input Direct parallel capture; or 28-bit shift; or 3ch 28-bit shift; or 3ch 28-bit shift; or 3ch parallel capture; or (per PRU core) 28-bit shift EnDat 2.2; or EnDat 2.2; or EnDat 2.2; or 28-bit shift 9ch Sigma Delta 9ch Sigma Delta 9ch Sigma Delta General purpose output Direct Direct; or Shift out Direct; or Shift out Direct; or Shift out Direct; or Shift out Direct; or Shift out (per PRU core) GPI Pins (PRU0, PRU1) 30, 30 17, 17 13, 0 20, 20 21*, 21 21, 21 GPO Pins (PRU0, PRU1) 32, 32 16, 16 12, 0 20, 20 21*, 21 21, 21 MPY/MAC N Y Y Y Y Y Scratchpad N Y (3 banks) Y (3 banks) N Y (3 banks) Y (3 banks) CRC16/32 0 0 2 2 2 0 INTC 1 1 1 1 1 1 Peripherals n/a Y Y Y Y Y UART 0 1 1 1 1 1 eCAP 0 1 1 no connect 1 1 IEP 0 1 1 no connect 1 1 * PRU-ICSS2 only. PRU-ICSS1 does not pin out the PRU0 core GPIs/GPOs. 15 MII_RT 0 2 2 no connect 2 2 ** 2 nd protocol limited to EnDAT/Profibus/BISS/HIperphase DSL or serial based protocol MDIO 0 1 1 no connect 1 1 Simultaneous protocols 1 1 2** 2

Examples of how people have used the PRU…

Use Case Examples • Industrial Protocols Not all use cases are • ASRC feasible on PRU • 10/100 Switch - Development complexity • Smart Card DSP-like functions • - Technical constraints • Filtering (i.e. running Linux on PRU) • FSK Modulation • LCD I/F • Camera I/F • RS-485 • UART • SPI • Monitor Sensors I2C • • Bit banging • Custom/Complex PWM Stepper motor control • Development Complexity

PRU Firmware Development Building Blocks for PRU Development

TI PRU Code Generation Tools (CGT): C Compiler

C Compiler • Developed and maintained by TI CGT team; Remains very similar to other TI compilers • Full support of C/C++ • Adds PRU-specific functionality: – Can take advantage of PRU architectural features automatically – Contains several intrinsics: A list can be found in Compiler documentation • Full instruction-set assembler for hand-tuned routines For more information, refer to the PRU Optimizing C/C++ Compiler User’s Guide: http://www.ti.com/lit/spruhv7.

TI PRU CGT Assembly vs C • Advantages of coding in Assembly over C: – Code can be tweaked to save every last cycle and byte of RAM – No need to rely on the compiler to make code deterministic – Easily make use of scratchpad • Advantages of coding in C over Assembly: – More code reusability – Can directly leverage kernel headers for interaction with kernel drivers – Optimizer is extremely intelligent at optimizing routines • “Accelerating” math via MAC unit, implementing LOOP instruction, etc. – Not mutually exclusive; Inline Assembly can be easily added to a C project

PRU Register Header Files

Building Blocks for PRU Development Overview Embedded Processing - PowerPoint PPT Presentation

Building Blocks for PRU Development Overview Embedded Processing Agenda PRU Hardware Overview PRU Firmware Development Linux Drivers Introduction PRU Hardware Overview Building Blocks for PRU Development ARM SoC Architecture ARM

Blocks What is syntax (delimiters) Where can blocks be used Scope and blocks Do

Reference Trajectories for Performance Review (et al. ;-) Enrico Spinielli, PRU/EUROCONTROL

FBPQ and building blocks FBPQ and building blocks Mark Drye Director of Asset Management

STARTER PLANT CONCRETE BLOCKS 1 X 8 INCH Quality building blocks are essential in the safe

Building Blocks Yang Xu Department of Automatic Control Building blocks Synchronized

Analog Integrated Circuits Fundamental Building Blocks Fundamental Building Blocks Basic

FPGAs! Basic Concepts Building Blocks There are (3) fundamental building blocks found in

Analog Integrated Circuits Fundamental Building Blocks Fundamental Building Blocks Current and

Analog Integrated Circuits Fundamental Building Blocks Fundamental Building Blocks Current

Analog Integrated Circuits Fundamental Building Blocks Fundamental Building Blocks Differential

explore the world Vertical Events Mining 2017 Pru Maclean WE HAVE REFINED A BLUEPRINT FOR

CORPORATE PRESENTATION SEPTEMBER 2019 ASX/TSX: PRU www.perseusmining.com PERSEUS MINING

Optimal monetary and macro-pru policies Oreste Tristani European Central Bank Federal Reserve

Pru Archer & Tash Burden *Properties closer to the university can be more expensive and

PRU RUKS KSA A HOL OLDI DING NG PCL 1Q19 PERFORMANCE RESULTS 24 MAY 2019 Agenda nda i

ICI CICI CI Pru rudent dential ial Value lue Fund d - Ser Serie ies s 11 11 th Jan 2017

Implementing the Yubikey at Fermilab Sal Gonzlez and Al Lilianstrom National Laboratories

Please Stand By Training Webinar will begin shortly For audio, please call 1-800-369-1820

Speeding up Samba by backing up Experiences in implementing and optimizing Active Directory

Stakeholder Engagement in Real World Evidence in Oncology CAPT 2018 Conference October 22, 2018

Laudit des GPO Aurlien Bordes SSTIC 5 juin 2019 /49 Rappels et contexte Les GPO (

Windows Not just for houses Windows 1-10 Windows Server Essentially a jacked up windows 8 box

PROMISE ZONES Urban - 2014 Draft Second Round Application Guide April 29, 2014 Presenter

Tuesday, August 1, 2017, 2:00 3:00 p.m. (EDT) This is the DTIC public site: