Faraydon Karim ST Microelectronics La Jolla, CA 92121 Faraydon.karim@st.com
Faraydon Karim MPSoC02
- c
Outline Motivation Network Processor Complexity Methodology and - - PDF document
Faraydon Karim ST Microelectronics La Jolla, CA 92121 Faraydon.karim@st.com Outline Motivation Network Processor Complexity Methodology and Architecture Faraydon Karim MPSoC02 o c Motivation Speed Requirement
Faraydon Karim MPSoC02
Faraydon Karim MPSoC02
Faraydon Karim MPSoC02
RISC Processor
Perform ance Configurability (Evolving standards) Com plexity Of Network Functions ASIC Netw ork Processor OC-12 OC-768
Faraydon Karim MPSoC02
5000 10000 15000 20000 25000 30000 35000 40000 45000 50000 OC3 OC12 OC48 OC192
L2 switching L3 routing QoS/CoS Monitoring Load Balancing Firewall VPN Intrusion Detection Virus Scanning
Today’s processors 1-3K MIPs MIPs
Faraydon Karim MPSoC02
Media Cell/Packet size Packets/Sec Time/Packet 10 Mb Ethernet 64 - 1518 14.88k - 800k 67.2-1,240 uS 100 Mb Ethernet 64 - 1518 148k – 8k 6.72 – 124 uS Gb Ethernet 64 - 1518 1.48M – 80k 672nS – 12.4 uS OC-3 53 ~300k ~3.3 us OC-12 53 ~1.2M ~833 nS OC-48 53 ~4.8M 208 nS OC-192 53 ~19.2M 52 nS OC-768 53 ~76.8M 13 nS
Faraydon Karim MPSoC02
114 million packets/sec (44 bytes/packet) Processing time < 9ns/packet Assumption: forwarding + classification = ~500 instructions Requirement: 57 GIPs Need for multiple GHz processors Packet Classification Lakshman and Stiliadis Proceedings of ACM SIGCOMM, Sept. 98 50 memory accesses/packet Requirement: 5.7 x 109 memory accesses/sec Need for multiple memory components
Faraydon Karim MPSoC02
~5.7GIPS for OC-192 . . . and getting worse
data comes in at 10Gbps (OC-192) and 40Gbps (OC-768)
frame doesn’t depend on previous or next one
driven by data (operand) availability asynchrony
Faraydon Karim MPSoC02
Faraydon Karim MPSoC02
Well known properties Existing processors are well defined Simulation with established benchmarks
Application space known ... However, very complex set of functions:
packet classification, forwarding, scheduling
Properties to verify not all known Evolving standards
Can test suites be developed?
Faraydon Karim MPSoC02
Identify frames based on information such as protocol,
destination/source address, etc
Queue frames awaiting further processing (prioritization)
Meet delay/jitter requirements
Tag frames for processing in subsequent devices Source: Agere, Inc
Faraydon Karim MPSoC02
Locality inter-packet is poor. uP cache does not help. A lot of pointer-chasing which requires Cache thrashing uP stalls during these indirections
Faraydon Karim MPSoC02
> 90% taken for DSP 50/50 for some network applications
Faraydon Karim MPSoC02
where the traditional micro processors can’t.
additions and tying them together the same old fashion way.
differentiation purpose.
difficult to program.
Faraydon Karim MPSoC02
Faraydon Karim MPSoC02
Faraydon Karim MPSoC02
Micro Processor Nano Processor Nano Processor Nano Processor Nano Processor Nano Processor Nano Processo Host Bus Interface Unit ST Net work Interface Unit Circular Buffer Octagon Connection IPA-TLC Memory Controller & Buffers Nano Processo Nano Processor 10Mb/100Mb/1Gb Ethernet MAC 10Mb/100Mb/1Gb Ethernet MAC ATM SONET
128-bit CPIX Bus (166MHz)
PHYs PHYs PHYs PHYs
Faraydon Karim MPSoC02
Register File
System Registers
Decode Unit
Special Hardware Branch Processor
Multithread buffers Special Hardware Special Hardware Special Hardware Special Hardware
Circular buffer Addressing
Faraydon Karim MPSoC02
Network Processor using Octagon
7 6 5 3 2 1 4 P1 M1 P0 M0 P2 M2 P3 M3 P4 M4 P5 M5 P6 M6 P7 M7
Octagon Node Model
Request Generator Processor Memory Arbiter
L L A R A R Ingress Egress MUX/DEMUX
Scheduler
Faraydon Karim MPSoC02
System H/W Architecture Evaluation/Partitioning
S/W design H/W design
System integration
Logic MCU DSP DRAM ADC DAC Analog
H/W emulation
Interface design
Transaction -> Cycle HW/SW Performance eval. RTL-to-layout Tools
System function
Domain-specific modelling tools
H/W-S/W cosim System S/W Architecture
Device Drivers Instruction-set sim (Function->cycle)
Cycle-based spec signoff
C compiler Source-level debug RTOS
PLD board
Faraydon Karim MPSoC02
Faraydon Karim MPSoC02
Property checking of well-established properties Validation test suites of known processor functionalities Simulation with established benchmarks
Application space known ... However, very complex set of functions:
packet classification, forwarding, scheduling
Properties to verify not all known Evolving standards
Can test suites be developed?
Faraydon Karim MPSoC02
processor: formal and simulation-based techniques for a single processor hw/sw co-designs: co-simulation of single processor-based co-designs
Multiple embedded processors Multi-threading, parallel processing, pipelining Mix of homogenous and non-homogenous processors
nano-processors and control processor
Multiple co-processors/hardware accelerators
for packet forwarding, packet classification, queue management
Faraydon Karim MPSoC02
Complex set of application, firmware, and development software Need for comprehensive set of software debugging tools Need for real-time verification through hardware prototyping environments
Cycle-accurate ISS/Network Simulator API Library Optimized Firmware Library Third party Routing Applications NanoPU NanoPU Compiler Assembler Linker Embedded RTOS
Architecture Performance Analysis
Network Models
H/W Prototyping Environment
NPU NPU Programmer’s Model
NanoPU NanoPU debugger Instruction-set Simulator
Faraydon Karim MPSoC02
Faraydon Karim MPSoC02
Faraydon Karim MPSoC02
1 2 3 6 7 5 4 4 3 7 2 1 6 5
Nodes 8 15 22 (shown) Horizontal Links #/max length(mm) 12/8 24/8 36/8 Nodes 8 15 22
Octagon Crossbar
Vertical Links #/max length(mm) 12/0.156 24/0.156 36/0.156 Horizontal Links #/max length(mm) 8/8 15/16 22/22 Vertical Links #/max length(mm) 32/0.108 120/0.192 242/0.276
Faraydon Karim MPSoC02
Long Interconnects in High speed/low voltage DSM Socs
Use of Ghz, v. long, nano-meter interconnects
2 4 6 8 10 0.0 0.2 0.4 0.6 0.8 1.0 0.1 µm 0.18 µm 0.25 µm 0.35 µm
Glitch Height (V) Length (mm)
2 4 6 8 10 0.0 0.2 0.4 0.6 0.8
0.1µm 0.13µm 0.18µm 0.25µm 0.35µm Delay Time (ns) Length (mm)
Faraydon Karim MPSoC02
Faraydon Karim MPSoC02
Faraydon Karim MPSoC02
Silicon Gate-Level RTL / u-Arch Architecture Functional Abstraction Layer Modeling Type Circuit Boolean Cycle-based Processes Functional
Faraydon Karim MPSoC02
Analyze all its component
Application Scheduling Architecture
_ Find parallelable Functions
Select The Components for desired Cost/Performance
Faraydon Karim MPSoC02
Faraydon Karim MPSoC02
Edge Router IP Packet processing steps
Egress Module: Traffic manager block Status memory for WRED [Thr1, Thr2, AvgQsize, …] Out queue lookup table Status memory for MDDR Output Queuing memory [packbuff ptr} Output Packet buffer memory Queues servicing (and shaping) DRR, MDRR Congestion control WRED/ drop tail Packet classifying [CoS, OutPrt}→out queue Switch Fabric Interface Switch fabric scheduler protocol Packet re-assembly
From the Switch Fabric CSIX –L1 interface header+payload packetbuff ptr Packet to L2 Output port interface
Faraydon Karim MPSoC02
Edge Router IP Packet processing steps
Ingress Module: Packet Processor
Packet transfer to traffic manager engine Forwarding Packet preparation (FWH+updated header+payload) Packet modification Header updating *CoS field *TTL Decrementing *Checksum calculation Forwarding header (FWH) preparation (CoS, drop prec.,
IP traffic conditioning and statistics Policing/metering (token bucket, marking non conforming packets, Drop precedence value assigned) IP packet classification/filtering (multifield,MF) Packet Parsing and lookup key preparation IP packet/header validation: Header length field check Packet length & min. length check Protocol version number Header Checksum IP packet lifetime control Statistics Memory Metering Packet flows status memory [flow parameters, token counters status] Lookup engine Lookup Tables and ACLs Packet buffer memory [Packet header+payload]
IP packet flow within the ingress packet processor/classifier block
Flow identified: (flow info record: QoS tag, egress port/line card, PTRs to metering and statistics mems) Packet header + payload Packet header Local delivery Table lookup and ACLs update (PCI) Entry:[Source/Dest addr., ToS field, Protocol Type, TCP/UDPSource/ Dest port…] Pkt non conforming Packet to traffic manager SPI-n modified or streaming interface (NP Forum) Sequence of data chunks (64 bytes) Packet from PHY (SPI-n interface) Sequence of data chunks (64 bytes)
Faraydon Karim MPSoC02
Faraydon Karim MPSoC02
Faraydon Karim MPSoC02
Faraydon Karim MPSoC02
Faraydon Karim MPSoC02
RAM
RAM
Boot ROM
Pointer Logic& Lookup table
Circular Buffer Control Processor
PCI Interface
Network interface
Checksum & Policy key
RAM
IP-TLC
RAM
Octagon Connection
MACs DMA DMA
Interface cntl.
Voyagers
Special Purpose Processors
System Registers
CSIX
Nano- Processor Bank
Faraydon Karim MPSoC02