A Comparison of Five Different Multiprocessor SoC Bus Architectures - - PowerPoint PPT Presentation

a comparison of five different multiprocessor soc bus
SMART_READER_LITE
LIVE PREVIEW

A Comparison of Five Different Multiprocessor SoC Bus Architectures - - PowerPoint PPT Presentation

A Comparison of Five Different Multiprocessor SoC Bus Architectures Kyeong Keol Ryu, Eung Shin and Vincent J. Mooney III School of Electrical and Computer Engineering Georgia Institute of Technology, Atlanta, USA {kkryu, eung,


slide-1
SLIDE 1

A Comparison of Five Different Multiprocessor SoC Bus Architectures

Kyeong Keol Ryu, Eung Shin and Vincent J. Mooney III School of Electrical and Computer Engineering Georgia Institute of Technology, Atlanta, USA {kkryu, eung, mooney}@ece.gatech.edu

slide-2
SLIDE 2

Outline

Introduction Motivation and Previous Work Five Bus Architectures for SoC:

BFBA, GBIA, GBIIA, CSBA, and CCBA

Application Examples:

OFDM transmitter and MPEG2 decoder

Experiment Environment Comparison in View of Algorithm and Architecture Comparison of Throughput of the Bus Architectures Conclusion

slide-3
SLIDE 3

Introduction

MPC750_A SRAM SRAM_A REGISTERS BI-FIFO_A MPC750_B SRAM SRAM_B REGISTERS BI-FIFO_B MPC750_C SRAM SRAM_C REGISTERS BI-FIFO_C MPC750_D SRAM SRAM_D REGISTERS BI-FIFO_D zz xx zz xx zz xx zz xx (A) (B) (A) (B) CPU Bus A CPU Bus B CPU Bus C CPU Bus D

PCB

slide-4
SLIDE 4

Motivation and Previous Work (I)

CoreConnect (IBM): Processor Local Bus (PLB) On-chip Peripheral Bus (OPB) AMBA (ARM): Advanced High-performance Bus (AHB) Advanced Peripheral Bus (APB) Intellectual Propery (IP) IP1 IP2 IP3 PLB IP1 IP2 IP3 AHB

slide-5
SLIDE 5

Motivation and Previous Work (II)

Sonics uNetwork

TDMA arbitration IP reuse and integration

Whisbone architecture (Silicore)

  • ne bus for all

supports multiple masters

In terms of bus topology, uNetwork and Whisbone are similar to AMBA and CoreConnect

slide-6
SLIDE 6

Five Bus Architectures for 4 processor System (I)

Global Bus I Architecture (GBIA) Bi-FIFO Bus Architecture (BFBA)

MPC750_A SRAM SRAM_A REGISTERS BI-FIFO_A MPC750_B SRAM SRAM_B REGISTERS BI-FIFO_B MPC750_C SRAM SRAM_C REGISTERS BI-FIFO_C MPC750_D SRAM SRAM_D REGISTERS BI-FIFO_D zz xx zz xx zz xx zz xx xx (A) (B) (A) (B) CPU Bus A CPU Bus B CPU Bus C CPU Bus D

slide-7
SLIDE 7

Crossbar Switch Bus Architecture(CSBA) IBM CoreConnect Bus Architecture(CCBA) Global Bus II Architecture (GBIIA)

Five Bus Architectures for 4 processor System (II)

slide-8
SLIDE 8

Application Examples (I)

OFDM Transmitter

Block Diagram Data Format: 32 guard samples and 128 data samples Function Assignment

Reference: D. Kim and G. L. St über, ''Performance of Multiresolution OFDM on Frequency-selective Fading Channels,''

IEEE Transaction on Vehicular Technology, vol. 48, no. 5, pp. 1740-1746, September 1999.

A1 Pro_A Pro_B Pro_C Pro_D Time Compute Node

…..

B1 A2 B2 A3 C1 B3 A4 C2 D1 C3 D2 B4 C4 D3 D4

slide-9
SLIDE 9

Application Examples (II)

MPEG2 Decoder

Video Processing Example

16 x 16 pixel resolution, M= 1, N= 2

SH I P SH I P SH I P SH I P SH I P SH I P SH I P SH I P Pro_A Pro_B Pro_C Pro_D Time SH: Sequence header, I: Intra decoding frame, P: Predictive decoding frame Compute Node

…..

SH I P SH I P SH I P SH I P SH I P SH I P SH I P SH I P Pro_A Pro_B Pro_C Pro_D Time Compute Node

…..

( BFBA and GBIA ) ( GBIIA, CSBA, and CCBA )

slide-10
SLIDE 10

Experiment Environment

Co-simulation Environment

Seamless CVE

  • co-simulator from Mentor Graphics

VCS

  • A Verilog HDL simulator from Synopsys

XRAY

  • A High-level debugger from Mentor Graphics

PowerPC C cross compiler

  • GCC

External Clock of PowerPC 750

  • 83.33 MHz (the internal clock speed can be much faster,

e.g., 400MHz)

slide-11
SLIDE 11

Comparison in View of Algorithm and Architecture

Algorithm

OFDM Transmitter

  • Strong output-data dependency between functions using many local variables
  • Many short loops
  • Few global variables

MPEG2 Decoder

  • Many global variables for header information
  • Hierarchical data structure which has a long loop with many nested loops

Architecture

BFBA and GBIA

  • No method to access global data
  • Fast data transfer between processor blocks

GBIIA, CSBA, and CCBA

  • Efficient access of global data
slide-12
SLIDE 12

Comparison of Throughput

  • f the Bus Architectures (I)

OFDM Transmitter

1.02 1.04 1.06 1.08 1.1 1.12 1.14 BFBA GBIA GBIIA CSBA CCBA [Mbps]

1.1208Mbps 4.5682 ms 380,686

CCBA

1.1222Mbps 4.5624 ms 380,199

CSBA

1.1197Mbps 4.5727 ms 381,061

GBIIA

1.0588Mbps 4.8360 ms 403,000

GBIA

1.1277Mbps 4.5402 ms 378,348

BFBA

Throughput Exe. Time/ Packet Exe. Cycles/ Packet Bus Architecture

Reference: 128 data samples and 32 guard samples per packet

slide-13
SLIDE 13

Comparison of Throughput

  • f the Bus Architectures (II)

MPEG2 Decoder

0.1 0.2 0.3 0.4 0.5 0.6 0.7 BFBA GBIA GBIIA CSBA CCBA [Mbps]

0.6769Mbps 4.5382 ms 378,181

CCBA

0.6781Mbps 4.5306 ms 377,548

CSBA

0.6780Mbps 4.5307 ms 377,562

GBIIA

0.4852Mbps 6.3305 ms 527,545

GBIA

0.5041Mbps 6.0942 ms 507,853

BFBA

Throughput Exe. Time/ Packet Exe. Cycles/ Packet Bus Architecture

Reference: 128 data samples and 32 guard samples per packet

slide-14
SLIDE 14

Conclusion

Five bus architectures evaluated

  • BFBA, GBIA, GBIIA, CSBA, and CCBA

Two application programs

  • OFDM transmitter and MPEG2 decoder

Pipeline or parallel operation improves performance BFBA best for OFDM

  • pipelined applications

CSBA best for MPEG2

  • parallel applications

bus architecture performance heavily dependent on

  • distribution of computation load
  • algorithm style

Future work: combine the bus architectures with switching logic to maximize performance according to application characteristics