Development of a Functional Prototype of the Quad-Core NGMP Space - - PowerPoint PPT Presentation

development of a functional prototype of the quad core
SMART_READER_LITE
LIVE PREVIEW

Development of a Functional Prototype of the Quad-Core NGMP Space - - PowerPoint PPT Presentation

Development of a Functional Prototype of the Quad-Core NGMP Space Processor DASIA 2012 May 14 th , 2012 www.aeroflex.com/gaisler Contents Contents: NGMP project overview NGMP architecture NGFP functional prototype


slide-1
SLIDE 1

Development of a Functional Prototype of the Quad-Core NGMP Space Processor

DASIA 2012

May 14th, 2012

www.aeroflex.com/gaisler

slide-2
SLIDE 2

2

Contents

  • Contents:

– NGMP project overview – NGMP architecture – NGFP functional prototype overview – NGFP usage – Current status – Conclusions and remaining work

slide-3
SLIDE 3

3

NGMP Project Overview

  • NGMP is an ESA activity developing a multi-processor system with higher

performance than earlier generations of European Space processors

  • Part of the ESA roadmap for standard microprocessor components
  • Aeroflex Gaisler's assignment consists of specification, the architectural

(VHDL) design, and verification by simulation and on FPGA. The goal of this work is to produce a verified gate-level netlist for a suitable technology.

  • As an additional step in the development of the NGMP, a functional

prototype ASIC “NGFP” is being developed, also under ESA contract, which is presented here.

slide-4
SLIDE 4

4

NGMP Architecture Overview (1/2)

LEON4FT

GRFPC L1 Cache

LEON4FT

GRFPC L1 Cache GRFPU

LEON4FT

GRFPC L1 Cache

LEON4FT

GRFPC L1 Cache GRFPU

IOMMU

DMA Masters CPU bus 128-bit

Level 2 Cache

Scrubber

Memory controller

EDAC DDR2 SDRAM to low-speed slaves... Memory bus 128-bit to either DDR2 or PC133 SDRAM mem_ifsel

slide-5
SLIDE 5

5

NGMP Architecture Overview (2/2)

LEON4FT

GRFPC L1 Cache

LEON4FT

GRFPC L1 Cache GRFPU

LEON4FT

GRFPC L1 Cache

LEON4FT

GRFPC L1 Cache GRFPU

IOMMU

DMA Masters CPU bus 128-bit

Level 2 Cache

Scrubber

Memory controller

EDAC DDR2 SDRAM to low-speed slaves... Memory bus 128-bit to either DDR2 or PC133 SDRAM mem_ifsel

  • Quad-core Leon4
  • GRFPU, pairwise

shared

  • L1 and L2 Caches
  • Memory controller
slide-6
SLIDE 6

6

  • LEON4FT
  • IEEE-1754 SPARC V8 compliant 32-bit processor
  • 7-stage pipeline, multi-processor support
  • 64- or 128-bit AHB bus interface
  • Compare-and-swap (CASA) instruction support
  • 1.7 DMIPS/MHz, 0.6 Wheatstone MFLOPS/MHz
  • Estimated 0.35 SPECINT/MHz, 0.25 SPECFP/MHz
  • 2.1 CoreMark/MHz (comparable to ARM11)

– GRFPU

  • High-performance FPU integrated into LEON4 pipeline
  • Hardware DIV and SQRT
  • Floating-point controller (FPC) decouples FP operations from

pipeline, allowing FPU and CPU to work in parallel

  • Each FPU arbitrated between two FPC:s to save significant

area for a few percent performance reduction (no reduction at all if only one CPU uses the FPU)

NGMP Overview - LEON4FT and GRFPU

slide-7
SLIDE 7

7

  • Level 1 cache
  • Separate L1 integrated into each LEON4 core
  • Multi-set with configurable LRU/LRR/RND policy
  • Write-through operation
  • Bus snooping and physical tags to maintain coherency
  • Level 2 cache
  • Designed as a bridge in the bus topology
  • Highly configurable in caching behavior
  • Supports copy-back operation
  • Locked ways, allowing part or whole to be used as on-chip RAM

NGMP Overview – Caches (1/2)

slide-8
SLIDE 8

8

NGMP Overview – Caches (2/2)

LEON4FT

GRFPC L1 Cache

LEON4FT

GRFPC L1 Cache GRFPU

LEON4FT

GRFPC L1 Cache

LEON4FT

GRFPC L1 Cache GRFPU

IOMMU

DMA Masters CPU bus 128-bit

Level 2 Cache

Scrubber

Memory controller

EDAC DDR2 SDRAM to low-speed slaves... Memory bus 128-bit to either DDR2 or PC133 SDRAM mem_ifsel

L1 is write-through with bus snooping to maintain coherency L2 can be copy-back since no masters are behind. This prevents repeated writes due to L1 write-through from causing unnecessary memory accesses. Burst lengths are matched between memory and caches

slide-9
SLIDE 9

9

  • Memory controller
  • DDR2 or SDRAM, selected with bootstrap signal
  • Use same package pins for DDR2 and SDRAM interfaces
  • Full-width 64-bit or half-width 32-bit external data buses,

selected with bootstrap signal

  • Powerful interleaved 16/32+8-bit ECC giving 32 or 16

checkbits (SW selected, can be switched on the fly)

  • Scrubber
  • Fast initialization of memory and checkbits on bootup
  • Background scrubbing
  • Error reporting to CPU and statistics collection
  • Memory error handling (memory controller, scrubber, cpu together)
  • Rapid regeneration of contents after SEFI
  • Graceful degradation of failed byte lane, regaining SEU

tolerance

  • Some example code already in our RTEMS repository

NGMP Overview – Memory Controller

slide-10
SLIDE 10

10

NGMP Overview – I/O Interfaces

  • Large number of I/O interfaces:

– SpaceWire router – PCI Master/T arget with DMA – Gbit ethernet – MIL-STD-1553B – Uart, SPI, GPIO

  • Debug interfaces:

– Ethernet – USB – Spacewire (RMAP) – JTAG, Serial

slide-11
SLIDE 11

11

NGMP Overview – Fault Torerance

  • LEON4FT

– 4-bit parity on L1 cache – Protected register files (both CPU and FPU)

  • L2 Cache

– BCH protected memories – Built-in Scrubber

  • General

– Block RAM contents in IP cores protected by ECC – Rad-hard flip-flops and logic by process, library or TMR on netlist

slide-12
SLIDE 12

12

NGMP Overview - Interfaces

  • Resource partitioning

– The architecture has been designed to support both SMP, AMP and mixtures (example: 3 CPU:s running Linux SMP and one running RTEMS) – The L2 caches can be set to 1 way/CPU mode – IRQ:s can be masked/routed separately to each CPU allowing different schemes – The I/O core IP's register interfaces are located at separate 4K pages to allow (via MMU) restricting user-level software from accessing the wrong IP in case of software malfunction.

slide-13
SLIDE 13

13

NGMP Block Diagram

64-bit SDRAM DDR2-800/ SDR-PC100 L2 Cache PCI Master 128-bit AHB @ 400 MHz 32-bit AHB @ 400 MHz Processor bus Slave IO bus PROM & IO CTRL PROM IO 8/16-bit HSSL SPW USB DCL Memory Scrubber On-Chip SDRAM 128-bit AHB @ 400 MHz Memory bus DDR2 AND SDRAM CTRLs UART Timers GPIO DSU AHB Status JTAG FPU AHB/APB Bridge AHB/AHB Bridge PCI Target PCI DMA Ethernet AHB Bridge IOMMU AHB/AHB Bridge 32-bit AHB @ 400 MHz Master IO bus 32-bit AHB @ 400 MHz Debug bus 32-bit APB @ 400 MHz RMAP DCL AHB Status PCI Arbiter Ethernet SPW SPW SPW HSSL HSSL HSSL UART S S S S S S S S S S S S S M S M M M M M M M M M M M M S S S S

M = Master interface(s) S = Slave interface(s) X = Snoop interface

X X MX S X M S M S S S Caches MMU Timers IRQCTRL

LEON4FT

FPU MX Caches MMU Timers IRQCTRL

LEON4FT

FPU Caches MMU Timers IRQCTRL

LEON4FT

FPU Caches MMU Timers IRQCTRL

LEON4FT

IRQMP IRQCTRL1 IRQCTRL2 IRQCTRL3 IRQCTRL4 IRQSTAMP S S S S S S MX MX CLKGATE S AHBTRACE PCITRACE AHB/APB Bridge 32-bit APB @ 400 MHz LEON4

  • STAT. UNIT

S M M S S S

slide-14
SLIDE 14

14

NGFP Overview (1/4)

  • NGFP is a functional (not rad-hard) prototype of NGMP on a commercial

technology.

  • Purposes

– Keep NGMP development going despite target technology library not being available yet. – Create more representative and complete prototype than the earlier FPGA prototypes for user evaluation and benchmarking. – Learning and gaining experience – Reduce technical risk for final ASIC design

slide-15
SLIDE 15

15

NGFP Overview (2/4)

  • The technology chosen was eASIC:s Nextreme2 structured ASIC.

– Fixed sea of LUT:s and RAM blocks similar to FPGA but programmed by customizing a few layers of metal. – ASIC-like tool flow (DC synthesis, custom back-end tools) – Competitive cost – Low lead time (8 weeks) – Devices are factory tested (unlike MPW)

slide-16
SLIDE 16

16

NGFP Overview (3/4)

  • The full NGMP architecture is included in the prototype, with the

following exceptions: – Non-FT version of LEON4 and non-FT L2-cache to save resources

  • L2-cache is set to have same timing to give

representative benchmark results. – DDR2 and SDRAM on different pins (multi-IO pads not available)

  • Still, only one of them can be used at a time.

– Reduced L2 cache sizes – Lower clock rates

slide-17
SLIDE 17

17

NGFP Overview (4/4)

  • Clock speed limitations

– Main CPU clock limited to 150 MHz in worst case (slow process) – DDR2 clock limited to 300 MHz

  • One contributing factor is that all the RAM-blocks that are used for the L2

cache are distributed all over the chip, this will not happen in a regular ASIC process where RAM:s can be freely placed on the chip.

  • Some potential for improvement on the prototype:

– In typical process timing is significantly better – We will also use overclocking techniques such as raising core voltage to get as good frequency as possible.

slide-18
SLIDE 18

18

NGFP Status

  • Currently finishing tape-out process (May 18:th)
  • 8 weeks lead time expected on devices
  • PCB design finalized, to be manufactured
  • After receiving devices, PCB assembly and board bring-up will follow
  • ...then continue with verification and benchmarking
slide-19
SLIDE 19

19

NGFP Evaluation Board

Evaluation board providing interfaces of the NGFP device 6U CPCI form factor

slide-20
SLIDE 20

20

NGFP Conclusions

  • A functional prototype of the NGMP architecture has been designed and

taped out.

  • Worst-case process timing did not become as good as we had hoped, but

mitigation by overclocking should be possible.

  • Some parts of this development have been very time consuming, one

issue in particular has been interfacing the native DDR2 PHY resources

  • n the technology.
  • We have gained valuable experience that will definitely speed up further

development, once the target technology has been fixed.

  • Thank you for listening!