Hardwa rdware re-acc acceler elerated ated CC CCD re D reado - - PowerPoint PPT Presentation

hardwa rdware re acc acceler elerated ated cc ccd re d
SMART_READER_LITE
LIVE PREVIEW

Hardwa rdware re-acc acceler elerated ated CC CCD re D reado - - PowerPoint PPT Presentation

Welcome Hardwa rdware re-acc acceler elerated ated CC CCD re D reado adout ut sm smear ar co correc rection tion for or Fa Fast st Sol olar ar Pol olarimete arimeter Stefan Tabel Walter Stechele and Korbinian Weikl Chair


slide-1
SLIDE 1

1

Hardwa rdware re-acc acceler elerated ated CC CCD re D reado adout ut sm smear ar co correc rection tion for

  • r Fa

Fast st Sol

  • lar

ar Pol

  • larimete

arimeter

IEEE E ASAP 2 P 2017

Monda day y July y 10th 0th, Sessi sion

  • n 3: Image

ge Process cessing ing

IEEE ASAP 2017 Stefan Tabel, MPG Semiconductor Laboratory

Welcome

Stefan Tabel and Korbinian Weikl Semiconductor Laboratory

  • f the Max Planck Society,

Munich, Germany Walter Stechele Chair for Integrated Systems, Technical University of Munich, Munich, Germany

slide-2
SLIDE 2

2

Related projects

IEEE ASAP 2017 Stefan Tabel, MPG Semiconductor Laboratory

Fast Solar Polarimeter (FSP) Full custom camera Solar ground-based observations 1m solar telescope SUNRISE On a stratosphere balloon Same image quality as satellites Lower costs Can we install FSP on SUNRISE? No, readout smear will hinder the post-facto correction of image jitter. An online correction can solve this problem…

slide-3
SLIDE 3

3

Readout smear models for the FSP camera

IEEE ASAP 2017 Stefan Tabel, MPG Semiconductor Laboratory

 Ground-based observations  Constant scene  4 polarization states  Circularly appearing images  Accumulation and inversion

S: smeared column Y: unsmeared column k: time index

 General solution for corrected image column  Not constant scene  For a jittered balloon flight  2 x 1024 half-columns  512 pixel/half-column  400 images/second  1 hour burst length  How to compute?

δ: relative transfer-time α : relative switching-time

slide-4
SLIDE 4

4

Optimization of the algorithm

IEEE ASAP 2017 Stefan Tabel, MPG Semiconductor Laboratory

 Quadratic complexity  Undefined length of series  Convergent  Correction via n successors only  Approximation with fake assumption of periodicity  Matrix becomes circulant  The inverse of a circulant matrix is circulant  Matrix-vector multiplication with a circulant matrix is a convolution  A block of a circulant matrix is of Toeplitz type  Each Toeplitz matrix can be extended to a circulant matrix

1) 4) 5) 3) 2)

slide-5
SLIDE 5

5

Design space exploration 1

IEEE ASAP 2017 Stefan Tabel, MPG Semiconductor Laboratory

 Study and single unit => no ASIC  FPGA instead of CPU / GPGPU:

 Power dissipation in the stratosphere  10G Ethernet peripherals on-chip  No need for hosts

=> Focus on Xilinx FFT cores:  Uint16 image data should be transformed using a 31 bit fixed-point transform  The correction needs to be done in single precision floating point  Choose a mixed-model with n є [4:6] Twiddle factor width is 24 bit <=

slide-6
SLIDE 6

6

Design space exploration 2

IEEE ASAP 2017 Stefan Tabel, MPG Semiconductor Laboratory

 NetFPGA SUME offers QDR II+ SRAM  6.7 gbps Ethernet stream  209 M samples per hemisphere  Requirements:  Rotation of the image  Parallelization  FFT, multiplication, IFFT  Degrees of freedom  DDR3 vs. QDR II+ => simple design for feasibility study targeting a single unit camera  Sequential vs. parallel algorithm => parallel version is always fast, slightly more expensive in logic, can be built in before the RAM, and can be easily configured to different depths of correction  Order of RAM and FFT => FFT before RAM would increase memory costs  Tasks  Use one RAM-module per hemisphere, rotate image during write access  Readout of parallel image-data  Parallel fixed-point FFT  Cast to single precision floating point, multiply with constants, cast to fixed-point, IFFT  Interface 10Gig Ethernet

slide-7
SLIDE 7

7

Memory and logic design

IEEE ASAP 2017 Stefan Tabel, MPG Semiconductor Laboratory

 209 M pix./sec. @ 225 MHz write  Write single pixel for image rotation  Row index @ LSB for column access during read burst  Each word serves as ring buffer for image bursts  A crossbar is necessary at read side  n times higher throughput @ read  n parallel and synchronous inputs  Correction values are constant (ROM)  Synchronous calculations  Higher throughput than in stream  FFT modules are extended with typecasts  One FFT module transforms 2 signals

slide-8
SLIDE 8

8

Parallelization

IEEE ASAP 2017 Stefan Tabel, MPG Semiconductor Laboratory

1 stream  Throughput and capacity require one RAM per hemisphere  Parallel algorithm forces temporal multiplexing on two logic pipelines per RAM (zero insertion)  Sequential variant can be built with lower logic resources at the costs of RAM  Twice the clock-rate at 2 pipelines did not meet timing constraints  No buffers at the memory interfaces, straight forward stream 2 RAMs 4 pipelines 1 sensor

slide-9
SLIDE 9

9

Results and tests

IEEE ASAP 2017 Stefan Tabel, MPG Semiconductor Laboratory

 Implementation: n = 4  SRAM not included in table  Correction for n=4 => max. error = 2 (Uint16)  Correction for n=6 => max. error = 1 (Uint16)  Cutoff due to noise => 3 bit in Uint16  Model-based, co-design with camera  Separate throughput test, later testing  Readout smear is a convolution  Stepwise correction removes copies of the image  FPGA module allows to use the FSP camera on the SUNRISE balloon mission

slide-10
SLIDE 10

10

That`s it!

IEEE ASAP 2017 Stefan Tabel, MPG Semiconductor Laboratory

Thank you very much for your interest! Your questions, please.