hardwa rdware re acc acceler elerated ated cc ccd re d
play

Hardwa rdware re-acc acceler elerated ated CC CCD re D reado - PowerPoint PPT Presentation

Welcome Hardwa rdware re-acc acceler elerated ated CC CCD re D reado adout ut sm smear ar co correc rection tion for or Fa Fast st Sol olar ar Pol olarimete arimeter Stefan Tabel Walter Stechele and Korbinian Weikl Chair


  1. Welcome Hardwa rdware re-acc acceler elerated ated CC CCD re D reado adout ut sm smear ar co correc rection tion for or Fa Fast st Sol olar ar Pol olarimete arimeter Stefan Tabel Walter Stechele and Korbinian Weikl Chair for Integrated Systems, Semiconductor Laboratory Technical University of Munich, of the Max Planck Society, Munich, Germany Munich, Germany IEEE E ASAP 2 P 2017 Monda day y July y 10th 0th, Sessi sion on 3: Image ge Process cessing ing 1 IEEE ASAP 2017 Stefan Tabel, MPG Semiconductor Laboratory

  2. Related projects Fast Solar Polarimeter (FSP) 1m solar telescope SUNRISE Full custom camera On a stratosphere balloon Solar ground-based observations Same image quality as satellites Lower costs Can we install FSP on SUNRISE? No, readout smear will hinder the post-facto correction of image jitter. An online correction can solve this problem… 2 IEEE ASAP 2017 Stefan Tabel, MPG Semiconductor Laboratory

  3. Readout smear models for the FSP camera δ : relative transfer-time S: smeared column α : relative switching-time Y: unsmeared column k: time index  2 x 1024 half-columns  512 pixel/half-column  400 images/second  Ground-based observations  1 hour burst length  General solution for corrected  Constant scene image column  4 polarization states  Not constant scene  How to compute?  Circularly appearing images  For a jittered balloon flight  Accumulation and inversion 3 IEEE ASAP 2017 Stefan Tabel, MPG Semiconductor Laboratory

  4. Optimization of the algorithm  Quadratic complexity 1)  Undefined length of series  Convergent 2)  Correction via n successors only  Approximation with fake 3) assumption of periodicity  Matrix becomes circulant  The inverse of a circulant matrix is circulant 4)  Matrix-vector multiplication with a circulant matrix is a convolution  A block of a circulant matrix is of Toeplitz type 5)  Each Toeplitz matrix can be extended to a circulant matrix 4 IEEE ASAP 2017 Stefan Tabel, MPG Semiconductor Laboratory

  5. Design space exploration 1  Study and single unit => no ASIC  FPGA instead of CPU / GPGPU: Power dissipation in the stratosphere  10G Ethernet peripherals on-chip  No need for hosts  => Focus on Xilinx FFT cores:  The correction needs to be done in single precision floating point  Choose a mixed-model with n є [4:6]  Uint16 image data should be transformed using a 31 bit fixed-point transform Twiddle factor width is 24 bit <= 5 IEEE ASAP 2017 Stefan Tabel, MPG Semiconductor Laboratory

  6. Design space exploration 2  NetFPGA SUME offers QDR II+ SRAM  6.7 gbps Ethernet stream  209 M samples per hemisphere  Requirements: Rotation of the image  Parallelization  FFT, multiplication, IFFT   Degrees of freedom DDR3 vs. QDR II+ => simple design for feasibility study targeting a single unit camera  Sequential vs. parallel algorithm => parallel version is always fast, slightly more expensive in  logic, can be built in before the RAM, and can be easily configured to different depths of correction Order of RAM and FFT => FFT before RAM would increase memory costs   Tasks Use one RAM-module per hemisphere, rotate image during write access  Readout of parallel image-data  Parallel fixed-point FFT  Cast to single precision floating point, multiply with constants, cast to fixed-point, IFFT  Interface 10Gig Ethernet  6 IEEE ASAP 2017 Stefan Tabel, MPG Semiconductor Laboratory

  7. Memory and logic design  209 M pix./sec. @ 225 MHz write  Write single pixel for image rotation  Row index @ LSB for column access during read burst  Each word serves as ring buffer for image bursts  A crossbar is necessary at read side  n times higher throughput @ read  n parallel and synchronous inputs  Correction values are constant (ROM)  Synchronous calculations  Higher throughput than in stream  FFT modules are extended with typecasts  One FFT module transforms 2 signals 7 IEEE ASAP 2017 Stefan Tabel, MPG Semiconductor Laboratory

  8. Parallelization 1 sensor 2 RAMs 4 pipelines 1 stream  Throughput and capacity require one RAM per hemisphere  Parallel algorithm forces temporal multiplexing on two logic pipelines per RAM (zero insertion)  Sequential variant can be built with lower logic resources at the costs of RAM  Twice the clock-rate at 2 pipelines did not meet timing constraints  No buffers at the memory interfaces, straight forward stream 8 IEEE ASAP 2017 Stefan Tabel, MPG Semiconductor Laboratory

  9. Results and tests  Implementation: n = 4  SRAM not included in table  Correction for n=4 => max. error = 2 (Uint16)  Correction for n=6 => max. error = 1 (Uint16)  Cutoff due to noise => 3 bit in Uint16  Model-based, co-design with camera  Separate throughput test, later testing  Readout smear is a convolution  Stepwise correction removes copies of the image  FPGA module allows to use the FSP camera on the SUNRISE balloon mission 9 IEEE ASAP 2017 Stefan Tabel, MPG Semiconductor Laboratory

  10. That`s it! Thank you very much for your interest! Your questions, please. 10 IEEE ASAP 2017 Stefan Tabel, MPG Semiconductor Laboratory

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend