tyrion
play

TYRION A Hardware Accelerator for SVD Chae Jubb Ruchir Khaitan A - PowerPoint PPT Presentation

TYRION A Hardware Accelerator for SVD Chae Jubb Ruchir Khaitan A Singularly Valuable Decomposition Do you remember linear algebra? Neither do we SVD allows you to decompose a matrix A into its singular values and left and right


  1. TYRION A Hardware Accelerator for SVD Chae Jubb Ruchir Khaitan

  2. A Singularly Valuable Decomposition ● Do you remember linear algebra? Neither do we ● SVD allows you to decompose a matrix A into its singular values and left and right singular vectors

  3. What is it good for? ● Can make a low rank approximation A’ using only the first k singular values ● Has uses in machine learning, natural language processing, image compression, seismic tomography analysis, etc

  4. Image Compression ● Original (square) image requires n 2 storage space ● Using only k singular values requires (2n + k)k space ● Relatively small k provides good approximation

  5. Example k = 64 k = 128 k = 512

  6. 2-Sided Jacobi Algorithm ● Basic idea: we want a diagonal matrix, so we want all of the off-diagonal elements to be zero ● Multiply matrix A with 2x2 rotation matrix to make off-diagonal element at index i,j go away ● Keep doing that, and collect the rotation matrices into the left & right singular vectors

  7. Algorithm Pros and Cons: ● Easily parallelizable ● Rotation matrices since each “elimination” require trig functions depends only on that ● Trig functions mean we row and column can’t use integer data ● Converges in quadratic types time ● Requires conversion to ● An implementation fixed point existed online ● Online implementation wasn’t super great

  8. SystemC ● System level modeling provides a higher level of abstraction (think in terms of threads and logical transactions not digital circuits) ● Generates correct and fast Verilog ● Novel toolchain

  9. Architecture ● Defined a high level wrapper over the hardware (between the driver and actual hardware) ● Send data to/from hardware with buffered FIFOs (one 32 bit chunk at a time) ● Communication with device done with 4-way handshake

  10. Interface ● You put a matrix in (either one 32 bit integer at a time or memory mapped) and you get 3 matrices out. ● Doubles are convert to 64 bit fixed point numbers (40 bit fractional part)

  11. Testing ● Fully randomized testbench ● SystemC provides full simulation environment ● Cargo CAD tools

  12. DEMO!

  13. Challenges ● Setting up toolchain ● Dealing with communication between different Avalon protocols ● Bus woes

  14. Lessons Learned ● Ruchir: Hardware is hard whereas software is fun. Also, having good partners makes everything much better. Also, 620 CEPSR is a great room. ● Chae: Mixing toolchains is hard. Mixing IPs is hard. Mixing semantics is hard. Writing code is easy.

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend