High Capability Multidimensional Data High Capability - PowerPoint PPT Presentation

S5455, GTC 2015 High Capability Multidimensional Data High Capability Multidimensional Data Compression on GPUs Sergio E. Zarantonello szarantonello@scu.edu Ed Karrels ed.karrels@gmail.com School of Engineering, Santa Clara University

Part 1: Theory and Applications Challenge g • Massive amounts of multidimensional data being generated by scientific simulations, monitoring devices, and high-end imaging applications. li ti • Growing inability of current networks, conventional computer hardware and software, to transmit, store, and analyze this data. , , , y Solution • • Fast and effective lossy data compression Fast and effective lossy data compression. • Optimized compression ratios subject to a priori set error bounds, requiring several iterations of compress/decompress cycle. • GPUs to make the above feasible.

Part 1: Theory and Applications Our goal g • A multidimensional wavelet-based CODEC for large data. • A discrete optimization procedure to provide best compression ratios p p p p subject to error tolerances and error metrics specified by user • A high performance CUDA implementation for large data, exploiting parallelism at various levels parallelism at various levels. • A flexible design allowing for future enhancements (redundant bases, adaptive dictionaries, compressive sensing, sparse representations, etc.) • An initial focus on Medical Computed Tomography, Seismic Imaging, and Non-Destructive Testing of Materials and Non-Destructive Testing of Materials.

Part 1: Theory and Applications Why wavelets ? y • Wavelets are “short” waves “localized” in both, spatial and frequency domains. p q y • Can be used as basis functions for sparse representations of data. • Give compact representations of well-behaved data and point singularities. • Multidimensional wavelets take advantage of data correlation along Multidimensional wavelets take advantage of data correlation along all coordinate axes. • Wavelet encoding/decoding can be implemented with fast algorithms . algorithms

Part 1: Theory and Applications Conventional 2d procedure p

Part 1: Theory and Applications Design g • Data decomposed into overlapping cubelets. • Cubelets encoded via biorthogonal wavelet filters along each g g coordinate axis. • Wavelet coefficients are thresholded, then quantized. • Quantized cubelets are Huffman encoded. • Process is “reversed” to reconstruct the data. • “Hill Climbing” algorithm is implemented to deliver highest compression possible subject to error constraint(s).

Part 1: Theory and Applications 2d (frame ‐ by ‐ frame) versus 3d procedure ( y ) p 6 6 Error X-Ray CT scan 5 10 steps of the cardiac cycle y 512 x 512 x 96 cube 4 http://www.osirix-viewer.com 3 2 1 2d procedure 3d procedure Same error rate: PSNR=46 0 Compression Ratio=6.6 Compression Ratio=10.2 Cutoff=88% Bins=1106 Cutoff=92% Bins=850 Max Error= 9.45 Max Error = 5.68

Part 1: Theory and Applications Outline of 3d procedure p y y z z x x x y z

Part 1: Theory and Applications Optimized compression for given error tolerance p p g 1. Calculate wavelet coefficients 1 C l l t l t ffi i t Error Check 2. Find starting compression parameters 3. Calculate reconstruction error 4. Calculate compression Ratio 4. Calculate compression Ratio 5. “Hill Climbing” iterations to find maximum compression ratio subject to error tolerance

Part 1: Theory and Applications Optimized compression for given error tolerance p p g 2000 56 1800 14 54 1600 12 52 10 1400 50 8 1200 48 6 1000 46 46 4 4 800 76 78 80 82 84 86 88 90 92 2 44 95 2500 % Cutoff 2000 42 95 90 1500 85 80 1000 75 500 500 70 70 65 Begin End 0 60 Cutoff 76 % 92 % Bins 1850 1850 850 850 PSNR 52.4 46.2 Ratio 3.6 X 10.2 X

Part 1: Theory and Applications Applications: Optical Coherence Tomography pp p g p y Objective: efficient transfer over the internet of Dataset courtesy of Quinglin Wang Carl Zeiss Meditec Inc . high-resolution 3d images of retina for diagnosis. 250 100 0 200 200 0 150 300 0 100 00 400 0 50 500 0 600 0 0 100 200 300 400 500 100 200 300 400 500

Part 1: Theory and Applications Applications: Reverse Time Migration pp g -3 x 10 3 50 2 2 100 1 150 200 0 250 -1 300 0 0 -2 350 0 0 400 -3 0 100 200 300 400 500 600 700 800 0 0 0

Part 2: Implementation p • Stages of compression 3D CT X ‐ Ray Inspection of a carburetor Dataset courtesy of of a carburetor. Dataset courtesy of – Wavelet transform North Star Imaging – Threshold – Threshold – Quantization – Huffman coding • Overall speedup p p

Part 2: Implementation Wavelet Transform on GPU • Apply convolution • Each row is independent • Within each row, multiple read / write passes Before … After odds evens • 1 row == 1 thread block • Synchronize between read & write

Part 2: Implementation 3d Wavelet Transform Thread block 0,2 , Thread block 0,1 Thread block 0,0 Thread block 1,0 Thread block 2,0 Thread block 3,0 Height × depth rows, each one is an independent thread block.

Part 2: Implementation Transform Along Each Axis g X Transpose XYZ → YZX Y Transpose YZX → ZXY Z

Part 2: Implementation GPU Transpose p • Access global memory in contiguous order write write read read Contiguous (thread indices) 0 1 2 3 0 1 2 3 write to global 4 5 6 7 4 5 6 7 memory memory 8 8 9 9 10 10 11 11 8 8 9 9 10 10 11 11 Global 12 13 14 15 12 13 14 15 memory write write Sh Shared d read Noncontiguous 0 1 2 3 0 4 8 12 memory read from shared 4 5 6 7 1 5 9 13 memory memory 8 8 9 9 10 10 11 11 2 2 6 6 10 10 14 14 12 13 14 15 3 7 11 15

Part 2: Implementation Optimizations p Version 1: Global memory Version 2: Shared memory • 2.5× speedup p p Version 3: Constant factors double → float • 1 6× speedup 1.6× speedup #define FILTER_0 .852 _ #define FILTER_1 .377 #define FILTER_2 ‐ .110 Speedup over CPU version: 105x #define FILTER_0 .852f (860ms → 8.2ms for 256x256x256 cubelet, #define FILTER_1 .377f #define FILTER_2 ‐ .110f 8 levels of transforms along each axis)

Part 2: Implementation Threshold • Trim smallest x% of values – round to 0 [ ‐ 5 .. +5] [ 5 .. 5] • Just sort absolute values using Thrust library Just sort absolute values using Thrust library • Speedup over CPU sort: 112× (7.0 toolkit is 35% faster than 6.5)

Part 2: Implementation Quantization Q Map floating point values to small set of integers • Log : bin size near x proportional to x – Matches data distribution well – Simple function; fast • Lloyd's algorithm • Lloyd s algorithm – Given starting bins, fine ‐ tune to minimize overall error – Start with log quantization bins g q – Multiple passes over full data set, time ‐ consuming

Part 2: Implementation Log / Lloyd Quantization g / y Q Log quantization, pSNR 38.269 GPU speedup: 97× (thrust::transform()) Lloyd quantization, pSNR 45.974 GPU speedup: create 13×, apply 48×

Part 2: Implementation Huffman Encoding g • Optimal bit encoding based on value frequencies • Compute histogram on CPU Value Count Encoding 9 16609445 1 – Copy data GPU → CPU: 17ms 8 8 46198 46198 011 011 – Compute on CPU: 27ms 10 42896 001 • Compute histogram on GPU 11 32594 000 Compute histogram on GPU 7 30831 0101 – No copy needed 12 6942 01000 – Compute: .61ms 6 5388 010011 Compute: .61ms – Optimization: per ‐ thread counter for common value

Part 2: Implementation Overall CPU → GPU speedup p p Wavelet transform CPU Sort Quantize GPU Histogram 0 500 1000 1500 2000 2500 Milliseconds GPU GPU: GTX 980 0 0 5 5 10 10 15 15 20 20 25 25 CPU: Intel Core i7 3.5GHz CPU I l C i7 3 5GH MATLAB CPU GPU Compress 43000 2300 21 Error control 39000 1400 18 time to process 256 3 cubelet, in milliseconds

Part 2: Implementation Future Directions • Improve performance – Use subsample for training Lloyd's – Use Quickselect to find threshold value – Multiple GPUs • Improve accuracy p y – Weighted values in Lloyd's algorithm – Normalize values in each quadrant q

Our Team Ed Karrels Drazen Fabris Sergio Zarantonello Santa Clara University, Santa Clara University Santa Clara University ed.karrels@gmail.com dfabris@scu.edu szarantonello@scu.edu szarantonello@scu edu David Concha Bonnie Smithson Anupam Goyal Universidad Rey Juan Carlos, Spain Santa Clara University Algorithmica LLC david.concha@urjc.es Bonnie@DejaThoris.com anupam@rithmica.com

High Capability Multidimensional Data High Capability - PowerPoint PPT Presentation

S5455, GTC 2015 High Capability Multidimensional Data High Capability Multidimensional Data Compression on GPUs Sergio E. Zarantonello szarantonello@scu.edu Ed Karrels ed.karrels@gmail.com School of Engineering, Santa Clara University Part 1:

EE 355 Unit 5 Multidimensional Arrays Mark Redekopp 2 MULTIDIMENSIONAL ARRAYS 3

Hierarchical Multidimensional Modelling Hierarchical Multidimensional Modelling in the Concept-

Multidimensional SAR SAR Imaging Imaging: : Multidimensional Studies in the in the Framework

Multidimensional Scaling Applied Multivariate Statistics Spring 2012 Outline Fundamental

Chapter 9 Multidimensional Arrays and the ArrayList Class Topics Declaring and

Multidimensional Scaling Applied Multivariate Statistics Spring 2013 Outline Fundamental

Multidimensional Quasi-Cyclic and Convolutional Codes Buket Ozkaya joint work with Cem G

Multidimensional Scaling MAT 6480W / STT 6705V Guy Wolf guy.wolf@umontreal.ca Universit e de

Shaken but Not Stirred: an Example of Subject Classification using Multidimensional Scaling

Combined multidimensional poverty measurement. The Mexican experience Julio Boltvinik El Colegio

Methodology for th the multidimensional ch characterization of th the Quality of Employment in

Overview Multidimensional Databases Cubes: Dimensions, Facts, Measures OLAP queries

On the K onigHall theorem for multidimensional matrices Anna Taranenko Sobolev Institute

Poverty in Canada: Unidimensional and Multidimensional Measures Presented by: Lori J Curtis, PhD

Multidimensional quadrilateral lattices with the values in Grassmann manifold are integrable

Biophysical Chemistry: NMR Spectroscopy Relaxation & Multidimensional Spectrocopy Lieven Buts

Image coding/compression EE5364 DSP Project Pradeep Suthram, David Hemmert, Tammo Heeren All

Intelligent Lighting in a CREST Living Lab Hdi HAMDI 23/03/2015 Intelligent buildings Contenu

Air Cooled Inverter Chiller Technical Training Technical Training Technical Training Air Cooled

= C zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA which tail bits are used must be set

Lecture 5 Lossless Coding (II) May 20, 2009 Shujun LI ( ): INF-10845-20091 Multimedia

Secure Data Management: Foundations, Systems and Applications My Journey 1985-Present Dr.

CODES FOR ALL SEASONS Emina Soljanin, Bell Labs IN THE CLOUD? CODES Emina @ Bell Labs Codes at

An Examination of Routing Algorithms for Parallel Computing Environments By James Kurtz Luis

High Capability Multidimensional Data High Capability - PowerPoint PPT Presentation

S5455, GTC 2015 High Capability Multidimensional Data High Capability Multidimensional Data Compression on GPUs Sergio E. Zarantonello szarantonello@scu.edu Ed Karrels ed.karrels@gmail.com School of Engineering, Santa Clara University Part 1:

EE 355 Unit 5 Multidimensional Arrays Mark Redekopp 2 MULTIDIMENSIONAL ARRAYS 3

Hierarchical Multidimensional Modelling Hierarchical Multidimensional Modelling in the Concept-

Multidimensional SAR SAR Imaging Imaging: : Multidimensional Studies in the in the Framework

Multidimensional Scaling Applied Multivariate Statistics Spring 2012 Outline Fundamental

Chapter 9 Multidimensional Arrays and the ArrayList Class Topics Declaring and

Multidimensional Scaling Applied Multivariate Statistics Spring 2013 Outline Fundamental

Multidimensional Quasi-Cyclic and Convolutional Codes Buket Ozkaya joint work with Cem G

Multidimensional Scaling MAT 6480W / STT 6705V Guy Wolf guy.wolf@umontreal.ca Universit e de

Shaken but Not Stirred: an Example of Subject Classification using Multidimensional Scaling

Combined multidimensional poverty measurement. The Mexican experience Julio Boltvinik El Colegio

Methodology for th the multidimensional ch characterization of th the Quality of Employment in

Overview Multidimensional Databases Cubes: Dimensions, Facts, Measures OLAP queries

On the K onigHall theorem for multidimensional matrices Anna Taranenko Sobolev Institute

Poverty in Canada: Unidimensional and Multidimensional Measures Presented by: Lori J Curtis, PhD

Multidimensional quadrilateral lattices with the values in Grassmann manifold are integrable

Biophysical Chemistry: NMR Spectroscopy Relaxation &amp; Multidimensional Spectrocopy Lieven Buts

Image coding/compression EE5364 DSP Project Pradeep Suthram, David Hemmert, Tammo Heeren All

Intelligent Lighting in a CREST Living Lab Hdi HAMDI 23/03/2015 Intelligent buildings Contenu

Air Cooled Inverter Chiller Technical Training Technical Training Technical Training Air Cooled

= C zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA which tail bits are used must be set

Lecture 5 Lossless Coding (II) May 20, 2009 Shujun LI ( ): INF-10845-20091 Multimedia

Secure Data Management: Foundations, Systems and Applications My Journey 1985-Present Dr.

CODES FOR ALL SEASONS Emina Soljanin, Bell Labs IN THE CLOUD? CODES Emina @ Bell Labs Codes at

An Examination of Routing Algorithms for Parallel Computing Environments By James Kurtz Luis

Biophysical Chemistry: NMR Spectroscopy Relaxation & Multidimensional Spectrocopy Lieven Buts