fast forward
play

FAST FORWARD Bill Dally, Chief Scientist and SVP Research, NVIDIA - PowerPoint PPT Presentation

GRADUATE FELLOW FAST FORWARD Bill Dally, Chief Scientist and SVP Research, NVIDIA Thursday, May 11, 2017 GRADUATE FELLOWSHIP PROGRAM Funding for Ph.D. students revolutionizing disciplines with the GPU Engage: Build mindshare Facilitate


  1. GRADUATE FELLOW FAST FORWARD Bill Dally, Chief Scientist and SVP Research, NVIDIA Thursday, May 11, 2017

  2. GRADUATE FELLOWSHIP PROGRAM Funding for Ph.D. students revolutionizing disciplines with the GPU Engage: Build mindshare • Facilitate recruiting • Learn: • Keep a finger on the pulse of leading academic research • Keep up with all the applications that are powered by GPUs Leverage: Track relevant research • Help to guide researchers working on relevant problems • 2

  3. GRADUATE FELLOWSHIP PROGRAM 144 Graduate Fellowships awarded -- $3.8M since program inception in 2002 Eligibility/Application Process: • Ph.D. candidates in at least their 2 nd year • Nomination(s) by Professor(s)/Advisor • 1-2 page research proposal Selection Process: • Committee of NVIDIA scientists and engineers review applications • Applications evaluated for originality, potential, and relevance 3

  4. CURRENT 2016-2017 GRAD FELLOWS Saman Ashkiani, UC Davis Yong He, CMU Yatish Turakhia, Stanford Gang Wu, Univ of Sussex Minjie Wang, NYU Jiajun Wu, MIT NVIDIA Foundation Fellow 4

  5. CURRENT 2016-2017 GRAD FELLOW FINALISTS Ahmed Elkholy, University of Illinois at Urbana-Champaign • • Achuta Kadambi, Massachusetts Institute of Technology Caroline Trippel, Princeton • Yu-Hang Tang, Brown University • Ling-Qi Yan, University of California at Berkeley • 5

  6. 6 Talks AGENDA 3 Minutes each 6

  7. JIAJUN WU, MIT 7

  8. SINGLE IMAGE 3D INTERPRETER NETWORK Jiajun Wu, May 11, 2017

  9. GOAL 9

  10. 3D INTERPRETER NETWORK (3D-INN) 2D Keypoint Labels 2D Keypoint 3D 3D-to-2D Estimation Interpreter Projection Three-step training paradigm 10

  11. 3D INTERPRETER NETWORK (3D-INN) 2D Keypoint Estimation Three-step training paradigm I: 2D Keypoint Estimation 11

  12. 3D INTERPRETER NETWORK (3D-INN) 3D Interpreter Three-step training paradigm I: 2D Keypoint Estimation II: 3D Interpreter 12

  13. 3D INTERPRETER NETWORK (3D-INN) 2D Keypoint Labels 2D Keypoint 3D 3D-to-2D Estimation Interpreter Projection Three-step training paradigm I: 2D Keypoint Estimation II: 3D Interpreter III: End-to-end Finetuning 13

  14. 3D ESTIMATION: QUALITATIVE RESULTS Training: our Keypoint-5 dataset, 2K images per category 14

  15. 3D ESTIMATION: QUALITATIVE RESULTS Training: our Keypoint-5 dataset, 2K images per category Keypoint-5 dataset 15

  16. 3D ESTIMATION: QUALITATIVE RESULTS Training: our Keypoint-5 dataset, 2K images per category IKEA Dataset [Lim et al, ’13] 16

  17. 3D ESTIMATION: QUALITATIVE RESULTS SUN Training: our Keypoint-5 dataset, 2K images per category Input SUN Database [Xiao et al, ’11] After FT 17

  18. 3D ESTIMATION: QUALITATIVE RESULTS Training: our Keypoint-5 dataset, 2K images per category SUN Database [Xiao et al, ’11] 18

  19. CHAIR EMBEDDING Manifold of chairs based on their inferred viewpoint 19

  20. CONTRIBUTIONS OF 3D-INN Single image 3D perception Real 2D labels + synthetic 3D models, connected via keypoints A 3D-to-2D projection layer for end-to-end training 20

  21. YATISH TURAKHIA, STANFORD 22

  22. DARWIN: A GENOMICS CO-PROCESSOR Yatish Turakhia, 05/11/2017

  23. GENOME ANALYSIS PIPELINE Patient Reads Genome (3 Billion base pairs) 1 2 ATGTCGAT REFERENCE:--ATGTC G ATGATCCAGAGGATA C TAGGATAT- CGATACGA Read assembly GAGTCATC PATIENT: --ATGTC A ATGAT - CAGAGGATA T TAGGATAT- DNA sequencer (Sequence alignment) ACTGACGT Mutations Find the disease- 3 causing mutation • Long reads (>10Kbp) offer a better resolution of the mutation spectrum but have high error rate (15-40%) • >1,300 CPU hours for reference-guided assembly of noisy long reads • >15,600 CPU hours for de novo assembly of noisy long reads 24

  24. DARWIN: SEQUENCE ALIGNMENT FRAMEWORK D-SOFT GACT Darwin D-SOFT GACT Query (Q) 40nm ASIC Query (Q) (Seed) (Extend) 300mm 2 , 9W D-SOFT API GACT API Reference (R) Reference (R) Software High speed and programmability 1. D-SOFT: Tunable speed/sensitivity to match different error profiles 2. GACT: First algorithm with O(1) memory for compute-intensive step of alignment allowing arbitrarily long alignments in hardware – well-suited to long reads 3. First framework shown to accelerate reference-guided as well as de novo assembly of reads in hardware 25

  25. DARWIN: REFERENCE-GUIDED ASSEMBLY Reference genome ~3Gbp D-SOFT GACT Reads 1 st GACT tile Candidate alignment Extended GACT Seed hit ~10Kbp start locations (from D-SOFT) GACT tiles trace-back Score=7500 Score=60 Read Read ~10 6 Reference Reference 26

  26. DARWIN: DE NOVO ASSEMBLY Reads GACT D-SOFT 1 st GACT tile Candidate alignment Extended GACT Seed hit (from D-SOFT) GACT tiles start locations trace-back Score=2500 Inferred ... Read ... Read overlap . . ... ... ... ... . . . . Reference Reference 40-100X speedup 6000X speedup 1. Sequential accesses to multiple 1. 512 Processing Elements (PEs) solving 3 DRAM channels dynamic programming equations every cycle 2. Random accesses using large on- 2. Trace-back pointers maintained in on-chip chip memory (64MB) memory (2KB/PE) 27

  27. DARWIN PERFORMANCE Reference-guided assembly SENSITIVITY DARWIN READ TYPE ERROR RATE SPEEDUP SOFTWARE DARWIN Pacific Biosciences 15% 95.95% 99.91% 4,110X Oxford Nanopore 2D 30% 98.11% 98.40% 4,080X Oxford Nanopore 1D 40% 97.10% 97.40% 128X De novo assembly SENSITIVITY DARWIN READ TYPE ERROR RATE SPEEDUP SOFTWARE DARWIN Pacific Biosciences 15% 99.80% 99.89% 250X 28

  28. THANK YOU!

  29. SAMAN ASHKIANI, UC DAVIS 30

  30. DYNAMIC DATA STRUCTURES FOR THE GPU Saman Ashkiani, 05/11/2017

  31. DYNAMIC DATA STRUCTURES FOR THE GPU Objective: a general-purpose data structure that can be updated at runtime • • Supports updates (insert/deletion): batched or individual Efficient queries (lookup, count, range, etc.): batched or individual • • Motivation: more types of GPU data structures in programmer’s toolbox • It is a challenging task, because GPUs have thousands of parallel threads: need an efficient non-blocking data structure • • Most classic non-blocking ideas are hard to be efficiently implemented in SIMD fashion Efficient dynamic memory allocation is hard on GPUs • Safe memory reclamation: no dynamic memory management on GPUs • 32

  32. OUR IMPLEMENTATIONS GPU LSM CONCURRENT HASH TABLE Hash table with chaining Dictionary data structure: multiple sorted arrays with different sizes • Updates: concurrent insertion/deletion (497 M updates/s on K40) • Updates: batch insertion/deletion • Queries: lookup (860 M queries/s on K40) (average: 225 M updates/s on K40) • Each bucket: a warp friendly linked list Queries: lookup, count, and range • (133, 60, and 30 M queries/s) • Warp-synchronous programming to better fit SIMD model Based on radix sort, merge, and • binary search • Our own dynamic memory allocator for nodes • Paper draft: http://ece.ucdavis.edu/~ashkiani/ • Memory reclamation: safe removal of gpu_lsm.pdf deleted nodes, for future reuse 33

  33. GANG WU, UNIVERSITY OF SUSSEX 35

  34. HIGH-SPEED FLUORESCENCE LIFETIME IMAGING BASED ON ANN AND GPU Gang Wu, 11 th May 2017

  35. CONTENTS What is FLIM Project Aims FLIM Theories ANN-GPU-FLIM Results 37

  36. WHAT IS FLIM FLIM Fluorescence-lifetime imaging microscopy is an technique for producing an image based on the differences in the exponential decay rate of the fluorescence from a fluorescent sample. Gold nanorods Applications Surgery guidance Disease therapies Disease diagnosis 38

  37. PROJECT AIMS Current systems CPU based traditional FLIM analysis is very slow (tens of minutes for one image) Aims: high-speed FLIM analysis Fast algorithm ( Artificial neural network ) Highly paralleled hardware ( GPU ) 39

  38. FLIM THEORIES Laser Lifetime TCSPC analysis GPU This work Sample Detector CPU with GUI 40

  39. FLIM THEORIES 1.5 Photon counts 𝐵, 𝑔 𝐸 , 𝜐 𝐺 , 𝜐 𝐸 0.7 3 ns 0 2.2 Time bin Fluorescence decay histogram 41

  40. ANN-GPU-FLIM ANN-GPU-FLIM principle FLIM Images FLIM data Artificial Neural Network 60 Photon count … 30 … … … 0 1 2 3 … 165 Once network training 166 42

  41. RESULTS Accuracy performance Different optimized areas, comparable performance. Time performance SPEEDUP OVERALL ALGORITHMS IMAGE SIZE TIME-CPU (S) TIME-GPU (S) (GPU VS CPU) SPEEDUP ANN 0.89 0.1 8.9 256×256 415 LSM 41.5 3.8 10.8 43

  42. YONG HE, CMU 45

  43. EVOLVING SHADER COMPILATION FOR PERFORMANCE AND MAINTAINABILITY Yong He, May 2017

  44. EVOLVING SHADER COMPILERS Meeting Performance Goals with Productivity Constraints Modern games feature increasingly more realistic graphics A game’s shader library has grown 100x more complex Shading language is still the same as ten years ago, lack functionality for achieving high performance without compromising code modularity and extensibility 47

  45. AUTOMATIC APPROXIMATE SHADER COMPILATION Performance Productivity Fast Shader Fast Code Compilation A System for Rapid, Automatic Shader Level-of-Detail. Yong He, Tim Foley, Natalya Tatarchuk, Kayvon Fatahalian. SIGGRAPH Asia 2015 48

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend