parallel computing opportunities and challenges
play

Parallel Computing: Opportunities and Challenges Victor Lee Parallel - PowerPoint PPT Presentation

Parallel Computing: Opportunities and Challenges Victor Lee Parallel Computing Lab (PCL), Intel Parallel Computing Lab (PCL), Intel Who We Are: Parallel Computing Lab Parallel Computing Research to Realization Worldwide leadership in


  1. Parallel Computing: Opportunities and Challenges Victor Lee Parallel Computing Lab (PCL), Intel Parallel Computing Lab (PCL), Intel

  2. Who We Are: Parallel Computing Lab Parallel Computing ‐‐ Research to Realization • Worldwide leadership in throughput/parallel computing, industry role ‐ model for application ‐ driven – architecture research, ensuring Intel leadership for this application segment Dual Charter: – Application ‐ driven architecture research and multicore/manycore product ‐ intercept opportunities A li ti d i hit t h d lti / d t i t t t iti • Workload focus: • Multimodal real ‐ time physical simulation, Behavioral simulation, Interventional medical – imaging, Large ‐ scale optimization (FSI), Massive data computing, non ‐ numeric computing Industry and academic co ‐ travelers • Mayo, HPI, CERN, Stanford (Prof. Fedkiw), UNC (Prof. Manocha), Columbia (Prof. Broadie) – Architectural focus: • “Feeding the beast” (memory) challenge unstructured accesses domain specific support Feeding the beast (memory) challenge, unstructured accesses, domain ‐ specific support, – massively threaded machines Recent accomplishments: • First TFlop SGEMM and highest performing SparseMVM on KNF silicon demo’ed at SC’09 • Fastest LU/Linpack demo on KNF at ISC’10 Fastest LU/Linpack demo on KNF at ISC’10 • • Fastest search, sort, and relational join – Best Paper Award for Tree Search at SIGMOD 2010 • Victor.W.Lee@intel.com 2

  3. Motivations Motivations • Exponential growth of digital devices Exponential growth of digital devices – Explosion of the amount of digital data Victor.W.Lee@intel.com 3

  4. Motivations Motivations • Exponential growth of digital devices Exponential growth of digital devices – Explosion of the amount of digital data • Popularity of World ‐ Wide ‐ Web – Changing the demographics of computer users Victor.W.Lee@intel.com 4

  5. Motivations Motivations • Exponential growth of digital devices Exponential growth of digital devices – Explosion of the amount of digital data • Popularity of World ‐ Wide ‐ Web – Changing the demographics of computer users • Limited frequency scaling for single core – Performance improvement via increasing core count Victor.W.Lee@intel.com 5

  6. What these lead to What these lead to Massive data needs massive computing to process Birth of multi ‐ /many ‐ core architecture Birth of multi ‐ /many ‐ core architecture Parallel computing Victor.W.Lee@intel.com 6

  7. The Opportunities The Opportunities What parallel computing can do for us? can do for us?

  8. Semantic Barrier Semantic Barrier Evaluation Gap Norman’s Gulf Computer’s Simulated Computer s Simulated H Human’s Conceptual Model ’ C t l M d l Model Execution Gap • Lower semantic barrier => Make computers solve problems the human way => Makes it easier for human to bl h h k i i f h use computers Victor.W.Lee@intel.com 8

  9. Model Driven Analytics Model Driven Analytics • Data ‐ driven models are now tractable and usable – We are not limited to analytical models any more We are not limited to analytical models any more – No need to rely on heuristics alone for unknown models – Massive data offers new algorithmic opportunities g pp • Many traditional compute problems worth revisiting • Web connectivity significantly speeds up model ‐ training training • Real ‐ time connectivity enables continuous model refinement – Poor model is an acceptable starting point – Classification accuracy improves over time Victor.W.Lee@intel.com 9

  10. Interactive RMS Loop Interactive RMS Loop Recognition Mining Synthesis Is it …? What is …? What if …? Create a new Find an existing Model model instance model instance ode s a ce M Most RMS apps are about enabling interactive (real-time) RMS Loop ( Most RMS apps are about enabling interactive (real Most RMS apps are about enabling interactive (real-time) RMS Loop ( Most RMS apps are about enabling interactive (real M M M t RMS t RMS t RMS t RMS b b b b t t t t bli bli bli bli i t i t i t i t ti ti ti ti ( ( ( ( l l ti l l ti ti time) RMS Loop (iRMS time) RMS Loop (iRMS ti ) RMS L ) RMS L ) RMS L ) RMS L ( (iRMS (iRMS ( iRMS) iRMS) iRMS) iRMS) Feb 7 , 2 0 0 7 1 0 Pradeep K. Dubey pradeep.dubey@intel.com 10 Victor.W.Lee@intel.com 10

  11. RMS Example: Future Medicine Recognition Mining Synthesis What is a tumor? Is there a tumor here? What if the tumor progresses? Images courtesy: http://splweb.bwh.harvard.edu:8000/pages/images_movies.html I h // l b b h h d d 8000/ /i i h l It is all about dealing efficiently with complex multimodal datasets It is all about dealing efficiently with complex multimodal datasets

  12. RMS Example: Future Entertainment Recognition Mining Synthesis Who are Shrek, Fiona, What if Shrek were to reach When does Shrek first meet When does Shrek first meet late? What if Fiona didn’t and Prince Charming? Fiona’s parents? believe Prince Charming? What is the story ‐ net? Tomorrow’s interactions and collaborations: Interactive story ‐ nets, multi ‐ party real ‐ time Tomorrow’s interactions and collaborations: Interactive story ‐ nets, multi ‐ party real ‐ time Tomorrow s interactions and collaborations: Interactive story nets, multi party real time Tomorrow s interactions and collaborations: Interactive story nets, multi party real time collaboration in movies, games and strategy simulations collaboration in movies, games and strategy simulations

  13. Opportunities (Summary) Opportunities (Summary) • More data More data – Model ‐ driven analytics • More computing – Interactive RMS loops • Lower computing barrier – Computer easier to use for the mass Computer easier to use for the mass Victor.W.Lee@intel.com 13

  14. The Challenges The Challenges Why Parallel Computing is hard?

  15. Multi ‐ Core / Many ‐ Core Era Multi Core / Many Core Era Multi ‐ Core Single Core Many ‐ Core Multi ‐ core / many ‐ core provides more compute capability with the same area / power p p y / p 4/21/2011 Intel Confidential 15

  16. Architecture Trends Architecture Trends • Rapidly Increasing Compute – Core Scaling (Nhm (4 ‐ cores)  Wsm (6 ‐ cores)  …  Intel Knights Ferry (32 ‐ cores) …) – Data ‐ Level Parallelism (SIMD) Scaling • SSE (128 ‐ bits)  AVX (256 ‐ bits)  …  LRBNI(512 ‐ bits)  … )  )   )  ( b ( b ( b • Increasing Memory Bandwidth, But… – Not keeping pace with compute increase. p g p p – Used to be 1 ‐ byte/flop – Current: Wsm ( 0.21 bytes/flop ); AMD Magny Cours: (0.20 bytes/flop ); NVIDIA GTX 480 ( 0.13 bytes/flop ) – Future: 0.05 bytes/flop (GPUs, 2017) (ref: Bill Dally, SC’09) O One clear trend: More cores in processors l t d M i Victor.W.Lee@intel.com 16

  17. Architecture Trend Architecture Trend Intel Core i7 990X Intel KNF (a.k.a. Westmere) Sockets 2 1 Cores/socket 6 32 Core Frequency (GHz) 3.3 1.2 SIMD Width 4 16 Peak Compute Peak Compute 316 GFLOPS 316 GFLOPS 1 228 GFLOPS 1,228 GFLOPS Increase in compute comes from more cores and wider SIMD Implication: Need to start programming for Parallel Architecture Victor.W.Lee@intel.com 17

  18. Parallel Programming Parallel Programming • What’s hard about it? What s hard about it? We don’t think in parallel Parallel algorithms are Parallel algorithms are after ‐ thoughts Victor.W.Lee@intel.com 18

  19. Parallel Programming Parallel Programming • Best serial code doesn’t always scale well for large # of processors Victor.W.Lee@intel.com 19

  20. Scalability for Multi ‐ Core y • Amdahl’s law for multi ‐ core architecture: Serial component Parallel component component 4/21/2011 Intel Confidential 20

  21. Scalability of Many ‐ Core y y • Amdahl’s law for many ‐ core architecture: Serial component Parallel component p Perf. ratio between 1 core in single ‐ core processor and many ‐ core processor Significant portion of applications must be Significant portion of applications must be parallelized to achieve good scaling 4/21/2011 Intel Confidential 21

  22. Challenges (Summary) Challenges (Summary) • Architecture changes for many ‐ core – Compute density vs. compute efficiency – Data management: Feeding the Beast • Algorithms – Is the best scalar algorithm suitable for parallel computing • Programming model – Human tends to think in sequential steps Parallel – Human tends to think in sequential steps. Parallel computing is not natural – Non ‐ ninja parallel programming Victor.W.Lee@intel.com 22

  23. Our approach Our approach Application Specific HW/SW Co ‐ design HW/SW Co design

  24. Our Approach: App ‐ Arch Co ‐ Design Architecture ‐ aware analysis of computational needs of parallel applications Workloads Programming environments Focus on specific co ‐ travelers Focus on specific co travelers Execution environments and domains: Platform firmware/Ucode W orkload W orkloads requirem ents used to I/O, network, Memory HPC/Imaging/Finance/Physical drive design validate storage decisions designs Simulations/Medical/… Si l i /M di l/ On-die fabric Cache Cores Multi ‐ /Many ‐ core features that accelerate applications in a power ‐ efficient manner (bonus point: simplify programming) Victor.W.Lee@intel.com 24

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend