perform ance visualization of hybrid cell applications
play

Perform ance Visualization of Hybrid Cell Applications Scicom P 1 5 - PowerPoint PPT Presentation

Com puter Science Computer Engineering Computer Architecture Perform ance Visualization of Hybrid Cell Applications Scicom P 1 5 , May 1 9 th, Barcelona holger.brunst daniel.hackenberg @ tu-dresden.de Outline I ntroduction Softw are


  1. Com puter Science » Computer Engineering » Computer Architecture Perform ance Visualization of Hybrid Cell Applications Scicom P 1 5 , May 1 9 th, Barcelona holger.brunst daniel.hackenberg @ tu-dresden.de

  2. Outline I ntroduction Softw are Tracing on Cell System s I m plem entation and Functionality Exam ples and Overhead Sum m ary Holger Brunst, Daniel Hackenberg Slide 2

  3. Cell Broadband Engine SPE SPU SPU SPU SPU SPU SPU SPU SPU LS LS LS LS LS LS LS LS Element Interconnect Bus PowerPC Memory Interface Bus Interface Controller L 2 L 1 Core Controller ( MIC ) ( BIC ) PowerPC Processor Element (PPE) Dual XDR FlexIO SPE: Synergistic Processor Element LS: Local Store Holger Brunst, Daniel Hackenberg Slide 3

  4. Cell Broadband Engine Vast Resources • � SPEs: SIMD-Cores for fast calculations, 256 KB local store (LS, software controlled), dedicated DMA engine (MFC) • � PPE: very simple PowerPC Core for OS (Linux) and control tasks Sophisticated Architecture • � Complex software development process • � Different compilers and programs for PPE and SPEs • � SPEs use DMA commands to access main memory or LS of other SPEs, asynchronous execution by MFC • � Mailbox communication between PPE and SPEs Tool Support Holger Brunst, Daniel Hackenberg Slide 4

  5. Trace-based Analysis W hy do w e still need to analyze? • � HPC: System complexity increases constantly • � Parallelism enters main stream market and not many people know how to deal with it Approaches • � Profilers do not give detailed insight into timing behavior of an application • � Detailed online analysis pretty much impossible because of intrusion and data amount Tracing • � Records application behavior step-wise • � Tracing is an option to capture the dynamic behavior of parallel applications • � Performance analysis done on a post-mortem basis Holger Brunst, Daniel Hackenberg Slide 5

  6. Background W hat is Vam pir? • � Performance monitoring and analysis tool • � Targets the visualization of dynamic processes on massively parallel (compute-) systems History • � Development started more than 15 years ago at Research Centre Jülich, ZAM • � Since 1997, developed at TU Dresden (first: collaboration with Pallas GmbH, from 2003-2005: Intel Software & Solutions Group, since January 2006: TU Dresden, ZIH / GWT-TUD) Availability • � Unix, Windows, and Mac OS • � Visualization components (Vampir) are commercial • � Monitor components (VampirTrace) are Open Source Holger Brunst, Daniel Hackenberg Slide 6

  7. Com ponents Application Trace Vampir CPU Data VampirTrace (OTF) VampirServer Time Task 1 … Task n << m Application OTF Trace 1 CPU Part 1 VampirTrace Application OTF Trace 2 CPU Part 2 VampirTrace Application OTF Trace 3 CPU Part 3 VampirTrace Application OTF Trace 4 CPU Part 4 VampirTrace . . . . . . Trace Data Application 10,000 CPU Part m VampirTrace Holger Brunst, Daniel Hackenberg Slide 7

  8. Flavors Vam pir • � Sequential event analysis • � Rich set of graphical performance views • � For desktops and small parallel production environments • � Less scalable Vam pirServer • � Distributed client/ server approach • � Parallel analysis • � New features Vam pir for W indow s • � Modern QT-based GUI • � Released at ISC 2009, Hamburg • � Currently: Beta-Release Holger Brunst, Daniel Hackenberg Slide 8

  9. Outline I ntroduction Softw are Tracing on Cell System s I m plem entation and Functionality Exam ples and Overhead Sum m ary Holger Brunst, Daniel Hackenberg Slide 11

  10. Softw are Tracing on Cell System s PPE • � Conventional tools with PowerPC support run unmodified • � Modifications necessary to support SPE threads SPE • � New concept needs to be designed, suitable for this architecture • � New monitor necessary to generate events • � Local store too small, only temporary storage of events • � Synchronization of PPE and SPE timers necessary Holger Brunst, Daniel Hackenberg Slide 12

  11. Trace Monitor Concept * Buffers will switch each time SPE 0 SPE n the current trace buffer is full SPU n SPU 0 Instrumented Instrumented I/O System SPE program SPE program ... SPE program writes trace trace file PPE events into Local Store Local Store small trace PPE trace file SPE 0 buffer processes Buf 1 Buf 1 ... SPE trace * Buf 2 Buf 2 buffers (post trace file SPE n mortem) and DMA transfer writes trace of full trace files to disk buffer to main Element Interconnect Bus memory in backgound, SPE program keeps running Instrumented Buf 1/0 Buf 1/n PPE program ... Buf 2/0 Buf 2/n PPE Buf 3/0 Buf 3/n Conventional monitoring ... ... Main tool with enhancements to Buf m/0 Buf m/n cover e.g. mailbox Memory communication with SPEs Holger Brunst, Daniel Hackenberg Slide 13

  12. Trace Visualization for Cell ( 1 ) Location Process 1 Region 1 Region 2 Process 2 Region 1 Region 2 Process 3 Region 1 Region 2 Process 4 Region 1 Region 2 Time I llustration of parallel processes in a typical tim eline display Holger Brunst, Daniel Hackenberg Slide 14

  13. Trace Visualization for Cell ( 2 ) Location PPE Process 1 SPE Thread 1 Region 1 Region 2 SPE Thread 2 Region 1 Region 2 SPE Thread 3 Region 1 Region 2 Time I llustration of SPE threads as children of the PPE process Holger Brunst, Daniel Hackenberg Slide 15

  14. Trace Visualization for Cell ( 3 ) Location PPE Process 1 SPE Thread 1 Region 1 SPE Thread 2 Region 1 SPE Thread 3 Region 1 Time I llustration of m ailbox m essages • � Classic two-sided communication (send/ receive) • � Illustrated by lines similar to MPI messages Holger Brunst, Daniel Hackenberg Slide 16

  15. Trace Visualization for Cell ( 4 ) Location PPE Process 1 Main Memory read read write SPE Thread 1 Region 1 SPE Thread 2 Region 1 Time I llustration of DMA transfers betw een SPEs and m ain m em ory • � PPE is not involved • � Main memory is represented as independent bar • � Allows graphical representation of memory states (read/ write) Holger Brunst, Daniel Hackenberg Slide 17

  16. Trace Visualization for Cell ( 5 ) Location PPE Process 1 Main Memory SPE Thread 1 SPE Thread 2 DMA put DMA get Time DMA transfers betw een SPEs • � Challenge: Communication is one-sided • � Peer-to-peer send/ receive representation unsuitable Distinction of active and passive partner? • � Additional lines • � Additional bullets (active partner) • � Even more bullets? (passive partner) Holger Brunst, Daniel Hackenberg Slide 18

  17. Trace Visualization for Cell ( 6 ) Location PPE Process 1 Main Memory SPE Thread 1 SPE Thread 2 DMA wait t 0 t 1 t 2 Time • � DMA wait operation creates t_0 = get_timestamp(); two events (at t1 and t2) mfc_get(); • � Allows illustration of DMA wait time [...] • � Similar for mailbox messages t_1 = get_timestamp(); wait_for_dma_tag(); t_2 = get_timestamp(); Holger Brunst, Daniel Hackenberg Slide 19

  18. Outline I ntroduction Softw are Tracing on Cell System s I m plem entation and Functionality Exam ples and Overhead Sum m ary Holger Brunst, Daniel Hackenberg Slide 20

  19. I m plem entation Prototype im plem entation based on Vam pirTrace ( VT) • � Open Source • � http: / / www.tu-dresden.de/ zih/ vampirtrace Additional tool: CellTrace • � Header files for PPE and SPE programs: Instrumentation of inline functions provided by the Cell SDK • � Library for PPE programs + library for SPE programs celltrace _spu.h celltrace _ppu.h spu_code_1.c spu _code _n.c ppu_code_1.c ppu_code _m.c SPU Compiler SPU Compiler vtcc vtcc -DCTRACE -DCTRACE -DCTRACE -DCTRACE celltrace _spu.a celltrace _ppu.a spu_code_1.o spu _code _n.o ppu_code_1.o ppu_code _m.o spu_object .o ppu_object .o Embedder Archiver cell _binary spu _lib.a (trace enabled ) Holger Brunst, Daniel Hackenberg Slide 21

  20. Trace Visualization w ith Vam pir ( 1 ) Visualization of a Cell trace using Vam pir • � Simple demo program • � 4 SPEs only Holger Brunst, Daniel Hackenberg Slide 22

  21. Trace Visualization w ith Vam pir ( 2 ) Holger Brunst, Daniel Hackenberg Slide 23

  22. Trace Visualization w ith Vam pir ( 3 ) Complex DMA transfers of SPE 3 Holger Brunst, Daniel Hackenberg Slide 24

  23. Outline I ntroduction Softw are Tracing on Cell System s I m plem entation and Functionality Exam ples and Overhead Sum m ary Holger Brunst, Daniel Hackenberg Slide 25

  24. Exam ple Cell Applications: FFT ( 1 ) FFT at a synchronization point 8 SPEs, 64 KByte page size, 11.9 GFLOPS Holger Brunst, Daniel Hackenberg Slide 26

  25. Exam ple Cell Applications: FFT ( 2 ) FFT at a synchronization point 8 SPEs, 16 MByte page size, 42.9 GFLOPS Holger Brunst, Daniel Hackenberg Slide 27

  26. Exam ple Applications: Cholesky ( 1 ) Cholesky transformation with 8 SPEs overview with DMA communication of SPE 3 Holger Brunst, Daniel Hackenberg Slide 28

  27. Exam ple Applications: Cholesky ( 2 ) Cholesky transformation with 8 SPEs enlargement with DMA communication of SPE 3 Holger Brunst, Daniel Hackenberg Slide 29

  28. Exam ple Cell Applications: RAxML ( 1 ) RAxML (Randomized Accelerated Maximum Likelihood) with 8 SPEs, ramp-up phase Holger Brunst, Daniel Hackenberg Slide 30

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend