hta s
play

HTAs PROGRAMMING FOR PARALLELISM AND LOCALITY WITH PAPER PUBLISHED - PowerPoint PPT Presentation

HTAs PROGRAMMING FOR PARALLELISM AND LOCALITY WITH PAPER PUBLISHED AT PPOPP MARCH 2006 PRESENTATION BY ROMAN FRIGG Written at UIUC 1 , Universidade da Coruna 2 and IBM T.J. Watson Research Center 3 by 30 Ganesh Bikshandi 1 , Jia Guo, Daniel


  1. function C = cannon(A,B,C) for i=2:m A{i,:} = circshift(A{i,:}, [0, -(i-1)]); CANNON’S B(:,i} = circshift(B{:,i}, [-(i-1), 0]); end ALGORITHM for k=1:m C = C + A * B; A = circshift(A, [0, -1]); B = circshift(B, [-1, 0]); end 3 HTA OPERATIONS | 12 & APPLICATIONS

  2. function C = cannon(A,B,C) Initialization for i=2:m A{i,:} = circshift(A{i,:}, [0, -(i-1)]); CANNON’S B(:,i} = circshift(B{:,i}, [-(i-1), 0]); end ALGORITHM for k=1:m C = C + A * B; A = circshift(A, [0, -1]); B = circshift(B, [-1, 0]); end 3 HTA OPERATIONS | 12 & APPLICATIONS

  3. function C = cannon(A,B,C) Initialization for i=2:m A{i,:} = circshift(A{i,:}, [0, -(i-1)]); CANNON’S B(:,i} = circshift(B{:,i}, [-(i-1), 0]); end ALGORITHM for k=1:m Iteration C = C + A * B; A = circshift(A, [0, -1]); B = circshift(B, [-1, 0]); end 3 HTA OPERATIONS | 12 & APPLICATIONS

  4. Initialization for i=2:m A{i,:} = circshift(A{i,:}, [0, -(i-1)]); B(:,i} = circshift(B{:,i}, [-(i-1), 0]); end A 11 A 12 A 13 B 11 B 12 B 13 A 21 A 22 A 23 B 21 B 22 B 23 A 31 A 32 A 33 B 31 B 32 B 33 3 HTA OPERATIONS | 13 & APPLICATIONS

  5. i=2 Initialization for i=2:m A{i,:} = circshift(A{i,:}, [0, -(i-1)]); B(:,i} = circshift(B{:,i}, [-(i-1), 0]); end A 11 A 12 A 13 B 11 B 12 B 13 A 21 A 22 A 23 B 21 B 22 B 23 A 31 A 32 A 33 B 31 B 32 B 33 3 HTA OPERATIONS | 13 & APPLICATIONS

  6. i=2 Initialization for i=2:m A{i,:} = circshift(A{i,:}, [0, -(i-1)]); B(:,i} = circshift(B{:,i}, [-(i-1), 0]); end A 11 A 12 A 13 B 11 B 12 B 13 A 22 A 23 A 21 B 21 B 22 B 23 A 31 A 32 A 33 B 31 B 32 B 33 3 HTA OPERATIONS | 13 & APPLICATIONS

  7. i=2 Initialization for i=2:m A{i,:} = circshift(A{i,:}, [0, -(i-1)]); B(:,i} = circshift(B{:,i}, [-(i-1), 0]); end A 11 A 12 A 13 B 11 B 12 B 13 A 22 A 23 A 21 B 21 B 22 B 23 A 31 A 32 A 33 B 31 B 32 B 33 3 HTA OPERATIONS | 13 & APPLICATIONS

  8. i=2 Initialization for i=2:m A{i,:} = circshift(A{i,:}, [0, -(i-1)]); B(:,i} = circshift(B{:,i}, [-(i-1), 0]); end A 11 A 12 A 13 B 11 B 22 B 13 A 22 A 23 A 21 B 21 B 32 B 23 A 31 A 32 A 33 B 31 B 12 B 33 3 HTA OPERATIONS | 13 & APPLICATIONS

  9. i=3 Initialization for i=2:m A{i,:} = circshift(A{i,:}, [0, -(i-1)]); B(:,i} = circshift(B{:,i}, [-(i-1), 0]); end A 11 A 12 A 13 B 11 B 22 B 13 A 22 A 23 A 21 B 21 B 32 B 23 A 31 A 32 A 33 B 31 B 12 B 33 3 HTA OPERATIONS | 13 & APPLICATIONS

  10. i=3 Initialization for i=2:m A{i,:} = circshift(A{i,:}, [0, -(i-1)]); B(:,i} = circshift(B{:,i}, [-(i-1), 0]); end A 11 A 12 A 13 B 11 B 22 B 13 A 22 A 23 A 21 B 21 B 32 B 23 A 32 A 33 A 31 B 31 B 12 B 33 3 HTA OPERATIONS | 13 & APPLICATIONS

  11. i=3 Initialization for i=2:m A{i,:} = circshift(A{i,:}, [0, -(i-1)]); B(:,i} = circshift(B{:,i}, [-(i-1), 0]); end A 11 A 12 A 13 B 11 B 22 B 13 A 22 A 23 A 21 B 21 B 32 B 23 A 33 A 31 A 32 B 31 B 12 B 33 3 HTA OPERATIONS | 13 & APPLICATIONS

  12. i=3 Initialization for i=2:m A{i,:} = circshift(A{i,:}, [0, -(i-1)]); B(:,i} = circshift(B{:,i}, [-(i-1), 0]); end A 11 A 12 A 13 B 11 B 22 B 13 A 22 A 23 A 21 B 21 B 32 B 23 A 33 A 31 A 32 B 31 B 12 B 33 3 HTA OPERATIONS | 13 & APPLICATIONS

  13. i=3 Initialization for i=2:m A{i,:} = circshift(A{i,:}, [0, -(i-1)]); B(:,i} = circshift(B{:,i}, [-(i-1), 0]); end A 11 A 12 A 13 B 11 B 22 B 23 A 22 A 23 A 21 B 21 B 32 B 33 A 33 A 31 A 32 B 31 B 12 B 13 3 HTA OPERATIONS | 13 & APPLICATIONS

  14. i=3 Initialization for i=2:m A{i,:} = circshift(A{i,:}, [0, -(i-1)]); B(:,i} = circshift(B{:,i}, [-(i-1), 0]); end A 11 A 12 A 13 B 11 B 22 B 33 A 22 A 23 A 21 B 21 B 32 B 13 A 33 A 31 A 32 B 31 B 12 B 23 3 HTA OPERATIONS | 13 & APPLICATIONS

  15. for k=1:m Iteration C = C + A * B; A = circshift(A, [0, -1]); B = circshift(B, [-1, 0]); end A 11 A 12 A 13 C 11 C 12 C 13 B 11 B 22 B 33 A 22 A 23 A 21 C 21 C 22 C 23 B 21 B 32 B 13 A 33 A 31 A 32 C 31 C 32 C 33 B 31 B 12 B 23 3 HTA OPERATIONS | 14 & APPLICATIONS

  16. k=1 for k=1:m Iteration C = C + A * B; A = circshift(A, [0, -1]); B = circshift(B, [-1, 0]); end A 11 A 12 A 13 C 11 C 12 C 13 B 11 B 22 B 33 A 22 A 23 A 21 C 21 C 22 C 23 B 21 B 32 B 13 A 33 A 31 A 32 C 31 C 32 C 33 B 31 B 12 B 23 3 HTA OPERATIONS | 14 & APPLICATIONS

  17. k=1 for k=1:m Iteration C = C + A * B; A = circshift(A, [0, -1]); B = circshift(B, [-1, 0]); end A 11 A 12 A 13 C 11 C 12 C 13 B 11 B 22 B 33 A 22 A 23 A 21 C 21 C 22 C 23 B 21 B 32 B 13 A 33 A 31 A 32 C 31 C 32 C 33 B 31 B 12 B 23 3 HTA OPERATIONS | 14 & APPLICATIONS

  18. k=1 for k=1:m Iteration C = C + A * B; A = circshift(A, [0, -1]); B = circshift(B, [-1, 0]); end A 12 A 13 A 11 C 11 C 12 C 13 B 11 B 22 B 33 A 23 A 21 A 22 C 21 C 22 C 23 B 21 B 32 B 13 A 31 A 32 A 33 C 31 C 32 C 33 B 31 B 12 B 23 3 HTA OPERATIONS | 14 & APPLICATIONS

  19. k=1 for k=1:m Iteration C = C + A * B; A = circshift(A, [0, -1]); B = circshift(B, [-1, 0]); end A 12 A 13 A 11 C 11 C 12 C 13 B 11 B 22 B 33 A 23 A 21 A 22 C 21 C 22 C 23 B 21 B 32 B 13 A 31 A 32 A 33 C 31 C 32 C 33 B 31 B 12 B 23 3 HTA OPERATIONS | 14 & APPLICATIONS

  20. k=1 for k=1:m Iteration C = C + A * B; A = circshift(A, [0, -1]); B = circshift(B, [-1, 0]); end A 12 A 13 A 11 C 11 C 12 C 13 B 21 B 32 B 13 A 23 A 21 A 22 C 21 C 22 C 23 B 31 B 12 B 23 A 31 A 32 A 33 C 31 C 32 C 33 B 11 B 22 B 33 3 HTA OPERATIONS | 14 & APPLICATIONS

  21. k=2 for k=1:m Iteration C = C + A * B; A = circshift(A, [0, -1]); B = circshift(B, [-1, 0]); end A 12 A 13 A 11 C 11 C 12 C 13 B 21 B 32 B 13 A 23 A 21 A 22 C 21 C 22 C 23 B 31 B 12 B 23 A 31 A 32 A 33 C 31 C 32 C 33 B 11 B 22 B 33 3 HTA OPERATIONS | 14 & APPLICATIONS

  22. k=2 for k=1:m Iteration C = C + A * B; A = circshift(A, [0, -1]); B = circshift(B, [-1, 0]); end A 12 A 13 C 11 C 12 C 13 B 21 B 32 A 11 B 13 A 23 A 21 B 31 C 21 C 22 C 23 B 12 A 22 B 23 A 31 C 31 B 11 A 32 C 32 B 22 A 33 C 33 B 33 3 HTA OPERATIONS | 14 & APPLICATIONS

  23. k=2 for k=1:m Iteration C = C + A * B; A = circshift(A, [0, -1]); B = circshift(B, [-1, 0]); end A 12 A 13 A 11 C 11 C 12 C 13 B 21 B 32 B 13 A 23 A 21 A 22 C 21 C 22 C 23 B 31 B 12 B 23 A 31 A 32 A 33 C 31 C 32 C 33 B 11 B 22 B 33 3 HTA OPERATIONS | 14 & APPLICATIONS

  24. 3 INTRO 1 TALK HTA OPERATIONS & APPLICATIONS 5 OVERVIEW CONCLUSIONS HOW HTA’s 4 WORK 2 EVALUATION | 15

  25. NASA ADVANCED SUPERCOMPUTING BENCHMARK Nprocs EP (CLASS C) FT (CLASS B) CG (CLASS C) MG (CLASS B) LU (CLASS B) Fortran+ Matlab + Fortran + Matlab + Fortran + Matlab + Fortran + Matlab + Fortran + Matlab + MPI HTA MPI HTA MPI HTA MPI HTA MPI HTA 1 901.6 3556.9 136.8 657.4 3606.9 3812.0 26.9 828.0 15.7 245.1 4 273.1 888.8 109.1 274.0 362.0 1750.9 17.0 273.8 6.3 60.5 8 136.3 447.0 65.5 159.3 123.4 823.6 9.6 151.3 2.9 29.9 16 68.6 224.8 37.2 87.2 89.5 375.2 4.8 87.0 1.2 16.0 32 34.7 112.0 20.7 42.9 48.4 250.3 3.3 54.9 1.1 9.8 64 17.1 56.7 10.4 24.0 44.5 148.0 1.6 50.4 1.3 7.1 image source: paper 128 8.5 29.1 5.9 15.6 30.8 123.0 1.4 38.5 1.6 N/A able 1. Execution times in seconds for some of the applications in the NAS benchmarks for Fortran+MPI versus MATLAB +HTA. 4 | 16 EVALUATION

  26. NASA ADVANCED SUPERCOMPUTING BENCHMARK Nprocs EP (CLASS C) FT (CLASS B) CG (CLASS C) MG (CLASS B) LU (CLASS B) Fortran+ Matlab + Fortran + Matlab + Fortran + Matlab + Fortran + Matlab + Fortran + Matlab + MPI HTA MPI HTA MPI HTA MPI HTA MPI HTA 1 901.6 3556.9 136.8 657.4 3606.9 3812.0 26.9 828.0 15.7 245.1 4 273.1 888.8 109.1 274.0 362.0 1750.9 17.0 273.8 6.3 60.5 8 136.3 447.0 65.5 159.3 123.4 823.6 9.6 151.3 2.9 29.9 16 68.6 224.8 37.2 87.2 89.5 375.2 4.8 87.0 1.2 16.0 32 34.7 112.0 20.7 42.9 48.4 250.3 3.3 54.9 1.1 9.8 64 17.1 56.7 10.4 24.0 44.5 148.0 1.6 50.4 1.3 7.1 image source: paper 128 8.5 29.1 5.9 15.6 30.8 123.0 1.4 38.5 1.6 N/A able 1. Execution times in seconds for some of the applications in the NAS benchmarks for Fortran+MPI versus MATLAB +HTA. Too many numbers! 4 | 16 EVALUATION

  27. 128 3.2 GHz Intel Xeons, Gigabit Ethernet speedup factor Matlab+HTA Fortran+MPI 128 EP ebarassingly parallel 96 sequential speed 100 % 64 25 % 32 Matlab+HTA Fortran+MPI 4 0 0 32 64 96 128 | 17 # processors EVALUATION

  28. 128 3.2 GHz Intel Xeons, Gigabit Ethernet speedup factor Matlab+HTA Fortran+MPI 128 EP ebarassingly parallel LINEAR 96 SPEEDUP sequential speed 100 % 64 25 % 32 Matlab+HTA Fortran+MPI 4 0 0 32 64 96 128 | 17 # processors EVALUATION

  29. 128 3.2 GHz Intel Xeons, Gigabit Ethernet speedup factor Matlab+HTA Fortran+MPI 128 FFT fast fourier transform 96 sequential speed 100 % 64 21 % 32 Matlab+HTA Fortran+MPI 4 0 0 32 64 96 128 | 18 # processors EVALUATION

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend