High Performance Computing @ AUB GradEx Workshop Mher Kazandjian - PowerPoint PPT Presentation

High Performance Computing @ AUB GradEx Workshop Mher Kazandjian November 2018 American University of Beirut

How this talk is structured? • History of computing • Scientifjc computing workfmows • Computer architecture overview • Do's and Don'ts • Demo's and walk throughs

Goals • Demonstrate how you (as users) can benefjt from AUB's HPC facilities • Attract users, because: • we want to boost scientifjc computing research • we want to help you • we have capacity This presentation is based on actual feedback and use cases collected from users over the past year

History of computing Alan Turing 1912-1954

Growth over time 12 orders of magnitude since 1960

Growth over time ~12 orders of magnitude since 1960 if you had 1000$ in 1970 you could do 10^12 times more calculations with hardware that costs the same today

What is HPC used for today? ● Solving scientifjc problems ● Data mining and deep learning ● Military research and security ● Cloud computing ● Blockchain (cryptocurrency)

What is HPC used for today? ● https://blog.openai.com/ai-and-compute/

Growth over time Multicores hit the markets in ~ 2005 Users at home started benefjting from parallelism Click to add text Click to add text Click to add text Prior to that Click to add text Click to add text applications that Click to add text Click to add text Click to add text scaled well were Click to add text Click to add text restricted to Click to add text Click to add text mainframes / Click to add text Click to add text datacenters and HPC clusters

HPC @ AUB 8 compute nodes in 2006 Specs per node - 4 cores - 8 GB ram ~ 80 GFlops

HPC is all about scalability • The high speed network is the " most" important component

But what is scalability? Performance improvements as the number of cores (resources) increases for the same problem size - hard scalability

But what is scalability? This is a CPU under a microscope

But what is scalability? 2 sec Prog.exe Serial runtime = T_serial

But what is scalability? 1 sec Prog.exe Prog.exe parallel runtime = T_parallel

But what is scalability? 0.5 sec Prog.exe Prog.exe Prog.exe Prog.exe parallel runtime = T_parallel

But what is scalability? 0.5 sec Prog.exe Prog.exe Prog.exe Prog.exe Very nice!! but this is usually never the case

First demo – First scalability diagram

But what is scalability? Repeat the same process across multiple processors Prog.exe Prog.exe Prog.exe Prog.exe Prog.exe Prog.exe Prog.exe

But what is scalability? Wait! - how do these processors talk to each other? - how much data needs to be transferred for a certain task? - how fast do the processes communicate with each other? - how often should the processes communicate with each other? Prog.exe Prog.exe Prog.exe Prog.exe Prog.exe Prog.exe Prog.exe

At the single chip level Through the cache memory of the CPU Typical latency ~ ns (or less) Typical bandwidth > 150 GB/s

At the single chip level Through the RAM Through the RAM Typical latency ~ a few to tens ns Typical bandwidth ~ 10 to 50 GB/s (sometimes more) https://ark.intel.com/#@Processors Random Access Momory (aka RAM)

Second demo: bandwidth and some lingo - An array is just a bunch of bytes - Bandwidth is the speed with which information is tranferred - A fmoat (double precision) is 8 bytes - an array of one million elements is 1000 x 1000 x 8 bytes = 80 MB - if I measure the time to initialize this array I can measure how fast the cpu can access the RAM (since initializing the array implies visiting each memory address and setting it to zero) - bandwidth = size of array / time to initialize it

Second demo: bandwidth and some lingo - An array is just a bunch of bytes - Bandwidth is the speed with which information is tranferred - A fmoat (double precision) is 8 bytes - an array of one million elements is 1000 x 1000 x 8 bytes = 80 MB - if I measure the time to initialize this array I can measure how fast the cpu can access the RAM (since initializing the array implies visiting each memory address and setting it to zero) - bandwidth = size of array / time to initialize it Intel i7-6700HQ - https://ark.intel.com/products/88967/Intel-Core-i7-6700HQ-Processor-6M-Cache-up-to-3-50-GHz- - Advertised bandwidth = 34 GB/s - measured bandwidth (single thread quickie) = 22.8 GB/s

At the single motherboard level Through QPI (quick path interconnect) - typical latency for small data ~ ns - typical bandwidth 100 GB/s QPI TIP: server = node = compute node = numa node Random Access Random Access Memory (aka RAM) Memory (aka RAM) Typical latency ~ a few to tens ns Typical bandwidth ~ 10 to 100 GB/s Through the RAM (sometimes more)

Second demo: bandwidth multi-threaded - https://github.com/jefghammond/STREAM https://ark.intel.com/products/64597/Intel-Xeon-Processor-E5-2665-20M-Cache-2_40-GHz-8_00-GT s-Intel-QPI Another benchmark 2 socket Intel Xeon server - 2 x sockets, expected bandwidth ~102 GB/s - measured ~ 75 GB/s - on a completely idle node ~95 GB/s is possible

At the cluster level (multiple nodes) Through the network (ethernet) Typical latency ~ 10 micro-sec to 100 micro sec Typical bandwidth ~ 100 MB/s to a few 100 MB/s

At the cluster level (multiple nodes) Through the network (infiniband – high speed network) Typical latency ~ a few to micro-seconds to < 1 micro sec Typical bandwidth > 3 GB/s Benefits over ethernet: - Remote direct memory access - higher bandwidth - much lower latency https://en.wikipedia.org/wiki/InfiniBand

What hardware we have at AUB What hardware we have at AUB? - Arza: - 256 core, 1 TB RAM IBM cluster - production simulations, benchmarking - http://website.aub.edu.lb/it/hpc/Pages/home.aspx - vLabs - see Vassili’s slide - very flexible, easy to manage, windows support - public cloud - infinite resources – limited by $$$ - two pilot projects being tested – will be open soon for testing

Parallelization libraries / software SMP parallelism - OpenMP - CUDA - Matlab - Spark (recently deployed and tested) distributed parallelism (cluster wide) - MPI - Spark - MPI + OpenMP (hybrid) - MPI + CUDA - MPI + CUDA + OpenMP - Spark + CUDA (not tested – any volunteers?)

Linux/Unix culture > 99% of HPC clusters wold wide use some kind of linux / unix - Clicking your way to install software is easy for you (on windows or mac), but a nightmare for power users. - Linux is: - open-source - free - secure (at least much secure than windows et. al ) - no need for an antivirus that slows down your system - respects your privacy - huge community support in scientific computing - 99.8% of all HPC systems world wide since 1996 are non-windows machines https://github.com/mherkazandjian/top500parser

Software stack on the HPC cluster - Matlab - C, Java, C++, fortran - python 2 and python 3 - jupyter notebooks - Tensorflow (Deep learning) - Scala - Spark - R - R studio, R server (new)

Cluster usage: Demo - The scheduler: resource manager - bjobs - bqueues - bhosts - lsload - important places - / gpfs1 /my_username - / gpfs1/ apps/sw - basic linux knowledge - sample job script

Cluster usage: Documentation https://hpc-aub-users-guide.readthedocs.io/en/latest/ https://github.com/hpcaubuserguide/hpcaub_userguide The guide is for you - we want you to contribute to it directly - please send us pull requests

Cluster usage: Job scripts https://hpc-aub-users-guide.readthedocs.io/en/latest/jobs.html

Cluster usage: Job scripts https://hpc-aub-users-guide.readthedocs.io/en/latest/jobs.html In the user guide, there are samples and templates for many use cases: - we will help you write your own if your use case is not covered - this is 90% of the getting started task - recent success story: - spark server job template

Cluster usage: Job scripts https://hpc-aub-users-guide.readthedocs.io/en/latest/jobs.html

How to benefjt from the HPC hardware? - run many serial jobs that do not need to communicate - aka embarrassingly parallel jobs (nothing embarrasing about it though as long as you get your job done) - e.g - train several neural networks with different layer numbers - do a parameter sweep for a certain model ./my_prog.exe --param 1 & ./my_prog.exe --param 2 & ./my_prog.exe --param 3 & These would execute simultaneously - difficulty: very easy

How to benefjt from the HPC hardware? - run many serial jobs that do not need to communicate Demo

How to benefjt from the HPC hardware? - run a SMP parallel program (i.e on one node using threads) - e.g - matlab - C/C++/python/Java Difficulty: very easy to medium (problem dependent)

How to benefjt from the HPC hardware? - run a SMP parallel program (i.e on one node using threads) - C

How to benefjt from the HPC hardware? - run a SMP parallel program (i.e on one node using threads) - Demo: matlab parfor

High Performance Computing @ AUB GradEx Workshop Mher Kazandjian - PowerPoint PPT Presentation

High Performance Computing @ AUB GradEx Workshop Mher Kazandjian November 2018 American University of Beirut How this talk is structured? History of computing Scientifjc computing workfmows Computer architecture overview

AUB GROUP LTD FULL YEAR RESULTS FOR THE PERIOD ENDED 30 JUNE 2017 (FY17) 28 TH AUGUST 2017 Page

STRESS AUB B Counselin nseling g Center er Ola Ataya - AUB Student Counseling Center 2

An Overview of AUB Name: Title: Office of Admissions American University of Beirut Historical

New York University High Performance Computing High Performance Computing Information

Psychometric Versus Dynamic Assessment for Identifying Dyslexic Children with High Mathematical

Getting the Performance Out Of Getting the Performance Out Of High Performance Computing High

High Performance Computing in Web Browsers CE Seminar WT14/15 Henning Lohse High Performance

Alphabet Soup: AUB and PALM COEIN for Systematic Diagnosis and Management of Abnormal Uterine

Fouad M.Fouad MD, FHS/AUB The Global Health initiative- Mailman School of Public Health- CUMC

About the Presenter Afif Tabsh, PMP Consultant & Trainer at CMCS (www.CMCS.co) PMP

Breakfast Seminar Series Employment Contracts Enhancing Enforceability Jock Climie Diane

Noncommutative Geometry in Physics Ali H. Chamseddine American University of Beirut (AUB) &

Introduction to High Performance Computing Pierre Aubert High Performance Computing (HPC)

Trends in High Performance Trends in High Performance Computing and Using Numerical Computing

High Performance Computing, High Performance Computing, Computational Grid, and Numerical

Trends in High Performance Trends in High Performance Computing and the Grid Computing and the

Android UI Development: Tips, Tricks, and Techniques Romain Guy Chet Haase Android UI Toolkit

Computing a tree http://faculty.washington.edu/jht/GS559_2013/ Genome 559: Introduction to

VIRTUAL CONFERENCE ictcm.com | # ICTCM 32 nd International Conference on Technology in

IRST SiPM characterizations and Application Studies G. Pauletta for the FACTOR collaboration

GROMACS simulatjon optjmisatjon Olivier Fisetue olivier.fjsetue@usask.ca Advanced Research

O mputational gic L The Polynomial Path Order and the Rules of Predicative Recursion with

Unpacking Tradi-onal Knowledge Rights and Wrongs TK = Tradi-onal Knowledge Cultural Harm

TK/Kindergarten Orientation Parent Meeting Thursday, August 13, 2020 2:00 pm & 3:00 pm

High Performance Computing @ AUB GradEx Workshop Mher Kazandjian - PowerPoint PPT Presentation

High Performance Computing @ AUB GradEx Workshop Mher Kazandjian November 2018 American University of Beirut How this talk is structured? History of computing Scientifjc computing workfmows Computer architecture overview

AUB GROUP LTD FULL YEAR RESULTS FOR THE PERIOD ENDED 30 JUNE 2017 (FY17) 28 TH AUGUST 2017 Page

STRESS AUB B Counselin nseling g Center er Ola Ataya - AUB Student Counseling Center 2

An Overview of AUB Name: Title: Office of Admissions American University of Beirut Historical

New York University High Performance Computing High Performance Computing Information

Psychometric Versus Dynamic Assessment for Identifying Dyslexic Children with High Mathematical

Getting the Performance Out Of Getting the Performance Out Of High Performance Computing High

High Performance Computing in Web Browsers CE Seminar WT14/15 Henning Lohse High Performance

Alphabet Soup: AUB and PALM COEIN for Systematic Diagnosis and Management of Abnormal Uterine

Fouad M.Fouad MD, FHS/AUB The Global Health initiative- Mailman School of Public Health- CUMC

About the Presenter Afif Tabsh, PMP Consultant &amp; Trainer at CMCS (www.CMCS.co) PMP

Breakfast Seminar Series Employment Contracts Enhancing Enforceability Jock Climie Diane

Noncommutative Geometry in Physics Ali H. Chamseddine American University of Beirut (AUB) &amp;

Introduction to High Performance Computing Pierre Aubert High Performance Computing (HPC)

Trends in High Performance Trends in High Performance Computing and Using Numerical Computing

High Performance Computing, High Performance Computing, Computational Grid, and Numerical

Trends in High Performance Trends in High Performance Computing and the Grid Computing and the

Android UI Development: Tips, Tricks, and Techniques Romain Guy Chet Haase Android UI Toolkit

Computing a tree http://faculty.washington.edu/jht/GS559_2013/ Genome 559: Introduction to

VIRTUAL CONFERENCE ictcm.com | # ICTCM 32 nd International Conference on Technology in

IRST SiPM characterizations and Application Studies G. Pauletta for the FACTOR collaboration

GROMACS simulatjon optjmisatjon Olivier Fisetue olivier.fjsetue@usask.ca Advanced Research

O mputational gic L The Polynomial Path Order and the Rules of Predicative Recursion with

Unpacking Tradi-onal Knowledge Rights and Wrongs TK = Tradi-onal Knowledge Cultural Harm

TK/Kindergarten Orientation Parent Meeting Thursday, August 13, 2020 2:00 pm &amp; 3:00 pm

About the Presenter Afif Tabsh, PMP Consultant & Trainer at CMCS (www.CMCS.co) PMP

Noncommutative Geometry in Physics Ali H. Chamseddine American University of Beirut (AUB) &

TK/Kindergarten Orientation Parent Meeting Thursday, August 13, 2020 2:00 pm & 3:00 pm