Welcome! Todays Agenda: Introduction Hardware Trust No One - PowerPoint PPT Presentation

/INFOMOV/ Optimization & Vectorization J. Bikker - Sep-Nov 2018 - Lecture 12: “Multithreading” Welcome!

Today’s Agenda: ▪ Introduction ▪ Hardware ▪ Trust No One / An Efficient Pattern ▪ Experiments ▪ Final Assignment

INFOMOV – Lecture 12 – “Multithreading” 3 Introduction A Brief History of Many Cores Once upon a time... Then, in 2005: Intel’s Core 2 Duo (April 22). (Also 2005: AMD Athlon 64 X2. April 21.) 2007: Intel Core 2 Quad 2010: AMD Phenom II X6

INFOMOV – Lecture 12 – “Multithreading” 4 Introduction A Brief History of Many Cores Once upon a time... Then, in 2005: Intel’s Core 2 Duo (April 22). (Also 2005: AMD Athlon 64 X2. April 21.) 2007: Intel Core 2 Quad 2010: AMD Phenom II X6 Today...

INFOMOV – Lecture 12 – “Multithreading” 5 Introduction A Brief History of Many Cores Once upon a time... Then, in 2005: Intel’s Core 2 Duo (April 22). (Also 2005: AMD Athlon 64 X2. April 21.) 2007: Intel Core 2 Quad 2010: AMD Phenom II X6 2017: Threadripper 1920X 2018: Threadripper 2950X

INFOMOV – Lecture 12 – “Multithreading” 6 Introduction

INFOMOV – Lecture 12 – “Multithreading” 7 Introduction Threads / Scalability ...

INFOMOV – Lecture 12 – “Multithreading” 8 Introduction Optimizing for Multiple Cores What we did before: 1. Profile. 2. Understand the hardware. 3. Trust No One. Goal: ▪ It’s fast enough when it scales linearly with the number of cores. ▪ It’s fast enough when the parallelizable code scales linearly with the number of cores. ▪ It’s fast enough if there is no sequential code.

INFOMOV – Lecture 12 – “Multithreading” 11 Hardware Hardware Review T0 L1 I-$ L2 $ We have: T1 L1 D-$ ▪ Four physical cores T0 L1 I-$ L2 $ ▪ Each running two threads T1 L1 D-$ ▪ L1 cache: 32Kb, 4 cycles latency L3 $ ▪ L2 cache: 256Kb, 10 cycles latency T0 L1 I-$ ▪ A large shared L3 cache. L2 $ T1 L1 D-$ T0 L1 I-$ L2 $ T1 L1 D-$

INFOMOV – Lecture 12 – “Multithreading” 12 Hardware Simultaneous Multi-Threading (SMT) (Also known as hyperthreading) E E Pipelines grow wider and deeper: E E E ▪ Wider, to execute multiple instructions in parallel E in a single cycle. E E ▪ Deeper, to reduce the complexity of each pipeline E stage, which allows for a higher frequency. E E E However, parallel instructions must be independent, t otherwise we get bubbles. Observation: two independent threads provide twice as many independent instructions.

INFOMOV – Lecture 12 – “Multithreading” 13 Hardware fldz xor ecx, ecx fld dword ptr [4520h] Simultaneous Multi-Threading (SMT) mov edx, 28929227h fld dword ptr [452Ch] ... push esi mov esi, 0C350h add ecx, edx mov eax, 91D2A969h xor edx, 17737352h shr ecx, 1 mul eax, edx E fld st(1) E faddp st(3), st E mov eax, 91D2A969h E shr edx, 0Eh E add ecx, edx E E fmul st(1),st E xor edx, 17737352h E shr ecx, 1 E mul eax, edx E shr edx, 0Eh E dec esi t jne tobetimed+1Fh

INFOMOV – Lecture 12 – “Multithreading” 14 Hardware fldz xor ecx, ecx fld dword ptr [4520h] Simultaneous Multi-Threading (SMT) mov edx, 28929227h fld dword ptr [452Ch] Nehalem (i7): six wide. push esi mov esi, 0C350h add ecx, edx ▪ Three memory operations mov eax, [91D2h] ▪ Three calculations (float, int, vector) xor edx, 17737352h shr ecx, 1 mul eax, edx fld st(1) faddp st(3), st mov eax, 91D2A969h shr edx, 0Eh add ecx, edx execution unit 1 MEM fld mov fmul st(1),st execution unit 2 MEM mov mov xor edx, 17737352h execution unit 3 MEM fld shr ecx, 1 execution unit 4 CALC fldz add xor mul mul eax, edx execution unit 5 CALC xor fld shr fmul execution unit 6 CALC push faddp shr edx, 0Eh dec esi t jne tobetimed+1Fh

INFOMOV – Lecture 12 – “Multithreading” 15 Hardware fldz fld st(1) xor ecx, ecx faddp st(3), st fld dword ptr [4520h] mov eax, 91D2A969h Simultaneous Multi-Threading (SMT) mov edx, 28929227h shr edx, 0Eh fld dword ptr [452Ch] add ecx, edx Nehalem (i7): six wide*. push esi fmul st(1),st mov esi, 0C350h xor edx, 17737352h add ecx, edx shr ecx, 1 ▪ Three memory operations mov eax, [91D2h] mul eax, edx ▪ Three calculations (float, int, vector) xor edx, 17737352h shr edx, 0Eh shr ecx, 1 dec esi mul eax, edx fldz SMT: feeding the pipe from two threads. fld st(1) xor ecx, ecx faddp st(3), st fld dword ptr [4520h] All it really takes is an extra set of registers. mov eax, 91D2A969h mov edx, 28929227h shr edx, 0Eh fld dword ptr [452Ch] add ecx, edx push esi execution unit 1 MEM fld mov fmul st(1),st mov esi, 0C350h execution unit 2 MEM mov mov xor edx, 17737352h add ecx, edx execution unit 3 MEM fld shr ecx, 1 mov eax, [91D2h] execution unit 4 CALC fldz add xor mul mul eax, edx xor edx, 17737352h execution unit 5 CALC xor fld shr fmul execution unit 6 CALC push faddp shr edx, 0Eh shr ecx, 1 dec esi mul eax, edx t jne tobetimed+1Fh jne tobetimed+1Fh *: Details: The Architecture of the Nehalem Processor and Nehalem-EP SMP Platforms, Thomadakis, 2011.

INFOMOV – Lecture 12 – “Multithreading” 16 Hardware Simultaneous Multi-Threading (SMT) Hyperthreading does mean that now two threads are using the same L1 and L2 cache. T0 L1 I-$ L2 $ T1 L1 D-$ ▪ For the average case, this will reduce data locality. ▪ If both threads use the same data, data locality remains the same. ▪ One thread can also be used to fetch data that the other thread will need *. *: Tolerating Memory Latency through Software-Controlled Pre-Execution in Simultaneous Multithreading Processors, Luk, 2001.

INFOMOV – Lecture 12 – “Multithreading” 17 Hardware Multiple Processors: NUMA Two physical processors on a single mainboard: ▪ Each CPU has its own memory ▪ Each CPU can access the memory of the other CPU. The penalty for accessing ‘foreign’ memory is ~50%.

INFOMOV – Lecture 12 – “Multithreading” 18 Hardware Multiple Processors: NUMA Do we care? ▪ Most boards host 1 CPU. ▪ A quadcore still talks to memory via a single interface. However: Threadripper is a NUMA device. Threadripper = 2x Zeppelin, with for each Zeppelin: ▪ L1, L2, L3 cache ▪ A link to memory This CPU behaves as two CPUs in a single socket.

INFOMOV – Lecture 12 – “Multithreading” 19 Hardware Multiple Processors: NUMA Threadripper & Windows: ▪ Threadripper hides NUMA from the OS ▪ Most software is not NUMA-aware.

INFOMOV – Lecture 12 – “Multithreading” 21 Trust No One Windows DWORD WINAPI myThread(LPVOID lpParameter) { unsigned int& myCounter = *((unsigned int*)lpParameter); while(myCounter < 0xFFFFFFFF) ++myCounter; return 0; } int main(int argc, char* argv[]) { using namespace std; unsigned int myCounter = 0; DWORD myThreadID; HANDLE myHandle = CreateThread(0, 0, myThread, &myCounter;, 0, &myThreadID;); char myChar = ' '; while(myChar != 'q') { cout << myCounter << endl; myChar = getchar(); } CloseHandle(myHandle); return 0; }

INFOMOV – Lecture 12 – “Multithreading” 22 Trust No One Boost #include <boost/thread.hpp> #include <boost/chrono.hpp> #include <iostream> void wait(int seconds) { boost::this_thread::sleep_for(boost::chrono::seconds{seconds}); } void thread() { for (int i = 0; i < 5; ++i) { wait(1); std::cout << i << '\n'; } } int main() { boost::thread t{thread}; t.join(); }

INFOMOV – Lecture 12 – “Multithreading” 23 Trust No One OpenMP #pragma omp parallel for for( int n = 0; n < 10; ++n ) printf( " %d", n ); printf( ".\n" ); float a[8], b[8]; #pragma omp simd for( int n = 0; n < 8; ++n) a[n] += b[n]; struct node { node *left, *right; }; extern void process(node* ); void postorder_traverse(node* p) { if (p->left) #pragma omp task postorder_traverse(p->left); if (p->right) #pragma omp task postorder_traverse(p->right); #pragma omp taskwait process(p); }

INFOMOV – Lecture 12 – “Multithreading” 24 Trust No One Intel TBB #include "tbb/task_group.h" using namespace tbb; int Fib( int n ) { if (n<2) { return n; } else { int x, y; task_group g; g.run( [&]{x=Fib( n – 1 );} ); // spawn a task g.run( [&]{y=Fib( n – 2 );} ); // spawn another task g.wait(); // wait for both tasks to complete return x + y; } }

INFOMOV – Lecture 12 – “Multithreading” 25 Trust No One Considerations When using external tools to manage your threads, ask yourself: ▪ What is the overhead of creating / destroying a thread? ▪ Do I even know when threads are created? ▪ Do I know on which cores threads execute? What if… we handled everything ourselves ?

INFOMOV – Lecture 12 – “Multithreading” 26 Trust No One worker thread 0 worker thread 1 worker thread 2 worker thread 3 worker thread 4 worker thread 5 worker thread 6 worker thread 7 ▪ Worker threads never die tasks: ▪ Tasks are claimed by worker threads ▪ Execution of a task may depend on completion of other tasks ▪ Tasks can produce new tasks

Welcome! Todays Agenda: Introduction Hardware Trust No One - PowerPoint PPT Presentation

/INFOMOV/ Optimization & Vectorization J. Bikker - Sep-Nov 2018 - Lecture 12: Multithreading Welcome! Todays Agenda: Introduction Hardware Trust No One / An Efficient Pattern Experiments Final

Welcome back. Today. Welcome back. Today. Continue Sampling combinatorial structures. Welcome

Welcome! Welcome! Welcome! Welcome! What will happen today? What will happen today? Lecture

What is the League Today 1 1/23/2017 What is the League Today What is the League Today 2

Welcome back. Today. Welcome back. Today. Review: Spectral gap, Edge expansion h ( G ) ,

Welcome back... Welcome back... ..to me. Welcome back... ..to me. Test out Welcome back...

Social/Network/Analysis mohamed.bouguessa@uqo.ca/ 1 Web/today 2

Lecture 15 Logistics HW4 is due today HW5 posted today HW5 posted today Exam

Welcome to Today s ACM Webinar Welcome to today s ACM Webinar. The presentation starts

Welcome! Welcome ! - Agenda ANNUAL STEM EXPO 17 ..:: TIME AGENDA ITEM 2:30 PM Welcome Ceremony

Welcome Monthly Meeting August 2, 2019 Welcome & Check-in Agenda I. Welcome and

TEC Roadshow 2016 Welcome Agenda What well cover today: Welcome TECs current

2015 Assigners Summit Welcome Agenda: 1. Welcome 2. Part 1 Issues in assigning today 3.

Department Collaborative June 25, 2018 Welcome! Agenda for today: Welcome Presentation

WIEMANN LAMPHERE ARCHITECTS MONTPELIER TODAY MONTPELIER TODAY PARKING! VEHICLES ARE

Today. Types of graphs. Today. Types of graphs. Complete Graphs. Trees. Hypercubes. Today.

Welcome! Welcome! Welcome! Welcome! Autor:Johann Oberdorfer Autor:Johann Oberdorfer With

Applying Trust Policies for Protecting Applying Trust Policies for Protecting Mobile Agents

Suite March 2018 Webinar Instructions Presenters Chris Andrews, The Cloudburst Group

Behavioral Query 12/12/2019 Jiaping Gui , Xusheng Xiao , Ding Li , Chung Hwan Kim

Presentation of the paper Proof-of-Execution: Reaching Consensus through Fault-Tolerant

Trusted Component Deployment Trusted Components Bernd Schoeller January 30 th , 2006 Code from

The New Uniform Grant Guidance: Executive Level Overview What You Need to Know Shelly L. Hammond

Josh Bloch Charlie Garrod 17-214 1 Administrivia Homework 5 Best Frameworks available today

www.cornwall-insight.com Tim Dixon James Brabben HELPING YOU MAKE SENSE OF THE HELPING YOU MAKE

Welcome! Todays Agenda: Introduction Hardware Trust No One - PowerPoint PPT Presentation

/INFOMOV/ Optimization & Vectorization J. Bikker - Sep-Nov 2018 - Lecture 12: Multithreading Welcome! Todays Agenda: Introduction Hardware Trust No One / An Efficient Pattern Experiments Final

Welcome back. Today. Welcome back. Today. Continue Sampling combinatorial structures. Welcome

Welcome! Welcome! Welcome! Welcome! What will happen today? What will happen today? Lecture

What is the League Today 1 1/23/2017 What is the League Today What is the League Today 2

Welcome back. Today. Welcome back. Today. Review: Spectral gap, Edge expansion h ( G ) ,

Welcome back... Welcome back... ..to me. Welcome back... ..to me. Test out Welcome back...

Social/Network/Analysis mohamed.bouguessa@uqo.ca/ 1 Web/today 2

Lecture 15 Logistics HW4 is due today HW5 posted today HW5 posted today Exam

Welcome to Today s ACM Webinar Welcome to today s ACM Webinar. The presentation starts

Welcome! Welcome ! - Agenda ANNUAL STEM EXPO 17 ..:: TIME AGENDA ITEM 2:30 PM Welcome Ceremony

Welcome Monthly Meeting August 2, 2019 Welcome &amp; Check-in Agenda I. Welcome and

TEC Roadshow 2016 Welcome Agenda What well cover today: Welcome TECs current

2015 Assigners Summit Welcome Agenda: 1. Welcome 2. Part 1 Issues in assigning today 3.

Department Collaborative June 25, 2018 Welcome! Agenda for today: Welcome Presentation

WIEMANN LAMPHERE ARCHITECTS MONTPELIER TODAY MONTPELIER TODAY PARKING! VEHICLES ARE

Today. Types of graphs. Today. Types of graphs. Complete Graphs. Trees. Hypercubes. Today.

Welcome! Welcome! Welcome! Welcome! Autor:Johann Oberdorfer Autor:Johann Oberdorfer With

Applying Trust Policies for Protecting Applying Trust Policies for Protecting Mobile Agents

Suite March 2018 Webinar Instructions Presenters Chris Andrews, The Cloudburst Group

Behavioral Query 12/12/2019 Jiaping Gui , Xusheng Xiao , Ding Li , Chung Hwan Kim

Presentation of the paper Proof-of-Execution: Reaching Consensus through Fault-Tolerant

Trusted Component Deployment Trusted Components Bernd Schoeller January 30 th , 2006 Code from

The New Uniform Grant Guidance: Executive Level Overview What You Need to Know Shelly L. Hammond

Josh Bloch Charlie Garrod 17-214 1 Administrivia Homework 5 Best Frameworks available today

www.cornwall-insight.com Tim Dixon James Brabben HELPING YOU MAKE SENSE OF THE HELPING YOU MAKE

Welcome Monthly Meeting August 2, 2019 Welcome & Check-in Agenda I. Welcome and