Performance (fjnish) / Exceptions 1 Changelog Changes made in this - PowerPoint PPT Presentation

Performance (fjnish) / Exceptions 1

Changelog Changes made in this version not seen in fjrst lecture: 9 November 2017: an infjnite loop: correct infjnite loop code 9 November 2017: move sync versus async slide earlier 1

alternate vector interfaces intrinsics functions/assembly aren’t the only way to write vector code e.g. GCC vector extensions: more like normal C code types for each kind of vector write + instead of _mm_add_epi32 e.g. CUDA (GPUs): looks like writing multithreaded code, but each thread is vector “lane” 2

other vector instructions multiple extensions to the X86 instruction set for vector instructions this class: SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2 supported on lab machines 128-bit vectors latest X86 processors: AVX, AVX2, AVX-512 256-bit and 512-bit vectors 3

other vector instructions features AVX2/AVX/SSE pretty limiting other vector instruction sets often more featureful: (and require more sophisticated HW support) better conditional handling better variable-length vectors ability to load/store non-contiguous values 4

addressing effjciency for ( int i = 0; i < N; ++i) { for ( int j = 0; j < N; ++j) { for ( int k = kk; k < kk + 2; ++k) { } } } tons of multiplies by N?? isn’t that slow? 5 float Bij = B[i * N + j]; Bij += A[i * N + k] * A[k * N + j]; B[i * N + j] = Bij;

addressing transformation for ( int kk = 0; k < N; kk += 2 ) compiler will usually do this! } } } Akj_pointer += N; for ( int j = 0; j < N; ++j) { for ( int k = kk; k < kk + 2; ++k) { for ( int i = 0; i < N; ++i) { 6 float Bij = B[i * N + j]; float *Akj_pointer = &A[kk * N + j]; // Bij += A[i * N + k] * A[k * N + j~]; Bij += A[i * N + k] * Akj_pointer; B[i * N + j] = Bij; transforms loop to iterate with pointer increment/decrement by N ( × sizeof(fmoat))

addressing effjciency compiler will usually eliminate slow multiplies doing transformation yourself often slower if so way to check: see if assembly uses lots multiplies in loop if it doesn’t — do it yourself 7 i * N; ++i into i_times_N; i_times_N += N

optimizing real programs spend efgort where it matters e.g. 90% of program time spent reading fjles, but optimize computation? e.g. 90% of program time spent in routine A, but optimize B? 9

profjlers fjrst step — tool to determine where you spend time tools exist to do this for programs example on Linux: perf 10

perf usage sampling profjler stops periodically, takes a look at what’s running perf record OPTIONS program example OPTIONS: -F 200 — record 200/second --call-graph=dwarf — record stack traces perf report or perf annotate 11

children/self “children” — samples in function or things it called “self” — samples in function alone 12

demo 13

other profjling techniques count number of times each function is called not sampling — exact counts, but higher overhead might give less insight into amount of time 14

tuning optimizations biggest factor: how fast is it actually setup a benchmark make sure it’s realistic (right size? uses answer? etc.) compare the alternatives 15

an infjnite loop int main (void) { while (1) { /* waste CPU time */ } } If I run this on a lab machine, can you still use it? …if the machine only has one core? 17

timing nothing long times [ NUM_TIMINGS ]; int main (void) { for (int i = 0; i < N ; ++ i ) { long start , end ; /* do nothing */ end = get_time (); } output_timings ( times ); } same instructions — same difgerence each time? 18 start = get_time (); times [ i ] = end - start ;

doing nothing on a busy system 19 time for empty loop body 10 8 10 7 10 6 time (ns) 10 5 10 4 10 3 10 2 10 1 0 200000 400000 600000 800000 1000000 sample #

doing nothing on a busy system 20 time for empty loop body 10 8 10 7 10 6 time (ns) 10 5 10 4 10 3 10 2 10 1 0 200000 400000 600000 800000 1000000 sample #

time multiplexing // whatever get_time does ... subq %rbp, %rax // whatever get_time does call get_time million cycle delay movq %rax, %rbp call get_time loop.exe ... time CPU: ssh.exe loop.exe firefox.exe ssh.exe 21

time multiplexing really loop.exe ssh.exe firefox.exe loop.exe ssh.exe = operating system exception happens return from exception 22

OS and time multiplexing starts running instead of normal program saves old program counter, registers somewhere sets new registers, jumps to new program counter saved information called context 23 mechanism for this: exceptions (later) called context switch

context all registers values condition codes program counter i.e. all visible state in your CPU except memory address space: map from program to real addresses 24 %rax %rbx , …, %rsp , …

context switch pseudocode context_switch(last, next): ... 25 copy_preexception_pc last − >pc mov rax,last − >rax mov rcx, last − >rcx mov rdx, last − >rdx mov next − >rdx, rdx mov next − >rcx, rcx mov next − >rax, rax jmp next − >pc

contexts (A running) Process B memory: in Memory … … %rcxPC %rbxZF %raxSF OS memory: code, stack, etc. code, stack, etc. %rax Process A memory: in CPU PC ZF SF … %rsp %rcx %rbx 26

contexts (B running) Process B memory: in Memory … … %rcxPC %rbxZF %raxSF OS memory: code, stack, etc. code, stack, etc. %rax Process A memory: in CPU PC ZF SF … %rsp %rcx %rbx 27

memory protection reading from another program’s memory? Program A Program B 0x10000: .word 42 // ... // do work // ... movq 0x10000, %rax // while A is working: movq $99, %rax movq %rax, 0x10000 ... result: %rax is 42 (always) result: might crash 28

program memory 0xFFFF FFFF FFFF FFFF 0xFFFF 8000 0000 0000 0x7F… 0x0000 0000 0040 0000 Used by OS Stack Heap / other dynamic Writable data Code + Constants 29

program memory (two programs) Used by OS Program A Stack Heap / other dynamic Writable data Code + Constants Used by OS Program B Stack Heap / other dynamic Writable data Code + Constants 30

address space Program A code = kernel-mode only trigger error real memory … OS data Program B data Program A data Program B code (set by OS) programs have illusion of own memory mapping (set by OS) mapping addresses Program B addresses Program A called a program’s address space 31

program memory (two programs) Used by OS Program A Stack Heap / other dynamic Writable data Code + Constants Used by OS Program B Stack Heap / other dynamic Writable data Code + Constants 32

address space Program A code = kernel-mode only trigger error real memory … OS data Program B data Program A data Program B code (set by OS) programs have illusion of own memory mapping (set by OS) mapping addresses Program B addresses Program A called a program’s address space 33

address space mechanisms next week’s topic mapping called page tables mapping part of what is changed in context switch 34 called virtual memory

context all registers values condition codes program counter i.e. all visible state in your CPU except memory address space: map from program to real addresses 35 %rax %rbx , …, %rsp , …

The Process process = thread(s) + address space thread = illusion of own CPU address space = illusion of own memory 36 illusion of dedicated machine:

synchronous versus asynchronous synchronous — triggered by a particular instruction traps and faults asynchronous — comes from outside the program interrupts and aborts timer event keypress, other input event 37

Performance (fjnish) / Exceptions 1 Changelog Changes made in this - PowerPoint PPT Presentation

Performance (fjnish) / Exceptions 1 Changelog Changes made in this version not seen in fjrst lecture: 9 November 2017: an infjnite loop: correct infjnite loop code 9 November 2017: move sync versus async slide earlier 1 alternate vector

Troubleshooting Exceptions Module Overview Exceptions Managed Exceptions Unhandled Exceptions

Exceptions Exceptions Amtoft from Hatcliff Raising Exceptions Handling Exceptions Application

Exceptions, MIPS-Style Reminder: MIPS CPU deals with exceptions. Interrupts are

CS 104 Computer Organization and Design Exceptions and Interrupts CS104: Exceptions and

Imprecise Exceptions - Exceptions in Haskell Christopher Krau Universit at des Saarlandes

Stark Exceptions The Stark exceptions are mandatory. That is, if an arrangement falls within

What are Exceptions? Exceptions are rare events triggered by the hardware and forcing the

Exceptions zero, that require immediate handling when encountered by your program. The C++

Week 15 Exceptions Linear Search Binary Search Exceptions, Sorting, Searching & Review

After Arbitration: Filing Exceptions with the Authority July 18, 2017 Filing Exceptions with the

Lecture 11: Exceptions & processor management Exceptions Operating systems main

EE 457 Unit 8 Exceptions What Happens When Things Go Wrong 2 What are Exceptions?

COMP 213 Advanced Object-oriented Programming Lecture 18 Exceptions Exceptions Weve seen

Chapter 10 Exceptions Chapter Scope The purpose of exceptions Exception messages The

redo logging (fjnish) / distributed systems 1 1 last time (1) block groups keep related

Performance, Correctness, Exceptions: Pick Three Andrea Gussoni , Alessandro Di Federico, Pietro

CLIC detector requirements and technologies first comparison with the pp case Lucie Linssen,

Scientific Computing II Parallel Methods Jens Saak and Martin K ohler Summer Term 2017 OVGU

Key observation 1 Key observation 2 Not all variables and statements affect the

Tree vs. Line Observation-Based Slicing Dave Binkley Joint work with Nicolas Gold, Syed

Scaling up HBase Mahdi Roozbahani Lecturer, Computational Science and Engineering, Georgia Tech

Temporal Alignment os 1 ohlen 1 Johann Gamper 2 Anton Dign Michael H. B 1 University of Z

THE SEN2AGRI THE SEN2AGRI SYSTEM DATABASE SYSTEM DATABASE WHO? WHO? Laureniu Nicola, CS

Last Time Response time analysis Blocking terms Priority inversion And solutions

Performance (fjnish) / Exceptions 1 Changelog Changes made in this - PowerPoint PPT Presentation

Performance (fjnish) / Exceptions 1 Changelog Changes made in this version not seen in fjrst lecture: 9 November 2017: an infjnite loop: correct infjnite loop code 9 November 2017: move sync versus async slide earlier 1 alternate vector

Troubleshooting Exceptions Module Overview Exceptions Managed Exceptions Unhandled Exceptions

Exceptions Exceptions Amtoft from Hatcliff Raising Exceptions Handling Exceptions Application

Exceptions, MIPS-Style Reminder: MIPS CPU deals with exceptions. Interrupts are

CS 104 Computer Organization and Design Exceptions and Interrupts CS104: Exceptions and

Imprecise Exceptions - Exceptions in Haskell Christopher Krau Universit at des Saarlandes

Stark Exceptions The Stark exceptions are mandatory. That is, if an arrangement falls within

What are Exceptions? Exceptions are rare events triggered by the hardware and forcing the

Exceptions zero, that require immediate handling when encountered by your program. The C++

Week 15 Exceptions Linear Search Binary Search Exceptions, Sorting, Searching &amp; Review

After Arbitration: Filing Exceptions with the Authority July 18, 2017 Filing Exceptions with the

Lecture 11: Exceptions &amp; processor management Exceptions Operating systems main

EE 457 Unit 8 Exceptions What Happens When Things Go Wrong 2 What are Exceptions?

COMP 213 Advanced Object-oriented Programming Lecture 18 Exceptions Exceptions Weve seen

Chapter 10 Exceptions Chapter Scope The purpose of exceptions Exception messages The

redo logging (fjnish) / distributed systems 1 1 last time (1) block groups keep related

Performance, Correctness, Exceptions: Pick Three Andrea Gussoni , Alessandro Di Federico, Pietro

CLIC detector requirements and technologies first comparison with the pp case Lucie Linssen,

Scientific Computing II Parallel Methods Jens Saak and Martin K ohler Summer Term 2017 OVGU

Key observation 1 Key observation 2 Not all variables and statements affect the

Tree vs. Line Observation-Based Slicing Dave Binkley Joint work with Nicolas Gold, Syed

Scaling up HBase Mahdi Roozbahani Lecturer, Computational Science and Engineering, Georgia Tech

Temporal Alignment os 1 ohlen 1 Johann Gamper 2 Anton Dign Michael H. B 1 University of Z

THE SEN2AGRI THE SEN2AGRI SYSTEM DATABASE SYSTEM DATABASE WHO? WHO? Laureniu Nicola, CS

Last Time Response time analysis Blocking terms Priority inversion And solutions

Week 15 Exceptions Linear Search Binary Search Exceptions, Sorting, Searching & Review

Lecture 11: Exceptions & processor management Exceptions Operating systems main