estrutura do tema avalia o de desempenho ia32
play

Estrutura do tema Avaliao de Desempenho (IA32) Soma Int Acesso a - PowerPoint PPT Presentation

O correr do tempo Avaliao de Desempenho na perspectiva de um computador no IA32 (6) Escala de Tempo (Mquina de 1 Ghz ) Microscpica Macroscpica Estrutura do tema Avaliao de Desempenho (IA32) Soma Int Acesso a Disco


  1. O correr do tempo Avaliação de Desempenho na perspectiva de um computador no IA32 (6) Escala de Tempo (Máquina de 1 Ghz ) Microscópica Macroscópica Estrutura do tema Avaliação de Desempenho (IA32) Soma Int Acesso a Disco Multiplicação FP Refresh Monitor Rotina de Divisão FP Teclar Interrupção 1. A avaliação de sistemas de computação Teclado 1 µ s 1 ns 1 ms 1 s 2. Técnicas de optimização de código (IM) 1.E-09 1.E-06 1.E-03 1.E+00 Tempo (seg) 3. Técnicas de optimização de hardware Os próximos slides foram adaptados da aula do Prof. Bryant em 2002 • Escalas fundamentais de tempo: 4. Técnicas de optimização de código (DM) ~10 –9 seg. • Implicações – Processador: 5. Outras técnicas de optimização –pode executar várias instr – Eventos externos: ~10 –2 seg. enquanta espera que 6. Medição de tempos • Keyboard input ocorram eventos externos –pode alternar execução • Disk seek entre código de vários proc • Screen refresh sem ser notado AJProença, Arquitectura de Computadores, LMCC, UMinho, 2003/04 1 AJProença, Arquitectura de Computadores, LMCC, UMinho, 2003/04 2 Measurement Challenge “Time” on a Computer System • How Much Time Does Program X Require? – CPU time real (wall clock) time • How many total seconds are used when executing X ? • Measure used for most applications = user time (time executing instructions in the user process) • Small dependence on other system activities – Actual (“Wall”) Time = system time (time executing instructions in kernel on behalf – How many seconds elapse between the start and the of user process) completion of X? – Depends on system load, I/O times, etc. = some other user’s time (time executing instructions in different user’s process) • Confounding Factors – How does time get measured? + + = real (wall clock) time – Many processes share computing resources • Transient effects when switching from one process to We will use the word “time” to refer to user time . another • Suddenly, the effects of alternating among processes cumulative user time become noticeable AJProença, Arquitectura de Computadores, LMCC, UMinho, 2003/04 3 AJProença, Arquitectura de Computadores, LMCC, UMinho, 2003/04 4

  2. Activity Periods: Heavy Load Activity Periods: Light Load Activity Periods, Load = 1 Activity Periods, Load = 2 Active Active 1 1 Inactive Inactive 0 10 20 30 40 50 60 70 80 Time (ms) 0 10 20 30 40 50 60 70 80 – Other interrupts Time (ms) – Most of the time spent executing one process • Due to I/O activity – Sharing processor with one other active process – Periodic interrupts every 10ms – Inactivity periods – From perspective of this process, system appears to be • Interval timer • System time spent “inactive” for ~50% of the time • Keep system from executing processing interrupts • Other process is executing one process to exclusion of • ~250,000 clock cycles others AJProença, Arquitectura de Computadores, LMCC, UMinho, 2003/04 5 AJProença, Arquitectura de Computadores, LMCC, UMinho, 2003/04 6 Interval Counting Unix time Command • OS Measures Runtimes Using Interval Timer time make osevent gcc -O2 -Wall -g -march=i486 -c clock.c – Maintain 2 counts per process gcc -O2 -Wall -g -march=i486 -c options.c • User time gcc -O2 -Wall -g -march=i486 -c load.c • System time gcc -O2 -Wall -g -march=i486 -o osevent osevent.c . . . – Each time: (i) get timer interrupt, (ii) increment counter for executing process 0.820u 0.300s 0:01.32 84.8% 0+0k 0+0io 4049pf+0w • User time if running in user mode • System time if running in kernel mode – 0.82 seconds user time (a) Interval Timings (a) Interval Timings (a) Interval Timings • 82 timer intervals A A A B B B A A A B B B A A A A A A 110u + 40s 110u + 40s 110u + 40s – 0.30 seconds system time B B B 70u + 30s 70u + 30s 70u + 30s Au Au Au As Bu Bs Bu Bu Bu Bu As Au Au Au Au Au Bs Bu Bu Bs Au Au Au As As Au Au Au As Bu Bs Bu Bu Bu Bu As Au Au Au Au Au Bs Bu Bu Bs Au Au Au As As Au Au Au As Bu Bs Bu Bu Bu Bu As Au Au Au Au Au Bs Bu Bu Bs Au Au Au As As • 30 timer intervals (b) Actual Times (b) Actual Times (b) Actual Times – 1.32 seconds wall time A A A A A A A A A A A A 120.0u + 33.3s 120.0u + 33.3s 120.0u + 33.3s – 84.8% of total was used running these processes B B B 73.3u + 23.3s 73.3u + 23.3s 73.3u + 23.3s B B B B B B Example • (.82+0.3)/1.32 = .848 0 0 0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 AJProença, Arquitectura de Computadores, LMCC, UMinho, 2003/04 7 AJProença, Arquitectura de Computadores, LMCC, UMinho, 2003/04 8

  3. Accuracy of Interval Counting (1) Accuracy of Int. Counting (2) A A A A • Computed time = 70ms • Computed time = 70ms Minimum Minimum Minimum Minimum Min Actual = 60 + ε Min Actual = 60 + ε • • Maximum Maximum Maximum Maximum A A A A Max Actual = 80 – ε Max Actual = 80 – ε • • 0 0 10 20 30 40 50 60 70 80 10 20 30 40 50 60 70 80 0 0 10 20 30 40 50 60 70 80 10 20 30 40 50 60 70 80 • Worst Case Analysis • Average Case Analysis – Timer Interval = δ – Over/underestimates tend to balance out – Single process segment measurement can be off by ±δ – As long as total run time is sufficiently large – No bound on error for multiple segments • Min run time ~1 second • 100 timer intervals • Could consistently underestimate, or consistently overestimate – Consistently miss 4% overhead due to timer interrupts AJProença, Arquitectura de Computadores, LMCC, UMinho, 2003/04 9 AJProença, Arquitectura de Computadores, LMCC, UMinho, 2003/04 10 Cycle Counters Cycle Counter Period – Most modern systems have built in registers that • Wrap Around Times for 550 MHz machine are incremented every clock cycle – Low order 32 bits wrap around every 2 32 / (550 * 10 6 ) = 7.8 seconds • Very fine grained • Maintained as part of process state – High order 64 bits wrap around every 2 64 / (550 * 10 6 ) = 33539534679 seconds – In Linux, counts elapsed global time • 1065 years – Special assembly code instruction to access – On (recent model) Intel machines: • For 2 GHz machine • 64 bit counter. – Low order 32-bits every 2.1 seconds • RDTSC instruction sets %edx to high order 32-bits, – High order 64 bits every 293 years %eax to low order 32-bits AJProença, Arquitectura de Computadores, LMCC, UMinho, 2003/04 11 AJProença, Arquitectura de Computadores, LMCC, UMinho, 2003/04 12

  4. Measuring with Cycle Counter Accessing the Cycle Counter (1) •Idea – GCC allows inline assembly code with mechanism for –Get current value of cycle counter matching registers with program variables • store as pair of unsigned’s cyc_hi and cyc_lo – Code only works on x86 machine compiling with GCC –Compute something –Get new value of cycle counter void access_counter(unsigned *hi, unsigned *lo) –Perform double precision subtraction to get elapsed cycles { /* Get cycle counter */ asm("rdtsc; movl %%edx,%0; movl %%eax,%1" /* Keep track of most recent reading of cycle counter */ static unsigned cyc_hi = 0; : "=r" (*hi), "=r" (*lo) static unsigned cyc_lo = 0; : /* No input */ : "%edx", "%eax"); void start_counter() } { /* Get current value of cycle counter */ access_counter(&cyc_hi, &cyc_lo); –Emit assembly with rdtsc and two movl instructions } AJProença, Arquitectura de Computadores, LMCC, UMinho, 2003/04 13 AJProença, Arquitectura de Computadores, LMCC, UMinho, 2003/04 14 Closer Look at Extended ASM (1) Closer Look at Extended ASM (2) asm(“ Instruction String " asm(“ Instruction String " : Output List : Output List : Input List : Input List : Clobbers List ); : Clobbers List ); } } void access_counter void access_counter (unsigned *hi, unsigned *lo) (unsigned *hi, unsigned *lo) { { /* Get cycle counter */ /* Get cycle counter */ asm("rdtsc; movl %%edx,%0; movl %%eax,%1" asm("rdtsc; movl %%edx,%0; movl %%eax,%1" : "=r" (*hi), "=r" (*lo) : "=r" (*hi), "=r" (*lo) : /* No input */ : /* No input */ : "%edx", "%eax"); : "%edx", "%eax"); } Output List } – Expressions indicating destinations for values %0 , %1 , …, % j Instruction String • Enclosed in parentheses – Series of assembly commands • Must be lvalue • Separated by “ ; ” or “ \n ” – Value that can appear on LHS of assignment • Use “ %% ” where normally would use “ % ” – Tag "=r" indicates that symbolic value ( %0 , etc.), should be replaced by register AJProença, Arquitectura de Computadores, LMCC, UMinho, 2003/04 15 AJProença, Arquitectura de Computadores, LMCC, UMinho, 2003/04 16

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend