1 Original Frame Rate Original Frame Rate Instantaneous Frame Rate - PDF document

Real- -Time Rendering Time Rendering Real- Real -Time Rendering Time Rendering Real Performance Analysis (Echtzeitgraphik ( Echtzeitgraphik) ) Performance Analysis and Characterization and Characterization Dr. Michael Wimmer Dr. Michael Wimmer wimmer@cg.tuwien.ac.at wimmer@cg.tuwien.ac.at What for? What for? Overview Overview � � If you want to improve performance � Performance Analysis � If you want to improve performance… … Performance Analysis � ... you have to be able to analyze it! ... you have to be able to analyze it! � Which tools to measure performance? Which tools to measure performance? � � � � Peek at what other people are doing! � � Performance Characterization 1 Peek at what other people are doing! Performance Characterization 1 � Understand influence of scene design � � Characterize general properties of Characterize general properties of Understand influence of scene design � scenes and and hardware architectures hardware architectures scenes � Understand influence of hardware � Understand influence of hardware � � Performance Characterization 2 Performance Characterization 2 � Characterize and find Characterize and find bottlenecks bottlenecks � � Will include some optimization tips � Will include some optimization tips… … � � Optimization Optimization � Will mostly be result of the above Will mostly be result of the above � Michael Wimmer 3 Vienna University of Technology Michael Wimmer 4 Vienna University of Technology Analysis Tools Analysis Tools Frame Rate Calculation Frame Rate Calculation � � Framerate � Running average � Framerate logging logging Running average � DIY (do it yourself), FRAPS DIY (do it yourself), FRAPS � Great for a quick look Great for a quick look � � � Call tracing/logging � � Obscures spikes over a few frames Obscures spikes over a few frames Call tracing/logging � � � Per frame FPS calculation Per frame FPS calculation � GLTrace GLTrace � � External profilers � � “ “Instantaneous FPS Instantaneous FPS” ” External profilers � � High accuracy High accuracy � � � VTune VTune, Quantify , Quantify � Lots of data Lots of data � � Internal profiling (fine � Internal profiling (fine- -grained) grained) � Graph it out on top of your app Graph it out on top of your app � � RDTSC RDTSC � � Log it to a file Log it to a file � � Driver profiling � Driver profiling � Only available in Direct3D for now Only available in Direct3D for now… … � Michael Wimmer 5 Vienna University of Technology Michael Wimmer 6 Vienna University of Technology 1

Original Frame Rate Original Frame Rate Instantaneous Frame Rate Instantaneous Frame Rate 60 60 50 50 Average looks pretty good Average looks pretty good In reality, a little noisy In reality, a little noisy 40 40 FPS FPS FPS FPS 30 30 20 20 10 10 0 0 1 1 n n Frames Frames Frames Frames Michael Wimmer 7 Vienna University of Technology Michael Wimmer 8 Vienna University of Technology FRAPS FRAPS GLTrace GLTrace � � Displays frame rate for � Can log � Displays frame rate for any any OpenGL app OpenGL app Can log all all OpenGL calls for any app OpenGL calls for any app � � Gives call counts � by intercepting calls to opengl32.dll by intercepting calls to opengl32.dll Gives call counts � � Average over last few frames � Average over last few frames � � Allows reverse engineering (also of models!) Allows reverse engineering (also of models!) � Has file logging � Has file logging � Cheating � Cheating… … Application Application � Small � Small (wireframe wireframe) ) ( performance hit performance hit � See VU � GLTrace- GLTrace - See VU- -page for page for gltrace.txt gltrace.txt � � Good for quick Good for quick opengl32.dll opengl32.dll link link… … comparisons comparisons � Can use trace for � Can use trace for � � www.fraps.com original original- - www.fraps.com simulation simulation! ! opengl32.dll opengl32.dll Michael Wimmer 9 Vienna University of Technology Michael Wimmer 10 Vienna University of Technology Example Trace (1338 Frames) Example Trace (1338 Frames) External Profiling – External Profiling – Sampling Sampling 738541 738541 glVertex3fv glVertex3fv � � Based on Based on sampling sampling at regular intervals 728673 728673 glTexCoord2fv glTexCoord2fv 224682 224682 glColor4fv glColor4fv � � Example: Intel Example: Intel VTune VTune 206474 206474 glNormal3fv glNormal3fv 201074 201074 glCallList glCallList 180574 180574 glBegin glBegin � Expensive, only Intel processors � Expensive, only Intel processors 180574 180574 glEnd glEnd � � How much time is spent in 168356 168356 glBindTextureEXT glBindTextureEXT How much time is spent in… … 22659 22659 glEnable glEnable 21150 21150 glMaterialfv glMaterialfv � OS OS � 20557 20557 glDisable glDisable 9622 9622 glShadeModel glShadeModel � Other applications Other applications � 5706 5706 glPopMatrix glPopMatrix 5706 5706 glPushMatrix glPushMatrix � Driver (kernel Driver (kernel- - and user and user- -mode) mode) � 4216 4216 glBlendFunc glBlendFunc Vertices 4326.8 Vertices 4326.8 3478 3478 glMatrixMode glMatrixMode � Application (which function, which line of code) � Application (which function, which line of code) 3164 3164 glLoadIdentity glLoadIdentity Triangles (3D) Triangles (3D) 2535.3 2535.3 � � Pros Pros 3010 3010 glDepthMask glDepthMask Triangles (2D) Triangles (2D) 939.0 939.0 2546 2546 glAlphaFunc glAlphaFunc 2546 2546 glMultMatrixf glMultMatrixf � works with any program, no rebuild necessary works with any program, no rebuild necessary � Fragments Fragments 1353892 1353892 2105 2105 glTexEnvf glTexEnvf 1676 1676 glEndList glEndList � no slowdowns no slowdowns � Image Image 1024× 1024 ×768 768 1676 1676 glNewList glNewList Michael Wimmer 11 Vienna University of Technology Michael Wimmer 12 Vienna University of Technology 2

VTune VTune External Profiling External Profiling – – Instrumentation Instrumentation � � Inserts logging directly into code Inserts logging directly into code � � Example: Rational Quantify Example: Rational Quantify � Pros � Pros � � Very accurate Very accurate � True call list and call graph True call list and call graph � � Cons � Cons � Need to rebuild code Need to rebuild code � � Really slows down execution � Really slows down execution � So slow, it invalidates all off So slow, it invalidates all off- -CPU interaction CPU interaction � � Example: main memory, GPU Example: main memory, GPU � Michael Wimmer 13 Vienna University of Technology Michael Wimmer 14 Vienna University of Technology Quantify Quantify Internal Profiling – Internal Profiling – RDTSC RDTSC � Current clock cycle counter � Current clock cycle counter � � Fine Fine- -grained timing (microseconds) grained timing (microseconds) � Calibrate using � Calibrate using GetTickCount GetTickCount() () � � Take into account overhead of Take into account overhead of rdtsc rdtsc itself! itself! � � Warm up caches (for tight loops) Warm up caches (for tight loops) Michael Wimmer 15 Vienna University of Technology Michael Wimmer 16 Vienna University of Technology Profiling – Profiling – Multitasking effects Multitasking effects Profiling: Seeing Half the Picture Profiling: Seeing Half the Picture � � Be aware of multitasking! Win2K examples: � Profiler runs on the CPU � Be aware of multitasking! Win2K examples: Profiler runs on the CPU Clock tick every 10 ms � � scheduler called � � GPU is a black box � Clock tick every 10 ms scheduler called GPU is a black box � � Thread quantum ~60 ms for foreground apps Thread quantum ~60 ms for foreground apps � � � > 1000 interrupts per clock tick! > 1000 interrupts per clock tick! Main Memory (MM) Main Memory (MM) � Accuracy Accuracy not not better than 1 ms for longer runs better than 1 ms for longer runs � � Consider using higher priority for timing � Consider using higher priority for timing Application Application SetPriorityClass(hProcess SetPriorityClass(hProcess, , Chipset / Chipset / CPU GPU GPU CPU Memory Controller Memory Controller REALTIME_PRIORITY_CLASS); REALTIME_PRIORITY_CLASS); SetThreadPriority(hThread, SetThreadPriority(hThread , THREAD_PRIORITY_TIME_CRITICAL); THREAD_PRIORITY_TIME_CRITICAL); VMM VMM � Beware thread starvation! Beware thread starvation! � Michael Wimmer 17 Vienna University of Technology Michael Wimmer 18 Vienna University of Technology 3

1 Original Frame Rate Original Frame Rate Instantaneous Frame Rate - PDF document

Real- -Time Rendering Time Rendering Real- Real -Time Rendering Time Rendering Real Performance Analysis (Echtzeitgraphik ( Echtzeitgraphik) ) Performance Analysis and Characterization and Characterization Dr. Michael Wimmer Dr.

Real-time video streaming performance: DMA (Linux) kernel buffer queueing dynamics

DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2018 // JENNIFER MA L E C T U R E # 1 3 : F

VIDEO PRODUCTION EASIER THAN YOU THINK IN YOUR ORGANIZATION GEORGE B THOMAS TWITTER:

CPU Scheduling and Memory Management for Interactive Real-Time Applications Shinpei Kato Yutaka

Video and the Video Computer Frame rates Recording Computer Literacy 1 Lecture 21

CSE 115 Introduction to Computer Science I FINAL EXAM Tuesday, December 11, 2018 7:15 PM -

Software Performance Modeling of a Frame Relay Access Device ADRIAN CONWAY GTE Internetworking

Frame Relay Basic Configurations: Hub and Spoke Frame Relay Basic Hub and Spoke Configuration

PWE3 Protocol Layering PWE3 IETF-52 December 12, 2001 Stewart Bryant <stbryant@cisco.com>

draft-briscoe-tsvwg-ecn-encap-guidelines-02 Bob Briscoe , BT John Kaippallimalil, Huawei Pat

Last Lecture: Summary Chapter 5: The Data Link Layer Goals: Overview: Our goals: network

Switching and bridging CSCI 466: Networks Keith Vertanen

Why Are We Here? The combination of ownership diversity and technology diversity is

A New So(ware Architecture for Core Internet Routers Robert Broberg September 16, 2011

A virtual private network (VPN) allows the provisioning of private network services for an

Relay Attacks in EMV Contactless Cards with Android OTS Devices e Vila , Ricardo J. Rodr

Overview/Questions Review: formatting HTML pages Frames Style Sheets 2 1 HTML Frames

What is a Process? Answer 1: a process is an abstraction of a program in execution Answer 2: a

Address Translation Chapter 8 OSPP Part I: Basics Important? Process isolation IPC

FrameNet translation using bilingual dictionaries with evaluation on the English-French pair

ECE 650 Systems Programming & Engineering Spring 2018 Virtual Memory Management Tyler

Neural AMR: Sequence-to- Sequence Models for Parsing and Generation Author: Ioannis Konstas,

(X)HTML & CSS Thierry Sans Client Side HTML Content CSS Presentation Javascript

Whats Next INF 117 Project in Software Engineering Lecture Notes - Spring Quarter, 2008

1 Original Frame Rate Original Frame Rate Instantaneous Frame Rate - PDF document

Real- -Time Rendering Time Rendering Real- Real -Time Rendering Time Rendering Real Performance Analysis (Echtzeitgraphik ( Echtzeitgraphik) ) Performance Analysis and Characterization and Characterization Dr. Michael Wimmer Dr.

Real-time video streaming performance: DMA (Linux) kernel buffer queueing dynamics

DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2018 // JENNIFER MA L E C T U R E # 1 3 : F

VIDEO PRODUCTION EASIER THAN YOU THINK IN YOUR ORGANIZATION GEORGE B THOMAS TWITTER:

CPU Scheduling and Memory Management for Interactive Real-Time Applications Shinpei Kato Yutaka

Video and the Video Computer Frame rates Recording Computer Literacy 1 Lecture 21

CSE 115 Introduction to Computer Science I FINAL EXAM Tuesday, December 11, 2018 7:15 PM -

Software Performance Modeling of a Frame Relay Access Device ADRIAN CONWAY GTE Internetworking

Frame Relay Basic Configurations: Hub and Spoke Frame Relay Basic Hub and Spoke Configuration

PWE3 Protocol Layering PWE3 IETF-52 December 12, 2001 Stewart Bryant &lt;stbryant@cisco.com&gt;

draft-briscoe-tsvwg-ecn-encap-guidelines-02 Bob Briscoe , BT John Kaippallimalil, Huawei Pat

Last Lecture: Summary Chapter 5: The Data Link Layer Goals: Overview: Our goals: network

Switching and bridging CSCI 466: Networks Keith Vertanen

Why Are We Here? The combination of ownership diversity and technology diversity is

A New So(ware Architecture for Core Internet Routers Robert Broberg September 16, 2011

A virtual private network (VPN) allows the provisioning of private network services for an

Relay Attacks in EMV Contactless Cards with Android OTS Devices e Vila , Ricardo J. Rodr

Overview/Questions Review: formatting HTML pages Frames Style Sheets 2 1 HTML Frames

What is a Process? Answer 1: a process is an abstraction of a program in execution Answer 2: a

Address Translation Chapter 8 OSPP Part I: Basics Important? Process isolation IPC

FrameNet translation using bilingual dictionaries with evaluation on the English-French pair

ECE 650 Systems Programming &amp; Engineering Spring 2018 Virtual Memory Management Tyler

Neural AMR: Sequence-to- Sequence Models for Parsing and Generation Author: Ioannis Konstas,

(X)HTML &amp; CSS Thierry Sans Client Side HTML Content CSS Presentation Javascript

Whats Next INF 117 Project in Software Engineering Lecture Notes - Spring Quarter, 2008

PWE3 Protocol Layering PWE3 IETF-52 December 12, 2001 Stewart Bryant <stbryant@cisco.com>

ECE 650 Systems Programming & Engineering Spring 2018 Virtual Memory Management Tyler

(X)HTML & CSS Thierry Sans Client Side HTML Content CSS Presentation Javascript