Sync Points in the Intel Gfx Driver Jesse Barnes Intel Open Source - PowerPoint PPT Presentation

Sync Points in the Intel Gfx Driver Jesse Barnes Intel Open Source Technology Center 1

Agenda ● History and other implementations ○ Other I/O layers - block device ordering ○ NV_fence, ARB_sync ○ EGL_native_fence_sync, Android Sync Framework ○ DMA fence ● Current i915 state of affairs ● Motivation and requirements ● Explicit sync in i915 2

Questions to keep in mind ● What if… ○ you don’t have buffer handles or explicit buffer allocation? ○ you just pass the driver a pointer to a command stream with no additional info? ○ you’re using direct command submission from GL, CL, or media without kernel driver involvement? ○ you want to allow some user space scheduling in your display server (e.g. Wayland, SurfaceFlinger)? ● How do I… ○ debug performance problems or lockups? ○ synchronize execution between different hardware blocks? 3

Block devices ● I/O barriers on storage used for things like journaling filesystems ○ Write metadata, barrier metadata, write data, or similar ○ Tough to implement on some storage systems due to lack of physical medium flush ○ Not exported as a separate object for IPC or inter-driver sync ○ Exists only in I/O stream for targeted block device 4

Block devices (cont) r B w w w r/w stream from app. Previous read and write must complete prior to later writes due to barrier. 5

NV_fence ● Ancient history - added to nVidia’s GL 1.2.1 circa 2000 ● Extended GL with a “partial finish” mechanism ● Useful for coordinating access to buffers shared between CPU and GPU without doing a glFinish() on a whole bunch of commands 6

NV_fence (cont) r/w access buffer context A 1 Once fence 1 from context A has passed, CPU can access buffer context B 1 contents or use it in another batch context C 5 GPU execution fence GPU code 7

ARB_sync ● Slightly less ancient - added to GL 3.2 circa 2009 ● Similar to NV_fence with some changes ● Adds client/server distinction, allowing client to continue running while server blocks for completion ● Namespace shared across contexts ● Again, useful for CPU/GPU memory sharing situations 8

ARB_sync (cont) r/w access buffer context A 1 Process could issue a blocking wait on any fence, or ask the display server to context B 2 block instead, allowing the process to continue building and queuing commands without waiting. context C 5 GPU execution fence GPU code 9

EGL_native_fence_sync ● Added by Android folks at Google circa 2012 ● Designed to sit on top of underlying OS sync object support ● Extends EGL_fence_sync with underlying FDs ● Uses Android Sync Framework underneath on Android 10

Android Sync Framework ● Added to Android, currently in staging branch ● Designed to support multiple kernel drivers ● Allows inter-process and inter-device synchronization ● Exposes userland ABI for waiting on and merging sync fences, as well as debug ● Actual sync fences created and exported by individual drivers ● Internals use one “timeline” per command streamer or logical engine in each device/driver (e.g. render engine batches, display flips/vblanks, camera frames) 11

Android Sync Framework (cont) GPU command complete Camera frame ready GPU render pipeline camera pipeline Video frame ready Each engine has an associated timeline, and maybe one per logical context as well. video decode pipeline Tracked with sequence numbers or some other hardware status indicator. 12

DMA fences ● Upstream solution (thanks Rob & Maarten!) ● Comparable to Android Sync Framework internals ● Simplified to a single fence struct with signaling and other callbacks ● Used in nouveau, radeon, and other drivers for internal command tracking ● Replaces a lot of similar code across drivers for seqno & batch tracking 13

Current i915 status ● Doesn’t use DMA fences ● All synchronization is implicit ○ Except in Android devices, which have sync framework support ● Sync is done using buffers ● Submission is also ordered, no scheduling (yet) ● Easy for userspace to use, but otoh easy to add bubbles to the pipeline ● Buffers can be used for explicit sync using buffer busy queries (see SNA) and buffer sharing ○ Downside is extra complexity for shared buffers, as you don’t want to fully synchronize on those 14

Explicit synchronization ● Buffer independent sync allows for the items above ● i915 plans (currently underway by Tvrtko) ○ add flag to execbuf to allow the return of a sync fence ○ sync fence will support Android Sync Framework ABI ○ internals will use DMA fence objects ○ other entry points (page flip, mode set) will optionally return sync fences as well ○ allows for asynchronous mode sets and flips with contingent completion ○ execbuf and other entry points will take sync fences to allow for internal sync and good pipeline utilization ○ GPU scheduler will be added as well, futher re-ordering requests relative to current behavior ● DRI/i965 ○ ARB_sync could be implemented in terms of sync fences 15

Questions answered ● What if… ○ you don’t have buffer handles? Add a sync fence to your command stream. ○ you just pass the driver a pointer to a command stream with no additional info? Get a sync fence back from the command submission. ○ you’re using ring3 direct submission without kernel driver involvement? Request a sync fence from the kernel driver when needed. ○ you want to allow some user space scheduling in your display server (e.g. Wayland, SurfaceFlinger)? Send your sync fences to the display server, allowing it to intelligently pick buffers to use and schedule work. ● How do I… ○ debug performance problems or lockups? Track sync fences between processes and in the kernel. ○ synchronize execution between different hw blocks? Use sync fences in userspace and/or in the kernel. 16

Sync Points in the Intel Gfx Driver Jesse Barnes Intel Open Source - PowerPoint PPT Presentation

Sync Points in the Intel Gfx Driver Jesse Barnes Intel Open Source Technology Center 1 Agenda History and other implementations Other I/O layers - block device ordering NV_fence, ARB_sync EGL_native_fence_sync, Android

Intel GFX CI and IGT What services do we provide, our roadmaps, and lessons learnt! Martin Peres

Intel GFX CI Doing validation the Linux Way Martin Peres - Intels Open Source Graphics Center

Rethink the Sync! Rethink the Sync! Edmund B. Nightingale Kaushik Veeraraghavan Peter M. Chen

Alberta Transportation Driver Fitness and Monitoring Mature Driver Medical Examinations Driver

Driver Group plc Results for the six months to March 2019 Driver Group plc 2 Business Profile

NHTSA Novice Driver Initiatives NHTSA Novice Driver Initiatives Driver Education Driver

Driver Group plc Results for the year to 3 0 Septem ber 2 0 1 9 Driver Group plc 2 Business

A starting point for ToR Vehicle has a driver Steering from Only vehicle systems Driver is able to

Introduction to the Microsoft Sync Framework Michael Clark Development Manager Microsoft Agenda

Intel e1000 Ethernet Controller Driver Intel e1000 controller Conclusion Ivan D elalande

Understanding driver/pedestrian conflicts: Driver Understanding driver/pedestrian conflicts:

State of the Intel Kernel Graphics Driver Daniel Vetter, Intel OTC LinuxTag Berlin 2014

? sync ref chosen as sync source by Listener Stream B: Presentation Stream C: timestamps

Sync Cylinder & Room Cylinder Replacement and Purge Instructions for Dual Cylinder Slides

COLLECTIONS WITH ALMA PUBLISHING April 27, 2020 Nicole Swanson, CARLI OCLC DATA SYNC COLLECTIONS

Maintaining an Out-of-Tree Driver and an Upstream Driver Simultaneously (with minimal pain)

On quantitative absolute continuity of harmonic measure and big piece approximation by chord-arc

Doing big.LITTLE right: little and big obstacles Uladizislau Rezki, Vitaly Wool Softprise

Testing LDAP Implementations Emmanuel Lcharny Do who need tests anyway ? OSS projects don't

Cooks Theorem 1 Cook showed that SATISFIABILITY is NP-complete. The terms used to specify it

Representing and Learning Regular Sets and Functions Jeffrey Heinz Department of Linguistics and

Ishiharas Contributions to Constructive Analysis Douglas S. Bridges University of Canterbury,

Experimental results on quark-gluon correlations Anselm Vossen Thanks for helpful discussions

Challenges to Cartesian materialism: Understanding consciousness, naturalism and the mind-world

Sync Points in the Intel Gfx Driver Jesse Barnes Intel Open Source - PowerPoint PPT Presentation

Sync Points in the Intel Gfx Driver Jesse Barnes Intel Open Source Technology Center 1 Agenda History and other implementations Other I/O layers - block device ordering NV_fence, ARB_sync EGL_native_fence_sync, Android

Intel GFX CI and IGT What services do we provide, our roadmaps, and lessons learnt! Martin Peres

Intel GFX CI Doing validation the Linux Way Martin Peres - Intels Open Source Graphics Center

Rethink the Sync! Rethink the Sync! Edmund B. Nightingale Kaushik Veeraraghavan Peter M. Chen

Alberta Transportation Driver Fitness and Monitoring Mature Driver Medical Examinations Driver

Driver Group plc Results for the six months to March 2019 Driver Group plc 2 Business Profile

NHTSA Novice Driver Initiatives NHTSA Novice Driver Initiatives Driver Education Driver

Driver Group plc Results for the year to 3 0 Septem ber 2 0 1 9 Driver Group plc 2 Business

A starting point for ToR Vehicle has a driver Steering from Only vehicle systems Driver is able to

Introduction to the Microsoft Sync Framework Michael Clark Development Manager Microsoft Agenda

Intel e1000 Ethernet Controller Driver Intel e1000 controller Conclusion Ivan D elalande

Understanding driver/pedestrian conflicts: Driver Understanding driver/pedestrian conflicts:

State of the Intel Kernel Graphics Driver Daniel Vetter, Intel OTC LinuxTag Berlin 2014

? sync ref chosen as sync source by Listener Stream B: Presentation Stream C: timestamps

Sync Cylinder &amp; Room Cylinder Replacement and Purge Instructions for Dual Cylinder Slides

COLLECTIONS WITH ALMA PUBLISHING April 27, 2020 Nicole Swanson, CARLI OCLC DATA SYNC COLLECTIONS

Maintaining an Out-of-Tree Driver and an Upstream Driver Simultaneously (with minimal pain)

On quantitative absolute continuity of harmonic measure and big piece approximation by chord-arc

Doing big.LITTLE right: little and big obstacles Uladizislau Rezki, Vitaly Wool Softprise

Testing LDAP Implementations Emmanuel Lcharny Do who need tests anyway ? OSS projects don't

Cooks Theorem 1 Cook showed that SATISFIABILITY is NP-complete. The terms used to specify it

Representing and Learning Regular Sets and Functions Jeffrey Heinz Department of Linguistics and

Ishiharas Contributions to Constructive Analysis Douglas S. Bridges University of Canterbury,

Experimental results on quark-gluon correlations Anselm Vossen Thanks for helpful discussions

Challenges to Cartesian materialism: Understanding consciousness, naturalism and the mind-world

Sync Cylinder & Room Cylinder Replacement and Purge Instructions for Dual Cylinder Slides