Porting Nouveau to Tegra K1 How NVIDIA became a Nouveau contributor - PowerPoint PPT Presentation

Porting Nouveau to Tegra K1 How NVIDIA became a Nouveau contributor Alexandre Courbot, NVIDIA FOSDEM 2015

The Story So Far... In 2014 NVIDIA released the Tegra K1 SoC ● 32 bit quad-core or 64-bit dual core ARM ● 192-cores low-power Kepler GPU (OpenGL 4.3, CUDA) ● Desktop Kepler already supported by Nouveau 2014/02/01: NVIDIA to contribute Nouveau GK20A support

(Incomplete) Credits NVIDIANs: Thierry Reding Ken Adams Terje Bergström Lauri Peltonen Gregory Roth Stephen Warren Vince Hsu Mark Zhang … and the whole Nouveau community!

Outline GK20A/Nouveau overview Nouveau bringup on Tegra K1 Challenges with memory management Engines layout on Tegra User-space (Mesa)

GK20A Overview Fully-featured Kepler part with unified shaders and per- process virtualization of the GPU ● Each process gets its own GPU context ● Memory virtualized per-context ● Graphics jobs submitted by user-space using pushbuffers

Nouveau Architecture Supports GPUs from Riva TNT (1998) to Maxwell (2014) ● Extremely modular ● GPU literally an assembling of engines and sub-devices Supporting GK20A means ● Finding/writing engines/subdevs for the chip ● Allowing Nouveau to run on Tegra

Platform Bus Support Nouveau expects the GPU to be on a PCI bus ● Provides GPU registers & BARs I/O addresses ● pci_map_page() used to map system RAM to GPU Abstract the bus and add platform bus support ● I/O addresses provided by Device Tree ● Replace deprecated pci_map_page() with DMA API → Nouveau can be instantiated from PCI or Device Tree

No VBIOS Video BIOS provides useful information (e.g. voltage tables for DVFS) and also performs critical initialization ● Alternate way to provide power information via per- chip static tables ● Perform necessary initialization for GK20A in-driver

No VRAM GK20A has no video memory of its own ● GPU is a direct client of Tegra’s Memory Controller ● Free and direct access to system memory ● Huge consequences for the driver

Address Translation on Desktop Kepler CPU VA CPU PA BAR1 System RAM System RAM Video RAM PCIe Bus GPU VA

Nouveau Memory Model ● 2 allocation targets: ● VRAM ● TT (system memory mapped to GPU) ● Target specified at buffer creation time ● Coherency maintained thanks to BAR1 (for VRAM) and PCIe (for TT)

Address Translation on Mobile Kepler CPU VA CPU PA BAR1 System RAM System RAM IOMMU VA GPU VA

Mobile Kepler Memory Model ● No more dedicated video memory ● All allocations in system memory ● Not a carve out! ● No coherency between CPU and GPU ● Must flush/invalidate CPU cache ourselves

Living Without VRAM How to handle VRAM allocations? ● Emulate VRAM? ● Sub-optimal memory management ● Dismiss VRAM allocations altogether? ● Requires more changes in the kernel & Mesa Decision taken to not use a RAM device for GK20A ● Better reflects reality, simplifies memory management ● User-space needs to be aware of no-VRAM devices

Using IOMMU IOMMU introduces a second level of address translation ● Useful to “flatten” context objects ● Instance blocks, PGTs, etc. ● Also allows to maximize large page usage on the GPU ● IOMMU more efficient than GPU MMU RAM RAM IOMMU VA GPU VA GPU VA

CPU/GPU Coherency ● Handled transparently by PCIe for desktop ● No such thing on Tegra: explicitly flush/invalidate buffer objects (DMA API) ● New flag for objects that must always be coherent ● Fences, GPFIFOs ● ARM makes things more difficult ● A memory page cannot be mapped twice with different attributes ● Kernel already maps lowmem (first 760MB) cached ● Cannot remap this memory with uncached attribute

Multiple CPU Mappings Coherency How to address the coherency issue? ● Use GPU path when writing coherent buffers ● PRAMIN window (slow) ● BAR1 (relatively scarce resource) ● Allocate coherent buffers using DMA API ● dma_alloc_coherent() can fix the lowmem mapping ● end up with permanent kernel mapping

Engines Layout ● GeForce GTX 680 (GK104) provides a graphics engine (GR), display controllers, 3 copy engines, video decoder, video encoder, VRAM, ... ● GK20A only includes a graphics engine ● Other functions already provided by different Tegra IPs Tegra Discrete GPU Nouveau V4L2 Nouveau TegraDRM ENC ... GR DISP ENC GR DISP VRAM System RAM tegra-mc

Engines Layout PRIME support is critical for this setup ● Export required to display GPU buffers Tegra K1 perfect fit for render-nodes ● card0 (tegradrm) is the display device ● renderD128 (nouveau) is the render device → requires support at application or Mesa level

Who Should Provide Memory? The first driver in the chain? V4L2 Nouveau V4L2 TegraDRM A neutral allocator? (e.g. ION) CAM GR ENC DISP Why should each driver have System RAM its own allocator? tegra-mc How to handle different engines capabilities?

User-space (Mesa) changes ~25 LoC changed to recognize GK20A … and Mesa fully works Some work required to avoid VRAM allocations Some more work to integrate seamlessly with tegradrm?

Conclusion GK20A close to work out-of-the-box with Nouveau Remaining tasks: ● Firmware distribution ● A few more kernel and Mesa patches pending Great experience working with the Nouveau community ● Plans to keep contributing support for future Tegra SoCs

Thank you! https://github.com/NVIDIA/tegra-nouveau-rootfs

Porting Nouveau to Tegra K1 How NVIDIA became a Nouveau contributor - PowerPoint PPT Presentation

Porting Nouveau to Tegra K1 How NVIDIA became a Nouveau contributor Alexandre Courbot, NVIDIA FOSDEM 2015 The Story So Far... In 2014 NVIDIA released the Tegra K1 SoC 32 bit quad-core or 64-bit dual core ARM 192-cores low-power

Nouveau The overdue Status update Karol Herbst Karol Herbst Nouveau 1 / 15 Goal Reliable

Nouveau Recap, on-going and future work Martin Peres & the Nouveau community Ph.D. student

Nouveau Recap, on-going and future work Karol Herbst, Pierre Moreau & Martin Peres Nouveau

Porting Go to NetBSD/arm64 Maya Rashish <coypu@sdf.org> Porting Go to NetBSD/arm64

Art Nouveau , ornamental style of art that flourished between about 1890 and 1910 throughout

SIGGRAPH 2013 Shaping the Future of Visual Computing TEGRA: Attacking Mobile Entertainment with

A High-Precision GPU, CPU and Memory Power Model for the Tegra K1 SoC Kristoffer Robin Stokke

A High-Precision, Hybrid GPU, CPU and RAM Power Model for the Tegra K1 SoC Kristoffer Robin

Porting Porting Biological Biological Applications Applications in Grid: An in Grid: An

PORTING THE HAMMER FILE SYSTEM TO LINUX Daniel Lorch June 10, 2009 Outline 2/13 Motivation 1.

Porting OpenVMS to x86-64 Update Clair Grant Camiel Vanderhoeven April 8, 2016 Porting OpenVMS

Challenges in Application Porting and Abstraction Presented by: Raj Johnson, President & CEO

Porting GASNet to Portals: Porting GASNet to Portals: Partitioned Global Address Space (PGAS)

Prex: Finding Guidance for Forward and Backward Porting of Linux Device Drivers Julia Lawall,

Security- -Enhanced Darwin: Enhanced Darwin: Security Porting SELinux to Mac OS X Porting

Porting LLVM to a new OS Kai Nacke 31 January 2016 LLVM devroom @ FOSDEM16 Porting LLVM

Generating Top-N Recommendations from Binary Profile Data Michael Hahsler Marketing Research and

Experiments on Siegel modular forms of genus 2 (Not only on the Paramodular Conjecture) Nathan

I W C T 2 0 1 7 C O M B I N AT O R I A L T E S T I N G ( C T ) CT tries to select test

Closed-End Fund Advisors Diversification, Income, & Tactical Management Introduction to

De-ghosting for Gigapixel Snapshot Processing Alexandros-Stavros Iliopoulos 1 Jun Hu 1 Nikos

Signaling Networks The word is the shadow of the deed. - Democritus Democritus The Laughing

Hierarchical Reinforcement Learning and Human Behavior Matthew Botvinick Princeton Neuroscience

Robotics Engineering Prof. Michael Gennert Robotics Engineering Program Director Fall 2016

Porting Nouveau to Tegra K1 How NVIDIA became a Nouveau contributor - PowerPoint PPT Presentation

Porting Nouveau to Tegra K1 How NVIDIA became a Nouveau contributor Alexandre Courbot, NVIDIA FOSDEM 2015 The Story So Far... In 2014 NVIDIA released the Tegra K1 SoC 32 bit quad-core or 64-bit dual core ARM 192-cores low-power

Nouveau The overdue Status update Karol Herbst Karol Herbst Nouveau 1 / 15 Goal Reliable

Nouveau Recap, on-going and future work Martin Peres &amp; the Nouveau community Ph.D. student

Nouveau Recap, on-going and future work Karol Herbst, Pierre Moreau &amp; Martin Peres Nouveau

Porting Go to NetBSD/arm64 Maya Rashish &lt;coypu@sdf.org&gt; Porting Go to NetBSD/arm64

Art Nouveau , ornamental style of art that flourished between about 1890 and 1910 throughout

SIGGRAPH 2013 Shaping the Future of Visual Computing TEGRA: Attacking Mobile Entertainment with

A High-Precision GPU, CPU and Memory Power Model for the Tegra K1 SoC Kristoffer Robin Stokke

A High-Precision, Hybrid GPU, CPU and RAM Power Model for the Tegra K1 SoC Kristoffer Robin

Porting Porting Biological Biological Applications Applications in Grid: An in Grid: An

PORTING THE HAMMER FILE SYSTEM TO LINUX Daniel Lorch June 10, 2009 Outline 2/13 Motivation 1.

Porting OpenVMS to x86-64 Update Clair Grant Camiel Vanderhoeven April 8, 2016 Porting OpenVMS

Challenges in Application Porting and Abstraction Presented by: Raj Johnson, President &amp; CEO

Porting GASNet to Portals: Porting GASNet to Portals: Partitioned Global Address Space (PGAS)

Prex: Finding Guidance for Forward and Backward Porting of Linux Device Drivers Julia Lawall,

Security- -Enhanced Darwin: Enhanced Darwin: Security Porting SELinux to Mac OS X Porting

Porting LLVM to a new OS Kai Nacke 31 January 2016 LLVM devroom @ FOSDEM16 Porting LLVM

Generating Top-N Recommendations from Binary Profile Data Michael Hahsler Marketing Research and

Experiments on Siegel modular forms of genus 2 (Not only on the Paramodular Conjecture) Nathan

I W C T 2 0 1 7 C O M B I N AT O R I A L T E S T I N G ( C T ) CT tries to select test

Closed-End Fund Advisors Diversification, Income, &amp; Tactical Management Introduction to

De-ghosting for Gigapixel Snapshot Processing Alexandros-Stavros Iliopoulos 1 Jun Hu 1 Nikos

Signaling Networks The word is the shadow of the deed. - Democritus Democritus The Laughing

Hierarchical Reinforcement Learning and Human Behavior Matthew Botvinick Princeton Neuroscience

Robotics Engineering Prof. Michael Gennert Robotics Engineering Program Director Fall 2016

Nouveau Recap, on-going and future work Martin Peres & the Nouveau community Ph.D. student

Nouveau Recap, on-going and future work Karol Herbst, Pierre Moreau & Martin Peres Nouveau

Porting Go to NetBSD/arm64 Maya Rashish <coypu@sdf.org> Porting Go to NetBSD/arm64

Challenges in Application Porting and Abstraction Presented by: Raj Johnson, President & CEO

Closed-End Fund Advisors Diversification, Income, & Tactical Management Introduction to