April 4-7, 2016 | Silicon Valley
Markus Tavenrath Senior Developer Technology Engineer mtavenrath@nvidia.com 4/4/2016
VKCPP Markus Tavenrath Senior Developer Technology Engineer - - PowerPoint PPT Presentation
April 4-7, 2016 | Silicon Valley VKCPP Markus Tavenrath Senior Developer Technology Engineer mtavenrath@nvidia.com 4/4/2016 INTRODUCTION Who am I? Senior Dev Tech Software Engineer - Professional Visualization Joined NVIDIA 8 years ago to
April 4-7, 2016 | Silicon Valley
Markus Tavenrath Senior Developer Technology Engineer mtavenrath@nvidia.com 4/4/2016
2
Senior Dev Tech Software Engineer - Professional Visualization Joined NVIDIA 8 years ago to work on middleware Goal: Make GPU programming easy and efficient Working with CAD ISVs to optimizing their graphics pipelines Working with driver team on Vulkan and OpenGL performance
3
4/4/2016
Clean, modern and consistent C-API without cruft Low CPU overhead Scales well with multiple threads Provides ‚low level‘ control over GPU Moves a lot of responsibility from driver to developer Developer essentialy writes part of the driver
4
4/4/2016
Simple HelloVulkan is ~750 lines of code So much to do, hard to start Easy to make errors Even harder to find those errors Is there a way to simplify Vulkan usage? Two projects started VKCPP (low level C++ API) NVK* (high level C++ API)
5
VKCPP is a port of the Vulkan API for C++11 Simplifies logic where possible without changing concepts/behaviour Generated from the official Vulkan spec file, vk.xml Header only with inline functions to minimize additional cost Open source project contains generated header and generator
4/4/2016
6
Initialization ‚vertical‘
Potential issues Struct/sType enums mismatch No type safety for enums and flags Risk of uninitialized fields
VkApplicationInfo appInfo = {}; appInfo.sType = VK_STRUCTURE_TYPE_APPLICATION_INFO; appInfo.pNext = NULL; appInfo.pApplicationName = appName; appInfo.applicationVersion = 1; appInfo.pEngineName = engineName; appInfo.engineVersion = 1; appInfo.apiVersion = VK_MAKE_VERSION(1, 0, 5); VkInstanceCreateInfo instInfo = {}; instInfo.sType = VK_STRUCTURE_TYPE_INSTANCE_CREATE_INFO; instInfo.pNext = NULL; instInfo.flags = 0; instInfo.pApplicationInfo = &appInfo; instInfo.enabledLayerCount = layerNames.size(); instInfo.ppEnabledLayerNames = layerNames.data() instInfo.enabledExtensionCount = extensionNames.size(); instInfo.ppEnabledExtensionNames = extensionNames.data(); VkResult res = vkCreateInstance(&instInfo, NULL, &info.inst); assert(res == VK_SUCCESS);
7
// Strip Vk prefix of all functions and structs VkResult vkCreateInstance(const VkInstanceCreateInfo* pCreateInfo, const VkAllocationCallbacks* pAllocator, VkInstance* pInstance); // Introduce new vk namespace for all vkcpp symbols namespace vk { Result createInstance(const InstanceCreateInfo* pCreateInfo, const AllocationCallbacks* pAllocator, Instance* pInstance); }; avoid symbol collisions
8
namespace vk { // Use scoped enums for type safety enum class ImageType { e1D = VK_IMAGE_TYPE_1D, e2D = VK_IMAGE_TYPE_2D, e3D = VK_IMAGE_TYPE_3D }; }
Strip VK_ prefix + enum name Use upper camel case of enum type as name ‘e‘ + name prefix is required only for numbers, used everywhere for consistency reasons
9
4/4/2016
// Introduce class for typesafe flags template <typename BitType, typename MaskType = VkFlags> class Flags { ... }; // BitType is scoped enum enum class QueueFlagBits { eGraphics = VK_QUEUE_GRAPHICS_BIT, eCompute = VK_QUEUE_COMPUTE_BIT, eTransfer = VK_QUEUE_TRANSFER_BIT, eSparseBinding = VK_QUEUE_SPARSE_BINDING_BIT }; // Define typesafe flags typedef Flags<QueueFlagBits, VkQueueFlags> QueueFlags;
10
template <typename BitType, typename MaskType = VkFlags> class Flags { public: Flags(); // No flags set. Use QueueFlags() or {} as value Flags(BitType bit); // QueueFlags qf(QueueFlagBits::eGraphics) Flags<BitType> & operator|=(Flags<BitType> const& rhs); // qf |= QueueFlagBits::eCompute; Flags<BitType> & operator&=(Flags<BitType> const& rhs); // qf &= QueueFlagBits::eGraphics; Flags<BitType> & operator^=(Flags<BitType> const& rhs); // qf ^= QueueFlagBits::eGraphics; Flags<BitType> operator|(Flags<BitType> const& rhs) const; // qf = QueueFlagBits::eCompute | QueueFlagBits::eGraphics Flags<BitType> operator&(Flags<BitType> const& rhs) const; // qf = qf & QueueFlagBits::eGraphics Flags<BitType> operator^(Flags<BitType> const& rhs) const; // qf = qf ^ QueueFlagBits::eGraphics explicit operator bool() const; // if (qf) Is any bit set? bool operator!() const; // if (!qf) Is no bit set? bool operator==(Flags<BitType> const& rhs) const; // if (qt == QueueFlagBits::eCompute) bool operator!=(Flags<BitType> const& rhs) const; // if (qt != QueueFlagBits::eCompute) explicit operator MaskType() const; // VkFlags flags = static_cast<VkFlags>(qf); };
11
4/4/2016
class EventCreateInfo { public: // All constructors initialize sType/pNext EventCreateInfo(); // Initialize all fields with default values (0 currently) EventCreateInfo(EventCreateFlags flags); // Create with all parameters specified EventCreateInfo(VkEventCreateInfo const & rhs); // Construct from native Vulkan type // get parameter object const EventCreateFlags& flags() const; EventCreateFlags& flags(); // get parameter from non-cost object // set parameter EventCreateInfo& flags(EventCreateFlags flags); ...
};
12
vk::ApplicationInfo appInfo(appName, 1, engineName, 1, VK_MAKE_VERSION(1,0,5)); vk::InstanceCreateInfo i({}, &appInfo, layerNames.size(), layerNames.data(), extNames.size(), extNames.data()); vk::Instance instance; vk::Result res = vk::createInstance(&i, nullptr, &instance); assert(res == vk::Result::eSuccess); VkApplicationInfo appInfo = {}; appInfo.sType = VK_STRUCTURE_TYPE_APPLICATION_INFO; appInfo.pNext = NULL; appInfo.pApplicationName = appName; appInfo.applicationVersion = 1; appInfo.pEngineName = engineName; appInfo.engineVersion = 1; appInfo.apiVersion = VK_MAKE_VERSION(1, 0, 5); VkInstanceCreateInfo i = {}; i.sType = VK_STRUCTURE_TYPE_INSTANCE_CREATE_INFO; i.pNext = NULL; i.flags = 0; i.pApplicationInfo = &appInfo; i.enabledLayerCount = layerNames.size(); i.ppEnabledLayerNames = layerNames.data() i.enabledExtensionCount = extNames.size(); i.ppEnabledExtensionNames = extNames.data(); VkInstance instance; VkResult res = vkCreateInstance(&i, NULL, &instance); assert(res == VK_SUCCESS);
13
4/4/2016
vk::ApplicationInfo appInfo(appName, 1, engineName, 1, VK_MAKE_VERSION(1,0,5)); vk::InstanceCreateInfo i({}, &appInfo, layerNames.size(), layerNames.data(), extNames.size(), extNames.data()); vk::Instance instance; vk::Result res = vk::createInstance(&i, nullptr, &instance); assert(res == vk::Result::eSuccess);
std::vector<std::string>? Bad idea
vk::InstanceCreateInfo i({}, &appInfo, {“layer_xyz”}, extNames.size(), extNames.data());
Lifetime of struct != Lifetime of temporary Temporary array will be destroyed after constructor CreateInfos would have to copy data
14
4/4/2016
VkApplicationInfo appInfo = { .sType = VK_STRUCTURE_TYPE_APPLICATION_INFO, .pNext = NULL, .pApplicationName = appName, .applicationVersion = 1, .pEngineName = engineName, .engineVersion = 1, .apiVersion = VK_MAKE_VERSION(1,0,5) }; vk::ApplicationInfo appInfo = vk::ApplicationInfo() .pApplicationName(appName) .applicationVersion(1) .pEngineName(engineName) .engineVersion(1) .apiVersion(VK_MAKE_VERSION(1,0,5));
Designated initializer list not part of C++11 Names explicit, but no guarantee to set all fields
15
4/4/2016
VkCommandBuffer cmd = ... VkRect2D scissor; scissor.offset.x = 0; scissor.offset.y = 0; scissor.extent.width = width; scissor.extent.height = height; vkCmdSetScissor(cmd, 0, 1, &scissor);
Convert C-Style OO to C++ Style OO
class CommandBuffer { // conversion from/to native handle CommandBuffer(VkCommandBuffer commandBuffer); CommandBuffer& operator=(VkCommandBuffer commandBuffer);
// boolean tests if handle is valid explicit operator bool() const; bool operator!() const; // functions void setScissor(uint32_t firstScissor, uint32_t scissorCount, const Rect2D* pScissors) const; }; vk::CommandBuffer cmd; cmd.setScissor(0, 1, &scissor);
16
4/4/2016
class Device { Result createFence(const FenceCreateInfo * createInfo, AllocationCallbacks const * allocator, Fence * fence) const; }; Change from pointer to reference allows passing temporaries class Device { Fence createFence(const FenceCreateInfo & createInfo, Optional<AllocationCallbacks const> const & allocator) const; };
17
4/4/2016
class Device { Result createFence(const FenceCreateInfo * createInfo, AllocationCallbacks const * allocator, Fence * fence) const; }; AllocationCallbacks are optional and might be null Optional<Allocationcallbacks> accepts nullptr as input class Device { Fence createFence(const FenceCreateInfo & createInfo, Optional<AllocationCallbacks const> const & allocator) const; };
18
4/4/2016
class Device { Result createFence(const FenceCreateInfo * createInfo, const AllocationCallbacks * allocator, Fence * fence) const; }; Fence is now a return value The C++ version of the function throws a std::system-error if result is not a success code class Device { Fence createFence(const FenceCreateInfo & createInfo, Optional<AllocationCallbacks const> const & allocator) const; };
19
4/4/2016
vk::Fence fence; vk::FenceCreateInfo ci; vk::Result result = device.createFence(&ci, nullptr, &fence); assert(result == vk::Result::eSuccess); try { vk::Fence fence = device.createFence({}, nullptr); } catch (std::system_error e) {...}
20
4/4/2016
vk::Fence fence; vk::FenceCreateInfo ci; vk::Result result = device.createFence(&ci, nullptr, &fence); assert(result == vk::Result::eSuccess); vk::Fence fence = device.createFence({}, nullptr);
21
4/4/2016
VkCommandBuffer cmd = ... VkRect2D scissor; scissor.offset.x = 0; scissor.offset.y = 0; scissor.extent.width = width; scissor.extent.height = height; cmd.setScissor(0, 1, &scissor); class CommandBuffer { // additional functions for arrays void setScissor(uint32_t firstScissor, std::vector<Rect2D> const & scissors) const; }; VkCommandBuffer cmd = ... cmd.setScissor(0, { { {0u,0u}, {width, height} } });
Offset2D Extent2D Rect2D std::vector<Rect2D> Count and real data size always correct Lifetime of temporary is function call
22
vk::PhysicalDevice pd = ...; std::vector<vk::QueueFamilyProperties> properties; uint32_t queueCount; pd.getQueueFamilyProperties(&queueCount, nullptr); properties.resize(queueCount); pd.getQueueFamilyProperties(&queueCount, properties.data()); Define variables Query size Resize std::vector Query data vk::PhysicalDevice pd = ...; std::vector<vk::QueueFamilyProperties> properties = pd.getQueueFamilyProperties();
23
std::vector<vk::ExtensionProperties> properties = enumerateInstanceExtensionProperties(layer); vk::Result res; std::vector<vk::ExtensionProperties> properties; uint32_t count; char *layer = ...; do { res = vk::enumerateInstanceExtensionProperties(layer, &count, nullptr); if (res) throw error; properties.resize(count); res = vk::enumerateInstanceExtensionProperties(layer, &count, properties.data()); } while (res == vk::Result::eIncomplete);
24
class CommandBuffer { void updateBuffer(Buffer dstBuffer, DeviceSize dstOffset, DeviceSize dataSize, const uint32_t* pData) const; }; cmd.updateBuffer(buffer, 0, static_cast<DeviceSize>(data.size() * sizeof(*data.data())), reinterpret_cast<const uint32_t*>(data.data())); Might work or might not work if sizeof(*data.data()) is not a multiple of 4 Call is not that beautiful dataSize in bytes data type is uint32_t -> dataSize must be multiple of 4
25
4/4/2016
class CommandBuffer { template <typename T> void updateBuffer(Buffer dstBuffer, DeviceSize dstOffset, std::vector<T> const & data) const; };
cmd.updateBuffer(buffer, 0, data); static_assert to ensure sizeof(T) is a multiple of 4
26
For debugging purposes it‘s useful to convert enums or flags to strings std::string to_string(FooEnum value) for enums to_string(vk::Result::eSuccess)
std::string to_string(BarFlags value) for flags to_string(vk::QueueFlagBits::eGraphics | vk::QueueFlagBits::eCompute)
27
VKCPP is ‘raw vulkan‘ using C++11 features Increases type safety and construction safety at compile time Also provides syntactic sugar Classes for handles Tempoaries for parameters std::vector for arrays Utility functions (to_string) Good for experts, doesn‘t simplify Vulkan workflow
28
Vulkan starters just want to render a triangle NVK will provide utility functions for routine work Resource Tracking CPU and GPU, lifetime can be hard to handle Memory suballocation, everyone will have to do it Utility functions, i.e. setup framebuffer Project at early stages, Hello Vulkan already ported ;-)
4/4/2016
29
4/4/2016
Instance DeviceMemory … Image PhysicalDevice Device std::shared_ptr to parent Queue Resources Destroy Device Destroy Image vkDestroyImage(m_device, …);
30
4/4/2016
Instance DeviceMemory … Image PhysicalDevice Device std::weak_ptr std::shared_ptr Queue Resources PhysicalDevice and Queue exist No need for destruction Reference to parent required Weak_ptr allows identity check
31
Implemented RAII layer with shared_ptrs Lifetime tracking Exception safety CreateInfos replaced by create function with parameters
std::shared_ptr<nvk::Instance> instance = nvk::Instance::create(“nvk", 0, ...);
4/4/2016
32
4/4/2016
It‘s strongly recommended to do suballocations in Vulkan Make nvk::DeviceMemory suballocation aware class DeviceMemory { vk::DeviceMemory deviceMemory; vk::DeviceSize
... };
Made API using DeviceMemory suballocation aware nvk::bindImageMemory(deviceMemory, image, offset) gets vk::bindImageMemory(deviceMemory, image, offset + deviceMemory.offset)
33
4/4/2016
Introduce Heap interface class Heap { virtual std::shared_ptr<nvk::DeviceMemory> allocate(vk::MemoryRequirements const & requirements) = 0; }; Interface allows multiple strategies for small/large or fixed size allocations
34
There‘s no free in the interface class Heap { virtual std::shared_ptr<nvk::DeviceMemory> allocate(vk::MemoryRequirements const & requirements) = 0; }; The Heap implementation will return a class derived from nvk::DeviceMemory Derived class keeps reference to Heap Destructor knows how to free the memory
4/4/2016
35
4/4/2016
Bind Buffer Bind Buffer Draw Bind Desc … CommandBuffer Queue Submit GPU Process Set of resources keep alive as long as it is use by CmdBuffer or GPU
36
4/4/2016
class ResourceTracker { // track resources, increase refcount void track(std::shared_ptr<vk::Buffer> const & buffer); // Give Fence to tracker which can be checked if resources are still being used void addFence(std::shared_ptr<vk::Fence> const & fence); // check if any tracked resource used on GPU, bool isUsed() const; // decrease refcount of resources if not used anymore void releaseUnused(); };
Introduce ResourceTracker Interface
37
Might be costly to track all used resources Put each resource once in list results in small list, but slow insertion Put each resource in list on each usage results large list, but fast insertion Use ResourceTracker as template interface for CommandBuffer Build up list once and keep it over multiple frames In debug mode check if all resources are tracked Should be nearly free in this mode
4/4/2016
38
4/4/2016
VKCPP for the ninja provides modern C+11 API for Reduced code complexity Improved compile time error checking NVK provides RAII styles gives exception safety Refcounting provides resource tracking on CPU Resource tracker provides resource tracking on GPU Hello Vulkan in NVK needs only about 200 lines of code
April 4-7, 2016 | Silicon Valley
Get it now: https://github.com/nvpro-pipeline/vkcpp