π± π, πβ² = π(π, πβ²) π π, πβ² + ΰΆ±
π»
π π, πβ², πβ²β² π± πβ², πβ²β² ππβ²β²
INFOMAGR β Advanced Graphics
Jacco Bikker - November 2018 - February 2019
Lecture 4 - Real - time Ray Tracing Welcome! , = (, ) - - PowerPoint PPT Presentation
INFOMAGR Advanced Graphics Jacco Bikker - November 2018 - February 2019 Lecture 4 - Real - time Ray Tracing Welcome! , = (, ) , + , ,
π± π, πβ² = π(π, πβ²) π π, πβ² + ΰΆ±
π»
π π, πβ², πβ²β² π± πβ², πβ²β² ππβ²β²
Jacco Bikker - November 2018 - February 2019
Advanced Graphics β Real-time Ray Tracing 3
Cost Breakdown for Ray Tracing:
βͺ Pixels βͺ Primitives βͺ Light sources βͺ Path segments Mind scalability as well as constant cost. Example: scene consisting of 1k spheres and 4 light sources, diffuse materials, rendered to 1M pixels: 1π Γ 5 Γ 1π = 5 β 109 ray/prim intersections. (multiply by desired framerate for realtime)
Using the BVH:
βͺ Pixels βͺ N primitives β log N deep tree βͺ Light sources βͺ Path segments Example: scene consisting of 1k spheres and 4 light sources, diffuse materials, rendered to 1M pixels: 1π Γ 5 Γ 10 = 5 β 107 ray/(prim or node) intersections. (multiply by desired framerate for realtime)
Advanced Graphics β Real-time Ray Tracing 4
Advanced Graphics β Real-time Ray Tracing 5
Reality Check
Performance is now OK, but weβre not quite ready to render a game world.
Advanced Graphics β Real-time Ray Tracing 6
Advanced Graphics β Real-time Ray Tracing 8 Cost of ray tracing: Dominated by memory access cost. βͺ Ray data βͺ Node data βͺ Triangle data βͺ Material data (incl. textures)
Advanced Graphics β Real-time Ray Tracing 9 Primary rays: For a tile of pixels, these are organized in a narrow frustum. All rays share a common
Advanced Graphics β Real-time Ray Tracing 10 Shadow rays: For point lights, shadow rays also tend to travel close together. When traced from the light source, they too have a common
Advanced Graphics β Real-time Ray Tracing 11 Secondary rays: Reflected and refracted rays tend to diverge significantly.
Advanced Graphics β Real-time Ray Tracing 12
Coherence
Primary rays and shadow rays for point lights are coherent : βͺ they tend to intersect the same primitives; βͺ they tend to traverse the same BVH nodes. Our problem: Ray tracing cost is dominated by memory latency. Solution: Amortize cost of fetching data over multiple rays.
Advanced Graphics β Real-time Ray Tracing 13
Coherent Ray Tracing*
SIMD: four rays for the price of one.
BVHNode::Traverse( Ray r ) { if (!r.Intersects( bounds )) return; if (isleaf()) { IntersectPrimitives(); } else { pool[left].Traverse( r ); pool[left + 1].Traverse( r ); } }
*: Interactive Rendering with Coherent Ray Tracing, Wald et al., 2001
Advanced Graphics β Real-time Ray Tracing 14
Coherent Ray Tracing*
SIMD: four rays for the price of one.
BVHNode::Traverse( Ray4 r4 ) { if (!r4.Intersects( bounds )) return; if (isleaf()) { IntersectPrimitives(); } else { pool[left].Traverse( r4 ); pool[left + 1].Traverse( r4 ); } }
*: Interactive Rendering with Coherent Ray Tracing, Wald et al., 2001
Advanced Graphics β Real-time Ray Tracing 15
Coherent Ray Tracing
Ray packet traversal: βͺ intersect four rays with a single BVH node; βͺ if any ray in the packet intersects the node, we traverse it; βͺ if the node is a leaf node, we intersect the four rays with each primitive in the leaf. Masking: βͺ We maintain an βactiveβ mask for disabling rays that do not intersect a node.
Advanced Graphics β Real-time Ray Tracing 16
Coherent Ray Tracing*
SIMD: four rays for the price of one.
BVHNode::Traverse( Ray4 r4, bool4 mask4 ) { bool4 hit4 = r4.Intersects( bounds ) & mask4; if (none( hit4 )) return; if (isleaf()) { IntersectPrimitives(); } else { pool[left].Traverse( r4, hit4 ); pool[left + 1].Traverse( r4, hit4 ); } }
Advanced Graphics β Real-time Ray Tracing 17
Coherent Ray Tracing
Results: βͺ for coherent packets, memory traffic is reduced; βͺ overall performance is improved by ~2.3x. Overhead: βͺ if only a single ray requires traversal or intersection, all four rays perform this operation.
Advanced Graphics β Real-time Ray Tracing 18
Large Packets*
Cost of memory access can be amortized over more rays by using larger packets. Note that a naΓ―ve approach will lead to significant overhead. We therefore add a frustum test to rapidly reject BVH nodes: If the packet frustum does not intersect the node AABB, we discard the node. The cost of this operation is independent of the number of rays in the packet. Likewise, a node is traversed as soon as we find that a ray intersects it. This is also independent of packet size.
*: Large Ray Packets for Real-time Whitted Ray Tracing, Overbeck et al., 2008
Advanced Graphics β Real-time Ray Tracing 19
Large Packets
Algorithm:
found. This step yields a new first active ray index.
BVHNode::Traverse( RayPacket rp, int first ) { if (!Intersects( rp[first )) // 1 { if (!Intersects( rp.frustum )) return; // 2 FindFirstActive( rp, ++first ); // 3 } if (first < rp.rayCount) { if (isleaf()) { IntersectPrimitives( rp ); } else { left.Traverse( rp, first ); right.Traverse( rp, first ); } } }
Advanced Graphics β Real-time Ray Tracing 20
Large Packets
Details: βͺ Constructing the frustum βͺ Ray order & overhead βͺ First / last βͺ Optimizations: recursion, SIMD
BVHNode::Traverse( RayPacket rp, int first ) { if (!Intersects( rp[first )) // 1 { if (!Intersects( rp.frustum )) return; // 2 FindFirstActive( rp, ++first ); // 3 } if (first < rp.rayCount) { if (isleaf()) { IntersectPrimitives( rp ); } else { left.Traverse( rp, first ); right.Traverse( rp, first ); } } }
Advanced Graphics β Real-time Ray Tracing 21
Frustum Construction
Method 1, for primary rays: Planes are easily defined using the corner rays: π1 = (π0βπΉ) Γ (π1 β π0) , π1 = π1 β πΉ π2 = (π1βπΉ) Γ (π2 β π1) , π2 = π2 β πΉ π3 = (π2βπΉ) Γ (π3 β π2) , π3 = π3 β πΉ π4 = (π3βπΉ) Γ (π0 β π3) , π4 = π4 β πΉ Note: for secondary rays, we will not have a common origin, nor corner rays. π0 π1 π2 πΉ π3
Advanced Graphics β Real-time Ray Tracing 22
Frustum Construction
Method 2, for shadow rays:
πΈπ‘ = Οπ=0
π
πΈπ , chose axis ΰ· π as the largest component of πΈπ‘ , ΰ· π£ and ΰ· π€ are the other axes.
π
π£πππ, π€πππ π£πππ¦, π€πππ π£πππ¦, π€πππ¦ (π£πππ, π€πππ¦) Note: this still requires a common origin. ΰ· π 1 ΰ· π£
Advanced Graphics β Real-time Ray Tracing 23
Frustum Construction
Method 3, for generic rays:
π
πππ , orthogonal to ΰ·
π, at location ππππ which is
π
ππππ , orthogonal to ΰ·
π, at location πππππ , which is obtained from the AABB over the ray origins.
and π£ππππ , π€ππππ of the rays with π
πππ and π ππππ .
ππππ , π£πππ¦ ππππ , π€πππ ππππ , π£πππ¦ ππππ and π£πππ πππ , π£πππ¦ πππ , π€πππ πππ , π£πππ¦ πππ .
π£πππ
πππ , π€πππ πππ β π£πππ ππππ , π€πππ ππππ ,
π£πππ¦
πππ , π€πππ πππ β π£πππ¦ ππππ , π€πππ ππππ ,
π£πππ¦
πππ , π€πππ¦ πππ
β π£πππ¦
ππππ , π€πππ¦ ππππ ,
π£πππ
πππ , π€πππ¦ πππ
β (π£πππ
ππππ , π€πππ¦ ππππ ).
ΰ· π ππππ πππππ
Advanced Graphics β Real-time Ray Tracing 24
Ray Order
The order of the rays in a packet is important. We keep track of the first active ray: in this case the green dot. We thus enter the node with 61 rays, while only 12 rays actually intersect the node. Keeping track of the last ray helps somewhat. 7 8 63
Advanced Graphics β Real-time Ray Tracing 25
Ray Order
Overhead can be reduced by numbering rays in each quadrant sequentially.
15 16 31 32 63
Advanced Graphics β Real-time Ray Tracing 26
Ray Order
Overhead can be reduced by numbering rays in each quadrant sequentially. For the general case, Morton order is optimal.
Advanced Graphics β Real-time Ray Tracing 27
Divergent Rays: Partition Traversal
int PartitionRays for ( int i = 0; i < ππ; i++ ) if (ray[idx[i]].IntersectsAABB()) swap( idx[ππ++], idx[i] ); return ππ; 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
rays ray indices ππ ππ
2 1 3 4 5 6 7 8 9 10 11 12 13 14 15
ππ
Advanced Graphics β Real-time Ray Tracing 28
Divergent Rays: Partition Traversal
int PartitionRays for ( int i = 0; i < ππ; i++ ) if (ray[idx[i]].IntersectsAABB()) swap( idx[ππ++], idx[i] ); return ππ; 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
rays ray indices ππ ππ
2 5 3 4 1 6 7 8 9 10 11 12 13 14 15
ππ
Advanced Graphics β Real-time Ray Tracing 29
Divergent Rays: Partition Traversal
int PartitionRays for ( int i = 0; i < ππ; i++ ) if (ray[idx[i]].IntersectsAABB()) swap( idx[ππ++], idx[i] ); return ππ; 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
rays ray indices ππ ππ
2 5 6 3 4 1 7 8 9 10 11 12 13 14 15
ππ
Advanced Graphics β Real-time Ray Tracing 30
Divergent Rays: Partition Traversal
int PartitionRays for ( int i = 0; i < ππ; i++ ) if (ray[idx[i]].IntersectsAABB()) swap( idx[ππ++], idx[i] ); return ππ; 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
rays ray indices ππ ππ
2 5 6 9 4 1 7 8 3 10 11 12 13 14 15
ππ
Advanced Graphics β Real-time Ray Tracing 31
Divergent Rays: Partition Traversal
int PartitionRays for ( int i = 0; i < ππ; i++ ) if (ray[idx[i]].IntersectsAABB()) swap( idx[ππ++], idx[i] ); return ππ; 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
rays ray indices ππ ππ
2 5 6 9 11 1 7 8 3 10 4 12 13 14 15
ππ
Advanced Graphics β Real-time Ray Tracing 32
Divergent Rays: Partition Traversal Partition traversal gathers active rays in a continuous list. This comes at the price of some overhead: βͺ
βͺ swapping of indices. In practice, this method is suitable for ray distributions where large gaps in the ray set are to be expected.
Advanced Graphics β Real-time Ray Tracing 33
Optimization: Recursion
The recursion can be replaced by a local stack:
struct Stack { BVHNode* node; int first; }; Stack stack[STACKSIZE]; stack[0].node = GetBVHRoot(); stack[0].first = 0; int stackPtr = 1; while( stackPtr > 0) { BVHNode* node = stack[--stackPtr].node; first = stack[stackPtr].first; ... }
BVHNode::Traverse( RayPacket rp, int first ) { if (!Intersects( rp[first )) // 1 { if (!Intersects( rp.frustum )) return; // 2 FindFirstActive( rp, ++first ); // 3 } if (first < rp.rayCount) { if (isleaf()) { IntersectPrimitives( rp ); } else { left.Traverse( rp, first ); right.Traverse( rp, first ); } } }
Advanced Graphics β Real-time Ray Tracing 34
Optimization: SIMD
We can still use SIMD to test four rays at once. βͺ The smallest primitive thus becomes the βQuadRayβ; βͺ a packet of π rays consists of π/4 QuadRays; βͺ βfirstβ points to the first QuadRay that has at least one active ray; βͺ FindFirst processes the remaining rays four at a time. Note: for AVX, replace βfourβ by βeightβ.
Advanced Graphics β Real-time Ray Tracing 35
Results
Compared to 2x2 SIMD packet traversal, ranged traversal improves primary and shadow rays by ~3.5x. Note that ray divergence has a large impact on performance. 1 16.85 (100%) 25.11 (100%) 18.44 (100%) 2 11.61 (69%) 18.83 (75%) 12.93 (70%) 3 6.98 (41%) 12.56 (50%) 7.48 (41%) 4 3.85 (23%) 7.71 (31%) 3.87 (21%)
Advanced Graphics β Real-time Ray Tracing 37
Ray Tracing Animated Scenes
Covered so far: βͺ Static geometry: high-quality construction with spatial splits βͺ Deformations: BVH refitting (water, waving trees, β¦ ) βͺ Structural changes: binned BVH construction Not covered: βͺ Rigid motion
Advanced Graphics β Real-time Ray Tracing 38
Advanced Graphics β Real-time Ray Tracing 39
Advanced Graphics β Real-time Ray Tracing 40
Advanced Graphics β Real-time Ray Tracing 41
Combining BVHs
Advanced Graphics β Real-time Ray Tracing 42
Combining BVHs
Two BVHs can be combined into a single BVH, by simply adding a new root node pointing to the two BVHs. βͺ This works regardless of the method used to build each BVH βͺ This can be applied repeatedly to combine many BVHs
Advanced Graphics β Real-time Ray Tracing 43
Scene Graph
Advanced Graphics β Real-time Ray Tracing 44
Scene Graph
world car wheel wheel wheel wheel turret plane plane car wheel wheel wheel wheel turret buggy wheel wheel wheel wheel dude dude dude
Advanced Graphics β Real-time Ray Tracing 45
Scene Graph
If our application uses a scene graph, we can construct a BVH for each scene graph node. The BVH for each node is built using an appropriate construction algorithm: βͺ High-quality SBVH for static scenery (offline) βͺ Fast binned SAH BVHs for dynamic scenery The extra nodes used to combine these BVHs into a single BVH are known as the Top-level BVH .
Advanced Graphics β Real-time Ray Tracing 46
Rigid Motion
Applying rigid motion to a BVH:
Advanced Graphics β Real-time Ray Tracing 47
Rigid Motion
Applying rigid motion to a BVH:
Rigid motion is achieved by transforming the rays by the inverse transform upon entering the sub-BVH.
(this obviously does not only apply to translation)
Advanced Graphics β Real-time Ray Tracing 48
The Top-level BVH - Construction
Input: list of axis aligned bounding boxes for transformed scene graph nodes Algorithm:
surface area
Note: algorithmic complexity is π(π3).
Advanced Graphics β Real-time Ray Tracing 49
The Top-level BVH β Faster Construction*
Algorithm:
Node A = list.GetFirst(); Node B = list.FindBestMatch( A ); while (list.size() > 1) { Node C = list.FindBestMatch( B ); if (A == C) { list.Remove( A ); list.Remove( B ); A = new Node( A, B ); list.Add( A ); B = list.FindBestMatch( A ); } else A = B, B = C; }
*: Fast Agglomerative Clustering for Rendering, Walter et al., 2008
A B C A B A B
Advanced Graphics β Real-time Ray Tracing 50
The Top-level BVH β Traversal
The leafs of the top-level BVH contain the sub-BVHs. When a ray intersects such a leaf, it is transformed by the inverted transform matrix of the sub-BVH. After this, it traverses the sub-BVH. Once the sub-BVH has been traversed, we transform the ray again, this time by the transform matrix of the sub-BVH. For efficiency, we store the inverted matrix with the sub-BVH root.
https://sketchfab.com/SEED.EA/collections/pica-pica
https://www.models-resource.com/wii_u/mariokart8/model/10718
Jacco Bikker - November 2018 - February 2019