CS6958: HARDWARE RAY TRACING Spring 2014 What is Ray Tracing? A - - PowerPoint PPT Presentation

cs6958 hardware ray tracing
SMART_READER_LITE
LIVE PREVIEW

CS6958: HARDWARE RAY TRACING Spring 2014 What is Ray Tracing? A - - PowerPoint PPT Presentation

CS6958: HARDWARE RAY TRACING Spring 2014 What is Ray Tracing? A computer graphic rendering technique that simulates optics Can generate very realistic-looking images Can take a long time to create those images A Tale of Two


slide-1
SLIDE 1

CS6958: HARDWARE RAY TRACING

Spring 2014

slide-2
SLIDE 2

What is Ray Tracing?

¨ A computer graphic rendering technique that

simulates optics

¤ Can generate very realistic-looking images ¤ Can take a long time to create those images

slide-3
SLIDE 3

A Tale of Two Rendering Algorithms

It was the best of times, it was the worst of times it was the age of wisdom, it was the age of foolishness, it was the epoch of belief, it was the epoch of incredulity, it was the season of Light, it was the season of Darkness…

Z-Buffer Rasterization vs. Ray Tracing

slide-4
SLIDE 4

A Little History

¨ In the beginning, there was Sketchpad…

If we point the light pen at the display system and press a button called “draw,” the computer will construct a straight line segment which stretches like a rubber band from the initial to the present location of the pen […] A sudden flick of the pen terminates drawing […]. 1962…

slide-5
SLIDE 5

No Right Answer…

slide-6
SLIDE 6

Z-buffering: Ed Catmull, 1974

slide-7
SLIDE 7

Z-Buffer

¨ Take triangles as input ¨ Project on image plane ¨ Store closeness in Z-buffer ¨ Save color if closer ¨ Independent triangles processed in parallel

¤ This assumption makes highly parallel processing possible

slide-8
SLIDE 8

Eventually, all GPUs use Z-buffering

¨ Because of parallel nature of algorithm, huge

advantage over CPUs

¤ Wide Single Instruction Multiple Data (SIMD)

n Fetch one instruction, execute on 32 different pieces of data

¤ Works brilliantly because of the assumption that all

triangles are independent

n Essentially a SIMD streaming throughput computation ¨ Result is HUGE, power-hungry chips…

¤ With huge graphics performance…

slide-9
SLIDE 9

Graphics Chip Architecture

slide-10
SLIDE 10

Graphics Chip Architecture

slide-11
SLIDE 11

Graphics Chips Performance

slide-12
SLIDE 12

Graphics Chips Performance

GK110 Kepler

slide-13
SLIDE 13

Graphics Chips Transistors

10-core Xeon Westmere Nvidia GK110 Kepler

slide-14
SLIDE 14

Graphics Chips Transistors

Nvidia GK110 Kepler

slide-15
SLIDE 15

Graphics Cards

slide-16
SLIDE 16

Why do we need more?

slide-17
SLIDE 17

Why do we need more? Lighting!

¨ Z-buffer rasterization is great at rendering lots and

lots of triangles

¤ But, the parallelism that enables the speed makes the

assumption that all triangles are independent

¤ This makes optical effects tricky

¨ Shadows, reflections, refractions etc. all need to

know about other triangles in the scene

¨ Going further, “global illumination” is even trickier

slide-18
SLIDE 18

What are our choices?

¨ A Tale of Two Rendering Algorithms…

¤ Z-buffer Rasterizing ¤ Ray Tracing

slide-19
SLIDE 19

Ray Tracing vs. Rasterization

Ray Tracing (parallel on pixels) Rasterizing (parallel on triangles)

slide-20
SLIDE 20

How to they compare?

¨ David Luebke (NVIDIA):

“Rasterization is fast, but needs cleverness to support complex visual effects. Ray tracing supports complex visual effects, but needs cleverness to be fast.”

slide-21
SLIDE 21

Optical Effects

Turner Whitted 1979

slide-22
SLIDE 22

Optical Effects

Turner Whitted 1979 Animated film: The Compleat Angler

slide-23
SLIDE 23

RAY TRACING

¨ Color each pixel based on the radiance from each

visible surface

Tom Funkhouser, Princeton

slide-24
SLIDE 24

RAY TRACING

¨ Color each pixel based on the radiance from each

visible surface

¤ Note that these arrows

point in the direction

  • f the radiance.

We normally trace rays in the other direction.

Tom Funkhouser, Princeton

slide-25
SLIDE 25

Ray Tracing

¨ For each sample

¤ Construct a ray from the eye position through view plane ¤ Find the first surface hit ¤ Compute color of that surface Tom Funkhouser, Princeton

slide-26
SLIDE 26

Ray Tracing

¨ For each sample

¤ Construct a ray from the eye position through view plane ¤ Find the first surface hit ¤ Compute color of that surface Tom Funkhouser, Princeton

Processing the “Primary Rays” is sometimes called “Ray Casting”

slide-27
SLIDE 27

Simple Ray Casting

Tom Funkhouser, Princeton

slide-28
SLIDE 28

Construct a ray through a pixel

Tom Funkhouser, Princeton

slide-29
SLIDE 29

Construct a ray through a pixel

Tom Funkhouser, Princeton

slide-30
SLIDE 30

Recursive Ray Tracing

Create scene (objects, materials, lights, camera, background) Preprocess scene foreach frame foreach pixel foreach sample generate ray intersect ray with objects find normal of closest object shade intersection point

Mutually recursive Shading can generate new rays…

slide-31
SLIDE 31

Recursive Ray Tracing

Create scene (objects, materials, lights, camera, background) Preprocess scene foreach frame foreach pixel foreach sample generate ray intersect ray with objects find normal of closest object shade intersection point

slide-32
SLIDE 32

Major Components of a Ray Tracer

¨ Camera (Pixels to Rays) ¨ Objects (Rays to Intersection info) ¨ Materials (Intersection info and Light to Color) ¨ Lights ¨ Background (Rays to Color) ¨ All together: a Scene

slide-33
SLIDE 33

Details

Steve Parker, UofU and NVIDIA

slide-34
SLIDE 34

Optical Effects

Turner Whitted 1979

slide-35
SLIDE 35

Turner Whitted

Phong Model Improved Model

slide-36
SLIDE 36

Turner Whitted

Improved Model

slide-37
SLIDE 37

Turner Whitted

slide-38
SLIDE 38

Optical Effects

Turner Whitted 1980

slide-39
SLIDE 39

Optical Effects

Steve Parker, UofU and NVIDIA

slide-40
SLIDE 40

Ray Tracers Love Glass…

slide-41
SLIDE 41

Global Illumination

slide-42
SLIDE 42

Global Illumination

slide-43
SLIDE 43

How real can you get?

Cornell University

slide-44
SLIDE 44

Car Makers Love Ray Tracing…

slide-45
SLIDE 45

Car Makers Love Ray Tracing

slide-46
SLIDE 46

Car/Film Rendering

slide-47
SLIDE 47

Architects Love Ray Tracing…

slide-48
SLIDE 48

Architectural Modeling

slide-49
SLIDE 49

Movie Makers Love Ray Tracing…

slide-50
SLIDE 50

Movie Makers Love Ray Tracing…

slide-51
SLIDE 51

Volume Renderers Love Ray Tracing…

Steve Parker, UofU and NVIDIA

slide-52
SLIDE 52

Scientific Visualization Loves RT

slide-53
SLIDE 53

Scientific Visualization with RT

slide-54
SLIDE 54

Ray Tracing is complex?

typedef struct{double x,y,z}vec;vec U,black,amb={.02,.02,.02};struct sphere{ vec cen,colour;double rad,kd,ks,kt,kl,ir}*s,*best,sph[]={0.,6.,.5,1.,1.,1.,.9, .05,.2,.85,0.,1.7,-1.,8.,-.5,1.,.5,.2,1.,.7,.3,0.,.05,1.2,1.,8.,-.5,.1,.8,.8, 1.,.3,.7,0.,0.,1.2,3.,-6.,15.,1.,.8,1.,7.,0.,0.,0.,.6,1.5,-3.,-3.,12.,.8,1., 1.,5.,0.,0.,0.,.5,1.5,};yx;double u,b,tmin,sqrt(),tan();double vdot(A,B)vec A ,B;{return A.x*B.x+A.y*B.y+A.z*B.z;}vec vcomb(a,A,B)double a;vec A,B;{B.x+=a* A.x;B.y+=a*A.y;B.z+=a*A.z;return B;}vec vunit(A)vec A;{return vcomb(1./sqrt( vdot(A,A)),A,black);}struct sphere*intersect(P,D)vec P,D;{best=0;tmin=1e30;s= sph+5;while(s-->sph)b=vdot(D,U=vcomb(-1.,P,s->cen)),u=b*b-vdot(U,U)+s->rad*s

  • >rad,u=u>0?sqrt(u):1e31,u=b-u>1e-7?b-u:b+u,tmin=u>=1e-7&&u<tmin?best=s,u:

tmin;return best;}vec trace(level,P,D)vec P,D;{double d,eta,e;vec N,colour; struct sphere*s,*l;if(!level--)return black;if(s=intersect(P,D));else return amb;colour=amb;eta=s->ir;d= -vdot(D,N=vunit(vcomb(-1.,P=vcomb(tmin,D,P),s->cen )));if(d<0)N=vcomb(-1.,N,black),eta=1/eta,d= -d;l=sph+5;while(l-->sph)if((e=l

  • >kl*vdot(N,U=vunit(vcomb(-1.,P,l->cen))))>0&&intersect(P,U)==l)colour=vcomb(e

,l->colour,colour);U=s->colour;colour.x*=U.x;colour.y*=U.y;colour.z*=U.z;e=1-eta* eta*(1-d*d);return vcomb(s->kt,e>0?trace(level,P,vcomb(eta,D,vcomb(eta*d-sqrt (e),N,black))):black,vcomb(s->ks,trace(level,P,vcomb(2*d,N,D)),vcomb(s->kd, colour,vcomb(s->kl,U,black))));}main(){puts(“P3\n32 32\n255”);while(yx<32*32) U.x=yx%32-32/2,U.z=32/2-yx++/32,U.y=32/2/tan(25/114.5915590261),U=vcomb(255., trace(3,black,vunit(U)),black),printf("%.0f %.0f %.0f\n",U);}/*minray!*/

Paul Heckbert’s complete ray tracer on the back of his business card (c1989) Does Whitted-style recursive ray tracing with reflections, refraction, two lights…

slide-55
SLIDE 55

Ray Tracing is complex?

typedef struct{double x,y,z}vec;vec U,black,amb={.02,.02,.02};struct sphere{ vec cen,colour;double rad,kd,ks,kt,kl,ir}*s,*best,sph[]={0.,6.,.5,1.,1.,1.,.9, .05,.2,.85,0.,1.7,-1.,8.,-.5,1.,.5,.2,1.,.7,.3,0.,.05,1.2,1.,8.,-.5,.1,.8,.8, 1.,.3,.7,0.,0.,1.2,3.,-6.,15.,1.,.8,1.,7.,0.,0.,0.,.6,1.5,-3.,-3.,12.,.8,1., 1.,5.,0.,0.,0.,.5,1.5,};yx;double u,b,tmin,sqrt(),tan();double vdot(A,B)vec A ,B;{return A.x*B.x+A.y*B.y+A.z*B.z;}vec vcomb(a,A,B)double a;vec A,B;{B.x+=a* A.x;B.y+=a*A.y;B.z+=a*A.z;return B;}vec vunit(A)vec A;{return vcomb(1./sqrt( vdot(A,A)),A,black);}struct sphere*intersect(P,D)vec P,D;{best=0;tmin=1e30;s= sph+5;while(s-->sph)b=vdot(D,U=vcomb(-1.,P,s->cen)),u=b*b-vdot(U,U)+s->rad*s

  • >rad,u=u>0?sqrt(u):1e31,u=b-u>1e-7?b-u:b+u,tmin=u>=1e-7&&u<tmin?best=s,u:

tmin;return best;}vec trace(level,P,D)vec P,D;{double d,eta,e;vec N,colour; struct sphere*s,*l;if(!level--)return black;if(s=intersect(P,D));else return amb;colour=amb;eta=s->ir;d= -vdot(D,N=vunit(vcomb(-1.,P=vcomb(tmin,D,P),s->cen )));if(d<0)N=vcomb(-1.,N,black),eta=1/eta,d= -d;l=sph+5;while(l-->sph)if((e=l

  • >kl*vdot(N,U=vunit(vcomb(-1.,P,l->cen))))>0&&intersect(P,U)==l)colour=vcomb(e

,l->colour,colour);U=s->colour;colour.x*=U.x;colour.y*=U.y;colour.z*=U.z;e=1-eta* eta*(1-d*d);return vcomb(s->kt,e>0?trace(level,P,vcomb(eta,D,vcomb(eta*d-sqrt (e),N,black))):black,vcomb(s->ks,trace(level,P,vcomb(2*d,N,D)),vcomb(s->kd, colour,vcomb(s->kl,U,black))));}main(){puts(“P3\n32 32\n255”);while(yx<32*32) U.x=yx%32-32/2,U.z=32/2-yx++/32,U.y=32/2/tan(25/114.5915590261),U=vcomb(255., trace(3,black,vunit(U)),black),printf("%.0f %.0f %.0f\n",U);}/*minray!*/

Paul Heckbert’s complete ray tracer on the back of his business card (c1989) Does Whitted-style recursive ray tracing with reflections, refraction, two lights…

slide-56
SLIDE 56

Andrew Kensler’s business-card C++ RT

#include <stdlib.h> // card > aek.ppm #include <stdio.h> #include <math.h> typedef int i;typedef float f;struct v{ f x,y,z;v operator+(v r){return v(x+r.x ,y+r.y,z+r.z);}v operator*(f r){return v(x*r,y*r,z*r);}f operator%(v r){return x*r.x+y*r.y+z*r.z;}v(){}v operator^(v r ){return v(y*r.z-z*r.y,z*r.x-x*r.z,x*r. y-y*r.x);}v(f a,f b,f c){x=a;y=b;z=c;}v

  • perator!(){return*this*(1/sqrt(*this%*

this));}};i G[]={247570,280596,280600, 249748,18578,18577,231184,16,16};f R(){ return(f)rand()/RAND_MAX;}i T(v o,v d,f &t,v&n){t=1e9;i m=0;f p=-o.z/d.z;if(.01 <p)t=p,n=v(0,0,1),m=1;for(i k=19;k--;) for(i j=9;j--;)if(G[j]&1<<k){v p=o+v(-k ,0,-j-4);f b=p%d,c=p%p-1,q=b*b-c;if(q>0 ){f s=-b-sqrt(q);if(s<t&&s>.01)t=s,n=!( p+d*t),m=2;}}return m;}v S(v o,v d){f t ;v n;i m=T(o,d,t,n);if(!m)return v(.7, .6,1)*pow(1-d.z,4);v h=o+d*t,l=!(v(9+R( ),9+R(),16)+h*-1),r=d+n*(n%d*-2);f b=l% n;if(b<0||T(h,l,t,n))b=0;f p=pow(l%r*(b >0),99);if(m&1){h=h*.2;return((i)(ceil( h.x)+ceil(h.y))&1?v(3,1,1):v(3,3,3))*(b *.2+.1);}return v(p,p,p)+S(h,r)*.5;}i main(){printf("P6 512 512 255 ");v g=!v (-6,-16,0),a=!(v(0,0,1)^g)*.002,b=!(g^a )*.002,c=(a+b)*-256+g;for(i y=512;y--;) for(i x=512;x--;){v p(13,13,13);for(i r =64;r--;){v t=a*(R()-.5)*99+b*(R()-.5)* 99;p=S(v(17,16,8)+t,!(t*-1+(a*(R()+x)+b *(y+R())+c)*16))*3.5+p;}printf("%c%c%c" ,(i)p.x,(i)p.y,(i)p.z);}}

slide-57
SLIDE 57

Andrew Kensler’s business-card C++ RT

#include <stdlib.h> // card > aek.ppm #include <stdio.h> #include <math.h> typedef int i;typedef float f;struct v{ f x,y,z;v operator+(v r){return v(x+r.x ,y+r.y,z+r.z);}v operator*(f r){return v(x*r,y*r,z*r);}f operator%(v r){return x*r.x+y*r.y+z*r.z;}v(){}v operator^(v r ){return v(y*r.z-z*r.y,z*r.x-x*r.z,x*r. y-y*r.x);}v(f a,f b,f c){x=a;y=b;z=c;}v

  • perator!(){return*this*(1/sqrt(*this%*

this));}};i G[]={247570,280596,280600, 249748,18578,18577,231184,16,16};f R(){ return(f)rand()/RAND_MAX;}i T(v o,v d,f &t,v&n){t=1e9;i m=0;f p=-o.z/d.z;if(.01 <p)t=p,n=v(0,0,1),m=1;for(i k=19;k--;) for(i j=9;j--;)if(G[j]&1<<k){v p=o+v(-k ,0,-j-4);f b=p%d,c=p%p-1,q=b*b-c;if(q>0 ){f s=-b-sqrt(q);if(s<t&&s>.01)t=s,n=!( p+d*t),m=2;}}return m;}v S(v o,v d){f t ;v n;i m=T(o,d,t,n);if(!m)return v(.7, .6,1)*pow(1-d.z,4);v h=o+d*t,l=!(v(9+R( ),9+R(),16)+h*-1),r=d+n*(n%d*-2);f b=l% n;if(b<0||T(h,l,t,n))b=0;f p=pow(l%r*(b >0),99);if(m&1){h=h*.2;return((i)(ceil( h.x)+ceil(h.y))&1?v(3,1,1):v(3,3,3))*(b *.2+.1);}return v(p,p,p)+S(h,r)*.5;}i main(){printf("P6 512 512 255 ");v g=!v (-6,-16,0),a=!(v(0,0,1)^g)*.002,b=!(g^a )*.002,c=(a+b)*-256+g;for(i y=512;y--;) for(i x=512;x--;){v p(13,13,13);for(i r =64;r--;){v t=a*(R()-.5)*99+b*(R()-.5)* 99;p=S(v(17,16,8)+t,!(t*-1+(a*(R()+x)+b *(y+R())+c)*16))*3.5+p;}printf("%c%c%c" ,(i)p.x,(i)p.y,(i)p.z);}}

slide-58
SLIDE 58

A Hierarchy of Ray Tracers

1.

Ray casting

2.

Ray casting with shadows

3.

Whitted-style recursive ray tracing

4.

Cook-style distribution ray tracing

5.

Path tracing for indirect illumination (global illumination)

6.

… even more advanced techniques…

slide-59
SLIDE 59

1: Ray Casting

¨ A 3D line query to determine visibility

¤ Rays are cast from the eye point through each pixel into

the scene

¤ Intersection point of nearest object is returned

slide-60
SLIDE 60

2: Ray Casting with Shadows

¨ At each intersection point, cast another ray in the

direction of the light source

¤ Checks whether the point is in shadow

slide-61
SLIDE 61

3: Whitted-Style Ray Tracing

¨ Recursively cast rays to account for reflections and

refractions

slide-62
SLIDE 62

3: Whitted-Style Ray Tracing

Ray casting with shadows Whitted-style ray tracing

slide-63
SLIDE 63

Classic Whitted Examples

slide-64
SLIDE 64

4: Distribution Ray Tracing

¨ AKA Cook-Style Ray Tracing

¤ Rays can be cast through a lens with area

(i.e. not just a pinhole)

n Depth of field

¤ secondary rays directions can be perturbed

n Glossy reflections

¤ Shadow rays can be aimed at area light sources

n Soft shadows

¤ Can also add time to the ray

n Motion blur

slide-65
SLIDE 65

4: Distribution Ray Tracing

slide-66
SLIDE 66

4: Distribution Ray Tracing

slide-67
SLIDE 67

4: Distribution Ray Tracing

slide-68
SLIDE 68

5: Path Tracing

¨ At each intersection point, cast a ray in a random

direction to see if any light comes from there

¤ With enough oversampling, this results in solving the

“rendering equation”

¤ Fills in the “ambient” shadowed spaces with indirect

lighting

slide-69
SLIDE 69

5: Path Tracing

slide-70
SLIDE 70

5: Path Tracing

slide-71
SLIDE 71

5: Path Tracing

Whitted ray tracing Path Tracing

slide-72
SLIDE 72

Lots more to it…

¨ But this hierarchy helps me keep things straight

¤ Ambient occlusion, ray bundles, beam tracing, photon

mapping, metropolis light transport, etc. etc. etc.

¨ Material properties involve other huge set of

issues that can impact realism

¤ BRDF: Bidirectional Reflectance Distribution

Function

¤ BSDF: Bidirectional Scattering

Distribution Function

¤ BTDF: Bidirectional Transmission

Distribution Function

¤ BSSRDF: Bidirectional

Scattering Surface Reflectance Distribution Function

slide-73
SLIDE 73

So – use GPUs to ray trace…

¨ … Problem solved? ¨ Unfortunately no – Ray Tracing isn’t as friendly to

SIMD parallelism as Z-buffer rasterization

¨ Cast rays into scene ¨ Intersect with all objects, return first hit ¨ Independent rays processed in parallel

¤ Additional rays can handle optical effects

slide-74
SLIDE 74

So – use GPUs to ray trace…

¨ … Problem solved? ¨ Unfortunately no – Ray Tracing isn’t as friendly to

SIMD parallelism as Z-buffer rasterization

¨ Cast rays into scene ¨ Intersect with all objects, return first hit ¨ Independent rays processed in parallel

¤ Additional rays can handle optical effects

slide-75
SLIDE 75

Acceleration Structures

¨ Hierarchical partitions that help eliminate large

numbers of primitives from that intersection step

¤ Surround scene objects with partitions that are easy to

test for intersection

¤ If you miss the partition, you don’t need to test anything

inside that partition

¤ Changes that linear search step into logarithmic search ¤ BUT – adds data-dependent branching…

slide-76
SLIDE 76

Acceleration Structures

¨ Partition the scene into easy to intersect units

¤ Tree-Based

n Bounding Volume Hierarchy (BVH)

n Axis-aligned or Object-aligned

n KD-Tree n Binary Space Partitioning Tree (BSP Tree)

¤ Grid-Based

n Oct-tree n Uniform Grids n Multi-Grids

slide-77
SLIDE 77

Bounding Volume Hierarchy

Tom Funkhouser, Princeton

slide-78
SLIDE 78

Bounding Volume Hierarchy

Tom Funkhouser, Princeton

slide-79
SLIDE 79

Ray Tracing Algorithm Phases

¨ Traversal

¤ Intersect the ray with bounding objects to eliminate as

much as you can

¨ Intersection

¤ At the leaf nodes, intersect the ray with actual

geometry (triangles, spheres, patches, etc.)

¨ Shading

¤ Figure out what color/light contribution that intersected

point adds to the scene

slide-80
SLIDE 80

Ray Tracing Algorithm Phases

¨ Traversal

¤ Tree traversal – does NOT map well to SIMD parallelism

¨ Intersection

¤ FP operations – maps fine to SIMD

¨ Shading

¤ Some trig, some FP – maps fine to SIMD

slide-81
SLIDE 81

Ray Tracing Algorithm Phases

¨ Traversal

¤ Tree traversal – does NOT map well to SIMD parallelism ¤ 64%-84% of run time

¨ Intersection

¤ FP operations – maps fine to SIMD ¤ 8% - 30% of run time

¨ Shading

¤ Some trig, some FP – maps fine to SIMD ¤ 1% to 8% of run time

slide-82
SLIDE 82

Gaming Possibilities

slide-83
SLIDE 83

iRay – NVIDIA’s GPU ray tracer

1 minute

slide-84
SLIDE 84

iRay – NVIDIA’s GPU ray tracer

30 minutes

slide-85
SLIDE 85

iRay – NVIDIA’s GPU ray tracer

4 hours

slide-86
SLIDE 86

iRay GPU ray tracing - 2011

slide-87
SLIDE 87

Ray Tracing Hardware?

¨ There have been a few academic projects

¤ Saarland University – SaarCor and RPU ¤ University of Illinois at Urbana-Champaign – Rigel ¤ University of Wisconsin, Madison – Copernicus ¤ KAIST, Korea – MRTP mobile RT ¤ University of Utah - TRaX

slide-88
SLIDE 88

TRaX: Threaded Ray eXecution

¨ If you could build a GPU that was customized for

ray tracing, what would it look like?

¤ Probably have lots of floating point units ¤ NVIDA/ATI GPUs organize them as wide SIMD

n For example, 32 threads in a “warp” n Great if all 32 threads truly do the exact same thing n Not so great if they branch…

¤ TRaX takes a more MIMD/SPMD approach

n Let the multiple threads each have their own PC n Letting the threads be out of sync has benefits…

slide-89
SLIDE 89

SIMD Execution

… SWI r6,r1,232 SWI r6,r1,236 LWI r3,r1,240 ORI r5,r0,114 ORI r6,r0,106 FPINVSQRT r5,r5 Bleid r23,$0BB0 FPDIV r5,r6,r5 ORI r7,r0,-107 FPDIV r5,r6,r5 ORI r8,r0,110 ORI r9,r0,107 FPMUL r7,r5,r7 SWI r7,r1,400 …

slide-90
SLIDE 90

SIMD Execution

… SWI SWI LWI ORI ORI FPINVSQRT Bleid

  • FPDIV
  • ORI
  • FPDIV
  • ORI
  • ORI
  • FPMUL
  • SWI

… SWI SWI LWI ORI ORI FPINVSQRT Bleid

  • FPDIV
  • ORI
  • FPDIV
  • ORI
  • ORI
  • FPMUL
  • SWI

… SWI SWI LWI ORI ORI FPINVSQRT Bleid

  • FPDIV
  • ORI
  • FPDIV
  • ORI
  • ORI
  • FPMUL
  • SWI

… SWI SWI LWI ORI ORI FPINVSQRT Bleid

  • FPDIV
  • ORI
  • FPDIV
  • ORI
  • ORI
  • FPMUL
  • SWI

… SWI SWI LWI ORI ORI FPINVSQRT Bleid

  • FPDIV
  • ORI
  • FPDIV
  • ORI
  • ORI
  • FPMUL
  • SWI

… SWI SWI LWI ORI ORI FPINVSQRT Bleid

  • FPDIV
  • ORI
  • FPDIV
  • ORI
  • ORI
  • FPMUL
  • SWI

… SWI SWI LWI ORI ORI FPINVSQRT Bleid

  • FPDIV
  • ORI
  • FPDIV
  • ORI
  • ORI
  • FPMUL
  • SWI

… SWI SWI LWI ORI ORI FPINVSQRT Bleid

  • FPDIV
  • ORI
  • FPDIV
  • ORI
  • ORI
  • FPMUL
  • SWI

Thread Number 1 2 3 4 5 6 7

slide-91
SLIDE 91

SIMD Execution

… SWI SWI LWI ORI ORI FPINVSQRT Bleid

  • FPDIV
  • ORI
  • FPDIV
  • ORI
  • ORI
  • FPMUL
  • SWI

… SWI SWI LWI ORI ORI FPINVSQRT Bleid

  • FPDIV
  • ORI
  • FPDIV
  • ORI
  • ORI
  • FPMUL
  • SWI

… SWI SWI LWI ORI ORI FPINVSQRT Bleid

  • FPDIV
  • ORI
  • FPDIV
  • ORI
  • ORI
  • FPMUL
  • SWI

… SWI SWI LWI ORI ORI FPINVSQRT Bleid

  • FPDIV
  • ORI
  • FPDIV
  • ORI
  • ORI
  • FPMUL
  • SWI

… SWI SWI LWI ORI ORI FPINVSQRT Bleid

  • FPDIV
  • ORI
  • FPDIV
  • ORI
  • ORI
  • FPMUL
  • SWI

… SWI SWI LWI ORI ORI FPINVSQRT Bleid

  • FPDIV
  • ORI
  • FPDIV
  • ORI
  • ORI
  • FPMUL
  • SWI

… SWI SWI LWI ORI ORI FPINVSQRT Bleid

  • FPDIV
  • ORI
  • FPDIV
  • ORI
  • ORI
  • FPMUL
  • SWI

… SWI SWI LWI ORI ORI FPINVSQRT Bleid

  • FPDIV
  • ORI
  • FPDIV
  • ORI
  • ORI
  • FPMUL
  • SWI

Thread Number 1 2 3 4 5 6 7

slide-92
SLIDE 92

SIMD Execution

… SWI SWI LWI ORI ORI FPINVSQRT Bleid

  • FPDIV
  • ORI
  • FPDIV
  • ORI
  • ORI
  • FPMUL
  • SWI

… SWI SWI LWI ORI ORI FPINVSQRT Bleid

  • FPDIV
  • ORI
  • FPDIV
  • ORI
  • ORI
  • FPMUL
  • SWI

… SWI SWI LWI ORI ORI FPINVSQRT Bleid

  • FPDIV
  • ORI
  • FPDIV
  • ORI
  • ORI
  • FPMUL
  • SWI

… SWI SWI LWI ORI ORI FPINVSQRT Bleid

  • FPDIV
  • ORI
  • FPDIV
  • ORI
  • ORI
  • FPMUL
  • SWI

… SWI SWI LWI ORI ORI FPINVSQRT Bleid

  • FPDIV
  • ORI
  • FPDIV
  • ORI
  • ORI
  • FPMUL
  • SWI

… SWI SWI LWI ORI ORI FPINVSQRT Bleid

  • FPDIV
  • ORI
  • FPDIV
  • ORI
  • ORI
  • FPMUL
  • SWI

… SWI SWI LWI ORI ORI FPINVSQRT Bleid

  • FPDIV
  • ORI
  • FPDIV
  • ORI
  • ORI
  • FPMUL
  • SWI

… SWI SWI LWI ORI ORI FPINVSQRT Bleid

  • FPDIV
  • ORI
  • FPDIV
  • ORI
  • ORI
  • FPMUL
  • SWI

Thread Number 1 2 3 4 5 6 7

slide-93
SLIDE 93

SIMD Execution

… SWI SWI LWI ORI ORI FPINVSQRT Bleid

  • FPDIV
  • ORI
  • FPDIV
  • ORI
  • ORI
  • FPMUL
  • SWI

… SWI SWI LWI ORI ORI FPINVSQRT Bleid

  • FPDIV
  • ORI
  • FPDIV
  • ORI
  • ORI
  • FPMUL
  • SWI

… SWI SWI LWI ORI ORI FPINVSQRT Bleid

  • FPDIV
  • ORI
  • FPDIV
  • ORI
  • ORI
  • FPMUL
  • SWI

… SWI SWI LWI ORI ORI FPINVSQRT Bleid

  • FPDIV
  • ORI
  • FPDIV
  • ORI
  • ORI
  • FPMUL
  • SWI

… SWI SWI LWI ORI ORI FPINVSQRT Bleid

  • FPDIV
  • ORI
  • FPDIV
  • ORI
  • ORI
  • FPMUL
  • SWI

… SWI SWI LWI ORI ORI FPINVSQRT Bleid

  • FPDIV
  • ORI
  • FPDIV
  • ORI
  • ORI
  • FPMUL
  • SWI

… SWI SWI LWI ORI ORI FPINVSQRT Bleid

  • FPDIV
  • ORI
  • FPDIV
  • ORI
  • ORI
  • FPMUL
  • SWI

… SWI SWI LWI ORI ORI FPINVSQRT Bleid

  • FPDIV
  • ORI
  • FPDIV
  • ORI
  • ORI
  • FPMUL
  • SWI

Thread Number 1 2 3 4 5 6 7

slide-94
SLIDE 94

SIMD Execution

… SWI SWI LWI ORI ORI FPINVSQRT Bleid

  • FPDIV
  • ORI
  • FPDIV
  • ORI
  • ORI
  • FPMUL
  • SWI

… SWI SWI LWI ORI ORI FPINVSQRT Bleid

  • FPDIV
  • ORI
  • FPDIV
  • ORI
  • ORI
  • FPMUL
  • SWI

… SWI SWI LWI ORI ORI FPINVSQRT Bleid

  • FPDIV
  • ORI
  • FPDIV
  • ORI
  • ORI
  • FPMUL
  • SWI

… SWI SWI LWI ORI ORI FPINVSQRT Bleid

  • FPDIV
  • ORI
  • FPDIV
  • ORI
  • ORI
  • FPMUL
  • SWI

… SWI SWI LWI ORI ORI FPINVSQRT Bleid

  • FPDIV
  • ORI
  • FPDIV
  • ORI
  • ORI
  • FPMUL
  • SWI

… SWI SWI LWI ORI ORI FPINVSQRT Bleid

  • FPDIV
  • ORI
  • FPDIV
  • ORI
  • ORI
  • FPMUL
  • SWI

… SWI SWI LWI ORI ORI FPINVSQRT Bleid

  • FPDIV
  • ORI
  • FPDIV
  • ORI
  • ORI
  • FPMUL
  • SWI

… SWI SWI LWI ORI ORI FPINVSQRT Bleid

  • FPDIV
  • ORI
  • FPDIV
  • ORI
  • ORI
  • FPMUL
  • SWI

Thread Number 1 2 3 4 5 6 7

slide-95
SLIDE 95

SIMD Execution

… SWI SWI LWI ORI ORI FPINVSQRT Bleid

  • FPDIV
  • ORI
  • FPDIV
  • ORI
  • ORI
  • FPMUL
  • SWI

… SWI SWI LWI ORI ORI FPINVSQRT Bleid

  • FPDIV
  • ORI
  • FPDIV
  • ORI
  • ORI
  • FPMUL
  • SWI

… SWI SWI LWI ORI ORI FPINVSQRT Bleid

  • FPDIV
  • ORI
  • FPDIV
  • ORI
  • ORI
  • FPMUL
  • SWI

… SWI SWI LWI ORI ORI FPINVSQRT Bleid

  • FPDIV
  • ORI
  • FPDIV
  • ORI
  • ORI
  • FPMUL
  • SWI

… SWI SWI LWI ORI ORI FPINVSQRT Bleid

  • FPDIV
  • ORI
  • FPDIV
  • ORI
  • ORI
  • FPMUL
  • SWI

… SWI SWI LWI ORI ORI FPINVSQRT Bleid

  • FPDIV
  • ORI
  • FPDIV
  • ORI
  • ORI
  • FPMUL
  • SWI

… SWI SWI LWI ORI ORI FPINVSQRT Bleid

  • FPDIV
  • ORI
  • FPDIV
  • ORI
  • ORI
  • FPMUL
  • SWI

… SWI SWI LWI ORI ORI FPINVSQRT Bleid

  • FPDIV
  • ORI
  • FPDIV
  • ORI
  • ORI
  • FPMUL
  • SWI

Thread Number 1 2 3 4 5 6 7

slide-96
SLIDE 96

SIMD Execution – Resource Replication

… SWI SWI LWI ORI ORI FPINVSQRT Bleid

  • FPDIV
  • ORI
  • FPDIV
  • ORI
  • ORI
  • FPMUL
  • SWI

… SWI SWI LWI ORI ORI FPINVSQRT Bleid

  • FPDIV
  • ORI
  • FPDIV
  • ORI
  • ORI
  • FPMUL
  • SWI

… SWI SWI LWI ORI ORI FPINVSQRT Bleid

  • FPDIV
  • ORI
  • FPDIV
  • ORI
  • ORI
  • FPMUL
  • SWI

… SWI SWI LWI ORI ORI FPINVSQRT Bleid

  • FPDIV
  • ORI
  • FPDIV
  • ORI
  • ORI
  • FPMUL
  • SWI

… SWI SWI LWI ORI ORI FPINVSQRT Bleid

  • FPDIV
  • ORI
  • FPDIV
  • ORI
  • ORI
  • FPMUL
  • SWI

… SWI SWI LWI ORI ORI FPINVSQRT Bleid

  • FPDIV
  • ORI
  • FPDIV
  • ORI
  • ORI
  • FPMUL
  • SWI

… SWI SWI LWI ORI ORI FPINVSQRT Bleid

  • FPDIV
  • ORI
  • FPDIV
  • ORI
  • ORI
  • FPMUL
  • SWI

… SWI SWI LWI ORI ORI FPINVSQRT Bleid

  • FPDIV
  • ORI
  • FPDIV
  • ORI
  • ORI
  • FPMUL
  • SWI

Thread Number 1 2 3 4 5 6 7

slide-97
SLIDE 97

SIMD Execution

… SWI SWI LWI ORI ORI FPINVSQRT Bleid

  • FPDIV
  • ORI
  • FPDIV
  • ORI
  • ORI
  • FPMUL
  • SWI

… SWI SWI LWI ORI ORI FPINVSQRT Bleid

  • FPDIV
  • ORI
  • FPDIV
  • ORI
  • ORI
  • FPMUL
  • SWI

… SWI SWI LWI ORI ORI FPINVSQRT Bleid

  • FPDIV
  • ORI
  • FPDIV
  • ORI
  • ORI
  • FPMUL
  • SWI

… SWI SWI LWI ORI ORI FPINVSQRT Bleid

  • FPDIV
  • ORI
  • FPDIV
  • ORI
  • ORI
  • FPMUL
  • SWI

… SWI SWI LWI ORI ORI FPINVSQRT Bleid

  • FPDIV
  • ORI
  • FPDIV
  • ORI
  • ORI
  • FPMUL
  • SWI

… SWI SWI LWI ORI ORI FPINVSQRT Bleid

  • FPDIV
  • ORI
  • FPDIV
  • ORI
  • ORI
  • FPMUL
  • SWI

… SWI SWI LWI ORI ORI FPINVSQRT Bleid

  • FPDIV
  • ORI
  • FPDIV
  • ORI
  • ORI
  • FPMUL
  • SWI

… SWI SWI LWI ORI ORI FPINVSQRT Bleid

  • FPDIV
  • ORI
  • FPDIV
  • ORI
  • ORI
  • FPMUL
  • SWI

Thread Number 1 2 3 4 5 6 7

slide-98
SLIDE 98

SIMD Execution

… SWI SWI LWI ORI ORI FPINVSQRT Bleid

  • FPDIV
  • ORI
  • FPDIV
  • ORI
  • ORI
  • FPMUL
  • SWI

… SWI SWI LWI ORI ORI FPINVSQRT Bleid

  • FPDIV
  • ORI
  • FPDIV
  • ORI
  • ORI
  • FPMUL
  • SWI

… SWI SWI LWI ORI ORI FPINVSQRT Bleid

  • FPDIV
  • ORI
  • FPDIV
  • ORI
  • ORI
  • FPMUL
  • SWI

… SWI SWI LWI ORI ORI FPINVSQRT Bleid

  • FPDIV
  • ORI
  • FPDIV
  • ORI
  • ORI
  • FPMUL
  • SWI

… SWI SWI LWI ORI ORI FPINVSQRT Bleid

  • FPDIV
  • ORI
  • FPDIV
  • ORI
  • ORI
  • FPMUL
  • SWI

… SWI SWI LWI ORI ORI FPINVSQRT Bleid

  • FPDIV
  • ORI
  • FPDIV
  • ORI
  • ORI
  • FPMUL
  • SWI

… SWI SWI LWI ORI ORI FPINVSQRT Bleid

  • FPDIV
  • ORI
  • FPDIV
  • ORI
  • ORI
  • FPMUL
  • SWI

… SWI SWI LWI ORI ORI FPINVSQRT Bleid

  • FPDIV
  • ORI
  • FPDIV
  • ORI
  • ORI
  • FPMUL
  • SWI

Thread Number 1 2 3 4 5 6 7

slide-99
SLIDE 99

SIMD Execution

… SWI SWI LWI ORI ORI FPINVSQRT Bleid

  • FPDIV
  • ORI
  • FPDIV
  • ORI
  • ORI
  • FPMUL
  • SWI

… SWI SWI LWI ORI ORI FPINVSQRT Bleid

  • FPDIV
  • ORI
  • FPDIV
  • ORI
  • ORI
  • FPMUL
  • SWI

… SWI SWI LWI ORI ORI FPINVSQRT Bleid

  • FPDIV
  • ORI
  • FPDIV
  • ORI
  • ORI
  • FPMUL
  • SWI

… SWI SWI LWI ORI ORI FPINVSQRT Bleid

  • FPDIV
  • ORI
  • FPDIV
  • ORI
  • ORI
  • FPMUL
  • SWI

… SWI SWI LWI ORI ORI FPINVSQRT Bleid

  • FPDIV
  • ORI
  • FPDIV
  • ORI
  • ORI
  • FPMUL
  • SWI

… SWI SWI LWI ORI ORI FPINVSQRT Bleid

  • FPDIV
  • ORI
  • FPDIV
  • ORI
  • ORI
  • FPMUL
  • SWI

… SWI SWI LWI ORI ORI FPINVSQRT Bleid

  • FPDIV
  • ORI
  • FPDIV
  • ORI
  • ORI
  • FPMUL
  • SWI

… SWI SWI LWI ORI ORI FPINVSQRT Bleid

  • FPDIV
  • ORI
  • FPDIV
  • ORI
  • ORI
  • FPMUL
  • SWI

Thread Number 1 2 3 4 5 6 7

slide-100
SLIDE 100

SIMD Execution

… SWI SWI LWI ORI ORI FPINVSQRT Bleid

  • FPDIV
  • ORI
  • FPDIV
  • ORI
  • ORI
  • FPMUL
  • SWI

… SWI SWI LWI ORI ORI FPINVSQRT Bleid

  • FPDIV
  • ORI
  • FPDIV
  • ORI
  • ORI
  • FPMUL
  • SWI

… SWI SWI LWI ORI ORI FPINVSQRT Bleid

  • FPDIV
  • ORI
  • FPDIV
  • ORI
  • ORI
  • FPMUL
  • SWI

… SWI SWI LWI ORI ORI FPINVSQRT Bleid

  • FPDIV
  • ORI
  • FPDIV
  • ORI
  • ORI
  • FPMUL
  • SWI

… SWI SWI LWI ORI ORI FPINVSQRT Bleid

  • FPDIV
  • ORI
  • FPDIV
  • ORI
  • ORI
  • FPMUL
  • SWI

… SWI SWI LWI ORI ORI FPINVSQRT Bleid

  • FPDIV
  • ORI
  • FPDIV
  • ORI
  • ORI
  • FPMUL
  • SWI

… SWI SWI LWI ORI ORI FPINVSQRT Bleid

  • FPDIV
  • ORI
  • FPDIV
  • ORI
  • ORI
  • FPMUL
  • SWI

… SWI SWI LWI ORI ORI FPINVSQRT Bleid

  • FPDIV
  • ORI
  • FPDIV
  • ORI
  • ORI
  • FPMUL
  • SWI

Thread Number 1 2 3 4 5 6 7

slide-101
SLIDE 101

SIMD Execution – SIMD Efficiency

… SWI SWI LWI ORI ORI FPINVSQRT Bleid

  • FPDIV
  • ORI
  • FPDIV
  • ORI
  • ORI
  • FPMUL
  • SWI

… SWI SWI LWI ORI ORI FPINVSQRT Bleid

  • FPDIV
  • ORI
  • FPDIV
  • ORI
  • ORI
  • FPMUL
  • SWI

… SWI SWI LWI ORI ORI FPINVSQRT Bleid

  • FPDIV
  • ORI
  • FPDIV
  • ORI
  • ORI
  • FPMUL
  • SWI

… SWI SWI LWI ORI ORI FPINVSQRT Bleid

  • FPDIV
  • ORI
  • FPDIV
  • ORI
  • ORI
  • FPMUL
  • SWI

… SWI SWI LWI ORI ORI FPINVSQRT Bleid

  • FPDIV
  • ORI
  • FPDIV
  • ORI
  • ORI
  • FPMUL
  • SWI

… SWI SWI LWI ORI ORI FPINVSQRT Bleid

  • FPDIV
  • ORI
  • FPDIV
  • ORI
  • ORI
  • FPMUL
  • SWI

… SWI SWI LWI ORI ORI FPINVSQRT Bleid

  • FPDIV
  • ORI
  • FPDIV
  • ORI
  • ORI
  • FPMUL
  • SWI

… SWI SWI LWI ORI ORI FPINVSQRT Bleid

  • FPDIV
  • ORI
  • FPDIV
  • ORI
  • ORI
  • FPMUL
  • SWI

Thread Number 1 2 3 4 5 6 7

slide-102
SLIDE 102

SPMD Execution

… SWI SWI LWI ORI ORI FPINVSQRT Bleid

  • FPDIV
  • ORI
  • FPDIV
  • ORI
  • ORI
  • FPMUL
  • SWI

… SWI SWI LWI ORI ORI FPINVSQRT Bleid

  • FPDIV
  • ORI
  • FPDIV
  • ORI
  • ORI
  • FPMUL
  • SWI

… SWI SWI LWI ORI ORI FPINVSQRT Bleid

  • FPDIV
  • ORI
  • FPDIV
  • ORI
  • ORI
  • FPMUL
  • SWI

… SWI SWI LWI ORI ORI FPINVSQRT Bleid

  • FPDIV
  • ORI
  • FPDIV
  • ORI
  • ORI
  • FPMUL
  • SWI

… SWI SWI LWI ORI ORI FPINVSQRT Bleid

  • FPDIV
  • ORI
  • FPDIV
  • ORI
  • ORI
  • FPMUL
  • SWI

… SWI SWI LWI ORI ORI FPINVSQRT Bleid

  • FPDIV
  • ORI
  • FPDIV
  • ORI
  • ORI
  • FPMUL
  • SWI

… SWI SWI LWI ORI ORI FPINVSQRT Bleid

  • FPDIV
  • ORI
  • FPDIV
  • ORI
  • ORI
  • FPMUL
  • SWI

… SWI SWI LWI ORI ORI FPINVSQRT Bleid

  • FPDIV
  • ORI
  • FPDIV
  • ORI
  • ORI
  • FPMUL
  • SWI

Thread Number 1 2 3 4 5 6 7

slide-103
SLIDE 103

SPMD Execution – Issue Rate

… SWI SWI LWI ORI ORI FPINVSQRT Bleid

  • FPDIV
  • ORI
  • FPDIV
  • ORI
  • ORI
  • FPMUL
  • SWI

… SWI SWI LWI ORI ORI FPINVSQRT Bleid

  • FPDIV
  • ORI
  • FPDIV
  • ORI
  • ORI
  • FPMUL
  • SWI

… SWI SWI LWI ORI ORI FPINVSQRT Bleid

  • FPDIV
  • ORI
  • FPDIV
  • ORI
  • ORI
  • FPMUL
  • SWI

… SWI SWI LWI ORI ORI FPINVSQRT Bleid

  • FPDIV
  • ORI
  • FPDIV
  • ORI
  • ORI
  • FPMUL
  • SWI

… SWI SWI LWI ORI ORI FPINVSQRT Bleid

  • FPDIV
  • ORI
  • FPDIV
  • ORI
  • ORI
  • FPMUL
  • SWI

… SWI SWI LWI ORI ORI FPINVSQRT Bleid

  • FPDIV
  • ORI
  • FPDIV
  • ORI
  • ORI
  • FPMUL
  • SWI

… SWI SWI LWI ORI ORI FPINVSQRT Bleid

  • FPDIV
  • ORI
  • FPDIV
  • ORI
  • ORI
  • FPMUL
  • SWI

… SWI SWI LWI ORI ORI FPINVSQRT Bleid

  • FPDIV
  • ORI
  • FPDIV
  • ORI
  • ORI
  • FPMUL
  • SWI

Thread Number 1 2 3 4 5 6 7

slide-104
SLIDE 104

SPMD Execution – Resource Sharing

… SWI SWI LWI ORI ORI FPINVSQRT Bleid

  • FPDIV
  • ORI
  • FPDIV
  • ORI
  • ORI
  • FPMUL
  • SWI

… SWI SWI LWI ORI ORI FPINVSQRT Bleid

  • FPDIV
  • ORI
  • FPDIV
  • ORI
  • ORI
  • FPMUL
  • SWI

… SWI SWI LWI ORI ORI FPINVSQRT Bleid

  • FPDIV
  • ORI
  • FPDIV
  • ORI
  • ORI
  • FPMUL
  • SWI

… SWI SWI LWI ORI ORI FPINVSQRT Bleid

  • FPDIV
  • ORI
  • FPDIV
  • ORI
  • ORI
  • FPMUL
  • SWI

… SWI SWI LWI ORI ORI FPINVSQRT Bleid

  • FPDIV
  • ORI
  • FPDIV
  • ORI
  • ORI
  • FPMUL
  • SWI

… SWI SWI LWI ORI ORI FPINVSQRT Bleid

  • FPDIV
  • ORI
  • FPDIV
  • ORI
  • ORI
  • FPMUL
  • SWI

… SWI SWI LWI ORI ORI FPINVSQRT Bleid

  • FPDIV
  • ORI
  • FPDIV
  • ORI
  • ORI
  • FPMUL
  • SWI

… SWI SWI LWI ORI ORI FPINVSQRT Bleid

  • FPDIV
  • ORI
  • FPDIV
  • ORI
  • ORI
  • FPMUL
  • SWI

Thread Number 1 2 3 4 5 6 7

slide-105
SLIDE 105

Ray Tracing Domain Features

¨ Ray tracing is “wonderfully parallel”

¤ Minimal communication and synchronization required

¨ All memory writes are to frame buffer only

¤ We can enforce write-around policy to keep caches clean ¤ Use local scratchpad memory for temporary variables

¨ Small program size makes for small fast Icaches ¨ Threads all at different places in program (out of sync)

¤ We can share various resources since all threads won’t be

using them at the same time

slide-106
SLIDE 106

Ray Tracing Domain Features

¨ Ray tracing is by nature divergent in control flow

¤ Pointer-chasing / tree search

¨ Consider lots of light-weight MIMD threads capable

  • f handling divergence

¤ Designed from ground up specifically for ray tracing

¨ But what about area overhead for MIMD?

¤ SPMD is better than full MIMD…

slide-107
SLIDE 107

TRaX

¨ If you could build a GPU that was customized for

ray tracing, what would it look like?

¤ Probably have lots of floating point units ¤ NVIDA/ATI GPUs organize them as wide SIMD

n For example, 32 threads in a “warp” n Great if all 32 threads truly do the exact same thing n Not so great if they branch…

¤ TRaX takes a more MIMD/SPMD approach

n Let the multiple threads each have their own PC n Letting the threads be out of sync has benefits…

slide-108
SLIDE 108

TRaX Architecture

Basic tile 32 core thread multiprocessor (TM) Chip

slide-109
SLIDE 109

TRaX Software Model

¨ Write a single-threaded ray tracer

¤ Copy this code to all thread processors

¨ Now make each thread use atomic increment to

help it decide which rays are its responsibility

¤ Let ‘em loose on the scene

slide-110
SLIDE 110

How well does it work?

¨ 2 int add, 8 FP mul and FP add, 1 invsqrt, and 2

16-banked instruction caches per TM

¨ 256KB, 16-bank L2 data cache x 4

¤ Total off-chip bandwidth: 52 GB/sec

¨ 20 TMs per L2 (80 total) ¨ Total of 2560 cores

slide-111
SLIDE 111

Comparison

  • We compare our architecture against the best

known wide-SIMD GPU ray tracer

  • Timo Aila (NVIDIA) et al, HPG09
  • Running on NVIDIA GTX 285
  • Used same scenes and rendering techniques

(shaders)

  • Compare performance/area and overall

performance

111

slide-112
SLIDE 112

Benchmark Scenes

Conference Room 282K triangles Fairy Forest 174K triangles Sibenik Cathedral 80K triangles

  • Primary rays only: ~1M rays per frame
  • Shading (w/secondary rays): ~34M rays per frame

112

slide-113
SLIDE 113

Results

MIMD/SPMD total area: 175mm^2 GTX285 total area: ~300mm^2 Both areas estimated at 65nm process

113

slide-114
SLIDE 114

Resource Area

SM = streaming multiprocessor, NVIDIA’s analogue to our TM

114

slide-115
SLIDE 115

Analysis

  • SPMD and resource sharing benefit from each other
  • threads get out of sync, resource requests become

evenly staggered

  • which results in high performance, small area
  • Small multi-banked icaches diminish area

requirement of SPMD instruction fetch

  • Without constraint of synchronized threads

115

slide-116
SLIDE 116

Comparison Conclusion

  • Wide SIMD and general purpose multi-core CPUs

seem over-provisioned for ray tracing

  • lightweight SPMD architecture with shared resources
  • ut-performs a highly tuned ray tracer on highly

evolved GPU hardware on realistic benchmark scenes

116

slide-117
SLIDE 117

Architecture Research

¨ How do people study new architectures?

¤ Build them and measure? ¤ Nope… Too expensive and time consuming

¨ Instead, we build simulators

¤ Functional simulators ¤ Trace-based simulators ¤ Cycle-accurate simulators ¤ Circuit simulators ¤ Analog circuit simulators

slide-118
SLIDE 118

Architecture Research

¨ How do people study new architectures?

¤ Build them and measure? ¤ Nope… Too expensive and time consuming

¨ Instead, we build simulators

¤ Functional simulators – simtrax x86 mode ¤ Trace-based simulators ¤ Cycle-accurate simulators – simtrax arch-sim mode ¤ Circuit simulators ¤ Analog circuit simulators

slide-119
SLIDE 119

Conclusion

¨ RT is not well supported by rasterizing GPUs or

parallel CPUs

¤ Wide SIMD and general purpose multi-core CPUs seem

  • ver-provisioned for ray tracing

¨ Leveraging RT behavior can improve HW

performance

¤ Lightweight SPMD architectures with shared resources

  • ut-perform a highly tuned ray tracer on highly evolved

GPU hardware and realistic benchmark scenes

slide-120
SLIDE 120

What’s the plan?

¨ Design and implement Ray Tracing variations on a

(simulated) parallel machine designed to be good at things like RT

¤ Start with everyone getting up to speed on a basic ray

tracer

¤ Then do projects on more advanced variations

slide-121
SLIDE 121

What’s the Plan?

¨ Analysis!

¤ All assignments will include an analysis of how the

application is behaving on the (simulated) hardware

¨ Simtrax has lots of analysis options

¤ Many of which are hard (or impossible) to see on real HW

n Energy, area, stalls (number and type), cache hit rates,

instruction counts, conflict counts, etc.

¤ Also, you can tweak various parameters to see their effect

n Number and type of functional units, cache size ,organization,

and configuration, memory interfaces, processor organization, etc.

n Essentially anything you can write code to simulate…

slide-122
SLIDE 122

What’s the plan?

¨ Hardware Platform

¤ TRaX – a many-core architecture designed for ray

tracing

¤ Quite different than a commercial GPU ¤ Exists as a detailed simulator

¨ Software Platform

¤ Compiler based on llvm-clang ¤ Generates x86 code (for running on your machine) ¤ Also generates TRaX assembly (for the simulator)

slide-123
SLIDE 123

Projects

¨ Extend our understanding of HW support for advanced

RT techniques…

¤ Beam tracing ¤ Ray bundles / vector operations ¤ Global illumination / Photon mapping / etc. ¤ Motion blur ¤ Ambient occlusion ¤ Animated scenes (acceleration structure rebuilds?) ¤ Participating media (fog, smoke, etc.) ¤ Alternative acceleration structures (grid, KD tree, etc.) ¤ Procedural texturing or geometry (mesh colors?) ¤ power saving techniques / hardware ¤ Rasterizing (!) ¤ Tessellations / procedural geometry

slide-124
SLIDE 124

Projects

¨ Extend our architecture that supports advanced RT

techniques… (i.e. enhance the simulator)

¤ Memory system enhancements ¤ Address generation units ¤ New function units ¤ Communication between thread processors ¤ Function unit chaining – configurable macro instructions ¤ Support for streaming data access

slide-125
SLIDE 125

Questions?

¨ Danny and Konstantin are the instructors

¤ I’m just the trouble maker

¨ Web site: www.eng.utah.edu/~cs6958 ¨ Mailing list

https://sympa.eng.utah.edu/sympa/info/cs6958

slide-126
SLIDE 126

Extra Pretty Pictures

slide-127
SLIDE 127

CS6620 Fall 2013

Christensen

slide-128
SLIDE 128

CS6620 Fall 2013

Romero

slide-129
SLIDE 129

CS6620 Fall 2013

McKenna

slide-130
SLIDE 130

CS6620 Fall 2013

Kumar K.

slide-131
SLIDE 131

CS6620 Fall 2013

Gifford

slide-132
SLIDE 132

CS6620 Spring 08

Pegoraro

slide-133
SLIDE 133

CS6620 Spring 08

Gallup

slide-134
SLIDE 134

CS6620 Spring 08

Kensler

slide-135
SLIDE 135

CS6620 Spring 08

Kensler variant B

slide-136
SLIDE 136

CS6620 Spring 08

Bavoil

slide-137
SLIDE 137

CS6620 Spring 08

Steffen

slide-138
SLIDE 138

CS6620 Spring 08

slide-139
SLIDE 139

CS6620 Spring 08

Luitjens

slide-140
SLIDE 140

CS6620 Spring 08

Ownby

slide-141
SLIDE 141

CS6620 Spring 08

Stratton