E E Energy Efficiency in Energy Efficiency in Effi i Effi i i - PowerPoint PPT Presentation

E E Energy Efficiency in Energy Efficiency in Effi i Effi i i i Graphics Rendering Graphics Rendering Graphics Rendering Graphics Rendering Preeti Ranjan Panda Department of Computer Science and Engineering Indian Institute of Technology Delhi Indian Institute of Technology Delhi Presentation at TU Dortmund, June 2011 J

Graphics Power Consumption Graphics Power Consumption p p p p Desktop computer Mobile computer Cooling Fan 4% 4% Rest R CPU CPU 13% 7% Chipset Power Power Supply 13% Supply Loss Loss VR VR Loss 22% 1% 7% HDD/ Other DVD 7% 7% 14.1' 9% Monitor 56% HDD/ LCD DVD 33% 4% 4% Graphics CPU 14% 4% Graphics 6% [ Ref : PC Energy-EfficiencyTrends and Technology, source: intel.com] B. V. N. Silpa and P. R. Panda, 2011

Observation Observation Observation Observation � GPU/Graphics rendering power is � GPU/Graphics rendering power is significant (greater than CPU) � Yet, very little research on GPU energy efficiency! y ◦ GPU performance was/is primary ◦ Proprietary GPU architectures ◦ Proprietary GPU architectures B. V. N. Silpa and P. R. Panda, 2011

Graphics Pipeline Graphics Pipeline Graphics Pipeline Graphics Pipeline From CPU T T exture exture Vertex Vertex Setup and Setup and Fragment Fragment Image Image Command Command Display Display Clipping processor Rasterize Processor Composition processor Receives Receives Delete Delete Generate Generate Pixel Pixel Blend with Blend with Transform Transform vertices and unseen pixels Coloring Frame vertices to commands part of and Buffer screen from CPU from CPU scene Z Z-test space and d Light B. V. N. Silpa and P. R. Panda, 2011

Adding Energy Efficiency Adding Energy Efficiency Adding Energy Efficiency Adding Energy Efficiency IIT Delhi – Intel Component Level: Collaboration From TEXTURE MAPPING CPU T T exture exture Vertex Vertex Setup and Setup and Fragment Fragment Image Image Command Command Display Display Clipping processor Rasterize Processor Composition processor Receives Receives Delete Delete Generate Generate Pixel Pixel Blend with Blend with Transform Transform vertices and unseen pixels Coloring Frame vertices to commands part of and Buffer screen from CPU from CPU scene Z Z-test space and d Light System Level: DVFS System Level: DVFS B. V. N. Silpa and P. R. Panda, 2011

LOW POWER LOW POWER TEXTURE MAPPING TEXTURE MAPPING TEXTURE MAPPING TEXTURE MAPPING [ICCAD’08] [ICCAD’08] [ [ ] ] B. V. N. Silpa and P. R. Panda, 2011

Power Profile of the Pipeline Power Profile of the Pipeline Power Profile of the Pipeline Power Profile of the Pipeline 100% gy ized energ 80% 80% 60% 40% 40% Normali 20% 0% City Fire Teapot Tunnel Benchmark Transform and lighting Transform and lighting Setup and rasterize Setup and rasterize Texture Memory Texture Memory Fragment processing Frame buffer write T T exture memory consumes 30-40% of total power. exture memory consumes 30 40% of total power. B. V. N. Silpa and P. R. Panda, 2011

Texture Mapping Texture Mapping Texture Mapping Texture Mapping � Add detail and surface texture to an object. � Reduces the modeling effort for the programmer. g p g T exture Mapped pp Object j T exture Object B. V. N. Silpa and P. R. Panda, 2011

Texture Filtering Texture Filtering � Texture space and object space could be at arbitrary angles to each other g � Nearest neighbor � Nearest neighbor � Bilinear interpolation : B C weighted average of four we g te ave age o ou texels nearest to the pixel C center. A B B B. V. N. Silpa and P. R. Panda, 2011

Texture Access Pattern Texture Access Pattern � Texture mapping exhibits high spatial and temporal locality (tx,ty) (tx+1,ty) Pixel � Bilinear filtering requires 4 Bilinear filtering requires 4 center center neighbouring texels (tx+1,ty+1) (tx,ty+1) � Neighbouring pixels map to spatially local texels ll l l l C � Repetitive textures A B B. V. N. Silpa and P. R. Panda, 2011

Blocking and Texture Cache Blocking and Texture Cache � Blocked Representation ◦ T T exels stored as 4x4 blocks l d 4 4 bl k ◦ Reduces dependency on texture orientation, and exploits spatial locality p y � Texture memory accessed through a Cache hierarchy (“TEXTURE CACHE”) y ( ) � Familiar architectural space � BUT, application knowledge could help improve the U , app cat o ow e ge cou e p p ove t e HW over a “standard cache” B. V. N. Silpa and P. R. Panda, 2011

Predictability in Texture Accesses Predictability in Texture Accesses Predictability in Texture Accesses Predictability in Texture Accesses (bx,by) (bx1,by) � Access to first texel gives information about access to the (tx,ty) (tx+1,ty) next 3 texels next 3 texels Pixel Pixel center � The four texels could be mapped to either one, two or four (tx+1,ty+1) (tx,ty+1) neighbouring blocks. i hb i bl k (bx by1) (bx,by1) (bx1 by1) (bx1,by1) Case 2 Case 3 Case 4 Case 1 B. V. N. Silpa and P. R. Panda, 2011

Low Power Texture Memory Architecture Low Power Texture Memory Architecture Low Power Texture Memory Architecture Low Power Texture Memory Architecture � Lower power memory architecture than � Lower power memory architecture than Cache for texturing ◦ Use a few registers to filter accesses to U f i t t filt t blocks expected to be reused ◦ Access stream has predictability - controlled Access stream has redictabilit c ntr lled access mechanism reduces tag lookups B. V. N. Silpa and P. R. Panda, 2011

How many blocks to buffer? How many blocks to buffer? How many blocks to buffer? How many blocks to buffer? � Need to buffer up to 4 blocks Buffer Texture Buffer Array Texture Buffer Array � A buffer is a set of 4x4 registers, each 32 bit � T exture Buffer Array is a group of 4 such buffers B. V. N. Silpa and P. R. Panda, 2011

Texture Lookup Texture Lookup Texture Lookup Texture Lookup � Case 1: ◦ ◦ Lookup (block0) Lookup (block0) Get the 4 texels from the block using offsets � SAVING: 3 LOOKUPS � Cases 2 & 3: � ◦ Lookup (block 0) p ( ) Get texel0 and texel1 from this block � ◦ Lookup (block 2) Get texel2 and texel3 from this block � ◦ SAVING: 2 LOOKUPS B. V. N. Silpa and P. R. Panda, 2011

Contd Contd Contd.. Contd.. � Case 4: ◦ ◦ Lookup all 4 blocks and get the texels Lookup all 4 blocks and get the texels from the respective blocks using offsets Power Savings from: Reduced Tag lookups Reduced Tag lookups Smaller buffer than cache B. V. N. Silpa and P. R. Panda, 2011

Distribution of access among various cases Distribution of access among various cases Distribution of access among various cases Distribution of access among various cases Distribution of accesses among the four cases 60% 50% 40% 40% 30% 20% 10% 0% case 1 case 2 case 3 case 4 Number of comparisons per access is 1.38 instead of 4 B. V. N. Silpa and P. R. Panda, 2011

Architecture of Architecture of Architecture of Texture Architecture of Texture exture Filter exture Filter ilter Memory ilter Memory emory emory From L1 Cache Cur Level 512-bit Controller Enable Bank Sel Hit Load Bank I Bank-I R/W R/W T exel Addr Comp-I (256 bytes) Fetch Block Addr index ADDR 4 4 Unit Bank-II Add C Addr Comp-II II (256 bytes) Offset TBA Load Hit 32-bit = = T o Filter REG = inde EN NCODER ex Block REG Addr = REG = Addr Comp B. V. N. Silpa and P. R. Panda, 2011 REG

Hit Rate into TFM Hit Rate into TFM Hit Rate into TFM Hit Rate into TFM Hit Rate 100% 100% 80% 60% 40% 20% 0% 0% Fire Teapot Tunnel Gloss Gearbox Sphere 16KB 2-way assoc 512B direct mapped 512B fully asscoc TFM TFM gives 4.5% better hit rate than a direct mapped filter of the same size g ves .5% bette t ate t a a ect appe te o t e sa e s e B. V. N. Silpa and P. R. Panda, 2011

Energy per Access Energy per Access Energy per Access Energy per Access Energy per Access 0.1 0.08 rgy(nJ) 0.06 Ener 0.04 0.02 0 0 Fire Teapot Tunnel Gloss Gearbox Sphere 16KB 2-way assoc L1 512B direct mapped L1 512B direct mapped filter 512B direct mapped filter 512B Full assoc filter 512B Full assoc filter TFM TFM consumes 75% lesser energy than the conventional T co su es 75% esse e e gy t a t e co ve t o a e tu e cac e exture cache B. V. N. Silpa and P. R. Panda, 2011

Texture Filter Memory Summary Texture Filter Memory Summary Texture Filter Memory Summary Texture Filter Memory Summary � In addition to high spatial locality texture � In addition to high spatial locality, texture mapping access pattern also has predictability p y � Replaced high energy cache lookups with low energy register buffer reads gy g � TFM consumes ~75% lesser energy than conventional texture mapping system pp g y � Overheads: ◦ TFM access 4x faster than cache access TFM access 4x faster than cache access ◦ 0.48% area overhead over texture cache subsystem y B. V. N. Silpa and P. R. Panda, 2011

DYNAMIC DYNAMIC VOLTAGE AND VOLTAGE AND FREQUENCY SCALING FREQUENCY SCALING FREQUENCY SCALING FREQUENCY SCALING ( (DVFS) ( (DVFS) ) ) [CODES+ISSS’10] [CODES+ISSS’10] B. V. N. Silpa and P. R. Panda, 2011

E E Energy Efficiency in Energy Efficiency in Effi i Effi i i - PowerPoint PPT Presentation

E E Energy Efficiency in Energy Efficiency in Effi i Effi i i i Graphics Rendering Graphics Rendering Graphics Rendering Graphics Rendering Preeti Ranjan Panda Department of Computer Science and Engineering Indian Institute of

Overv rview of f Li Links between NDCs and Building Energy Effi ficiency Webinar Linking

El Paso Electric El Paso Electric Energy Efficiency Energy Efficiency Standard Offer Programs -

NHEC Perspectives on Energy NHEC Perspectives on Energy Efficiency and Sustainable Energy

DANFOSS ESCO PARTNERSHI P FOR AN ENERGY EFFI CI ENT TOMORROW 1 | Danfoss Drives |

RE RENEWA NEWABLE BLES S & ENER & ENERGY GY EFFICI EFFI CIENCY ENCY IN B IN

India s Energy Efficiency India s Energy Efficiency Standards & Labeling Program

Farm Energy IQ Farms Today Securing Our Energy Future Dairy Farm Energy Efficiency Gary

2018 DTE Energy Incentive and Rebate Program 1 ENERGY EFFICIENCY PROGRAM FOR BUSINESS Jacob

Accelerating Energy Efficiency Delivering Global Energy Efficiency Goals and the offer of

SinkPAD TM Thermally Efficient PCB Technology for Th ll Effi i t PCB T h l f the LED

CHIT CHIT1 is is a a no novel the therapeutic tar arget in in IPF: an anti-fibrotic effi

SinkPAD TM Thermally Efficient PCB Technology for Th ll Effi i t PCB T h l f the LED

BI OLOGI CAL FUEL EFFI CI ENCY & ENVI RONMENTAL SOLUTI ONS 2 0 1 7 XMI LE TODAY 0 2 XMI LE

Acad cademic emic Se Self lf-effi efficacy, cacy, Engagement, gagement, and nd Ach

and Effi ficient Speculative Execution JIYONG YU, NAMRATA MANTRI, JOSEP TORRELLAS, ADAM

EFFI C I EN T C OL U M N AR STORAG E W I TH APACH E PARQU ET Ranganathan Balashanmugam, Aconex

State-of-the Art in the Measurement of Page 1 Pavement Surface Characteristics PDRG-RPUG 1 st

Between Renderers with MDL Jan Jordan Software Product Manager MDL March 18, GTC San Jose 2019

Analysis and processing of SPM data Introduction Gwyddion is a free software developed by two

S7444 - WHAT THE PROFILER IS TELLING YOU: OPTIMIZING GPU KERNELS Christoph Angerer, Jakob

157 East 72 nd St Draft Presentation in Preparation for Hearing Window Date July 23, 2019

(ICC/ASHRAE 700) 2018 Revision Process April 18-19, 2017 Meeting of the Consensus Committee and

N PAHs in sediment Boiler Ponds 1 2 Wastewater Ponds Fire Water Pond In fi ltra on Over

TIA TR TIA TR- -42 Liaison to IEEE 802.3 42 Liaison to IEEE 802.3 Valerie Maguire Valerie

E E Energy Efficiency in Energy Efficiency in Effi i Effi i i - PowerPoint PPT Presentation

E E Energy Efficiency in Energy Efficiency in Effi i Effi i i i Graphics Rendering Graphics Rendering Graphics Rendering Graphics Rendering Preeti Ranjan Panda Department of Computer Science and Engineering Indian Institute of

Overv rview of f Li Links between NDCs and Building Energy Effi ficiency Webinar Linking

El Paso Electric El Paso Electric Energy Efficiency Energy Efficiency Standard Offer Programs -

NHEC Perspectives on Energy NHEC Perspectives on Energy Efficiency and Sustainable Energy

DANFOSS ESCO PARTNERSHI P FOR AN ENERGY EFFI CI ENT TOMORROW 1 | Danfoss Drives |

RE RENEWA NEWABLE BLES S &amp; ENER &amp; ENERGY GY EFFICI EFFI CIENCY ENCY IN B IN

India s Energy Efficiency India s Energy Efficiency Standards &amp; Labeling Program

Farm Energy IQ Farms Today Securing Our Energy Future Dairy Farm Energy Efficiency Gary

2018 DTE Energy Incentive and Rebate Program 1 ENERGY EFFICIENCY PROGRAM FOR BUSINESS Jacob

Accelerating Energy Efficiency Delivering Global Energy Efficiency Goals and the offer of

SinkPAD TM Thermally Efficient PCB Technology for Th ll Effi i t PCB T h l f the LED

CHIT CHIT1 is is a a no novel the therapeutic tar arget in in IPF: an anti-fibrotic effi

SinkPAD TM Thermally Efficient PCB Technology for Th ll Effi i t PCB T h l f the LED

BI OLOGI CAL FUEL EFFI CI ENCY &amp; ENVI RONMENTAL SOLUTI ONS 2 0 1 7 XMI LE TODAY 0 2 XMI LE

Acad cademic emic Se Self lf-effi efficacy, cacy, Engagement, gagement, and nd Ach

and Effi ficient Speculative Execution JIYONG YU, NAMRATA MANTRI, JOSEP TORRELLAS, ADAM

EFFI C I EN T C OL U M N AR STORAG E W I TH APACH E PARQU ET Ranganathan Balashanmugam, Aconex

State-of-the Art in the Measurement of Page 1 Pavement Surface Characteristics PDRG-RPUG 1 st

Between Renderers with MDL Jan Jordan Software Product Manager MDL March 18, GTC San Jose 2019

Analysis and processing of SPM data Introduction Gwyddion is a free software developed by two

S7444 - WHAT THE PROFILER IS TELLING YOU: OPTIMIZING GPU KERNELS Christoph Angerer, Jakob

157 East 72 nd St Draft Presentation in Preparation for Hearing Window Date July 23, 2019

(ICC/ASHRAE 700) 2018 Revision Process April 18-19, 2017 Meeting of the Consensus Committee and

N PAHs in sediment Boiler Ponds 1 2 Wastewater Ponds Fire Water Pond In fi ltra on Over

TIA TR TIA TR- -42 Liaison to IEEE 802.3 42 Liaison to IEEE 802.3 Valerie Maguire Valerie

RE RENEWA NEWABLE BLES S & ENER & ENERGY GY EFFICI EFFI CIENCY ENCY IN B IN

India s Energy Efficiency India s Energy Efficiency Standards & Labeling Program

BI OLOGI CAL FUEL EFFI CI ENCY & ENVI RONMENTAL SOLUTI ONS 2 0 1 7 XMI LE TODAY 0 2 XMI LE