adding slow silent virtual channels f or low power on
play

Adding Slow- Silent Virtual Channels f or Low- Power On- Chip - PowerPoint PPT Presentation

Adding Slow- Silent Virtual Channels f or Low- Power On- Chip Networks Hiroki Mat sut ani (Keio Univ, J apan) Michihiro Koibuchi (NI I , J apan) Daihan Wang (Keio Univ, J apan) Hideharu Amano (Keio Univ, J apan) I am very


  1. Adding Slow- Silent Virtual Channels f or Low- Power On- Chip Networks Hiroki Mat sut ani (Keio Univ, J apan) Michihiro Koibuchi (NI I , J apan) Daihan Wang (Keio Univ, J apan) Hideharu Amano (Keio Univ, J apan)

  2. I am very sorry… • My f light was canceled on April 6. • I was wait ing f or rebooking at airport f or seven hours, but I couldn’t get a t icket . I got a f ever. • I arrived at Newcast le on April 7. • I couldn’t f ind my baggage; I wore only a shirt . • My hot el reservat ion was canceled w/ o asking; I didn’t have a place t o sleep… • I went t o anot her hot el t o book a room in my shirt sleeves in t he rain. The f ever was gone up. • Ms. J erder kindly did her present at ion on Apr 8. • I would like t hank her and ASYNC/ NOCS program commit t ee.

  3. Volt age and f requency scaling Power gat ing Adding Slow- Silent Virtual Channels f or Low- Power On- Chip Networks Hiroki Mat sut ani (Keio Univ, J apan) Michihiro Koibuchi (NI I , J apan) Daihan Wang (Keio Univ, J apan) Hideharu Amano (Keio Univ, J apan)

  4. I ntroduction: Area and power • Due t o t he f inger process t echnology, – Area const raint is relaxed – But power densit y becomes more serious • Adding ext ra hardware resources (e.g., VCs) – We can get a perf ormance margin; so – We can reduce volt age and f requency t o reduce power VC# 0 VC# 0 VC# 0 VC# 1 VC# 1 VC# 1 I ssues t o be t ackled in t his present at ion VC# 2 VC# 2 VC# 2 • Adding ext ra hardware increases t he leakage power Rout er (a) Rout er (b) Rout er (c) • How much resource is required t o minimize t ot al power

  5. Outline: Slow- silent virtual channels • Net work-on-Chip (NoC) • On-Chip Rout er – Archit ect ure and it s power consumpt ion • Slow-silent virt ual channels – Volt age and f requency scaling – Run-t ime power gat ing of virt ual channels – Adapt ive VC act ivat ion • Evaluat ions (1VC, 2VC, 3VC, and 4VC) – Throughput – Power consumpt ion (wit h PG & volt age f req scaling) – How many VCs are required t o minimize power

  6. Network- on- Chip (NoC) • Processor core Processor core Rout er – Largest component – Various low-power t echniques are used e.g., St andby current 11uA [Ishikawa,IEICE’05] • On-chip rout er – Area is not so large – Always preparing (act ive) f or packet inj ect ion An example t ile archit ect ure The next slides show “Rout er archit ect ure” and “I t s power” (ASPLA 90nm CMOS)

  7. On- Chip Router: Architecture • 5-input 5-out put rout er (dat a widt h is 64-bit ) Each VC has a FI FO Each physical buf f er (4 x 64 bit s) channel has 2 VCs ARBITER X+ X+ FIFO X- X- FIFO Y+ Y+ FIFO Y- Y- FIFO 5x5 XBAR CORE CORE FIFO HW amount is 34 kilo gat es and 64% of area is used f or FI FO

  8. On- Chip Router: Pipeline • A header f lit goes t hrough a rout er in 3 cycles – RC (Rout ing comput at ion) – VSA (Virt ual channel / Swit ch allocat ion) – ST (Swit ch t raversal) A packet consist s of a header and 3 dat a f lit s • E.g., Packet t ransf er f rom rout er A t o C @ROUTER B @ROUTER C @ROUTER A RC VSA ST HEAD RC VSA ST RC VSA ST DATA 1 ST ST ST ST DATA 2 ST ST ST ST ST DATA 3 1 2 3 4 5 6 7 8 9 10 11 12 ELAPSED TIME [CYCLE]

  9. On- Chip Router: Power consumption • Place-and-rout ed wit h 90nm CMOS • Post layout simulat ion at 200MHz Packet swit ching power is large � Volt age f req scaling Power consumpt ion of a rout er when n port s are used [mW] A rout er consumes more power as t he rout er processes more packet s

  10. On- Chip Router: Power consumption Packet swit ching power is large � Volt age f req scaling Power consumpt ion when no port is used � st andby power Leakage (55.0%) Dynamic (45.0%) Channels (49.4%) is t he largest � Runt ime power gat ing Leakage of channel buf Standby power of the on-chip router

  11. Outline: Slow- silent virtual channels • Net work-on-Chip (NoC) • On-Chip Rout er – Archit ect ure and it s power consumpt ion • Slow-silent virt ual channels – Volt age and f requency scaling – Run-t ime power gat ing of virt ual channels – Adapt ive VC act ivat ion • Evaluat ions (1VC, 2VC, 3VC, and 4VC) – Throughput – Power consumpt ion (wit h PG & volt age f req scaling) – How many VCs are required t o minimize power

  12. Slow- Silent Virtual Channels • Adding ext ra VCs • Volt age & f requency – Perf ormance improves scaling (VFS) – Set t he reduced volt age and f requency – I n response t o t he Perf ormance margin perf ormance margin − α ( V V ) ∝ = ⋅ ⋅ ⋅ 2 th f P a C f V switching CV • Problem Lat ency vs. accept ed t raf f ic – Adding ext ra VCs 2-VC 3-VC 1-VC 4-VC increases leakage power – We can reduce volt age – I t may overwhelm VFS and f requency We f ocus on run-t ime power gat ing of VCs t o reduce leakage

  13. Power Gating of virtual channels • Run-t ime power gat ing of virt ual channels No packet s in a VC � Sleep (t urn of f t he power supply) – Packet arrives at t he VC � Wakeup (t urn on t he power) – ARBI TER X+ X+ sleep X- X- sleep Y+ Y+ sleep Y- Y- sleep 5x5 XBAR CORE CORE sleep

  14. Power Gating of virtual channels • Run-t ime power gat ing of virt ual channels No packet s in a VC � Sleep (t urn of f t he power supply) – Packet arrives at t he VC � Wakeup (t urn on t he power) – ARBI TER X+ X+ sleep X- X- sleep Y+ Y+ sleep Y- Y- sleep Link shut down has been st udied f or on- & of f -chip net works, 5x5 XBAR but prior work uses SRAM buf f ers [Chen,ISLPED’03] [Soteriou,TPDS’07] CORE CORE sleep � We use small regist ered FI FOs f or light -weight NoC rout ers

  15. Power Gating: Various overheads Pipeline st all of a • Area overhead rout er occurs Sleep Active – Power swit ches FIFO FIFO • Perf ormance overhead Wait ing f or – Wakeup delay channel wakeup – Pipeline st all is caused � Frequent on/ of f should be avoided • Power overhead – Driving power swit ches – Short sleeps adversely increases dynamic power � Frequent on/ of f should be avoided

  16. Power Gating: Various overheads Pipeline st all of a • Area overhead rout er occurs Sleep Active – Power swit ches FIFO FIFO • Perf ormance overhead Wait ing f or – Wakeup delay channel wakeup – Pipeline st all is caused Power switch � Frequent on/ of f should be avoided Vdd sleep • Power overhead – Driving power swit ches Virt ual Vdd – Short sleeps adversely Circuit block increases dynamic power � Frequent on/ of f should be avoided GND Cont rol t hat gradually act ivat es VCs in response t o workload

  17. Power Gating: VC activation policy • Virt ual channel (VC) level power gat ing • Virt ual-channel select ion: – All packet s use VC# 0 when t hey are inj ect ed t o NoC – VC number is increased when t he packet conf lict s VC#0 VC#0 VC#0 VC#1 VC#1 VC#1 Only VC# 0 is used if workload is low VC#2 VC#2 VC#2 Rout er (a) Rout er (b) Rout er (c)

  18. Power Gating: VC activation policy • Virt ual channel (VC) level power gat ing • Virt ual-channel select ion: – All packet s use VC# 0 when t hey are inj ect ed t o NoC – VC number is increased when t he packet conf lict s All VCs are act ivat ed if workload is high VC#0 VC#0 VC#0 VC#1 VC#1 VC#1 VC#2 VC#2 VC#2 Rout er (a) Rout er (b) Rout er (c) High peak perf ormance of VCs wit h t he least leakage power

  19. Power Gating: Routing design • A virt ual-channel layer – A virt ual net work consist ing of VCs wit h t he same VC# • Deadlock-f reedom [Duato,TPDS’93] [Koibuchi,ICPP’03] – Moving upper t o lower layers VC# 0 � VC# 1 � VC# 2 � VC# 3 – Only bot t om layer must guarant ee deadlock-f reedom All VC layers except f or t he bot t om can employ any rout ing, VC# 0 VC# 0 VC# 0 VC Layer # 0 as f ar as t he bot t om guarant ees deadlock-f ree by it self VC# 1 VC# 1 VC# 1 VC Layer # 1 VC# 2 VC# 2 VC# 2 VC Layer # 2 VC# 3 VC# 3 VC# 3 VC Layer # 3 Rout er (a) Rout er (b) Rout er (c)

  20. Outline: Slow- silent virtual channels • Net work-on-Chip (NoC) • On-Chip Rout er – Archit ect ure and it s power consumpt ion • Slow-silent virt ual channels – Volt age and f requency scaling – Run-t ime power gat ing of virt ual channels – Adapt ive VC act ivat ion • Evaluat ions (1VC, 2VC, 3VC, and 4VC) – Throughput – Power consumpt ion (wit h PG & volt age f req scaling) – How many VCs are required t o minimize power

  21. Evaluations of slow- silent VCs • Preliminary • Process t echnology – Leakage modeling of PG – ASPLA 90nm CMOS – Breakeven point of PG – 1.00V (baseline) • Simulat ion paramet ers • Evaluat ion it ems – Original t hroughput Topology 2-D Mesh (8x8) – Power consumpt ion w/ o Rout ing DOR (XY rout ing) PG and VFS Buf f er size 4-f lit (WH swit ching) – Power consumpt ion w/ # of VCs 1VC, 2VC, 3VC, 4VC PG and VFS Lat ency 3-cycle per 1-hop • Traf f ic pat t erns • Which is t he best ? – Unif rom + NPB t races – 1VC, 2VC, 3VC, and 4VC (BT, SP, CG, MG, I S)

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend