INTERCONNECTION NETWORKS Mahdi Nazm Bojnordi Assistant Professor - - PowerPoint PPT Presentation

interconnection networks
SMART_READER_LITE
LIVE PREVIEW

INTERCONNECTION NETWORKS Mahdi Nazm Bojnordi Assistant Professor - - PowerPoint PPT Presentation

INTERCONNECTION NETWORKS Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah CS/ECE 7810: Advanced Computer Architecture Overview Upcoming deadline Feb.3 rd : project group formation No groups have sent me


slide-1
SLIDE 1

INTERCONNECTION NETWORKS

CS/ECE 7810: Advanced Computer Architecture

Mahdi Nazm Bojnordi

Assistant Professor School of Computing University of Utah

slide-2
SLIDE 2

Overview

¨ Upcoming deadline

¤ Feb.3rd: project group formation ¤ No groups have sent me emails!

¨ This lecture

¤ Cache interconnects ¤ Basics of the interconnection networks ¤ Network topologies ¤ Flow control

slide-3
SLIDE 3

Where Interconnects Are Used?

¨ About 60% of the dynamic power in modern

microprocessors is dissipated in on-chip interconnects

[Magen’04] [Intel Core i7]

  • Six processor cores
  • 8MB Last level cache
slide-4
SLIDE 4

Cache Interconnect Optimizations

slide-5
SLIDE 5

Large Cache Organization

¨ Fewer subarrays gives increased area efficiency,

but larger delay due to longer wordlines/bitlines

[Aniruddha’09]

NDWL = 4 NDBL = 4 H-TREE SUBARRAY

Interconnect

Cache Core Cache Core Cache Core Cache Core

slide-6
SLIDE 6

Large Cache Energy Consumption

¨ H-tree is clearly the dominant component of

energy consumption

[Aniruddha’09]

H-tree Decoder Wordlines Bitline mux & drivers Senseamp mux & drivers Bitlines Sense amplifier Sub-array output drivers

90%

slide-7
SLIDE 7

¨ A global wire management at the microarchitecture level ¨ A heterogeneous interconnect that is comprised of wires with

varying latency, bandwidth, and energy characteristics

Heterogeneous Interconnects

[Balasubramonian’05]

slide-8
SLIDE 8

¨ Better energy-efficiency for a dynamically scheduled

partitioned architecture

¤ ED2 is reduced by 11% ¨ A low-latency low-bandwidth network can be effectively used

to hide wire latencies and improve performance

¨ A high-bandwidth low-energy network and an instruction

assignment heuristic are effective at reducing contention cycles and total processor energy.

Heterogeneous Interconnects

[Balasubramonian’05]

slide-9
SLIDE 9

Non-Uniform Cache Architecture

¨ NUCA optimizes energy and time based on the

proximity of the cache blocks to the cache controller.

2MB @ 130nm Bank Access time = 3 cycles Interconnect delay = 8 cycles 16MB @ 50nm Bank Access time = 3 cycles Interconnect delay = 44 cycles

[Kim’04]

slide-10
SLIDE 10

Non-Uniform Cache Architecture

¨ S-NUCA-1 ¤ Use private per-bank channel ¤ Each bank has its distinct access latency ¤ Statically decide data location for its given address ¤ Average access latency =34.2 cycles ¤ Wire overhead = 20.9% à an issue

Tag Array Data Bus Address Bus Bank Sub-bank Predecoder Sense amplifier Wordline driver and decoder

[Kim’04]

slide-11
SLIDE 11

Non-Uniform Cache Architecture

¨ S-NUCA-2 ¤ Use a 2D switched network to alleviate wire area overhead ¤ Average access latency =24.2 cycles ¤ Wire overhead = 5.9%

[Kim’04] Bank Data bus Switch Tag Array Wordline driver and decoder Predecoder

slide-12
SLIDE 12

Non-Uniform Cache Architecture

¨ Dynamic NUCA

¤ Data can dynamically migrate ¤ Move frequently used cache lines closer to CPU

[Kim’04]

8 bank sets way 0 way 1 way 2 way 3

  • ne set

bank

slide-13
SLIDE 13

Non-Uniform Cache Architecture

¨ Fair mapping

¤ Average access time across all bank sets are equal

8 bank sets way 0 way 1 way 2 way 3

  • ne set

bank

slide-14
SLIDE 14

Non-Uniform Cache Architecture

¨ Shared mapping

¤ Sharing the closet banks for farther banks

8 bank sets way 0 way 1 way 2 way 3 bank

slide-15
SLIDE 15

Encoding Based Optimizations

slide-16
SLIDE 16

¨ Bus invert coding transfers either the data or its complement to

minimize the number of bit flips on the bus.

Cache Interconnect Optimizations

Old data New data Old data New data 0 0 1 0 1 0 0 1 0 0 1 0 0 1 1 0 1 1 0 1 1 1 1 0

[Stan’95]

2 switching DD

P CV f a =

slide-17
SLIDE 17

Time-Based Data Transfer

¨ The percentage of processor energy expended on

an 8MB cache when running a set of parallel applications on a Sun Niagara-like multicore processor

[Bojnordi’13]

Relative CPU Energy

slide-18
SLIDE 18

Time-Based Data Transfer

¨ Communication over the long, capacitive H-tree

interconnect is the dominant source of energy consumption (80% on average) in the L2 cache

[Bojnordi’13]

Relative Cache Energy

slide-19
SLIDE 19

Key idea: represent information by the number of clock cycles between two consecutive pulses to reduce interconnect activity factor.

Time-Based Data Transfer

1 2 3 4 5

Time (cycles)

Parallel Data Transfer Time Based Data Transfer Example: transmitting the value 5 Serial Data Transfer Fixed Dynamic Energy Fixed Transfer Time [Bojnordi’13]

slide-20
SLIDE 20

Time-Based Data Transfer

¨ Cache blocks are partitioned into small, contiguous chunks.

[Bojnordi’13]

slide-21
SLIDE 21

Time-Based Data Transfer

[Bojnordi’13]

slide-22
SLIDE 22

Time-Based Data Transfer

¨ L2 cache energy is reduced by 1.8x at the cost of

less than 2% increase in the execution time.

0.2 0.4 0.6 0.8 1 1.2 0.5 1

Execution Time Normalized to the Binary Encoding L2 Cache Energy Normalized to the Binary Encoding

DESC Dynamic Zero Compression Bus Invert Coding 30% 40% [Bojnordi’13]

slide-23
SLIDE 23

Interconnection Networks

slide-24
SLIDE 24

Interconnection Networks

¨ Goal: transfer maximum amount of information with

the minimum time and power

¨ Connects processors, memories, caches, and I/O

devices

Interconnection Network CPU Mem CPU Mem CPU Mem CPU Mem CPU Mem CPU Mem

slide-25
SLIDE 25

Types of Interconnection Networks

¨ Four domains based on number and proximity of

devices

¤ On-chip networks (OCN or NOC)

n Microarchitectural elements: cores, caches, reg. files, etc.

¤ System/storage area networks (SAN)

n Computer subsystems: storage, processor, IO device, etc.

¤ Local area networks (LAN)

n Autonomous computer systems: desktop computers etc.

¤ Wide area networks (WAN)

n Interconnected computers distributed across the globe

slide-26
SLIDE 26

Basics of Interconnection Networks

¨ Network topology

¤ How to wire switches and nodes in the network

¨ Routing algorithm

¤ How to transfer a message from source to destination

¨ Flow control

¤ How to control the flow messages within the network