How to Design Fast Asynchronous How to Design Fast Asynchronous - - PowerPoint PPT Presentation

how to design fast asynchronous how to design fast
SMART_READER_LITE
LIVE PREVIEW

How to Design Fast Asynchronous How to Design Fast Asynchronous - - PowerPoint PPT Presentation

How to Design Fast Asynchronous How to Design Fast Asynchronous Routers for Asynchronous Routers for Asynchronous On- -chip Networks chip Networks On Wei Song Supervisor: Doug Edwards Advanced Processor Technologies Group Advanced


slide-1
SLIDE 1

2009-10-28 Advanced Processor Technology Group The School of Computer Science

How to Design Fast Asynchronous How to Design Fast Asynchronous Routers for Asynchronous Routers for Asynchronous On On-

  • chip Networks

chip Networks

Wei Song Supervisor: Doug Edwards Advanced Processor Technologies Group

slide-2
SLIDE 2

2009-10-28 Advanced Processor Technology Group The School of Computer Science

Index

  • What is

What is asynchronous circuit asynchronous circuit? ?

  • Why to use on-chip network?
  • Why asynchronous on-chip network is

slow?

  • How can we improve it?
  • So, what’s next?
slide-3
SLIDE 3

2009-10-28 Advanced Processor Technology Group The School of Computer Science

Synchronous Circuit

  • Pipeline style
  • Strict timing assumption
  • A global clock driven by a balanced tree
slide-4
SLIDE 4

2009-10-28 Advanced Processor Technology Group The School of Computer Science

Asynchronous Circuits – C-element

A B Q’ X Q X Q 1 1 1

slide-5
SLIDE 5

2009-10-28 Advanced Processor Technology Group The School of Computer Science

Asynchronous Pipeline

  • Handshake
  • Nearly delay insensitive (no timing assumptions)
  • Power efficient (no global clock)
  • Complicated (larger area)
slide-6
SLIDE 6

2009-10-28 Advanced Processor Technology Group The School of Computer Science

Index

  • What is asynchronous circuit?
  • Why to use

Why to use on

  • n-
  • chip network

chip network? ?

  • Why asynchronous on-chip network is

slow?

  • How can we improve it?
  • So, what’s next?
slide-7
SLIDE 7

2009-10-28 Advanced Processor Technology Group The School of Computer Science

Bus Based Multiprocessor System

  • A shared communication fabric
  • One master at one time
  • Bandwidth constrained
  • Fixed communication latency
slide-8
SLIDE 8

2009-10-28 Advanced Processor Technology Group The School of Computer Science

A Mesh Network-on-Chip (NoC)

Processor

router

Processor

router

Processor

router

Processor

router

Processor

router

Processor

router

Processor

router

Processor

router

Processor

router

  • Distributed

communication resource

  • Scalable bandwidth
  • Multiple master and

slave pairs at a time

  • Variable

communication latency

slide-9
SLIDE 9

2009-10-28 Advanced Processor Technology Group The School of Computer Science

The Router for NoC

Arb Arb Arb A r b

North South L

  • c

a l

  • 5 ports
  • Duplex channels
  • Input buffer
  • Arbiter
  • Crossbar (Muxes)
slide-10
SLIDE 10

2009-10-28 Advanced Processor Technology Group The School of Computer Science

Data Path of a NoC

Processor

router

Processor

router

Processor

router

Processor

router

Processor

router

Processor

router

Processor

router

Processor

router

Processor

router Arb Arb Arb Arb

North South L

  • c

a l

slide-11
SLIDE 11

2009-10-28 Advanced Processor Technology Group The School of Computer Science

Index

  • What is asynchronous circuit?
  • Why to use on-chip network?
  • Why

Why asynchronous on asynchronous on-

  • chip network

chip network is slow? is slow?

  • How can we improve it?
  • So, what’s next?
slide-12
SLIDE 12

2009-10-28 Advanced Processor Technology Group The School of Computer Science

A 4-bit Synchronous Pipeline

  • Data are synchronised by the global clock
  • No significant speed difference with the 1-

bit pipeline

slide-13
SLIDE 13

2009-10-28 Advanced Processor Technology Group The School of Computer Science

A 4-bit Asynchronous Pipeline

d0i d1i d2i d3i d0o d1o d2o d3o acko acki

slide-14
SLIDE 14

2009-10-28 Advanced Processor Technology Group The School of Computer Science

Reasons of the Low Speed

  • Asynchronous pipelines deliberately

detect the arrival of data

  • A big C-element tree in the loop!
slide-15
SLIDE 15

2009-10-28 Advanced Processor Technology Group The School of Computer Science

Index

  • What is asynchronous circuit?
  • Why to use on-chip network?
  • Why asynchronous on-chip network is

slow?

  • How can we improve it?

How can we improve it?

  • So, what’s next?
slide-16
SLIDE 16

2009-10-28 Advanced Processor Technology Group The School of Computer Science

Channel Slicing

d0i d1i d2i d3i d0o d1o d2o d3o acko acki d0i d1i d2i d3i d0o d1o d2o d3o

slide-17
SLIDE 17

2009-10-28 Advanced Processor Technology Group The School of Computer Science

Re-Synchronisation (1)

slide-18
SLIDE 18

2009-10-28 Advanced Processor Technology Group The School of Computer Science

Re-Synchronisation (2)

slide-19
SLIDE 19

2009-10-28 Advanced Processor Technology Group The School of Computer Science

Re-Synchronisation (3)

slide-20
SLIDE 20

2009-10-28 Advanced Processor Technology Group The School of Computer Science

Hardware Implementation

  • Verilog

HDL+STG(Petrify)

  • Layout Implementation
  • Faraday 130 nm

Technology

  • 12.6K Gates

(50,000um2)

  • 0.3*0.3mm2
  • Channel Sliced 450MHz
  • Synchronised 360MHz
slide-21
SLIDE 21

2009-10-28 Advanced Processor Technology Group The School of Computer Science

Performance

slide-22
SLIDE 22

2009-10-28 Advanced Processor Technology Group The School of Computer Science

Index

  • What is asynchronous circuit?
  • Why to use on-chip network?
  • Why asynchronous on-chip network is

slow?

  • How can we improve it?
  • So, what

So, what’ ’s next? s next?

slide-23
SLIDE 23

2009-10-28 Advanced Processor Technology Group The School of Computer Science

Spatial Division Multiplex

  • Frequently Re-synchronisation will

compromise the speed

  • Sub-channels should run

independently

  • Sub-channels could transmit different

messages

  • Multiple messages could be

transmitted by the same channel but

  • n different sub-channels
slide-24
SLIDE 24

2009-10-28 Advanced Processor Technology Group The School of Computer Science

Spatial Division Multiplex (con.)

slide-25
SLIDE 25

2009-10-28 Advanced Processor Technology Group The School of Computer Science

Conclusion

  • Asynchronous Circuits

– Delay insensitive, low power

  • On-chip Network

– Distributed communication fabric, scalable bandwidth

  • Asynchronous On-chip Network

– The C-element tree in synchronisation compromises speed

  • Channel Slicing

– Let sub-channels run independently, fast

  • SDM

– Let more messages share the fabric simultaneously

slide-26
SLIDE 26

2009-10-28 Advanced Processor Technology Group The School of Computer Science

Thanks!