Improving Controller Synthesis from Esterel Cristian Soviani Jia - - PowerPoint PPT Presentation

improving controller synthesis from esterel
SMART_READER_LITE
LIVE PREVIEW

Improving Controller Synthesis from Esterel Cristian Soviani Jia - - PowerPoint PPT Presentation

Improving Controller Synthesis from Esterel Cristian Soviani Jia Zeng Stephen A. Edwards Department of Computer Science, Columbia University www.cs.columbia.edu/{soviani,jia,sedwards} {soviani,jia,sedwards}@cs.columbia.edu Why controllers ?


slide-1
SLIDE 1

Improving Controller Synthesis from Esterel

Cristian Soviani Jia Zeng Stephen A. Edwards Department of Computer Science, Columbia University www.cs.columbia.edu/˜{soviani,jia,sedwards} {soviani,jia,sedwards}@cs.columbia.edu

slide-2
SLIDE 2

Why controllers ?

Several state machines drive the bulk logic (data paths) Small area. Delay is critical. Even a simple “behaviour” leads to infernal RTL Most bugs are here. Verification is critical. Typical applications:

various device controllers (e.g. Eth. MAC) bus interfaces & arbiters scheduling pipelined units

Is it correct ? ——-

D Q init=1 D Q init=1 button led OFF -> BLINKING -> ON >

slide-3
SLIDE 3

A delicate compromise to avoid

Corectness vs. Performance

  • addr. decoder

select cs D Q R D Q D Q cs2 cs1 xfer_ack C C rst to data paths addr

from Xilinx, EDK3.1 docs, Designing Custom OPB Slave Peripherals for Microblaze A master who assumed control of the bus may terminate, or abort, the transfer at any time by deasserting select. All slaves are required to terminate the transfer in progress and reset their state machines if the select signal is deactivated ... if the select is deactivated in the cycle in which the slave would have activated xferAck, then the slave must deactivate the xferAck signal in this cycle from IBM, On-Chip Peripheral Bus, Architecture Specifications, v2.1

slide-4
SLIDE 4

Simplified OPB SSRAM controller

1 module opb_ram_ctrl: 2 input SEL, RNW, A3, A2, A1, A0; 3 output XFER_ACK; 4 output OREG_CE, OREG_RES; 5 output MEM_RD, MEM_WR; 6 loop 7 await [ SEL and A3 and ... ]; 8 abort 9 pause; 10 present RNW then 11 emit MEM_RD; pause; 12 emit OREG_CE; pause; 13 emit XFER_ACK 14 else 15 emit MEM_WR; emit XFER_ACK 16 end 17 when [ not SEL ]; 18 emit OREG_RES; 19 end loop 20 end module

You’ve already seen:

  • reads require more cycles that writes
  • deasserting select aborts the opera-

tion

  • I included no comments on purpose

Easy to modify. Try:

  • removing 8 and 17
  • removing 9
  • adding “pauses” between 11 and 12

I wrote the sample in 3’

slide-5
SLIDE 5

Why Controllers in Esterel ?

What language do we want? High level

simple to write / modify / understand : powerful sequential and concurrent flavored constructs deterministic : we can’t avoid mathematics high level verification : this is different from simulation

EFFICIENT Keep the abstraction near the technology

synchronous intuitive translation no “synthesis subset” jokes

Esterel is a good candidate

slide-6
SLIDE 6

Previsious work. Our Results

Esterel technologies : Esterel v5, IC, one-hot enc.

  • D. Potop : GRC, hierarchical enc.

Primary target: s/w. H/W synthesis relies on generic seq. optimization (sis/blifopt) CEC : Columbia Esterel Compiler Challenge : use high level info CEC generates corect & efficient circuits To do: improve the circuit delay

slide-7
SLIDE 7

Surface & Depth - Termination levels

input A, B, C, D;

  • term. level

s/d

  • utput X, Y, Z;

trap T in trap U, V in present A then pause end# 0,1/0 || present B then exit T end# 0,3/0 || pause; present C then exit U end# 1/0,2 || present D then exit V else 1,2/0,1 pause; pause# end handle U do emit X handle V do emit Y end trap handle T do emit Z end trap

Surface : hard start Depth : continue Term levels: 0 : terminated 1 : still running 2,3 ... exceptions The biggest level wins

slide-8
SLIDE 8

Sample Esterel code

module example: input R;

  • utput A, B, C, D;

every R do loop emit A; pause; emit B; pause end loop || emit C; pause; pause; emit D end every end module

Sample timing diagram

R

✁ ✁✂✁✁✁✂✁✁✁ ✂

A

✁✁ ✄✁✄ ✄✁✄ ✄

B

✁✁✂✁ ✁ ✁
  • C
✁✁ ✂✁✁✁✂✁✁✁✂ ✁ ✄

D

✁✁✂✁✁✂ ✂✁✁✁✂✁✁✂✁ ✁✁✂✁

Note the “strong” priority of “every R” which aborts the curent instructions and immediately restarts its body

slide-9
SLIDE 9

The CFG and ST

module example: input R;

  • utput A, B, C, D;

every R do loop emit A; pause; emit B; pause end loop || emit C; pause; pause; emit D end every end module

1 3 1 1 1 R 2 4 1 3 4 1 5 8 1 A B 1 9 D 1 10 2 7 6 1 1 1 A C 1 7 11 1 1 1 1 12 1 2 3 4 1 5 8 1 6 7 1 9 10 1 11 2

The Control Flow Ghaph (left) and the Selection Tree (above)

slide-10
SLIDE 10

Clock 0 : R=0 A=0 B=0 C=0 D=0

R

☎✆☎ ✝✁✞ ☎✆☎✟☎✆☎✆☎✟☎✆☎✆☎✟☎✆☎ ✝✂✞ ☎✟☎ ✝✂✞✁✝✂✞

A

☎✆☎✆☎ ✝✂✞✂✝✁✞✂✝✂✞✁✝✂✠✆✠✟✞✂✝✁✠✆✠✆✞✁✝✂✠

B

☎✆☎✆☎✟☎✆☎ ✝✁✞✂✝✂✞✁✝✂✞ ☎✟☎ ✝✁✞ ☎✆☎ ✝✂✞ ☎

C

☎✆☎✆☎ ✝✂✞ ☎✟☎✆☎✆☎✟☎✆☎✆☎✟☎✆☎✟☎ ✝✂✞ ☎✆☎ ✝✁✞✂✝✂✠

D

☎✆☎✆☎✟☎✆☎✆☎✟☎ ✝✂✞ ☎✆☎✆☎✟☎✆☎✟☎✆☎✆☎✟☎✆☎ ✝✁✞ ☎✆☎✟☎✆☎✟☎

every R do loop emit A; pause; emit B; pause end loop || emit C; pause; pause; emit D end every

1 3 1 1 1 R 2 4 1 3 4 1 5 8 1 A B 1 9 D 1 10 2 7 6 1 1 1 A C 1 7 11 1 1 1 1 12 1 2 3 4 1 5 8 1 6 7 1 9 10 1 11 2 1 12 1 2 3 4 1 5 8 1 6 7 1 9 10 1 11 2

slide-11
SLIDE 11

Clock 1 : R=0 A=0 B=0 C=0 D=0

R

☎✆☎ ✝✁✞ ☎✆☎✟☎✆☎✆☎✟☎✆☎✆☎✟☎✆☎ ✝✂✞ ☎✟☎ ✝✂✞✁✝✂✞

A

☎✆☎✆☎ ✝✂✞✂✝✁✞✂✝✂✞✁✝✂✠✆✠✟✞✂✝✁✠✆✠✆✞✁✝✂✠

B

☎✆☎✆☎✟☎✆☎ ✝✁✞✂✝✂✞✁✝✂✞ ☎✟☎ ✝✁✞ ☎✆☎ ✝✂✞ ☎

C

☎✆☎✆☎ ✝✂✞ ☎✟☎✆☎✆☎✟☎✆☎✆☎✟☎✆☎✟☎ ✝✂✞ ☎✆☎ ✝✁✞✂✝✂✠

D

☎✆☎✆☎✟☎✆☎✆☎✟☎ ✝✂✞ ☎✆☎✆☎✟☎✆☎✟☎✆☎✆☎✟☎✆☎ ✝✁✞ ☎✆☎✟☎✆☎✟☎

every R do loop emit A; pause; emit B; pause end loop || emit C; pause; pause; emit D end every

1 3 1 1 1 R 2 4 1 3 4 1 5 8 1 A B 1 9 D 1 10 2 7 6 1 1 1 A C 1 7 11 1 1 1 1 12 1 2 3 4 1 5 8 1 6 7 1 9 10 1 11 2 1 12 1 2 3 4 1 5 8 1 6 7 1 9 10 1 11 2

slide-12
SLIDE 12

Clock 2 : R=1 A=1 B=0 C=1 D=0

R

☎✆☎ ✝✁✞ ☎✆☎✟☎✆☎✆☎✟☎✆☎✆☎✟☎✆☎ ✝✂✞ ☎✟☎ ✝✂✞✁✝✂✞

A

☎✆☎✆☎ ✝✂✞✂✝✁✞✂✝✂✞✁✝✂✠✆✠✟✞✂✝✁✠✆✠✆✞✁✝✂✠

B

☎✆☎✆☎✟☎✆☎ ✝✁✞✂✝✂✞✁✝✂✞ ☎✟☎ ✝✁✞ ☎✆☎ ✝✂✞ ☎

C

☎✆☎✆☎ ✝✂✞ ☎✟☎✆☎✆☎✟☎✆☎✆☎✟☎✆☎✟☎ ✝✂✞ ☎✆☎ ✝✁✞✂✝✂✠

D

☎✆☎✆☎✟☎✆☎✆☎✟☎ ✝✂✞ ☎✆☎✆☎✟☎✆☎✟☎✆☎✆☎✟☎✆☎ ✝✁✞ ☎✆☎✟☎✆☎✟☎

every R do loop emit A; pause; emit B; pause end loop || emit C; pause; pause; emit D end every

1 3 1 1 1 R 2 4 1 3 4 1 5 8 1 A B 1 9 D 1 10 2 7 6 1 1 1 A C 1 7 11 1 1 1 1 12 1 2 3 4 1 5 8 1 6 7 1 9 10 1 11 2 1 12 1 2 3 4 1 5 8 1 6 7 1 9 10 1 11 2

slide-13
SLIDE 13

Clock 3 : R=0 A=0 B=1 C=0 D=0

R

☎✆☎ ✝✁✞ ☎✆☎✟☎✆☎✆☎✟☎✆☎✆☎✟☎✆☎ ✝✂✞ ☎✟☎ ✝✂✞✁✝✂✞

A

☎✆☎✆☎ ✝✂✞✂✝✁✞✂✝✂✞✁✝✂✠✆✠✟✞✂✝✁✠✆✠✆✞✁✝✂✠

B

☎✆☎✆☎✟☎✆☎ ✝✁✞✂✝✂✞✁✝✂✞ ☎✟☎ ✝✁✞ ☎✆☎ ✝✂✞ ☎

C

☎✆☎✆☎ ✝✂✞ ☎✟☎✆☎✆☎✟☎✆☎✆☎✟☎✆☎✟☎ ✝✂✞ ☎✆☎ ✝✁✞✂✝✂✠

D

☎✆☎✆☎✟☎✆☎✆☎✟☎ ✝✂✞ ☎✆☎✆☎✟☎✆☎✟☎✆☎✆☎✟☎✆☎ ✝✁✞ ☎✆☎✟☎✆☎✟☎

every R do loop emit A; pause; emit B; pause end loop || emit C; pause; pause; emit D end every

1 3 1 1 1 R 2 4 1 3 4 1 5 8 1 A B 1 9 D 1 10 2 7 6 1 1 1 A C 1 7 11 1 1 1 1 12 1 2 3 4 1 5 8 1 6 7 1 9 10 1 11 2 1 12 1 2 3 4 1 5 8 1 6 7 1 9 10 1 11 2

slide-14
SLIDE 14

Clock 4 : R=0 A=1 B=0 C=0 D=1

R

☎✆☎ ✝✁✞ ☎✆☎✟☎✆☎✆☎✟☎✆☎✆☎✟☎✆☎ ✝✂✞ ☎✟☎ ✝✂✞✁✝✂✞

A

☎✆☎✆☎ ✝✂✞✂✝✁✞✂✝✂✞✁✝✂✠✆✠✟✞✂✝✁✠✆✠✆✞✁✝✂✠

B

☎✆☎✆☎✟☎✆☎ ✝✁✞✂✝✂✞✁✝✂✞ ☎✟☎ ✝✁✞ ☎✆☎ ✝✂✞ ☎

C

☎✆☎✆☎ ✝✂✞ ☎✟☎✆☎✆☎✟☎✆☎✆☎✟☎✆☎✟☎ ✝✂✞ ☎✆☎ ✝✁✞✂✝✂✠

D

☎✆☎✆☎✟☎✆☎✆☎✟☎ ✝✂✞ ☎✆☎✆☎✟☎✆☎✟☎✆☎✆☎✟☎✆☎ ✝✁✞ ☎✆☎✟☎✆☎✟☎

every R do loop emit A; pause; emit B; pause end loop || emit C; pause; pause; emit D end every

1 3 1 1 1 R 2 4 1 3 4 1 5 8 1 A B 1 9 D 1 10 2 7 6 1 1 1 A C 1 7 11 1 1 1 1 12 1 2 3 4 1 5 8 1 6 7 1 9 10 1 11 2 1 12 1 2 3 4 1 5 8 1 6 7 1 9 10 1 11 2

slide-15
SLIDE 15

The CFG and PDG

1 3 1 1 1 R 2 4 1 3 4 1 5 8 1 A B 1 9 D 1 10 2 7 6 1 1 1 A C 1 7 11 1 1 1 1 R 1 3 1 2 1 3 1 7 A 11 C 4 5 8 4 1 9 1 10 2 7 A 6 B D

The PDG (above) is a more concurrent represen- tation of the CFG (left) Note that nodes 4, A, 7, C, 11 (on the right side

  • f the picture) have the same flow control
slide-16
SLIDE 16
  • Reincarnation. Schizophrenia

module reincarnation: input A;

  • utput X,Y;

loop signal S in trap T in present A then pause; emit S end; present S then emit X else emit Y end; || pause; exit T end trap end signal end loop end module

Can X and Y be both emitted in the same cy- cle?

signal S declares a local signal w/ default value 0 Consequence : because of the loop, S can have 2 values in the same cycle Some instructions can execute several times in the same cycle: different inputs - different results Dementia praecox

YES, they can

slide-17
SLIDE 17

Token Ring. Causality

module token_ring: output HELLOA, HELLOB, HELLOC; signal TKAB, TKBC, TKCA in emit TKCA || run host [ signal HELLOA/HELLO, TKCA/TKIN, TKAB/TKOUT ; constant 2/N] || run host [ signal HELLOB/HELLO, TKAB/TKIN, TKBC/TKOUT ; constant 3/N] || run host [ signal HELLOC/HELLO, TKBC/TKIN, TKCA/TKOUT ; constant 4/N] end signal end module module host: constant N : integer; input TKIN; output TKOUT, HELLO; signal REQ, ACK, GIFT in loop present [ TKIN and not GIFT ] then pause else await TKIN; emit GIFT end; present REQ then emit ACK; pause end; emit TKOUT end || loop await N tick; weak abort sustain REQ when immediate ACK; emit HELLO end loop end signal end module

slide-18
SLIDE 18

Blackbox State Machines

For each Exclusive node in the ST we build a state machine

9 10 11 1 12 2

SM9_hold SM9_goto_0 SM9_goto_2 SM9_goto_1 SM9_chk_0 SM9_chk_1 SM9_chk_2

The “scaffolding” combinational logic is synthesised by translating the PDG The blackbox state machines are synthesised using a given encoding for each one (heuristicaly determined, can be manually overwritten)

slide-19
SLIDE 19

Different encodings

SM9_hold SM9_goto_0 SM9_goto_2 SM9_goto_1 SM9_chk_0 SM9_chk_1 SM9_chk_2 D Q D Q D Q

  • ne hot encoding

S0 S1 S2 Q2 Q1 Q0

  • - 1
  • 1 -

1 - -

SM9_hold SM9_goto_2 SM9_goto_1 SM9_chk_0 SM9_chk_1 SM9_chk_2 D Q D Q S0 S1 S2 Q1 Q0 0 0

  • 1

1 -

The inputs SMgoto and SMhold are assumed mutually exclusive SMhold is used for the Suspend instruction ( a powerful construct simi- lar to UNIX Ctrl-Z ) The generation of these signals is not trivial and requires some kind of priority arbitration (see Enter and Suspend translation in the next slides)

slide-20
SLIDE 20

Translation of Emit, Test and Fork

w1

A

w2

A

w3

A

w1 w2 w3 A

Emit

w1

R

w2 w3

1

w1 w2 w3 R

Test

w1 w2 w3

w1 w2 w3

Fork

slide-21
SLIDE 21

Translation of Sync

w1 w2 w3

1

w4

1

w5

2

w6

2

w7

3

w8

3

w9 w10 w11 w12

1 2 3

The Sync node computes the maximum termination level of all its threads w8 w7 w6 w5 w2 w1 w9 w11 w12 Note: w3, w4, w10 are not used The translation is mainly a priority decoder Termination level 1 is spe- cially handled

slide-22
SLIDE 22

Translation of Switch and Enter

9 10 11 1 12 2

w1

9

w2 w3 w4

1 2

w1 w2 w3 w4 SM9_chk_0 SM9_chk_1 SM9_chk_2

Switch

w1

10 11

w2 w3

12

w1 w2 w3 SM9_goto_0 SM9_goto_1 SM9_goto_2

Enter

slide-23
SLIDE 23

Translation of Suspend

suspend loop emit VECT_ADD pause; emit VECT_MUL; pause end loop when [ not RDY ] Intuitively works like UNIX Ctrl-Z. If RDY is not present, the Sus- pend body is “frozen”. The exe- cution resumes when RDY is as- serted. Suspend instructions can be nested. We build a “sus- pend” OR net on the ST structure. This net drives the SMhold signals.

suspend C suspend A suspend B

slide-24
SLIDE 24

Translation of Counters

abort run Handshake when case [ 20 tick ] do emit TIMEOUT end abort This sample runs the Handshake

  • module. If the handshaking is not

finished in 20 clocks, it is aborted and the TIMEOUT signal is as- serted Counted predicates are a very useful Esterel construct D Q A start CE LD CY alarm The last state before alarm is one hot encoded

slide-25
SLIDE 25

The generated scaffolding circuit

sm0_goto_0 sm2_goto_0 sm2_goto_1 sm8_goto_2 sm8_goto_1 D sm8_goto_0 A B sm5_goto_1 sm5_goto_0 C R sm5_chk_0 sm5_chk_1 sm8_chk_1 sm8_chk_0 sm8_chk_2 sm2_chk_1 sm2_chk_0 sm0_chk_0 sm0_chk_1

slide-26
SLIDE 26

The final SIS - optimized circuit

D A B C R

SIS : script.rugged

Xilinx XC2V2000-ff896-4 FPGA has 4 input LUTS The circuit will be 1 level : 2ns period For comparision, a 16 bit adder (registered I/O) has a 5.3 ns pe- riod

slide-27
SLIDE 27

Last slide

Questions ? Suggestions ? CEC-0.2 can be downloaded at landc.cs.columbia.edu/projects.html Feel free to play. We are waiting for feedback.