Why formal verification remains on the fringes of commercial - - PowerPoint PPT Presentation

why formal verification remains on the fringes of
SMART_READER_LITE
LIVE PREVIEW

Why formal verification remains on the fringes of commercial - - PowerPoint PPT Presentation

Why formal verification remains on the fringes of commercial development Arvind Computer Science & Artificial Intelligence Laboratory Massachusetts Institute of Technology WG2.8, Park City, Utah June 16, 2008 May 27, 2008 L1-1


slide-1
SLIDE 1

May 27, 2008 L1-1 http://csg.csail.mit.edu/arvind

Why formal verification remains

  • n the fringes of commercial

development

Arvind

Computer Science & Artificial Intelligence Laboratory Massachusetts Institute of Technology WG2.8, Park City, Utah June 16, 2008

slide-2
SLIDE 2

May 27, 2008 L1-2 http://csg.csail.mit.edu/arvind

A designer’s perspective

The goal is to design systems that meet some criteria such as cost, performance, power, compatibility, robustness, … The design effort and the time-to- market matter ($$$)

Can formal methods help?

slide-3
SLIDE 3

May 27, 2008 L1-3 http://csg.csail.mit.edu/arvind

Examples

IP Lookup in a router 802.11a Transmitter H.264 Video Codec OOO Processors Cache Coherence Protocols

Increasingly challenging

slide-4
SLIDE 4

May 27, 2008 L1-4 http://csg.csail.mit.edu/arvind

Example 1: Simple deterministic functionality

Internet router

Queue Manager Packet Processor Exit functions Control Processor

Line Card (LC)

IP Lookup

SRAM (lookup table)

Arbitration Switch

LC LC LC

A packet is routed based on the “Longest Prefix Match” (LPM) of it’s IP address with entries in a routing table Line rate and the order of arrival must be maintained line rate ⇒ 15Mpps for 10GE

slide-5
SLIDE 5

May 27, 2008 L1-5 http://csg.csail.mit.edu/arvind

“C” version of LPM

int lpm (IPA ipa) /* 3 memory lookups */ { int p; /* Level 0: 8 bits */ p = RAM [ipa[31:24]]; if (isLeaf(p)) return value(p); /* Level 1: 8 bits */ p = RAM [ipa[23:16]]; if (isLeaf(p)) return value(p); /* Level 2: 8 bits */ p = RAM [ptr(p) + ipa [15:8]]; if (isLeaf(p)) return value(p); /* Level 3: 8 bits */ p = RAM [ptr(p) + ipa [7:0]]; return value(p); /* must be a leaf */ }

Not obvious from the C code how to deal with

  • memory latency
  • pipelining

Must process a packet every 1/15 µs or 67 ns Must sustain 4 memory dependent lookups in 67 ns

Memory latency ~30ns to 40ns

Real LPM algorithms are more complex … … … …

28-1

slide-6
SLIDE 6

May 27, 2008 L1-6 http://csg.csail.mit.edu/arvind

An implementation:

Circular pipeline

enter? done?

RAM

yes inQ fifo no

  • utQ

Does the look up produce the right answer?

 Easy: check it against the C program

Performance concern: Are there any “dead cycles”?

 Has direct impact on memory cost

Do answers come out in the right order?

 Is it even possible to express in a given logic?  Alternative: The designer tags input messages and

checks that the tags are produced in order

slide-7
SLIDE 7

May 27, 2008 L1-7 http://csg.csail.mit.edu/arvind

Example 2: Dealing with Noise

802.11a Transmitter

Controller Scrambler Encoder Interleaver Mapper Cyclic Extend headers data accounts for 85% area

24 Uncoded bits

Must produce one OFDM symbol (64 Complex Numbers) every 4 µsec IFFT

slide-8
SLIDE 8

May 27, 2008 L1-8 http://csg.csail.mit.edu/arvind

Verification Issues

Control is straightforward

 Small amounts of testing against the C code

is sufficient, provided the arithmetic is implemented correctly

 C code may have to be instrumented to capture

the intermediate values in the FIFOs

 No corner cases in the computation in

various blocks

 High-confidence with a few correct packets

Still may be worthwhile proving that the (non standard) arithmetic library is implemented correctly

slide-9
SLIDE 9

May 27, 2008 L1-9 http://csg.csail.mit.edu/arvind

802.11a transceiver:

Higher-level correctness

Does the receiver actually recover the full class of corrupted packets as defined in the standard?

 Designers totally ignore this issue  This incorrectness is likely to have no impact on

sales

Who would know?

If we really wanted to test for this, we could do it by generating the maximally-correctable corrupted traffic

All these are purely academic questions!

slide-10
SLIDE 10

May 27, 2008 L1-10 http://csg.csail.mit.edu/arvind

Example 3: Lossy encodings

H.264 Video Decoder

The standard is 400+ pages of English; the standard implementation is 80K lines of convoluted C. Each is incomplete! Only viable correctness criterion is bit-level matching against the reference implementation on sample videos Parallelization is more complicated than what one may guess based on the dataflow diagram because of data-dependencies and feedback

NAL unwrap Parse + CAVLC Inverse Quant Transformation Deblock Filter Intra Prediction Inter Prediction Ref Frames Compresse d Bits Frames

Errors don’t matter much

slide-11
SLIDE 11

May 27, 2008 L1-11 http://csg.csail.mit.edu/arvind

H.264 Decoder:

Implementation

Different requirements for different environments

 QVGA 320x240p (30 fps)  DVD 720x480p  HD DVD 1280x720p (60-75 fps)

Each context requires a different amount of parallelism in different blocks

 Modular refinement is necessary  Verifying the correctness of refinements requires

traditional formal techniques (pipeline abstraction, etc.)

NAL unwrap Parse + CAVLC Inverse Quant Transformation Deblock Filter Intra Prediction Inter Prediction Ref Frames Compresse d Bits Frames

slide-12
SLIDE 12

May 27, 2008 L1-12 http://csg.csail.mit.edu/arvind

Example 4: Absolute Correctness is required

Microprocessor design

Empty Waiting Dispatched Killed Done E W Di K Do

Head Tail V -

  • Instr -

V - V -

  • Instr -

V - V -

  • Instr -

V - V -

  • Instr -

V - V -

  • Instr -

V - V -

  • Instr -

V - V -

  • Instr -

V - V -

  • Instr -

V - V -

  • Instr -

V - V -

  • Instr -

V - V 0

  • Instr B

V 0 W V 0

  • Instr C

V 0 W

  • Instr D

V 0 W V 0

  • Instr A

V 0 W V -

  • Instr -

V - V -

  • Instr -

V - E E E E E E E E E E E E V 0

Re-Order Buffer Insert an instr into ROB

Decode Unit Register File

Get operands for instr Writeback results Get a ready ALU instr Get a ready MEM instr Put ALU instr results in ROB Put MEM instr results in ROB

ALU Unit(s) MEM Unit(s)

Resolve branches

Operand 1 Result Instruction Operand 2 State

slide-13
SLIDE 13

May 27, 2008 L1-13 http://csg.csail.mit.edu/arvind

“Automated” Processor Verification

Models are abstracted from (real) designs

 UCLID – Bryant (CMU) : OOO Processor hand

translated into CLU logic (synthetic)

 Cadence SMV - McMillian : Tomasulo Algorithm

(hand written model. synthetic)

 ACL – Jay Moore: (Translate into Lisp)  …

Some property of the manually abstracted model is verified

 Great emphasis (and progress) on automated

decision procedures

Since abstraction is not automated it is not clear what is being verified!

BAT[Manolios et al] is a move in the right direction

slide-14
SLIDE 14

May 27, 2008 L1-14 http://csg.csail.mit.edu/arvind

Automatic extraction of abstract models from designs expressed in Verilog or C or SystemC is a lost cause

slide-15
SLIDE 15

May 27, 2008 L1-15 http://csg.csail.mit.edu/arvind

It took Joe Stoy more than 6 months to learn PVS and show that some of the proofs in Xiaowei Shen’s thesis were correct

This technology is not ready for design engineers

Example 5: nondeterministic specifications

Cache Coherence

slide-16
SLIDE 16

May 27, 2008 L1-16 http://csg.csail.mit.edu/arvind

Model Checking

CC is one of the most popular applications of model checking The abstract protocol needs to be abstracted more to avoid state explosion

 For example, only 3 CPUs, 2 addresses

There is a separate burden of proof why the abstraction is correct Nevertheless model checking is a very useful debugging aid for the verification of abstract CC protocols

slide-17
SLIDE 17

May 27, 2008 L1-17 http://csg.csail.mit.edu/arvind

Implementation

Design is expressed in some notation which is NOT used directly to generate an implementation

 The problem of verification of the actual

protocol remains formidable

 Testing cannot uncover all bugs because of

the huge non-deterministic space

Proving the correctness of cache coherence protocol implementations remains a challenging problem

slide-18
SLIDE 18

May 27, 2008 L1-18 http://csg.csail.mit.edu/arvind

Summary

The degree of correctness required depends upon the application

 Different applications require vastly different formal

and informal techniques

Formal tools must be tied directly to high-level design languages Formal techniques should be presented as debugging aids during the design process

 A designer is unlikely to do any thing for the sake of

helping the post design verification

The real success of a formal technique is when it is used ubiquitously without the designer being aware of it e.g., type systems