Verifying a Commercial Microprocessor Design at the RTL level Ken - - PowerPoint PPT Presentation
Verifying a Commercial Microprocessor Design at the RTL level Ken - - PowerPoint PPT Presentation
Verifying a Commercial Microprocessor Design at the RTL level Ken McMillan Cadence Berkeley Labs mcmillan@cadence.com We will consider some of the problems involved in verifying the actual RTL code of a commercial processor design, as
We will consider some of the problems involved in verifying the actual RTL code of a commercial processor design, as opposed to an architectural model. This is a work in progress...
Outline
- Methodology
- The PicoJava design
- Verification Strategy
- Problems
Proof Methodology
property decomposition model checking abstraction parameterization “circular” assume/guarantee proof
- divide into “units of work”
temporal “case splitting”
- identify resources used
abstract interpretation
- reduce to finite state
q →+ p p →+ q Gp ∧ Gq
“Circular” assume/guarantee
- Let p →+ q stand for
“if p up to time t-1, then q at t”
- Equivalent in LTL of
¬(p U ¬q)
- Now we can reason as follows:
That is, if neither p nor q is the first to be false, then both are always true.
Using a reference model
- Ref. Model
A B q p q →+ p p →+ q Gp ∧ Gq e.g., programmer’s model A and B each perform a “unit of work” refinement relations (temporal properties) “circular” proof:
Temporal case splitting
p1 p2 p3 p4 p5 v1 ... Idea: parameterize on most recent writer w at time t. φ: I'm O.K. at time t.
∀i: G((w=i) ⇒ φ) Gφ
Abstract interpretation
- Problem: variables range over unbounded set U
- Solution: reduce U to finite set Û by a
parameterized abstraction, e.g.,
where U\i represents all the values in U except i.
- Need a sound abstract interpretation, such
that:
if φ is valid in the abstraction, then, for all parameter valuations, φ is valid in the original. Û = {{i}, U\i}
Data type abstractions in SMV
- Examples:
– Equality – Function symbol application = {i} U\i {i} U\i 1 ⊥ ^ ^ x f(x) f(i) {i} U\i ⊥
Unbounded array reduced to one fixed element!
Note: truth value under abstraction may be ⊥... represents “no information”
Applying abstraction
pi v1 ... φ: I'm O.K. at time t.
φ →+ ((w=i) ⇒ φ) abstracted elements i.e, if pi is the most recent to modify v1, then v1 is correct. Must verify by model checking:
Review
- By a sequence of three steps:
– “circular” assume/guarantee reasoning
(restricts to one “unit of work”)
– case splitting (adding parameters)
(identifies resources used in that unit of work)
– abstraction interpretation
(abstracts away everything else)
...we reduce the verification of an unbounded system of processes to a finite state problem.
PicoJava
- Stack machine architecture
- Implements Java bytecode interpreter in
hardware
I$ D$
F
- l
d
Integer pipe Stack $
B u s I n t f
Mem u-Code
Instruction path
- We will concentrate on I$ and Fold units.
D e c
- d
e
Queue
15
I$ Align
B u s I n t f
Mem
8 4
PC PC F
- l
d
bytes insts
Specification strategy
- Since implementation is very large and
complex, we need a specification strategy that allows a fine-grain decomposition of the proof.
- Topics:
– Reference Model – Histories – Tags and Refinement Relations – Dealing with Exceptions
Reference Model
- Programmer’s view of Java machine (ISA)
– contains only programmer visible state Mem PC SP
PSR
Relating Impl to Ref Model
- Specify Impl w.r.t. reference model history
Mem PC SP
PSR
Ref Model ... History Complete state Implementation Refinement relation Interleave
Correctness criterion
- Correctness is defined as follows:
– There exists some interleaving of Impl and Ref, such that the given relation holds between Impl and history.
- Must choose a witness interleaving
– Any interleaving that ensures reference model “stays ahead of” the implementation. We use this approach because one step of implementation may correspond to many steps of reference model.
Multiple histories
- Instructions are a variable number of bytes
- Some parts of Impl deal with bytes, some with
instructions.
- Keep two histories:
– Byte level history (stream of instruction bytes) – Inst level history (stream of instructions) We could also record history at coarser granularity if needed...
Tags and refinement relations
- Tags are auxiliary state information
- Tags are pointers into a history (byte or inst)
- Tags flow with data
- Refinement relations
– Are temporal specifications of data correctness – Use tags to locate correct value of data in history Note, we sometimes have to prove equality of tags to show correct data flow
Tags for instruction path
D e c
- d
e
Queue
15
I$ Align
B u s I n t f
Mem
8 4
PC PC F
- l
d
bytes insts byte history tag inst history tag equality proof
+ + + +
incremented tag derived tag
= = =
Alignment between histories
- Comparing tags into byte and inst histories
– record byte history position of each inst ... Inst history ... Byte history
Dealing with Exceptions
- Exceptions (e.g., branch mispredictions)
– pipeline may be executing incorrect instructions – incorrect instructions must be flushed
- Specification strategy
– Define tag “max”
- latest instruction correctly fetched
– Data with tag after “max” is unspecified ... History max data correct data unspecified
Summary of approach
- Strategy
– Reference model/ Histories/ Tags
- Localization of verification
– Model checking can be localized to very small scale. – State explosion is not a problem.
Problems
Accidents happen to words
- Verification depends strongly on abstraction
- f data types.
– Use uninterpreted types and functions. – 32-bit word might be abstracted to:
{ a, b, ~ }
where a and b are parameters of a property.
- Problem:
– In RTL descriptions, words are often arbitrarily broken into bits and reassembled.
Example accident
- 8-bit register implemented in cells:
module reg8(clk,inp,out); input clk, inp[7:0];
- utput out[7:0];
reg1 cell0(clk,inp[0],out[0]); ... reg1 cell7(clk,inp[7],out[7]); endmodule The state is actually held in bits. How do we abstract the state?
Example Accident
- Verilog can’t make 2-D arrays!
module foo(bits,...); input bits[63:0]; byte0 = bits[7:0]; ... byte7 = bits[63:56]; ... Instead of an array of bytes, we get 64 bits!
A pragmatic approach
- If possible, verify property at bit level
– Words must not index large arrays – Can use “bit slicing”
- Else, use two-level approach
– Make intermediate model at word level – Verify properties using abstractions – Verify intermediate model at bit level This avoids re-modeling the entire design using uninterpreted types and functions.
Bit-field abstractions
- Words are often divided into fields
- Typical abstraction
– property has parameters t ($ Tag) and a ($ Addr) 31 $Tag $ Addr $ Off 4 14 31 {t,~} {a,~} {0..15} 4 14
But accidents happen...
- Adresses of many different bit lengths occur
31 $Tag $ Addr 4 14
Cache line
31 $Tag $ Addr $ Off 4 14 3 31 $Tag $ Addr 4 14
Half cache line
2 31 $Tag $ Addr 4 14
Word Byte
$ Addr 4 14
Cache location Since types are not structured, how does a tool know how to divide and abstract these bit vectors?
Manual approach
- Re-model using structured types
– i.e., instead of a bit vector, use: struct { tag : $TAG; addr : $ADDR;
- ffset : array 3..0 of boolean;
}
- Prove model correct at bit level
- Prove property using type-based abstractions
– examples: cache contents correctness, aligner
- utput, etc...
Mapping between representations
- Sometimes need to translate between
representations with uninterpreted functions
– example: 31 $Tag $ Addr $ Off 4 14 31 $Address ft fa fo finv (Must manually instantiate injectiveness axiom)
What’s needed?
- Ability to abstract any bit-field of a word
– conceptually straightforward
- Some heuristic method of grouping bits
together and assigning them types?
– less obvious Essentially, we need to be able to reverse-engineer a bit-level design into a structured design.
Incoherence
- Few processors implement ISA precisely
– makes writing a specification difficult
- Example: three incoherent caches in PicoJava
– Instruction (I) – Data (D) – Stack (S)
- How to handle mismatch between ISA and
Impl?
Solution (?)
- Mark every address as valid/invalid for I,D,S
- Example:
– I becomes valid when I$ line explicitly flushed – I becomes invalid when location written as data
- Assume program never reads invalid addresses
PC SP
PSR
Mem
IDS
Problem: Pipe delay means address is readable unknown number of clock cycles after flush instruction (???)
Accidental correctness
- Example:
– decode not one-hot until first queue load (!) – but, in PSR, Fold unit not enabled at reset – one instruction required to enable Fold unit – hence one-hot when Fold unit enabled! Queue
15
PC F
- l
d
bytes insts
Decode must be
- ne-hot here
Note, local property (one-hotness) depends on far away logic (PSR, integer unit, etc...). This is not written anywhere because no one actually knows why circuit works!
Conclusions (?)
- Compositional verification of real processors
at RTL level is possible.
– Reference model/ Histories/ Tags
- Several aspects of typical RTL descriptions