Memory Consistency Verification of Hardware Yatin A. Manerkar - - PowerPoint PPT Presentation

memory consistency verification
SMART_READER_LITE
LIVE PREVIEW

Memory Consistency Verification of Hardware Yatin A. Manerkar - - PowerPoint PPT Presentation

Automated Formal Memory Consistency Verification of Hardware Yatin A. Manerkar Princeton University June 23 rd , 2019 http:/ ://www.c .cs.p .princeton.edu/~manerkar 1 The Rise of Parallelism 2 [Image: K. Rupp, M. Horowitz et al.] The


slide-1
SLIDE 1

Yatin A. Manerkar

Automated Formal Memory Consistency Verification

  • f Hardware

http:/ ://www.c .cs.p .princeton.edu/~manerkar

Princeton University June 23rd, 2019

1
slide-2
SLIDE 2

The Rise of Parallelism…

[Image: K. Rupp, M. Horowitz et al.] 2
slide-3
SLIDE 3

The Rise of Parallelism…

[Image: K. Rupp, M. Horowitz et al.]

Moore’s Law and end

  • f Dennard Scaling:

stagnation of single- threaded performance

2
slide-4
SLIDE 4

The Rise of Parallelism…

[Image: K. Rupp, M. Horowitz et al.]

Parallelism fuels modern performance improvements

2
slide-5
SLIDE 5

…and Heterogeneity (Example: Apple A12)

3
slide-6
SLIDE 6

…and Heterogeneity (Example: Apple A12)

3

“Big” CPUs

slide-7
SLIDE 7

…and Heterogeneity (Example: Apple A12)

3

“Big” CPUs “Little” CPUs

slide-8
SLIDE 8

…and Heterogeneity (Example: Apple A12)

3

“Big” CPUs “Little” CPUs GPUs

slide-9
SLIDE 9

…and Heterogeneity (Example: Apple A12)

3

“Big” CPUs “Little” CPUs GPUs ML Accelerator

slide-10
SLIDE 10

…and Heterogeneity (Example: Apple A12)

3

“Big” CPUs “Little” CPUs GPUs ML Accelerator

slide-11
SLIDE 11

…and Heterogeneity (Example: Apple A12)

3

“Big” CPUs “Little” CPUs GPUs ML Accelerator

Parallel processors are hard to get right! How can we formally verify parallel hardware?

slide-12
SLIDE 12

Building a Formally Verified Processor

4

Formal Methods Expert

  • Build proven-correct processor (e.g. Kami) or…
slide-13
SLIDE 13

Building a Formally Verified Processor

4

Formal Methods Expert

  • Build proven-correct processor (e.g. Kami) or…
  • …construct formal model of implementation

and verify that (REMS)

slide-14
SLIDE 14

Building a Formally Verified Processor

4

Formal Methods Expert

  • Build proven-correct processor (e.g. Kami) or…
  • …construct formal model of implementation

and verify that (REMS)

  • Formal methods expert carries most of the

verification burden

slide-15
SLIDE 15

Building a Formally Verified Processor

4

Formal Methods Expert

  • Build proven-correct processor (e.g. Kami) or…
  • …construct formal model of implementation

and verify that (REMS)

  • Formal methods expert carries most of the

verification burden

  • Experts on building processors
  • Generally not much formal methods

expertise

  • Can they share more of the

verification burden?

Computer Architect

slide-16
SLIDE 16

Building a Formally Verified Processor

4

Formal Methods Expert

  • Build proven-correct processor (e.g. Kami) or…
  • …construct formal model of implementation

and verify that (REMS)

  • Formal methods expert carries most of the

verification burden

  • Experts on building processors
  • Generally not much formal methods

expertise

  • Can they share more of the

verification burden?

Computer Architect

My work: Automated tools that enable engineers to formally verify their systems by themselves! Case Study: Memory Consistency Verification

slide-17
SLIDE 17

Talk Outline

▪Overview ▪Memory Consistency Background ▪PipeProof: All-Program Microarchitectural MCM Verification ▪RTLCheck: MCM Verification of Verilog RTL ▪Expanding to other domains ▪Conclusion

5
slide-18
SLIDE 18

Processors Communicate via Shared Memory

6

“Big” CPUs “Little” CPUs GPUs ML Accelerator

slide-19
SLIDE 19

What does this program print?

Th Thread 0 Th Thread 1 ❶x = 1; ❸if (y == 1) print("Answer is:"); ❷y = 1; ❹if (x == 1) print("42");

slide-20
SLIDE 20

What does this program print?

Th Thread 0 Th Thread 1 ❶x = 1; ❸if (y == 1) print("Answer is:"); ❷y = 1; ❹if (x == 1) print("42");

Can it print “Answer is: 42”? Yes, eg: ❶❷❸❹

slide-21
SLIDE 21

What does this program print?

Th Thread 0 Th Thread 1 ❶x = 1; ❸if (y == 1) print("Answer is:"); ❷y = 1; ❹if (x == 1) print("42");

Can it print “Answer is: 42”? How about just “42”? Yes, eg: ❶❷❸❹ Yes, eg: ❶❸❹❷

slide-22
SLIDE 22

What does this program print?

Th Thread 0 Th Thread 1 ❶x = 1; ❸if (y == 1) print("Answer is:"); ❷y = 1; ❹if (x == 1) print("42");

Can it print “Answer is: 42”? How about just “42”? Could it print nothing? Yes, eg: ❶❷❸❹ Yes, eg: ❶❸❹❷ Yes, eg: ❸❹❶❷

slide-23
SLIDE 23

What does this program print?

Th Thread 0 Th Thread 1 ❶x = 1; ❸if (y == 1) print("Answer is:"); ❷y = 1; ❹if (x == 1) print("42");

Can it print “Answer is: 42”? How about just “42”? Could it print nothing? Yes, eg: ❶❷❸❹ Yes, eg: ❶❸❹❷ Yes, eg: ❸❹❶❷

These executions obey Sequential Consistency (SC) [Lamport79], which requires that the results of the overall program correspond to some in-order interleaving of the statements from each individual thread.

slide-24
SLIDE 24

What does this program print?

Th Thread 0 Th Thread 1 ❶x = 1; ❸if (y == 1) print("Answer is:"); ❷y = 1; ❹if (x == 1) print("42");

How about “Answer is:”? ❷❶❸❹

slide-25
SLIDE 25

What does this program print?

Th Thread 0 Th Thread 1 ❶x = 1; ❸if (y == 1) print("Answer is:"); ❷y = 1; ❹if (x == 1) print("42");

How about “Answer is:”? ❷❶❸❹ It depends!

slide-26
SLIDE 26

What does this program print?

Th Thread 0 Th Thread 1 ❶x = 1; ❸if (y == 1) print("Answer is:"); ❷y = 1; ❹if (x == 1) print("42");

How about “Answer is:”? ❷❶❸❹ It depends!

NO!

slide-27
SLIDE 27

What does this program print?

Th Thread 0 Th Thread 1 ❶x = 1; ❸if (y == 1) print("Answer is:"); ❷y = 1; ❹if (x == 1) print("42");

How about “Answer is:”? ❷❶❸❹ It depends!

NO! YES!

slide-28
SLIDE 28

What does this program print?

Th Thread 0 Th Thread 1 ❶x = 1; ❸if (y == 1) print("Answer is:"); ❷y = 1; ❹if (x == 1) print("42");

How about “Answer is:”? ❷❶❸❹ It depends!

NO! YES!

Most processors today implement “weak memory models” that relax orderings required by SC!

slide-29
SLIDE 29

Why reorder memory operations?

Answer: Performance!

x: 0 y: 0 Memory Core 0

x = 1; y = 1;

Core 1

r1 = y; r2 = x; Core 0 Core 1 x = 1; y = 1; r1 = y; r2 = x; Can r1=1 and r2=0? Message Passing (mp)

Cache y: 0

slide-30
SLIDE 30

Why reorder memory operations?

Answer: Performance!

x: 0 y: 0 Memory Core 0

x = 1; y = 1;

Core 1

r1 = y; r2 = x; Core 0 Core 1 x = 1; y = 1; r1 = y; r2 = x; Can r1=1 and r2=0? Message Passing (mp)

Cache y: 0

Can improve performance by sending both stores to memory in parallel

slide-31
SLIDE 31

Why reorder memory operations?

Answer: Performance!

Memory Core 0 Core 1

r1 = y; r2 = x; x = 1; y = 1; Core 0 Core 1 x = 1; y = 1; r1 = y; r2 = x; Can r1=1 and r2=0? Message Passing (mp)

Cache y: 1 x: 0

Store to y finishes quickly in cache

slide-32
SLIDE 32

Why reorder memory operations?

Answer: Performance!

Memory Core 0 Core 1

x = 1; y = 1; r1 = y = 1; r2 = x; Core 0 Core 1 x = 1; y = 1; r1 = y; r2 = x; Can r1=1 and r2=0? Message Passing (mp)

Cache y: 1 x: 0

slide-33
SLIDE 33

Why reorder memory operations?

Answer: Performance!

Memory Core 0 Core 1

x = 1; y = 1; r1 = y = 1; r2 = x = 0; Core 0 Core 1 x = 1; y = 1; r1 = y; r2 = x; Can r1=1 and r2=0? Message Passing (mp)

Cache y: 1 x: 0 y: 1

slide-34
SLIDE 34

Why reorder memory operations?

Answer: Performance!

Memory Core 0 Core 1

x = 1; y = 1; r1 = y = 1; r2 = x = 0; Core 0 Core 1 x = 1; y = 1; r1 = y; r2 = x; Can r1=1 and r2=0? Message Passing (mp)

Cache y: 1 x: 1 y: 1

By the time store of x is complete, Core 1 has

  • bserved reordering!
slide-35
SLIDE 35

Why reorder memory operations?

Answer: Performance!

Memory Core 0 Core 1

r1 = y = 1; r2 = x = 0; Core 0 Core 1 x = 1; y = 1; r1 = y; r2 = x; Can r1=1 and r2=0? Message Passing (mp)

Cache y: 1 x: 1 y: 1

x = 1; FENCE y = 1; r1 = y = 1; r2 = x = 1;

Fence/synchronization instructions can enforce

  • rder between memory
  • perations where needed
slide-36
SLIDE 36

Memory Consistency Models (MCMs)

▪Instruction sets (ISAs) represent hardware operations (add, ld, st, …) ▪MCMs similarly represent the orderings among hardware memory ops

Compiler Hardware

slide-37
SLIDE 37

Memory Consistency Models (MCMs)

▪Instruction sets (ISAs) represent hardware operations (add, ld, st, …) ▪MCMs similarly represent the orderings among hardware memory ops

Where do I need to add fences? Compiler Hardware

slide-38
SLIDE 38

Memory Consistency Models (MCMs)

▪Instruction sets (ISAs) represent hardware operations (add, ld, st, …) ▪MCMs similarly represent the orderings among hardware memory ops

Where do I need to add fences? Compiler Hardware How much can I buffer and reorder memory operations?

slide-39
SLIDE 39

Memory Consistency Models (MCMs)

▪Instruction sets (ISAs) represent hardware operations (add, ld, st, …) ▪MCMs similarly represent the orderings among hardware memory ops

ISA-Level MCM (x86, ARMv8, RISC-V, etc) Where do I need to add fences? Compiler Hardware How much can I buffer and reorder memory operations?

slide-40
SLIDE 40

Memory Consistency Models (MCMs)

▪Instruction sets (ISAs) represent hardware operations (add, ld, st, …) ▪MCMs similarly represent the orderings among hardware memory ops

ISA-Level MCM (x86, ARMv8, RISC-V, etc) Where do I need to add fences? Compiler Hardware How much can I buffer and reorder memory operations?

In a nutshell: MCMs specify what value will be returned when your program does a load!

slide-41
SLIDE 41

Memory Consistency Models (MCMs)

JVM LLVM IR PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86 CPU ARM CPU Power CPU Nvidia GPU AMD GPU … … … Shared Virtual Memory

Memory Consistency Models (MCMs) Specify rules and guarantees about the ordering and visibility of accesses to shared memory [Sorin et al., 2011].

slide-42
SLIDE 42

Memory Consistency Models (MCMs)

JVM LLVM IR PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86 CPU ARM CPU Power CPU Nvidia GPU AMD GPU … … … Shared Virtual Memory

SW MCMs

Memory Consistency Models (MCMs) Specify rules and guarantees about the ordering and visibility of accesses to shared memory [Sorin et al., 2011].

slide-43
SLIDE 43

Memory Consistency Models (MCMs)

JVM LLVM IR PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86 CPU ARM CPU Power CPU Nvidia GPU AMD GPU … … … Shared Virtual Memory

HW MCMs

Memory Consistency Models (MCMs) Specify rules and guarantees about the ordering and visibility of accesses to shared memory [Sorin et al., 2011].

slide-44
SLIDE 44

Memory Consistency Models (MCMs)

JVM LLVM IR PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86 CPU ARM CPU Power CPU Nvidia GPU AMD GPU … … … Shared Virtual Memory

IR MCMs

Memory Consistency Models (MCMs) Specify rules and guarantees about the ordering and visibility of accesses to shared memory [Sorin et al., 2011].

slide-45
SLIDE 45

Interface (e.g. ISA-Level MCM)

The Need for MCM Verification

▪MCMs are specified at interfaces between layers of the stack

  • Upper layers target MCM; lower layers must maintain it for all programs!

Upper layer (e.g. Compiler) Lower layer (e.g. Microarchitecture1) Interface MCM (e.g. ISA-level MCM)

1Microarchitecture is a component-level (e.g. caches, pipeline stages, store buffers) model of the hardware.
slide-46
SLIDE 46

Interface (e.g. ISA-Level MCM)

The Need for MCM Verification

▪MCMs are specified at interfaces between layers of the stack

  • Upper layers target MCM; lower layers must maintain it for all programs!

Targets MCM of lower layer Upper layer (e.g. Compiler) Lower layer (e.g. Microarchitecture1) Interface MCM (e.g. ISA-level MCM)

1Microarchitecture is a component-level (e.g. caches, pipeline stages, store buffers) model of the hardware.
slide-47
SLIDE 47

Interface (e.g. ISA-Level MCM)

The Need for MCM Verification

▪MCMs are specified at interfaces between layers of the stack

  • Upper layers target MCM; lower layers must maintain it for all programs!

Targets MCM of lower layer Upper layer (e.g. Compiler) Lower layer (e.g. Microarchitecture1) Must maintain MCM of interface!

1Microarchitecture is a component-level (e.g. caches, pipeline stages, store buffers) model of the hardware.
slide-48
SLIDE 48

???

The Need for MCM Verification

▪MCMs are specified at interfaces between layers of the stack

  • Upper layers target MCM; lower layers must maintain it for all programs!

Targets MCM of lower layer Upper layer (e.g. Compiler) Lower layer (e.g. Microarchitecture1) Must maintain MCM of interface!

1Microarchitecture is a component-level (e.g. caches, pipeline stages, store buffers) model of the hardware.
slide-49
SLIDE 49

The Check Suite: Automated Tools For Verifying Memory Orderings and their Security Implications

High-Level Languages (HLL) Compiler Architecture (ISA) Microarchitecture OS RTL (e.g. Verilog) PipeCheck [Micro ‘14] [IEEE MICRO Top Picks] TriCheck [ASPLOS ‘17] [IEEE MICRO Top Picks] CCICheck [Micro ‘15] [Nominated for Best Paper Award] COATCheck [ASPLOS ‘16] [IEEE MICRO Top Picks] RTLCheck [Micro ‘17] [IEEE MICRO Top Picks Honorable Mention]
  • Axiomatic specifications -> Happens-before graphs
  • Cyclic => Impossible, Acyclic => Possible
  • Model Checking space of graphs using SMT solvers
  • Most tools written in Gallina => can be proven correct
A C B CheckMate [Micro ‘18] [IEEE Micro Top Picks] PipeProof [Micro ‘18] [Best Paper Nominee. IEEE Micro Top Picks Honorable Mention]

http://check.cs.princeton.edu

slide-50
SLIDE 50

The Check Suite: Automated Tools For Verifying Memory Orderings and their Security Implications

High-Level Languages (HLL) Compiler Architecture (ISA) Microarchitecture OS RTL (e.g. Verilog) PipeCheck [Micro ‘14] [IEEE MICRO Top Picks] TriCheck [ASPLOS ‘17] [IEEE MICRO Top Picks] CCICheck [Micro ‘15] [Nominated for Best Paper Award] COATCheck [ASPLOS ‘16] [IEEE MICRO Top Picks] RTLCheck [Micro ‘17] [IEEE MICRO Top Picks Honorable Mention]
  • Axiomatic specifications -> Happens-before graphs
  • Cyclic => Impossible, Acyclic => Possible
  • Model Checking space of graphs using SMT solvers
  • Most tools written in Gallina => can be proven correct
A C B CheckMate [Micro ‘18] [IEEE Micro Top Picks] PipeProof [Micro ‘18] [Best Paper Nominee. IEEE Micro Top Picks Honorable Mention]

http://check.cs.princeton.edu

So far, tools have found bugs in:
  • Widely-used Research simulator
  • Cache coherence paper
  • IBM XL C++ compiler (fixed in v13.1.5)
  • In-design commercial processors
  • RISC-V ISA specification
  • Open-source RTL (Verilog)
  • C++ 11 mem model
  • SpectrePrime, MeltdownPrime
slide-51
SLIDE 51

The Check Suite: Automated Tools For Verifying Memory Orderings and their Security Implications

High-Level Languages (HLL) Compiler Architecture (ISA) Microarchitecture OS RTL (e.g. Verilog) PipeCheck [Micro ‘14] [IEEE MICRO Top Picks] TriCheck [ASPLOS ‘17] [IEEE MICRO Top Picks] CCICheck [Micro ‘15] [Nominated for Best Paper Award] COATCheck [ASPLOS ‘16] [IEEE MICRO Top Picks] RTLCheck [Micro ‘17] [IEEE MICRO Top Picks Honorable Mention]
  • Axiomatic specifications -> Happens-before graphs
  • Cyclic => Impossible, Acyclic => Possible
  • Model Checking space of graphs using SMT solvers
  • Most tools written in Gallina => can be proven correct
A C B CheckMate [Micro ‘18] [IEEE Micro Top Picks] PipeProof [Micro ‘18] [Best Paper Nominee. IEEE Micro Top Picks Honorable Mention]

http://check.cs.princeton.edu

So far, tools have found bugs in:
  • Widely-used Research simulator
  • Cache coherence paper
  • IBM XL C++ compiler (fixed in v13.1.5)
  • In-design commercial processors
  • RISC-V ISA specification
  • Open-source RTL (Verilog)
  • C++ 11 mem model
  • SpectrePrime, MeltdownPrime
slide-52
SLIDE 52

The Check Suite: Automated Tools For Verifying Memory Orderings and their Security Implications

High-Level Languages (HLL) Compiler Architecture (ISA) Microarchitecture OS RTL (e.g. Verilog) PipeCheck [Micro ‘14] [IEEE MICRO Top Picks] TriCheck [ASPLOS ‘17] [IEEE MICRO Top Picks] CCICheck [Micro ‘15] [Nominated for Best Paper Award] COATCheck [ASPLOS ‘16] [IEEE MICRO Top Picks] RTLCheck [Micro ‘17] [IEEE MICRO Top Picks Honorable Mention]
  • Axiomatic specifications -> Happens-before graphs
  • Cyclic => Impossible, Acyclic => Possible
  • Model Checking space of graphs using SMT solvers
  • Most tools written in Gallina => can be proven correct
A C B CheckMate [Micro ‘18] [IEEE Micro Top Picks] PipeProof [Micro ‘18] [Best Paper Nominee. IEEE Micro Top Picks Honorable Mention]

http://check.cs.princeton.edu

So far, tools have found bugs in:
  • Widely-used Research simulator
  • Cache coherence paper
  • IBM XL C++ compiler (fixed in v13.1.5)
  • In-design commercial processors
  • RISC-V ISA specification
  • Open-source RTL (Verilog)
  • C++ 11 mem model
  • SpectrePrime, MeltdownPrime
slide-53
SLIDE 53

Talk Outline

▪Overview and Motivation ▪Memory Consistency Background ▪PipeProof: All-Program Microarchitectural MCM Verification ▪RTLCheck: MCM Verification of Verilog RTL ▪Expanding to other domains ▪Conclusion

14
slide-54
SLIDE 54

Microarchitectural MCM Verification

Mic icroarchit itecture

SC/TSO/RISC-V MCM?

?

Memory Hierarchy

WB EX IF WB EX IF

... ... Core 0 Core n

▪PipeProof proves that a microarchitecture respects its ISA MCM

  • For all possible programs!

▪How do we formally specify

  • ISA-level MCMs?
  • Microarchitectural orderings?
slide-55
SLIDE 55

▪MCMs often defined using relational patterns

  • [Shasha and Snir TOPLAS 1988] [Alglave et al. TOPLAS 2014]

▪ISA-level executions are graphs

  • nodes: instructions, edges: ISA-level relations

▪Eg: SC is 𝑏𝑑𝑧𝑑𝑚𝑗𝑑 𝑞𝑝 ∪ 𝑑𝑝 ∪ 𝑠𝑔 ∪ 𝑔𝑠 ▪Formal specifications of ISA + HLL MCMs in recent years

  • x86 [Owens et al. TPHOLS2009], ARM [Pulte et al. POPL2018], C11 [Batty et al. POPL 2011], …

▪Automated formal tools e.g. herd [Alglave et al. TOPLAS 2014]

  • Can formally analyse small test programs against these models

ISA-Level MCM Specifications

(i1) (i2) (i3) (i4) po po rf fr

Core 0 Core 1 (i1) x = 1; (i2) y = 1; (i3) r1 = y; (i4) r2 = x; SC Forbids: r1 = 1, r2 = 0 Legend: po = Program order co = coherence order rf = reads-from fr = from-reads 16 Message passing (mp) litmus test
slide-56
SLIDE 56

▪MCMs often defined using relational patterns

  • [Shasha and Snir TOPLAS 1988] [Alglave et al. TOPLAS 2014]

▪ISA-level executions are graphs

  • nodes: instructions, edges: ISA-level relations

▪Eg: SC is 𝑏𝑑𝑧𝑑𝑚𝑗𝑑 𝑞𝑝 ∪ 𝑑𝑝 ∪ 𝑠𝑔 ∪ 𝑔𝑠 ▪Formal specifications of ISA + HLL MCMs in recent years

  • x86 [Owens et al. TPHOLS2009], ARM [Pulte et al. POPL2018], C11 [Batty et al. POPL 2011], …

▪Automated formal tools e.g. herd [Alglave et al. TOPLAS 2014]

  • Can formally analyse small test programs against these models

ISA-Level MCM Specifications

(i1) (i2) (i3) (i4) po po rf fr

Core 0 Core 1 (i1) x = 1; (i2) y = 1; (i3) r1 = y; (i4) r2 = x; SC Forbids: r1 = 1, r2 = 0 Legend: po = Program order co = coherence order rf = reads-from fr = from-reads 16 Message passing (mp) litmus test
slide-57
SLIDE 57

▪MCMs often defined using relational patterns

  • [Shasha and Snir TOPLAS 1988] [Alglave et al. TOPLAS 2014]

▪ISA-level executions are graphs

  • nodes: instructions, edges: ISA-level relations

▪Eg: SC is 𝑏𝑑𝑧𝑑𝑚𝑗𝑑 𝑞𝑝 ∪ 𝑑𝑝 ∪ 𝑠𝑔 ∪ 𝑔𝑠 ▪Formal specifications of ISA + HLL MCMs in recent years

  • x86 [Owens et al. TPHOLS2009], ARM [Pulte et al. POPL2018], C11 [Batty et al. POPL 2011], …

▪Automated formal tools e.g. herd [Alglave et al. TOPLAS 2014]

  • Can formally analyse small test programs against these models

ISA-Level MCM Specifications

(i1) (i2) (i3) (i4) po po rf fr

Core 0 Core 1 (i1) x = 1; (i2) y = 1; (i3) r1 = y; (i4) r2 = x; SC Forbids: r1 = 1, r2 = 0 Legend: po = Program order co = coherence order rf = reads-from fr = from-reads 16 Message passing (mp) litmus test
slide-58
SLIDE 58

▪MCMs often defined using relational patterns

  • [Shasha and Snir TOPLAS 1988] [Alglave et al. TOPLAS 2014]

▪ISA-level executions are graphs

  • nodes: instructions, edges: ISA-level relations

▪Eg: SC is 𝑏𝑑𝑧𝑑𝑚𝑗𝑑 𝑞𝑝 ∪ 𝑑𝑝 ∪ 𝑠𝑔 ∪ 𝑔𝑠 ▪Formal specifications of ISA + HLL MCMs in recent years

  • x86 [Owens et al. TPHOLS2009], ARM [Pulte et al. POPL2018], C11 [Batty et al. POPL 2011], …

▪Automated formal tools e.g. herd [Alglave et al. TOPLAS 2014]

  • Can formally analyse small test programs against these models

ISA-Level MCM Specifications

(i1) (i2) (i3) (i4) po po rf fr

Core 0 Core 1 (i1) x = 1; (i2) y = 1; (i3) r1 = y; (i4) r2 = x; SC Forbids: r1 = 1, r2 = 0 Legend: po = Program order co = coherence order rf = reads-from fr = from-reads 16 Message passing (mp) litmus test
slide-59
SLIDE 59

▪Developed by PipeCheck [Lustig et al. MICRO 2014] ▪Microarchitecture performs instrs. in stages ▪Microarchitectural executions are µhb graphs

  • Nodes: instr. sub-events, edges: happens-before relationships

▪Cyclic µhb graph → unobservable, Acyclic → observable

Microarchitectural Happens-Before (µhb) Graphs

Legend: IF = Fetch EX = Execute WB = Writeback Message passing (mp) litmus test Core 0 Core 1 (i1) x = 1; (i2) y = 1; (i3) r1 = y; (i4) r2 = x; SC Forbids: r1 = 1, r2 = 0 17

(i1) (i2) (i3) (i4) po po rf fr

Memory Hierarchy

WB EX IF WB EX IF

simpleSC microarchitecture ... ... Core 0 Core n

slide-60
SLIDE 60

▪Developed by PipeCheck [Lustig et al. MICRO 2014] ▪Microarchitecture performs instrs. in stages ▪Microarchitectural executions are µhb graphs

  • Nodes: instr. sub-events, edges: happens-before relationships

▪Cyclic µhb graph → unobservable, Acyclic → observable

Microarchitectural Happens-Before (µhb) Graphs

IF EX WB

Legend: IF = Fetch EX = Execute WB = Writeback Message passing (mp) litmus test Core 0 Core 1 (i1) x = 1; (i2) y = 1; (i3) r1 = y; (i4) r2 = x; SC Forbids: r1 = 1, r2 = 0 17

(i1) (i2) (i3) (i4) po po rf fr

Memory Hierarchy

WB EX IF WB EX IF

simpleSC microarchitecture ... ... Core 0 Core n

slide-61
SLIDE 61

▪Developed by PipeCheck [Lustig et al. MICRO 2014] ▪Microarchitecture performs instrs. in stages ▪Microarchitectural executions are µhb graphs

  • Nodes: instr. sub-events, edges: happens-before relationships

▪Cyclic µhb graph → unobservable, Acyclic → observable

Microarchitectural Happens-Before (µhb) Graphs

IF EX WB

Legend: IF = Fetch EX = Execute WB = Writeback Message passing (mp) litmus test Core 0 Core 1 (i1) x = 1; (i2) y = 1; (i3) r1 = y; (i4) r2 = x; SC Forbids: r1 = 1, r2 = 0 17

(i1) (i2) (i3) (i4) po po rf fr

Memory Hierarchy

WB EX IF WB EX IF

simpleSC microarchitecture ... ... Core 0 Core n

slide-62
SLIDE 62

▪Developed by PipeCheck [Lustig et al. MICRO 2014] ▪Microarchitecture performs instrs. in stages ▪Microarchitectural executions are µhb graphs

  • Nodes: instr. sub-events, edges: happens-before relationships

▪Cyclic µhb graph → unobservable, Acyclic → observable

Microarchitectural Happens-Before (µhb) Graphs

IF EX WB

Legend: IF = Fetch EX = Execute WB = Writeback Message passing (mp) litmus test Core 0 Core 1 (i1) x = 1; (i2) y = 1; (i3) r1 = y; (i4) r2 = x; SC Forbids: r1 = 1, r2 = 0 17

(i1) (i2) (i3) (i4) po po rf fr

Memory Hierarchy

WB EX IF WB EX IF

simpleSC microarchitecture ... ... Core 0 Core n

slide-63
SLIDE 63

Microarchitectural MCM Verification

Mic icroarchit itecture

SC/TSO/RISC-V MCM?

?

Memory Hierarchy

WB EX IF WB EX IF

... ... Core 0 Core n

slide-64
SLIDE 64

Mic icroarchit itecture Speci cific icati tion in in μSpec DS DSL

Microarchitectural MCM Verification

SC/TSO/RISC-V MCM?

?

Memory Hierarchy

WB EX IF WB EX IF

... ... Core 0 Core n

Axiom "PO_Fetch": forall microops "i1", forall microops "i2", SameCore i1 i2 /\ ProgramOrder i1 i2 => AddEdge ((i1, Fetch), (i2, Fetch), "PO"). Axiom "Execute_stage_is_in_order": forall microops "i1", ...

▪µSpec DSL [Lustig et al. ASPLOS 2016] is similar to first-order logic (FOL)

  • forall, exists, AND (/\), OR (\/), NOT (~), implication (=>)
  • Has built-in predicates which take memory operations as input

− e.g. ProgramOrder i j where i and j are loads/stores

  • Predicates can reference nodes and edges (µhb edges closed under transitivity)

− e.g. EdgeExists ((i1, Fetch), (i2, Fetch))

slide-65
SLIDE 65

▪PipeProof verifies that a microarchitecture correctly respects its ISA MCM across all possible programs

  • Early-stage design-time verification (i.e. before RTL)
  • Microarch. and

ISA MCM Specs All-Program MCM Correctness Proof!

PipeProof

High-Level Languages (HLL) Compiler Instruction Set (ISA) Microarchitecture Processor RTL (Verilog)

PipeProof: Automated All-Program MCM Verif.

[Yatin A. Manerkar, Daniel Lustig, Margaret Martonosi, and Aarti Gupta. PipeProof: Automated Memory Consistency Proofs for Microarchitectural Specifications. The 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), October 2018.] 19
  • Aux. Inputs

(e.g. Mappings)

slide-66
SLIDE 66

Verifying Across All Possible Programs

▪Are all forbidden programs microarchitecturally unobservable?

  • If so, then microarchitecture is correct

▪Infinite number of forbidden programs

  • E.g.: For SC, must check all possibilities of 𝑑𝑧𝑑𝑚𝑗𝑑(𝑞𝑝 ∪ 𝑑𝑝 ∪ 𝑠𝑔 ∪ 𝑔𝑠)

▪Prove using abstractions and induction

  • Based on Counterexample-guided abstraction refinement [Clarke et al. CAV 2000]
20
slide-67
SLIDE 67

Verifying Across All Possible Programs

▪Are all forbidden programs microarchitecturally unobservable?

  • If so, then microarchitecture is correct

▪Infinite number of forbidden programs

  • E.g.: For SC, must check all possibilities of 𝑑𝑧𝑑𝑚𝑗𝑑(𝑞𝑝 ∪ 𝑑𝑝 ∪ 𝑠𝑔 ∪ 𝑔𝑠)

▪Prove using abstractions and induction

  • Based on Counterexample-guided abstraction refinement [Clarke et al. CAV 2000]

i1

rf

i2

po

i1 i3

fr

i2

po co po rf

i1 i3

co

i2 i4

po co rf

i1 i3

fr

i2 i4

po

20
slide-68
SLIDE 68

All non-unary cycles containing fr (Infinite set)

i1

fr

i2

po

i1 i3

fr

i2

po co po rf

i1 i3

fr

i2 i4

po co rf

i1 i3

fr

i2 i4

po …

The Transitive Chain (TC) Abstraction

21
slide-69
SLIDE 69

All non-unary cycles containing fr (Infinite set)

i1

fr

i2

po

i1 i3

fr

i2

po co po rf

i1 i3

fr

i2 i4

po co rf

i1 i3

fr

i2 i4

po …

The Transitive Chain (TC) Abstraction

Cycle = Transitive Chain (sequence) + Loopback edge (fr)

21
slide-70
SLIDE 70

i1 in

r1…n-1 fr All non-unary cycles containing fr (Infinite set)

i1

fr

i2

po

i1 i3

fr

i2

po co po rf

i1 i3

fr

i2 i4

po co rf

i1 i3

fr

i2 i4

po …

The Transitive Chain (TC) Abstraction

Transitive chain (sequence)

  • f ISA-level edges

Cycle = Transitive Chain (sequence) + Loopback edge (fr)

21
slide-71
SLIDE 71

i1 in

r1…n-1 fr All non-unary cycles containing fr (Infinite set)

i1

fr

i2

po

i1 i3

fr

i2

po co po rf

i1 i3

fr

i2 i4

po co rf

i1 i3

fr

i2 i4

po …

Some µhb edge from i1 to in (transitive connection)

IF EX WB

The Transitive Chain (TC) Abstraction

Cycle = Transitive Chain (sequence) + Loopback edge (fr) ISA-level transitive chain =>

  • Microarch. level transitive connection
21
slide-72
SLIDE 72

i1

fr

i2

po

i1 i3

fr

i2

po co po rf

i1 i3

fr

i2 i4

po co rf

i1 i3

fr

i2 i4

po

The Transitive Chain (TC) Abstraction

22

Infinite!

slide-73
SLIDE 73

Using TC Abstraction

i1

fr

i2

po

i1 i3

fr

i2

po co po rf

i1 i3

fr

i2 i4

po co rf

i1 i3

fr

i2 i4

po

The Transitive Chain (TC) Abstraction

22

Finite! Infinite!

i1 in

r1…n-1 fr

Some µhb edge from i1 to in (transitive connection)

IF EX WB 3 x 3 = 9 possible transitive connections from i1 to in

slide-74
SLIDE 74

Using TC Abstraction

i1

fr

i2

po

i1 i3

fr

i2

po co po rf

i1 i3

fr

i2 i4

po co rf

i1 i3

fr

i2 i4

po

i1 in IF EX WB fr i1 in IF EX WB fr i1 in IF EX WB fr i1 in IF EX WB fr i1 in IF EX WB fr i1 in IF EX WB fr i1 in IF EX WB fr i1 in IF EX WB fr i1 in IF EX WB fr

The Transitive Chain (TC) Abstraction

22

Finite! Infinite!

slide-75
SLIDE 75

Using TC Abstraction

i1

fr

i2

po

i1 i3

fr

i2

po co po rf

i1 i3

fr

i2 i4

po co rf

i1 i3

fr

i2 i4

po

i1 in IF EX WB fr i1 in IF EX WB fr i1 in IF EX WB fr i1 in IF EX WB fr i1 in IF EX WB fr i1 in IF EX WB fr i1 in IF EX WB fr i1 in IF EX WB fr i1 in IF EX WB fr

Abstraction soundness automatically verified as a supporting proof!

The Transitive Chain (TC) Abstraction

22

Finite! Infinite!

slide-76
SLIDE 76 i1 in fr Some µhb edge from i1 to in (transitive connection) All possible transitive connections Other ISA-level cycles… i1 in po Some µhb edge from i1 to in (transitive connection)

Cycles containing fr Cycles containing po

Microarchitectural Correctness Proof

23
slide-77
SLIDE 77 i1 in IF EX WB fr ✓

NoDecomp

i1 in fr Some µhb edge from i1 to in (transitive connection) All possible transitive connections Other transitive connections… Other ISA-level cycles… i1 in po Some µhb edge from i1 to in (transitive connection)

Cycles containing fr Cycles containing po

Microarchitectural Correctness Proof

23
slide-78
SLIDE 78 i1 in IF EX WB fr ?

AbsCounterX

i1 in IF EX WB fr ✓

NoDecomp

i1 in fr Some µhb edge from i1 to in (transitive connection) All possible transitive connections Other transitive connections… Other ISA-level cycles… i1 in po Some µhb edge from i1 to in (transitive connection)

Cycles containing fr Cycles containing po Acyclic graph with transitive connection => Abstract Counterexample (i.e. possible bug)

Microarchitectural Correctness Proof

23
slide-79
SLIDE 79 i1 in IF EX WB fr ?

AbsCounterX

i1 in IF EX WB fr ✓

NoDecomp

i1 in fr Some µhb edge from i1 to in (transitive connection) All possible transitive connections Other transitive connections… Other ISA-level cycles… i1 in po Some µhb edge from i1 to in (transitive connection)

Cycles containing fr Cycles containing po Transitive connection (green edge) may represent one or multiple ISA-level edges

Microarchitectural Correctness Proof

23
slide-80
SLIDE 80 i1 in IF EX WB fr ?

AbsCounterX

i1 in IF EX WB fr ✓

NoDecomp

i1 in fr Some µhb edge from i1 to in (transitive connection) Try to Concretize (Replace transitive connection with one ISA-level edge) Microarch Buggy, Return Counterexample Observable All possible transitive connections Other transitive connections… Other ISA-level cycles… i1 in po Some µhb edge from i1 to in (transitive connection)

Cycles containing fr Cycles containing po Transitive connection (green edge) may represent one or multiple ISA-level edges

Microarchitectural Correctness Proof

23
slide-81
SLIDE 81 i1 in IF EX WB fr ?

AbsCounterX

i1 in IF EX WB fr ✓

NoDecomp

i1 in fr Some µhb edge from i1 to in (transitive connection) Try to Concretize (Replace transitive connection with one ISA-level edge) Unobs. Microarch Buggy, Return Counterexample Observable Consider all Decompositions (Inductively break down Transitive Chain) All possible transitive connections Other transitive connections… Other ISA-level cycles… i1 in po Some µhb edge from i1 to in (transitive connection)

Cycles containing fr Cycles containing po Transitive connection (green edge) may represent one or multiple ISA-level edges

Microarchitectural Correctness Proof

23
slide-82
SLIDE 82 i1 in IF EX WB fr ?

AbsCounterX

i1 in IF EX WB fr ✓

NoDecomp

i1 in fr Some µhb edge from i1 to in (transitive connection) Try to Concretize (Replace transitive connection with one ISA-level edge) Unobs. Microarch Buggy, Return Counterexample Observable Consider all Decompositions (Inductively break down Transitive Chain) All possible transitive connections Other transitive connections… Other ISA-level cycles…

“Refinement Loop”

i1 in po Some µhb edge from i1 to in (transitive connection)

Cycles containing fr Cycles containing po Transitive connection (green edge) may represent one or multiple ISA-level edges

Microarchitectural Correctness Proof

23
slide-83
SLIDE 83 i1 in IF EX WB fr ?

AbsCounterX

Refinement Loop: Concretization

▪Replaces transitive connection with a single ISA-level edge

  • All concretizations must be unobservable
  • Observable concretizations are counterexamples (bugs)
24
slide-84
SLIDE 84 i1 in IF EX WB fr

Refinement Loop: Concretization

▪Replaces transitive connection with a single ISA-level edge

  • All concretizations must be unobservable
  • Observable concretizations are counterexamples (bugs)
rf 24
slide-85
SLIDE 85 i1 in IF EX WB fr po … i1 in IF EX WB fr

Refinement Loop: Concretization

▪Replaces transitive connection with a single ISA-level edge

  • All concretizations must be unobservable
  • Observable concretizations are counterexamples (bugs)
rf 24
slide-86
SLIDE 86 p i1 IF EX WB r q in fr ?

AbsCounterX

Refinement Loop: Decomposition

▪Inductively break down transitive chain

  • Additional constraints may be enough to make execution unobservable
25

factorial(n) factorial(n-1) * = n

slide-87
SLIDE 87 p i1 IF EX WB r q in fr ?

AbsCounterX

Refinement Loop: Decomposition

▪Inductively break down transitive chain

  • Additional constraints may be enough to make execution unobservable
25

factorial(n) factorial(n-1) * = n Chain of length n Chain of length n-1 “Peeled-off” edge = +

slide-88
SLIDE 88 p i1 IF EX WB r q in fr

Refinement Loop: Decomposition

▪Inductively break down transitive chain

  • Additional constraints may be enough to make execution unobservable
p i1

s

in-1 IF EX WB rf r q in fr 25

factorial(n) factorial(n-1) * = n Chain of length n Chain of length n-1 “Peeled-off” edge = +

slide-89
SLIDE 89

p i1 IF EX WB r q in fr p i1 t i2 IF EX WB co r q in fr

Refinement Loop: Decomposition

▪Inductively break down transitive chain

  • Additional constraints may be enough to make execution unobservable
p i1

s

in-1 IF EX WB rf r q in fr 25

factorial(n) factorial(n-1) * = n Chain of length n Chain of length n-1 “Peeled-off” edge = +

slide-90
SLIDE 90

p i1 IF EX WB r q in fr p i1 t i2 IF EX WB co r q in fr

✓ ?

Refinement Loop: Decomposition

▪Inductively break down transitive chain

  • Additional constraints may be enough to make execution unobservable
p i1

s

in-1 IF EX WB rf r q in fr

If decomposition is abstract counterexample, repeat concretization and decomposition!

25

factorial(n) factorial(n-1) * = n Chain of length n Chain of length n-1 “Peeled-off” edge = +

slide-91
SLIDE 91

simpleTSO simpleTSO (w/ Covering Sets + Memoization) Total Time Timeout 2449.7 sec (≈ 41 mins) simpleSC simpleSC (w/ Covering Sets + Memoization) Total Time 225.9 sec 19.1 sec

Results

▪Ran PipeProof on simpleSC (SC) and simpleTSO (TSO1) µarches

  • 3-stage in-order pipelines

▪TSO verification made feasible by optimizations

  • Explicitly checking all decompositions => case explosion
  • Covering Sets Optimization (eliminate redundant transitive connections)
  • Memoization (eliminate previously checked ISA-level cycles)
26 1TSO (Total Store Order) is the MCM of Intel x86 processors. It relaxes Store->Load ordering.
slide-92
SLIDE 92

PipeProof Takeaways

▪First Ever Automated All-Program Microarchitectural MCM Verification

  • Designers get both completeness and automation of verification
  • Engineers can verify microarchitectures themselves, before RTL is written!

▪Based on techniques from formal methods (CEGAR) [Clarke et al. CAV 2000] ▪Transitive Chain (TC) Abstraction models infinite set of executions ▪Accolades:

  • Nominated for Best Paper at MICRO 2018
  • “Honorable Mention” in 2018 IEEE Micro Top Picks of Comp. Arch. Conferences
27
slide-93
SLIDE 93

Talk Outline

▪Overview and Motivation ▪Memory Consistency Background ▪PipeProof: All-Program Microarchitectural MCM Verification ▪RTLCheck: MCM Verification of Verilog RTL ▪Expanding to other domains ▪Conclusion

28
slide-94
SLIDE 94 29

Microarchitectural Orderings

Verified with PipeProof

What if I want to verify RTL (Verilog)?

po rf i1 i3 fr i2 i4 po (i2) (i1) IF EX WB (i3) (i4)

ISA-Level MCM

Axiom "PO_Fetch": forall microop "i1", "i2", SameCore i1 i2 /\ ProgramOrder i1 i2 => AddEdge ((i1, IF), (i2, IF)). ... acyclic (po U co U rf U fr)
slide-95
SLIDE 95 29

RTL implementation (Verilog)

[RTL Image: Christopher Batten]

Microarchitectural Orderings

Verified with PipeProof

What if I want to verify RTL (Verilog)?

po rf i1 i3 fr i2 i4 po (i2) (i1) IF EX WB (i3) (i4)

ISA-Level MCM

Axiom "PO_Fetch": forall microop "i1", "i2", SameCore i1 i2 /\ ProgramOrder i1 i2 => AddEdge ((i1, IF), (i2, IF)). ... acyclic (po U co U rf U fr)

?

slide-96
SLIDE 96 29

RTL implementation (Verilog)

[RTL Image: Christopher Batten]

Microarchitectural Orderings

Verified with PipeProof

What if I want to verify RTL (Verilog)?

po rf i1 i3 fr i2 i4 po (i2) (i1) IF EX WB (i3) (i4)

ISA-Level MCM

Axiom "PO_Fetch": forall microop "i1", "i2", SameCore i1 i2 /\ ProgramOrder i1 i2 => AddEdge ((i1, IF), (i2, IF)). ... acyclic (po U co U rf U fr)

?

✓ 

slide-97
SLIDE 97 [Yatin A. Manerkar, Daniel Lustig, Margaret Martonosi, and Michael Pellauer. RTLCheck: Verifying the Memory Consistency of RTL Designs. The 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), October 2017.]

▪RTLCheck enables automated checking of Verilog RTL against µspec axioms for litmus test suites

RTLCheck: Checking RTL Consistency Orderings

High-Level Languages (HLL) Compiler Instruction Set (ISA) Microarchitecture Processor RTL (Verilog) 30

Mapping Functions

RTLCheck

Axiom "PO_Fetch": forall microop "i1", "i2", SameCore i1 i2 /\ ProgramOrder i1 i2 => AddEdge ((i1, IF), (i2, IF)). assert property @(posedge clk) (...) ... Core 0 Core 1 x = 1; y = 1; r1 = y; r2 = x; Litmus Test µspec axioms Test-specific Temporal RTL Properties
slide-98
SLIDE 98 [Yatin A. Manerkar, Daniel Lustig, Margaret Martonosi, and Michael Pellauer. RTLCheck: Verifying the Memory Consistency of RTL Designs. The 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), October 2017.]

▪RTLCheck enables automated checking of Verilog RTL against µspec axioms for litmus test suites

RTLCheck: Checking RTL Consistency Orderings

High-Level Languages (HLL) Compiler Instruction Set (ISA) Microarchitecture Processor RTL (Verilog) 30

Mapping Functions

RTLCheck

Axiom "PO_Fetch": forall microop "i1", "i2", SameCore i1 i2 /\ ProgramOrder i1 i2 => AddEdge ((i1, IF), (i2, IF)). assert property @(posedge clk) (...) ... Core 0 Core 1 x = 1; y = 1; r1 = y; r2 = x; Litmus Test µspec axioms Test-specific Temporal RTL Properties
slide-99
SLIDE 99

SystemVerilog Assertions (SVA)

▪SVA: Industry standard for RTL verification, e.g.: ARM [Reid et al. CAV 2016]

  • Based on Linear Temporal Logic (LTL) with regular operators

▪Commercial tools (e.g. JasperGold) can formally verify SVA assertions ▪Translating µspec to SVA => RTL MCM verification using industry flows ▪But it’s not that simple!

31 assert property @(posedge clk) (...) ... SVA Assertions RTL Impl.

Cadence JasperGold

Assertion Proven? Counterexample found?
slide-100
SLIDE 100

Meaning can be Lost in Translation!

小心地滑

(Caution: Slippery Floor)

slide-101
SLIDE 101

Meaning can be Lost in Translation!

[Image: Barbara Younger] [Inspiration: Tae Jun Ham]

小心地滑

(Caution: Slippery Floor)

slide-102
SLIDE 102

The µspec/SVA Mismatch

▪Tricky to translate µspec to SVA while maintaining µspec semantics ▪SVA Verifiers (JasperGold) don’t implement full SVA spec!

  • Causes further complications

▪Example: Outcome Filtering

  • Filtering litmus test executions to those that have particular values for loads
slide-103
SLIDE 103

Outcome Filtering with Execution as a Single Unit

▪In this case, outcome filtering is easy and efficient ▪Know load values, so can draw (red) edges based on these values

  • Example: i4 reads 0 => i4 must read mem before write i1
mp litmus test Core 0 Core 1 (i1) x = 1; (i2) y = 1; (i3) r1 = y; (i4) r2 = x;

IF EX WB (i1) (i2) (i3) (i4)

slide-104
SLIDE 104

Outcome Filtering with Execution as a Single Unit

▪In this case, outcome filtering is easy and efficient ▪Know load values, so can draw (red) edges based on these values

  • Example: i4 reads 0 => i4 must read mem before write i1
SC Forbids: r1 = 1, r2 = 0 mp litmus test Core 0 Core 1 (i1) x = 1; (i2) y = 1; (i3) r1 = y; (i4) r2 = x;

IF EX WB (i1) (i2) (i3) (i4)

slide-105
SLIDE 105

Outcome Filtering with Execution as a Single Unit

▪In this case, outcome filtering is easy and efficient ▪Know load values, so can draw (red) edges based on these values

  • Example: i4 reads 0 => i4 must read mem before write i1
SC Forbids: r1 = 1, r2 = 0 mp litmus test Core 0 Core 1 (i1) x = 1; (i2) y = 1; (i3) r1 = y; (i4) r2 = x;

IF EX WB (i1) (i2) (i3) (i4)

slide-106
SLIDE 106

Outcome Filtering with Execution as a Single Unit

▪In this case, outcome filtering is easy and efficient ▪Know load values, so can draw (red) edges based on these values

  • Example: i4 reads 0 => i4 must read mem before write i1
SC Forbids: r1 = 1, r2 = 0 mp litmus test Core 0 Core 1 (i1) x = 1; (i2) y = 1; (i3) r1 = y; (i4) r2 = x;

IF EX WB (i1) (i2) (i3) (i4)

slide-107
SLIDE 107

▪In temporal logic syntax (G = always, F = eventually), this becomes: ▪Assumptions introduce liveness: expensive to check! [Cerny et al. 2010] ▪SVA verifiers approximate: only check assumptions until current state

  • This results in a property which is easier to check…
  • …but makes outcome filtering impossible with such verifiers!

▪RTLCheck Solution: Generate properties that handle all test outcomes

Outcome Filtering with Temporal Logic

35

assume property (a); // e.g. Load i4 returns 0 assert property (b); // e.g. i4 reads mem before write i1 //The above is equivalent to... assert property ((always a) implies (always b)); G a -> G b = (~(G a)) \/ G b = (F ~a) \/ G b

slide-108
SLIDE 108

▪First automated RTL MCM verification for litmus test suites

  • Engineers can check MCM properties of their RTL themselves
  • Compatible with existing industry flows and tools

▪Novel algorithms to translate µspec axioms to temporal SVA properties

  • Ongoing work: Formalise mismatch between µspec and SVA

▪Discovered bug in memory implementation of RISC-V V-scale processor ▪Accolades:

  • “Honorable Mention” in 2017 IEEE Micro Top Picks of Comp. Arch. Conferences

RTLCheck Takeaways

36
slide-109
SLIDE 109

Talk Outline

▪Overview and Motivation ▪Background on MCM Specification and Verification ▪PipeProof: All-Program Microarchitectural MCM Verification ▪RTLCheck: MCM Verification of Verilog RTL ▪Expanding to other domains ▪Conclusion

37
slide-110
SLIDE 110

Security Analysis with CheckMate [Trippel et al. MICRO 2018]

38

▪Work by another member of our research group (Caroline Trippel) ▪Her key insight: µhb graphs can be used for reasoning about security!

CheckMate

Hardware Exploit

  • Prog. Synthesis
Microarchitecture + OS Specification in Alloy Exploit Pattern Specification

prime probe ViCL Create ViCL Expire

Attacker T0 on C0 Attacker T1 on C1 R [VA1]à0 R [VA1]à0 R [VA0] à r1 W [f(r1)=VA1] à 0 R [VA1]à0 VA to PA Mapping: VA1:(PA1:A), VA0:(PA0:V) VA to Cache Index Mapping: VA1:IDX0, VA0:IDX1 Attacker T0 on C0 Attacker T1 on C1 R [VA1]à0 W [VA1]à0 R [VA0] à r1 W [f(r1)=VA1] à 0 R [VA1]à0 VA to PA Mapping: VA1:(PA1:A), VA0:(PA0:V) VA to Cache Index Mapping: VA1:IDX0, VA0:IDX1 Attacker T0 on C0 CLFLUSH [VA2]à0 R [VA1] à r1 R [f(r1)=VA2] à 0 R [VA2]à0 A to PA Mapping: VA2:(PA1:A), VA1:(PA0:V) VA to Cache Index Mapping: VA2:IDX0, VA1:IDX1 Victim T0 on C0 Attacker T1 on C1 R [VA1]à0 W [VA0] à r1 R [VA1]à0 VA to PA Mapping: VA1:(PA1:A), VA0:(PA0:V) VA to Cache Index Mapping: VA1:IDX0, VA0:IDX0 Victim T0 on C0 Attacker T1 on C1 W [VA1]à0 R [VA0] à r1 R [VA1]à0 VA to PA Mapping: VA1:(PA1:A), VA0:(PA0:V) VA to Cache Index Mapping: VA1:IDX0, VA0:IDX0 Victim T0 on C0 Attacker T1 on C1 R [VA1]à0 R [VA0] à r1 R [VA1]à0 VA to PA Mapping: VA1:(PA1:A), VA0:(PA0:V) VA to Cache Index Mapping: VA1:IDX0, VA0:IDX0 Exploits synthesized from µhb analysis fact Program_Order_Fetch { all disj e0, e1 : Event | ProgramOrder[e0, e1] => EdgeExists[e0, Fetch, e1, Fetch, uhb_inter] } fact In_Order_Decode { all disj e0, e1 : Event | EdgeExists[e0, Fetch, e1, Fetch, uhb_inter] => EdgeExists[e0, Decode, e1, Decode, uhb_inter] } [CheckMate: Automated Exploit Program Generation for Hardware Security Verification. Caroline Trippel, Daniel Lustig, and Margaret Martonosi. In Proceedings of the 51st International Symposium on Microarchitecture (MICRO), October 2018.]
slide-111
SLIDE 111

Security Analysis with CheckMate [Trippel et al. MICRO 2018]

38

▪Work by another member of our research group (Caroline Trippel) ▪Her key insight: µhb graphs can be used for reasoning about security!

CheckMate

Hardware Exploit

  • Prog. Synthesis
Microarchitecture + OS Specification in Alloy Exploit Pattern Specification

prime probe ViCL Create ViCL Expire

Attacker T0 on C0 Attacker T1 on C1 R [VA1]à0 R [VA1]à0 R [VA0] à r1 W [f(r1)=VA1] à 0 R [VA1]à0 VA to PA Mapping: VA1:(PA1:A), VA0:(PA0:V) VA to Cache Index Mapping: VA1:IDX0, VA0:IDX1 Attacker T0 on C0 Attacker T1 on C1 R [VA1]à0 W [VA1]à0 R [VA0] à r1 W [f(r1)=VA1] à 0 R [VA1]à0 VA to PA Mapping: VA1:(PA1:A), VA0:(PA0:V) VA to Cache Index Mapping: VA1:IDX0, VA0:IDX1 Attacker T0 on C0 CLFLUSH [VA2]à0 R [VA1] à r1 R [f(r1)=VA2] à 0 R [VA2]à0 A to PA Mapping: VA2:(PA1:A), VA1:(PA0:V) VA to Cache Index Mapping: VA2:IDX0, VA1:IDX1 Victim T0 on C0 Attacker T1 on C1 R [VA1]à0 W [VA0] à r1 R [VA1]à0 VA to PA Mapping: VA1:(PA1:A), VA0:(PA0:V) VA to Cache Index Mapping: VA1:IDX0, VA0:IDX0 Victim T0 on C0 Attacker T1 on C1 W [VA1]à0 R [VA0] à r1 R [VA1]à0 VA to PA Mapping: VA1:(PA1:A), VA0:(PA0:V) VA to Cache Index Mapping: VA1:IDX0, VA0:IDX0 Victim T0 on C0 Attacker T1 on C1 R [VA1]à0 R [VA0] à r1 R [VA1]à0 VA to PA Mapping: VA1:(PA1:A), VA0:(PA0:V) VA to Cache Index Mapping: VA1:IDX0, VA0:IDX0 Exploits synthesized from µhb analysis fact Program_Order_Fetch { all disj e0, e1 : Event | ProgramOrder[e0, e1] => EdgeExists[e0, Fetch, e1, Fetch, uhb_inter] } fact In_Order_Decode { all disj e0, e1 : Event | EdgeExists[e0, Fetch, e1, Fetch, uhb_inter] => EdgeExists[e0, Decode, e1, Decode, uhb_inter] }

Includes new exploits! (SpectrePrime, MeltdownPrime)

[CheckMate: Automated Exploit Program Generation for Hardware Security Verification. Caroline Trippel, Daniel Lustig, and Margaret Martonosi. In Proceedings of the 51st International Symposium on Microarchitecture (MICRO), October 2018.]
slide-112
SLIDE 112

Security Analysis with CheckMate [Trippel et al. MICRO 2018]

38

▪Work by another member of our research group (Caroline Trippel) ▪Her key insight: µhb graphs can be used for reasoning about security!

CheckMate

Hardware Exploit

  • Prog. Synthesis
Microarchitecture + OS Specification in Alloy Exploit Pattern Specification

prime probe ViCL Create ViCL Expire

Attacker T0 on C0 Attacker T1 on C1 R [VA1]à0 R [VA1]à0 R [VA0] à r1 W [f(r1)=VA1] à 0 R [VA1]à0 VA to PA Mapping: VA1:(PA1:A), VA0:(PA0:V) VA to Cache Index Mapping: VA1:IDX0, VA0:IDX1 Attacker T0 on C0 Attacker T1 on C1 R [VA1]à0 W [VA1]à0 R [VA0] à r1 W [f(r1)=VA1] à 0 R [VA1]à0 VA to PA Mapping: VA1:(PA1:A), VA0:(PA0:V) VA to Cache Index Mapping: VA1:IDX0, VA0:IDX1 Attacker T0 on C0 CLFLUSH [VA2]à0 R [VA1] à r1 R [f(r1)=VA2] à 0 R [VA2]à0 A to PA Mapping: VA2:(PA1:A), VA1:(PA0:V) VA to Cache Index Mapping: VA2:IDX0, VA1:IDX1 Victim T0 on C0 Attacker T1 on C1 R [VA1]à0 W [VA0] à r1 R [VA1]à0 VA to PA Mapping: VA1:(PA1:A), VA0:(PA0:V) VA to Cache Index Mapping: VA1:IDX0, VA0:IDX0 Victim T0 on C0 Attacker T1 on C1 W [VA1]à0 R [VA0] à r1 R [VA1]à0 VA to PA Mapping: VA1:(PA1:A), VA0:(PA0:V) VA to Cache Index Mapping: VA1:IDX0, VA0:IDX0 Victim T0 on C0 Attacker T1 on C1 R [VA1]à0 R [VA0] à r1 R [VA1]à0 VA to PA Mapping: VA1:(PA1:A), VA0:(PA0:V) VA to Cache Index Mapping: VA1:IDX0, VA0:IDX0 Exploits synthesized from µhb analysis fact Program_Order_Fetch { all disj e0, e1 : Event | ProgramOrder[e0, e1] => EdgeExists[e0, Fetch, e1, Fetch, uhb_inter] } fact In_Order_Decode { all disj e0, e1 : Event | EdgeExists[e0, Fetch, e1, Fetch, uhb_inter] => EdgeExists[e0, Decode, e1, Decode, uhb_inter] }

ViCL abstraction [Manerkar

et al. MICRO 2015] used to

model cache behaviour Includes new exploits! (SpectrePrime, MeltdownPrime)

[CheckMate: Automated Exploit Program Generation for Hardware Security Verification. Caroline Trippel, Daniel Lustig, and Margaret Martonosi. In Proceedings of the 51st International Symposium on Microarchitecture (MICRO), October 2018.]
slide-113
SLIDE 113

Ongoing Work: Verifying Distributed Systems

39

▪Joint work with Themis Melissaris ▪Distributed systems have some similarities to shared-memory systems

  • Distributed protocols (e.g. Paxos) similar to cache coherence protocols
  • Replicated data store consistency models similar to MCMs
slide-114
SLIDE 114

Ongoing Work: Verifying Distributed Systems

39

▪Joint work with Themis Melissaris ▪Distributed systems have some similarities to shared-memory systems

  • Distributed protocols (e.g. Paxos) similar to cache coherence protocols
  • Replicated data store consistency models similar to MCMs
Tran 1 Tran 2 Tran_start Op 1 Op 2 Tran_end W x 1 W y 1 R y 0 R x 0
slide-115
SLIDE 115

Ongoing Work: Verifying Distributed Systems

39 [Cartoon by Julia Evans]

▪Joint work with Themis Melissaris ▪Distributed systems have some similarities to shared-memory systems

  • Distributed protocols (e.g. Paxos) similar to cache coherence protocols
  • Replicated data store consistency models similar to MCMs

▪Also have features with no shared-memory analogue!

  • Correctness in the presence of node failures
  • Eventual consistency [Vogels CACM 2009]
Tran 1 Tran 2 Tran_start Op 1 Op 2 Tran_end W x 1 W y 1 R y 0 R x 0
slide-116
SLIDE 116

Talk Outline

▪Overview and Motivation ▪Background on MCM Specification and Verification ▪PipeProof: All-Program Microarchitectural MCM Verification ▪RTLCheck: MCM Verification of Verilog RTL ▪Expanding to other domains ▪Conclusion

40
slide-117
SLIDE 117

▪Complexity of computing hardware is increasing

  • Ubiquitous parallelism and increased heterogeneity

▪Automated formal verification helps engineers handle this complexity

  • Give engineers the ability to formally verify their systems themselves
  • PipeProof: Automated All-Program Microarchitectural MCM Verification
  • RTLCheck: Per-Program MCM Verification of RTL Designs

▪Techniques for MCM analysis applicable to other domains

  • e.g. Security [Trippel et al. MICRO 2018] and distributed systems

Conclusions

41
slide-118
SLIDE 118

Collaborators

42 Margaret Martonosi Daniel Lustig (NVIDIA) Aarti Gupta Michael Pelluaer (NVIDIA) Caroline Trippel Sharad Malik Hongce Zhang
slide-119
SLIDE 119

Yatin A. Manerkar

Automated Formal Memory Consistency Verification

  • f Hardware

http:/ ://www.c .cs.p .princeton.edu/~manerkar

Princeton University June 23rd, 2019

43
slide-120
SLIDE 120

Backup Slides

44
slide-121
SLIDE 121

Chain Invariants

▪Abstractly represent repeated ISA-level patterns ▪Sometimes needed for refinement loop to terminate ▪Inductively proven by PipeProof before their use in proof algorithms ▪Example: checking for edge from i1 to i5 (TC abstraction support proof)

Abstract Counterexample

i1 i3 i4 fr i5 po

45
slide-122
SLIDE 122

Chain Invariants

▪Abstractly represent repeated ISA-level patterns ▪Sometimes needed for refinement loop to terminate ▪Inductively proven by PipeProof before their use in proof algorithms ▪Example: checking for edge from i1 to i5 (TC abstraction support proof)

Repeating ISA-Level Pattern

i1 i3 i4 fr i5 po i1 i3 i4 fr i2 po i5 po

45
slide-123
SLIDE 123

Chain Invariants

▪Abstractly represent repeated ISA-level patterns ▪Sometimes needed for refinement loop to terminate ▪Inductively proven by PipeProof before their use in proof algorithms ▪Example: checking for edge from i1 to i5 (TC abstraction support proof)

Repeating ISA-Level Pattern

i1 i3 i4 fr i5 po i1 i3 i4 fr i2 po i5 po

Can continue decomposing in this way forever!

45
slide-124
SLIDE 124

Chain Invariants

▪Abstractly represent repeated ISA-level patterns ▪Sometimes needed for refinement loop to terminate ▪Inductively proven by PipeProof before their use in proof algorithms ▪Example: checking for edge from i1 to i5 (TC abstraction support proof)

Chain Invariant Applied

i1 i3 i4 fr i5 po i1 i3 i4 fr i2 po i5 po i1 i4 fr i2 po_plus i5

  • po_plus = arbitrary

number of repetitions of po

  • Next edge peeled off will

be something other than po

45
slide-125
SLIDE 125

Covering Sets Optimization

▪ Must verify across all possible transitive connections ▪ Each decomposition creates a new set of transitive connections

  • Can quickly lead to a case explosion

▪ The Covering Sets Optimization eliminates redundant transitive connections

x y i1 z in IF EX WB fr x y i1 z in IF EX WB fr

B A

slide-126
SLIDE 126

Covering Sets Optimization

▪ Must verify across all possible transitive connections ▪ Each decomposition creates a new set of transitive connections

  • Can quickly lead to a case explosion

▪ The Covering Sets Optimization eliminates redundant transitive connections

x y i1 z in IF EX WB fr x y i1 z in IF EX WB fr

B A

Graph A has an edge from x→z (tran conn.)

slide-127
SLIDE 127

Covering Sets Optimization

▪ Must verify across all possible transitive connections ▪ Each decomposition creates a new set of transitive connections

  • Can quickly lead to a case explosion

▪ The Covering Sets Optimization eliminates redundant transitive connections

x y i1 z in IF EX WB fr x y i1 z in IF EX WB fr

B A

Graph B has edges from y→z (tran conn.) and x→z (by transitivity) Graph A has an edge from x→z (tran conn.)

slide-128
SLIDE 128

Covering Sets Optimization

▪ Must verify across all possible transitive connections ▪ Each decomposition creates a new set of transitive connections

  • Can quickly lead to a case explosion

▪ The Covering Sets Optimization eliminates redundant transitive connections

x y i1 z in IF EX WB fr x y i1 z in IF EX WB fr

B A

Graph B has edges from y→z (tran conn.) and x→z (by transitivity) Graph A has an edge from x→z (tran conn.) Correctness of A => Correctness of B (since B contains A’s tran conn.) Checking B explicitly is redundant!

slide-129
SLIDE 129

Memoization Optimization

▪Base PipeProof algorithm examines some cycles multiple times ▪Memoization eliminates redundant checks of cycles that have already been verified

i1 fr i2 i3 i4 rf po po

slide-130
SLIDE 130

Memoization Optimization

▪Base PipeProof algorithm examines some cycles multiple times ▪Memoization eliminates redundant checks of cycles that have already been verified

i1 in IF EX WB fr Some Tran. Conn.

i1 fr i2 i3 i4 rf po po fr

slide-131
SLIDE 131

Memoization Optimization

▪Base PipeProof algorithm examines some cycles multiple times ▪Memoization eliminates redundant checks of cycles that have already been verified

i1 in IF EX WB fr Some Tran. Conn.

i1 fr i2 i3 i4 rf po po

i1 in IF EX WB po Some Tran. Conn.

po po

slide-132
SLIDE 132

Memoization Optimization

▪Base PipeProof algorithm examines some cycles multiple times ▪Memoization eliminates redundant checks of cycles that have already been verified

i1 in IF EX WB fr Some Tran. Conn. i1 in IF EX WB rf Some Tran. Conn.

i1 fr i2 i3 i4 rf po po

i1 in IF EX WB po Some Tran. Conn.

rf Same cycle is checked 3 times!

slide-133
SLIDE 133

Memoization Optimization

▪Base PipeProof algorithm examines some cycles multiple times ▪Memoization eliminates redundant checks of cycles that have already been verified

i1 in IF EX WB fr Some Tran. Conn. i1 in IF EX WB rf Some Tran. Conn.

i1 fr i2 i3 i4 rf po po

i1 in IF EX WB po Some Tran. Conn.

rf Procedure: If all ISA-level cycles containing edge ri have been checked, do not peel off ri edges when checking subsequent cycles Same cycle is checked 3 times!

slide-134
SLIDE 134

Filtering Invalid Decompositions

▪When decomposing a transitive connection, the decomposition should guarantee the transitive connections of its parent abstract cexes. ▪Decompositions that do not do this are invalid and filtered out

p i1 r q in IF EX WB fr

?

AbsCounterX rX p i1 in-1 IF EX WB rf r q in fr In Invali lid De Decomposition

slide-135
SLIDE 135

The Adequate Model Over-Approximation

▪Addition of an instruction can make unobservable execution observable! ▪Need to work with over-approximation of microarchitectural constraints ▪PipeProof sets all exists clauses to true as its over-approximation

t i1 i2 IF EX WB fr v i3 co SubsetExec u t i1 i2 IF EX WB fr v i3 SubsetWithExternal u i4 rf co

slide-136
SLIDE 136

PipeProof Block Diagram

Microarchitecture Ordering Spec. ISA-Level MCM Spec. PipeProof ISA Edge ->

  • Microarch. Mapping
Result: All-Program MCM Correctness Proof? Counterexample found?

Chain Invariants Transitive Chain Abstraction Support Proof Microarch. Correctness Proof

  • Cex. Generation

Proof of Chain Invariants

Fail Fail Pass Pass
slide-137
SLIDE 137

PipeProof Block Diagram

Microarchitecture Ordering Spec. ISA-Level MCM Spec. PipeProof ISA Edge ->

  • Microarch. Mapping
Result: All-Program MCM Correctness Proof? Counterexample found?

Chain Invariants Transitive Chain Abstraction Support Proof Microarch. Correctness Proof

  • Cex. Generation

Proof of Chain Invariants

Fail Fail Pass Pass
slide-138
SLIDE 138

PipeProof Block Diagram

Microarchitecture Ordering Spec. ISA-Level MCM Spec. PipeProof ISA Edge ->

  • Microarch. Mapping
Result: All-Program MCM Correctness Proof? Counterexample found?

Chain Invariants Transitive Chain Abstraction Support Proof Microarch. Correctness Proof

  • Cex. Generation

Proof of Chain Invariants

Fail Fail Pass Pass

Links ISA- level and µarch executions

slide-139
SLIDE 139

PipeProof Block Diagram

Microarchitecture Ordering Spec. ISA-Level MCM Spec. PipeProof ISA Edge ->

  • Microarch. Mapping
Result: All-Program MCM Correctness Proof? Counterexample found?

Chain Invariants Transitive Chain Abstraction Support Proof Microarch. Correctness Proof

  • Cex. Generation

Proof of Chain Invariants

Fail Fail Pass Pass

Represent repeated ISA-level patterns

slide-140
SLIDE 140

PipeProof Block Diagram

Microarchitecture Ordering Spec. ISA-Level MCM Spec. PipeProof ISA Edge ->

  • Microarch. Mapping
Result: All-Program MCM Correctness Proof? Counterexample found?

Chain Invariants Transitive Chain Abstraction Support Proof Microarch. Correctness Proof

  • Cex. Generation

Proof of Chain Invariants

Fail Fail Pass Pass

If design can’t be verified, a counterexample (a forbidden execution that is observable) is often returned

slide-141
SLIDE 141

PipeProof Block Diagram

Microarchitecture Ordering Spec. ISA-Level MCM Spec. PipeProof ISA Edge ->

  • Microarch. Mapping
Result: All-Program MCM Correctness Proof? Counterexample found?

Chain Invariants Transitive Chain Abstraction Support Proof Microarch. Correctness Proof

  • Cex. Generation

Proof of Chain Invariants

Fail Fail Pass Pass

Supporting proofs provide foundation for correctness proof

slide-142
SLIDE 142

Mapping ISA-Level Edges to Microarchitecture

▪Translate each edge in ISA-level cycle to microarchitectural constraints ▪Do so with user-provided Mapping Axioms ▪Example: Mapping of 𝑞𝑝 edges

Axiom "Mapping_po": forall microop "i", forall microop "j", (HasDependency po i j => AddEdge ((i, Fetch), (j, Fetch), "po_arch", "blue")).

i1 i2 IF EX WB po
slide-143
SLIDE 143

Mapping ISA-Level Edges to Microarchitecture

▪Translate each edge in ISA-level cycle to microarchitectural constraints ▪Do so with user-provided Mapping Axioms ▪Example: Mapping of 𝑞𝑝 edges

Axiom "Mapping_po": forall microop "i", forall microop "j", (HasDependency po i j => AddEdge ((i, Fetch), (j, Fetch), "po_arch", "blue")).

i1 i2 IF EX WB po
slide-144
SLIDE 144

Mapping ISA-Level Edges to Microarchitecture

▪Translate each edge in ISA-level cycle to microarchitectural constraints ▪Do so with user-provided Mapping Axioms ▪Example: Mapping of 𝑞𝑝 edges

Axiom "Mapping_po": forall microop "i", forall microop "j", (HasDependency po i j => AddEdge ((i, Fetch), (j, Fetch), "po_arch", "blue")).

i1 i2 IF EX WB po

Blue edges between EX and WB stages added by

  • ther FIFO axioms (refer to µspec file)
slide-145
SLIDE 145

▪Open question as to whether a set of litmus tests is complete

(i1) (i2) IF EX WB (i3) (i4) (i1) (i2) IF EX WB (i3) (i4) Cyclic => Still unobservable Acyclic => BUG! Core 0 Core 1 x = 1; y = 1; r1 = y; r2 = x; Forbid: r1 = 1, r2 = 0 mp Litmus Test Core 0 Core 1 x = 1; r1 = y; y = 1; r2 = x; Forbid: r1 = 0, r2 = 0 sb Litmus Test po po rf fr po po fr fr 57

Can “litmus tests” provide complete coverage?

slide-146
SLIDE 146

▪Open question as to whether a set of litmus tests is complete

(i1) (i2) IF EX WB (i3) (i4) (i1) (i2) IF EX WB (i3) (i4) Cyclic => Still unobservable Acyclic => BUG! Core 0 Core 1 x = 1; y = 1; r1 = y; r2 = x; Forbid: r1 = 1, r2 = 0 mp Litmus Test Core 0 Core 1 x = 1; r1 = y; y = 1; r2 = x; Forbid: r1 = 0, r2 = 0 sb Litmus Test po po rf fr po po fr fr 57

Can “litmus tests” provide complete coverage? Different tests catch different bugs! To catch all bugs, must verify across all programs!

slide-147
SLIDE 147

Property to check: mapNode(Ld x → St x, Ld x == 0) or mapNode(St x → Ld x, Ld x == 1);

▪Don’t filter based on outcome

  • Translate all possible outcomes

▪Tag each case with appropriate load value constraints

  • reflect the data constraints required for edge(s)

▪Ongoing work: Precisely formalise the µspec/SVA mismatch

  • How much is fundamental? How much is due to SVA verifier approximation?

Solution: Load Value Constraints

Axiom "Read_Values": Every load either reads BeforeAllWrites OR reads FromLatestWrite

Core 0 Core 1 (i1) x = 1; (i3) r1 = y; (i2) y = 1; (i4) r2 = x; SC Forbids: r1 = 1, r2 = 0 mp Note: Axioms and properties abstracted for brevity
slide-148
SLIDE 148

Property to check: mapNode(Ld x → St x, Ld x == 0) or mapNode(St x → Ld x, Ld x == 1);

▪Don’t filter based on outcome

  • Translate all possible outcomes

▪Tag each case with appropriate load value constraints

  • reflect the data constraints required for edge(s)

▪Ongoing work: Precisely formalise the µspec/SVA mismatch

  • How much is fundamental? How much is due to SVA verifier approximation?

Solution: Load Value Constraints

Axiom "Read_Values": Every load either reads BeforeAllWrites OR reads FromLatestWrite

Core 0 Core 1 (i1) x = 1; (i3) r1 = y; (i2) y = 1; (i4) r2 = x; SC Forbids: r1 = 1, r2 = 0 mp Note: Axioms and properties abstracted for brevity
slide-149
SLIDE 149

Property to check: mapNode(Ld x → St x, Ld x == 0) or mapNode(St x → Ld x, Ld x == 1);

▪Don’t filter based on outcome

  • Translate all possible outcomes

▪Tag each case with appropriate load value constraints

  • reflect the data constraints required for edge(s)

▪Ongoing work: Precisely formalise the µspec/SVA mismatch

  • How much is fundamental? How much is due to SVA verifier approximation?

Solution: Load Value Constraints

Axiom "Read_Values": Every load either reads BeforeAllWrites OR reads FromLatestWrite

Core 0 Core 1 (i1) x = 1; (i3) r1 = y; (i2) y = 1; (i4) r2 = x; SC Forbids: r1 = 1, r2 = 0 mp Note: Axioms and properties abstracted for brevity
slide-150
SLIDE 150

Property to check: mapNode(Ld x → St x, Ld x == 0) or mapNode(St x → Ld x, Ld x == 1);

▪Don’t filter based on outcome

  • Translate all possible outcomes

▪Tag each case with appropriate load value constraints

  • reflect the data constraints required for edge(s)

▪Ongoing work: Precisely formalise the µspec/SVA mismatch

  • How much is fundamental? How much is due to SVA verifier approximation?

Solution: Load Value Constraints

Axiom "Read_Values": Every load either reads BeforeAllWrites OR reads FromLatestWrite

Core 0 Core 1 (i1) x = 1; (i3) r1 = y; (i2) y = 1; (i4) r2 = x; SC Forbids: r1 = 1, r2 = 0 mp Note: Axioms and properties abstracted for brevity
slide-151
SLIDE 151

Core 0

Memory WB DX IF

Multi-V-scale: a Multicore Case Study

59
slide-152
SLIDE 152

Core 0

Memory WB DX IF

3-stage in-order RISC-V pipeline

Multi-V-scale: a Multicore Case Study

59
slide-153
SLIDE 153

Core 0 Core 1 Core 2 Core 3

Arbiter Memory WB DX IF WB DX IF WB DX IF WB DX IF

Arbiter enforces that

  • nly one core

can access memory at any time

Multi-V-scale: a Multicore Case Study

59
slide-154
SLIDE 154

▪When two stores are sent to memory in successive cycles, first of two stores is dropped by memory! ▪Bug would occur even in single-core V-scale ▪Fixed bug by eliminating intermediate wdata reg

Core 0 Core 1 Core 2 Core 3

Arbiter WB DX IF WB DX IF WB DX IF WB DX IF

Memory

wdata

Mem array Stores

x = 1 y = 1

Bug Discovered in V-scale Mem. Implementation

60
slide-155
SLIDE 155

▪When two stores are sent to memory in successive cycles, first of two stores is dropped by memory! ▪Bug would occur even in single-core V-scale ▪Fixed bug by eliminating intermediate wdata reg

Core 0 Core 1 Core 2 Core 3

Arbiter WB DX IF WB DX IF WB DX IF WB DX IF

Memory

wdata

Mem array Stores

x = 1 y = 1

Bug Discovered in V-scale Mem. Implementation

60
slide-156
SLIDE 156

▪When two stores are sent to memory in successive cycles, first of two stores is dropped by memory! ▪Bug would occur even in single-core V-scale ▪Fixed bug by eliminating intermediate wdata reg

Core 0 Core 1 Core 2 Core 3

Arbiter WB DX IF WB DX IF WB DX IF WB DX IF

Memory

wdata

Mem array Stores

x = 1 y = 1

Bug Discovered in V-scale Mem. Implementation

60