A Consistency Checker for Memory Subsystem Traces Matthew Naylor , - PowerPoint PPT Presentation

A Consistency Checker for Memory Subsystem Traces Matthew Naylor , Simon Moore, Alan Mujumdar Email: matthew.naylor@cl.cam.ac.uk

Problem Verify that the memory subsystem in a shared-memory multiprocessor implements a well-defined consistency model. This is a prerequisite for the correct execution of concurrent programs on such architectures.

Our approach Black-box specification-based testing: 1. Feed auto-generated requests to mem subsystem 2. Record a trace of all requests and responses 3. Check that trace satisfies consistency model

Attractions of black-box approach Generic: can be applied to a wide range of implementations and coherence protocols. Easy to apply: no modifications are required to the design under test.

Drawback of black-box approach Checking traces is an NP-complete problem [Gibbons and Korach, 1994] . Corollary : larger traces involving more cores are more likely to contain bugs yet less likely to be checkable in reasonable time.

State of the art TSOtool [Manovit, 2006] is a conformance checker for the TSO consistency model. It can handle large traces , on the order of millions of memory operations and hundreds of cores. Achieved through powerful inference rules and careful algorithm design.

BUT... Many modern memory subsystem implementations are more relaxed than TSO . And TSOtool is a “proprietary product of Sun Microsystems” .

Example: Limitations of TSO Thread 0 Thread 1 *data := 1 *flag == 1 *flag := 1 *data == 0 Forbidden under TSO, but observable if: ■ L1 cache is non-blocking , e.g. Rocket Chip, where first load is a miss & second is a hit. ■ Or, coherence protocol is lazy , e.g. BERI, where second load is a stale hit.

Our main contributions ■ Generalisation of TSOtool’s algorithm to support a wider range of consistency models. ■ An open-source checker for memory subsystem traces called Axe . ■ Experiences of applying Axe to open-source SoCs BERI and Rocket Chip.

Part II: Axe Consistency Checker

What is Axe? Thread id Input: a memory subsystem trace Store 0: M[0] := 1 0: sync Barrier 0: { M[1] == 0; M[1] := 1 } 1: M[1] == 1 @ 100 : 110 Atomic RMW 1: M[0] == 0 @ 115 : Load Timestamps Axe Output Does trace satisfy SC , TSO , PSO, WMO or POW model? If not, emit smallest subset of trace that fails.

SPARC models Non-deterministic Reorder Thread 0 Shared ... ... Switch Memory Reorder Thread n ■ SC prohibits reordering. ■ TSO can reorder S → L, simulating store buffer. ■ WMO can additionally reorder S → S, L → L, and L → S (provided addresses differ).

Algorithm demo Thread 0 Thread 1 Thread 2 M[0] := 1 M[0] == 1 M[0] := 2 M[0] == 2 M[0] := 3

Add thread-local edges Thread 0 Thread 1 Thread 2 M[0] := 1 M[0] == 1 M[0] := 2 M[0] == 2 M[0] := 3

Add reads-from edges Thread 0 Thread 1 Thread 2 M[0] := 1 M[0] == 1 M[0] := 2 M[0] == 2 M[0] := 3

Delete a root, add reads-before edges Thread 0 Thread 1 Thread 2 M[0] := 1 M[0] == 1 M[0] := 2 M[0] == 2 M[0] := 3

Violation: cycle detected! Thread 0 Thread 1 Thread 2 M[0] := 1 M[0] == 1 M[0] := 2 M[0] == 2 M[0] := 3

Backtrack, delete a root, add reads-before edges Thread 0 Thread 1 Thread 2 M[0] := 1 M[0] == 1 M[0] := 2 M[0] == 2 M[0] := 3

Delete a root Thread 0 Thread 1 Thread 2 M[0] := 1 M[0] == 1 M[0] := 2 M[0] == 2 M[0] := 3

Delete root, add reads-before edges Thread 0 Thread 1 Thread 2 M[0] := 1 M[0] == 1 M[0] := 2 M[0] == 2 M[0] := 3

Delete root Thread 0 Thread 1 Thread 2 M[0] := 1 M[0] == 1 M[0] := 2 M[0] == 2 M[0] := 3

Delete root Thread 0 Thread 1 Thread 2 M[0] := 1 M[0] == 1 M[0] := 2 M[0] == 2 M[0] := 3 Empty graph -- trace is valid!

Lesson ■ Easy to encounter backtracking behaviour during topological sort. ■ Routine backtracking is catastrophic for checking even small traces. ■ In response, TSOtool uses inference rules .

TSOtool’s inference rules Rule 1 M[ x ] := v M[ x ] := w M[ x ] == v M[ x ] := v M[ x ] := w Rule 2 M[ x ] == w

Apply Rule 2 Thread 0 Thread 1 Thread 2 M[0] := 1 M[0] == 1 M[0] := 2 M[0] == 2 M[0] := 3 Picking a root is now deterministic

Efficient graph representation ■ During checking, adding an edge to the graph is a very common operation. ■ Problem : need to quickly determine whether any added edge introduces a cycle. ■ Sounds like maintenance of an O(N 3 ) transitive closure, disastrous for large N .

SC graph representation ■ Under SC , operations on the same thread are totally ordered . ■ For each node, we need only maintain the nearest successor on each thread. ■ Complexity: O(N*T)

TSO graph representation ■ Under TSO , loads on the same thread are totally ordered . Likewise for stores. ■ For each node, we need only maintain the nearest load & store successor on each thread. ■ Complexity: O(2*N*T)

WMO graph representation ■ Under WMO , loads from same address on same thread are totally ordered . Likewise for stores. ■ For each node, maintain the nearest load & store successor on each thread for each address. ■ Complexity: O(2*N*T*A) ■ Still much better than O(N 3 ): T and A are small.

Axe performance evaluation (WMO) Averaged over a range of traces (576 in total): Checking time grows linearly with trace size.

Trace shrinking Problem : It’s hard to determine why a large trace is invalid just by staring at it. Solution : A trace shrinking procedure. Given a trace that violates a model, it searches for the smallest subset of the trace that still violates the model.

Part III: Applications

Trace generation BERI or BERI or ... Rocket Rocket I$ D$ I$ D$ L2 Bus We replaced the core with a random traffic generator that logs all requests & responses, yielding a random trace .

Rocket Chip coherence bug 260-element counterexample, after shrinking: 0: M[2] := 46 @ 497: 1: M[2] == 46 @ 280:513 1: M[2] := 61 @ 729: 1: M[2] == 46 @ 854:979 Write of 61 dropped Only write of 46 in trace Identified as “race condition” by Rocket Chip devs.

Rocket Chip atomics bug After shrinking: 1: M[3] := 31 0: { M[3] == 31; M[3] := 178 } 0: { M[3] == 178; M[3] := 198 } Not 1: { M[3] == 178; M[3] := 59 } atomic Bug occurs when a store-conditional is issued before a load-reserve response is received.

BERI barrier bug After shrinking: 1: M[39028] := 76 1: M[39028] := 79 # Set data 1: sync 1: M[2761] := 83 # Set flag 0: M[2761] == 83 # See flag 0: sync 0: M[39028] == 76 # See stale data This bug only observable after generating cancelled loads and stores in traffic generator.

Summary & conclusions ■ We have generalised a state-of-the-art checker to a wider range of consistency models through our open-source tool Axe. ■ This enabled us to test BERI & Rocket Chip, uncovering several serious bugs, concisely reported using our trace shrinking procedure. ■ Time complexity now dependent on number of distinct addresses in trace, but still performs well .

A Consistency Checker for Memory Subsystem Traces Matthew Naylor , - PowerPoint PPT Presentation

A Consistency Checker for Memory Subsystem Traces Matthew Naylor , Simon Moore, Alan Mujumdar Email: matthew.naylor@cl.cam.ac.uk Problem Verify that the memory subsystem in a shared-memory multiprocessor implements a well-defined consistency

Consistency - Chapter 5 Introduce several notions of Local Consistency: arc consistency,

Constraint Programming - An overview Node-consistency Arc-consistency Path-consistency

Exploiting Microarchitectural Flaws in the Heart of the Memory Subsystem Daniel Moghimi,

Web Cache Consistency Web Cache Consistency Web Cache Consistency Web Cache Consistency

Memory Hierarchy: Caching CSE 141, S2'06 Jeff Brown The memory subsystem Computer Control

Distributed Memory and Cache Consistency Distributed Memory and Cache Consistency (some slides

Distributed Memory and Cache Consistency Distributed Memory and Cache Consistency (some slides

1 Applications ? Trading Consistency for Performance Applications ? Trading Consistency for

Traces Exist (Hypothetically)! Carl Pollard Structure and Evidence in Linguistics Workshop in

Memory II. Memory improvement III. Problems with memory 3 systems/stages of Memory: memory

Memory Consistency Models CSE 451 James Bornholt Memory consistency models The short version:

Discovering Bits of Place Histories from People's Activity Traces from People s Activity Traces

C++ 11 Memory Consistency Model Sebastian Gerstenberg NUMA Seminar 07.01.2015 Agenda 1.

Nuspell: version 3 of the new spell checker FOSS spell checker implemented in C++17 with aid of

LIBIS/Aware conformance checker Agenda Functional analysis Technical design Research

Nuspell: the new spell checker FOSS spell checker implemented in C++14 with aid of Mozilla.

Testability Reid Holmes Testability The degree to which a system or component facilitates the

Specification-Based Testing 1 Stuart Anderson Stuart Anderson Specification-Based Testing 1

UNIT TESTING 3 / 8 1 / 8 Unit testing involves: Lots of small, independent tests Reporting

TDDE45 - Lecture 7: Testability Martin Sjlund Department of Computer and Information Science

Runway A new tool for distributed systems design Diego Ongaro Lead Software Engineer, Compute

Mind the Gap Nick McKeown Stanford University President Dean My problem I get excited about

Implementing the Model Communication DEVS and Statechart Thomas Feng April 2, 2003 Email:

Design Verification Introduction Virendra Singh Associate Professor Computer Architecture and