An integrated concurrency and core- ISA architectural envelope - PowerPoint PPT Presentation

An integrated concurrency and core- ISA architectural envelope definition, and test oracle, for IBM POWER multiprocessors Kathryn E. Gray 1 Gabriel Kerneis 1+ Dominic Mulligan 1 Christopher Pulte 1 Susmit Sarkar 2 Peter Sewell 1 1 University of Cambridge 1+ During work 2 University of St Andrews

What is an architecture spec?

What is an architecture spec? Typically prose

What is an architecture spec? Typically prose Sometimes pseudocode

What is an architecture spec? Typically prose Version 2.06 Branch I-form Branch Conditional B-form b target_addr (AA=0 LK=0) bc BO,BI,target_addr (AA=0 LK=0) ba target_addr (AA=1 LK=0) bca BO,BI,target_addr (AA=1 LK=0) bl target_addr (AA=0 LK=1) bcl BO,BI,target_addr (AA=0 LK=1) bla target_addr (AA=1 LK=1) bcla BO,BI,target_addr (AA=1 LK=1) 18 LI AA LK 16 BO BI BD AA LK 0 6 30 31 0 6 11 16 30 31 if AA then NIA � iea EXTS(LI || 0b00) if (64-bit mode) else NIA � iea CIA + EXTS(LI || 0b00) then M � 0 Sometimes pseudocode if LK then LR � iea CIA + 4 else M � 32 if ¬ BO 2 then CTR � CTR - 1 target_addr specifies the branch target address. ctr_ok � BO 2 | ((CTR M:63 ≠ 0) ⊕ BO 3 ) cond_ok � BO 0 | (CR BI+32 ≡ BO 1 ) If AA=0 then the branch target address is the sum of if ctr_ok & cond_ok then LI || 0b00 sign-extended and the address of this if AA then NIA � iea EXTS(BD || 0b00) instruction, with the high-order 32 bits of the branch tar- else NIA � iea CIA + EXTS(BD || 0b00) get address set to 0 in 32-bit mode. if LK then LR � iea CIA + 4 If AA=1 then the branch target address is the value BI+32 specifies the Condition Register bit to be tested. LI || 0b00 sign-extended, with the high-order 32 bits of The BO field is used to resolve the branch as described the branch target address set to 0 in 32-bit mode. in Figure 42. target_addr specifies the branch target address. If LK=1 then the effective address of the instruction fol- lowing the Branch instruction is placed into the Link If AA=0 then the branch target address is the sum of Register. BD || 0b00 sign-extended and the address of this instruction, with the high-order 32 bits of the branch tar- Special Registers Altered: get address set to 0 in 32-bit mode. LR (if LK=1) If AA=1 then the branch target address is the value BD || 0b00 sign-extended, with the high-order 32 bits of the branch target address set to 0 in 32-bit mode. If LK=1 then the effective address of the instruction fol- lowing the Branch instruction is placed into the Link Register. Special Registers Altered: CTR (if BO 2 =0) LR (if LK=1) Extended Mnemonics: Examples of extended mnemonics for Branch Condi- tional : Extended: Equivalent to: blt target bc 12,0,target bne cr2,target bc 4,10,target bdnz target bc 16,0,target

But …

But … • Not executable test oracles • You can’t test h/w or s/w against prose

But … • Not executable test oracles • You can’t test h/w or s/w against prose • Not a clear guide to concurrent behaviour • Especially for weakly consistent IBM Power and ARM

But … • Not executable test oracles • You can’t test h/w or s/w against prose • Not a clear guide to concurrent behaviour • Especially for weakly consistent IBM Power and ARM • A mass of instruction set detail

Specification as Artefact We (show how to) make architecture specs that are real technical artefacts

Specification as Artefact We (show how to) make architecture specs that are real technical artefacts • Executable as test oracle

Specification as Artefact We (show how to) make architecture specs that are real technical artefacts • Executable as test oracle • Mathematically precise

Specification as Artefact We (show how to) make architecture specs that are real technical artefacts • Executable as test oracle • Mathematically precise • Related to vendor pseudocode and intuition

Specification as Artefact We (show how to) make architecture specs that are real technical artefacts • Executable as test oracle • Mathematically precise • Related to vendor pseudocode and intuition • Clarify interface between ISA and concurrency

Specification as Artefact We (show how to) make architecture specs that are real technical artefacts Specifically IBM POWER all non-FP non-vector "user" ISA (153 instructions) and concurrency model

Specification as Artefact We (show how to) make architecture specs that are real technical artefacts Specifically IBM POWER all non-FP non-vector "user" ISA (153 instructions) and concurrency model Applicable to ARM as well See Modelling the ARMv8 Architecture, Operationally Concurrency and ISA, POPL16

Not just an emulator

Not just an emulator Emulator PPCMEM2

Not just an emulator Emulator PPCMEM2 Written in C etc A language with many faults Intermingling of emulation detail & semantics

Not just an emulator Emulator PPCMEM2 Written in C etc Written in Lem & Sail A language with many faults Languages for logic, Intermingling of emulation detail mathematics, and ISAs & semantics � Only spec detail Emulation separated

Not just an emulator Emulator PPCMEM2 Running concurrent code: Consider a lock

Not just an emulator Emulator PPCMEM2 T1 Lock T1 Set critical section T1 Unlock T2 Lock … � repeat Running concurrent code: Consider a lock

Not just an emulator Emulator PPCMEM2 init0:W crit/8=0 Thread 0 Thread 1 init1:W spin_lock_unlocked/8=0 T1 Lock R0[0-63] i4:BL enq i34:BL enq R0[0-63] R30 R30 rf[0-3,0,0] R0 R0 R0[0-63] enq:i5:LDAXR W1, [X0] R0[0-63] enq:i35:LDAXR W1, [X0] 4:Flow event: RXA spin_lock_unlocked/4 g:RXA spin_lock_unlocked/4 = 0 RXA spin_lock_unlocked/4 = ? R1 R1 T1 Set critical section R1[0-63] co R1 R1 i6:ADD W2, W1, #16, LSL #12 i36:ADD W2, W1, #16, LSL #12 R2 R2 R2[0-63] R1[0-63] R0[0-63] R0 R2 R1[0-63] R0 R2 T1 Unlock i37:STXR W3, W2, [X0] i7:STXR W3, W2, [X0] 5:Flow event: h:WX spin_lock_unlocked/4=0x00010000 R4[0-63] R30[0-63] a:WX spin_lock_unlocked/4=? R4[0-63] R5[0-63] h:WX spin_lock_unlocked/4=0x00010000 R5[0-63] R3 R4[0-63] R3 R4[0-63] co R30[0-63] R3[0-63] R3 R3 i8:CBNZ W3, exit i38:CBNZ W3, exit T2 Lock R5[0-63] R30[0-63] R1 R1 R4[0-63] i9:EOR W2, W1, W1, ROR #16 i39:EOR W2, W1, W1, ROR #16 R4[0-63] R2 R0[0-63] R2 R0[0-63] R2[0-63] R2 R2 i10:CBZ W2, out i40:CBZ W2, out R0[0-63] … R0[0-63] R0 spin:i11:LDAXRH W3, [X0] R30 R30 0:Memory read request from storage RXA spin_lock_unlocked/2 out:i12:RET out:i42:RET R3 R0[0-63] R4 R5 R1 R3 R4 R5 R0[0-63] i55:STR X5, [X4] i13:EOR W2, W3, W1, LSR #16 i25:STR X5, [X4] 6:Reorder events: m:W crit/8=1 and h:WX spin_lock_unlocked/4=0x00010000 R2 d:W crit/8=0 m:W crit/8=1 � rf[0-7,0,0] R4 R4 R2 i26:LDR X5, [X4] i56:LDR X5, [X4] i14:CBNZ W2, exit e:R crit/8 = 0 2:Memory read request from storage R crit/8 R5 R5 R5[0-63] repeat R30 R5 R5 out:i15:RET i27:CBNZ W5, error i57:CBZ W5, error R4 R5 i28:BL unlock error:i29:MOV W18, #1 i58:BL unlock error:i59:MOV W18, #1 i16:STR X5, [X4] R30 R18 R30 R18 b:W crit/8=0 Running concurrent code: R0 R0 R4 unlock:i30:LDRH W1, [X0] unlock:i60:LDRH W1, [X0] i17:LDR X5, [X4] 1:Memory read request from storage R spin_lock_unlocked/2 3:Memory read request from storage R spin_lock_unlocked/2 R5 R1 R1 R1 R1 R5 i31:ADD W1, W1, #1 i61:ADD W1, W1, #1 Consider a lock i18:CBNZ W5, error R1 R1 R0 R1 R0 R1 i19:BL unlock error:i20:MOV W18, #1 i32:STLRH W1, [X0] i62:STLRH W1, [X0] R30 R18 f:W.rel spin_lock_unlocked/2=? n:W.rel spin_lock_unlocked/2=? R0 i33:BL exit i63:BL exit unlock:i21:LDRH W1, [X0] R30 R30 R1 R1 i22:ADD W1, W1, #1 R1 R0 R1 i23:STLRH W1, [X0] c:W.rel spin_lock_unlocked/2=? i24:BL exit R30 Test SPINLOCK_UNROLL

Beneficiaries • Compiler writers • Concurrency primitive implementors • Security developers • Hardware developers

ISA model Litmus frontend Binary frontend test.litmus a.out Power 2.06B Framemaker Framemaker export Litmus parser ELF model OCaml Lem Power 2.06B XML parse, analyse, patch Concurrency model Harness Power 2.06B Text UI Sail Storage Web UI semantics Sail typecheck Lem OCaml, CSS, JS System Power 2.06B semantics Lem (Sail AST) Thread Lem executions semantics Sail interpreter Lem Lem

� � � � � � � � Sample Instruction Store Word with Update D-form stwu RS,D(RA) union ast member (bit[5], bit[5], bit[16]) Stwu � � 37 RS RA D function clause decode (0b100101 : � 0 6 11 16 31 (bit[5]) RS : � (bit[5]) RA : � EA � (RA) + EXTS(D) � (bit[16]) D as instr) = � MEM(EA, 4) � (RS) 32:63 � Stwu (RS,RA,D) � RA � EA � � Let the effective address (EA) be the sum (RA)+ D. function clause execute (Stwu (RS, RA, D)) = { � (RS) 32:63 are stored into the word in storage addressed (bit[64]) EA := 0; � by EA. EA := GPR[RA] + EXTS(D); � GPR[RA] := EA; � EA is placed into register RA. MEMw(EA,4) := (GPR[RS])[32 .. 63] � If RA=0, the instruction form is invalid. } Special Registers Altered: None

An integrated concurrency and core- ISA architectural envelope - PowerPoint PPT Presentation

An integrated concurrency and core- ISA architectural envelope definition, and test oracle, for IBM POWER multiprocessors Kathryn E. Gray 1 Gabriel Kerneis 1+ Dominic Mulligan 1 Christopher Pulte 1 Susmit Sarkar 2 Peter Sewell 1 1 University of

OBAMA PRESIDENTIAL CENTER INTRODUCTION 2 INTRODUCTION 3 ARCHITECTURAL DESIGN 4 ARCHITECTURAL

Welcome Welcome Core: Core A Regional Destination Core: Core UL Core: Core Downtown

ISAs and Y86-64 Samira Khan Agenda ISA vs Microarchitecture ISA Tradeoffs Y86-64 ISA

Corporate Presentation December 2019 Agenda Overview ISA Group 1 Overview ISA Group in Per

Religious Architectural Religious Architectural Religious Architectural Religious Architectural

COMP31212: Concurrency Topics 4.1: Concurrency Patterns - Monitors Topic 4.1: Concurrency

Instructions and Addressing 1 ISA vs. Microarchitecture ISA vs. Microarchitecture An ISA or

ISA Implementations Partly in Run programs for one ISA on hardware with different ISA Techniques:

Architectural Resources Cambridge Architectural Resources Cambridge Architectural Resources

Concurrency What is concurrency? In computer science, concurrency is a property of systems which

Concurrency Control Ensuring Isolation 354 Concurrency control Concurrency To increase

Caching, Parallelism, Fault Tolerance Marco Serafini COMPSCI 532 Lectures 2-3 Memory Hierarchy

INSTITUTIONAL PRESENTATION 1 Q 2 0 | R E S U L T S ISA Viso geral CTEEP ISA CTEEP in

CEO Conference N e w Y o r k | M a y , 2 0 1 9 Viso ISA CTEEP geral Why Invest in ISA

INSTITUTIONAL PRESENTATION 4 Q 1 9 | R E S U L T S A ISA Viso geral CTEEP ISA CTEEP in

PRESENTATION 2 Q 1 9 | R E S U L T S ISA CTEEP ISA CTEEP in the Transmission Sector

Tests and Testing p. 1 Empirical Science of the Artificial Treating these human-made

Litmus Testing at Rack Scale We're Going to Build a Large Program Collider ad Collide instructions

Verification, and Counterexamples Yatin Manerkar Princeton University manerkar@princeton.edu

Consistency of RTL Designs Yatin A. Manerkar , Daniel Lustig*, Margaret Martonosi, and Michael

Definitions Early Acids turns blue litmus red tastes sour neutralizes bases

Soft Real-Time on Multiprocessors: Are Analysis-Based Schedulers Really Worth It? Christopher

A Revisionist History of Denotational Semantics Stephen Brookes Carnegie Mellon University

Real-Time Operating Systems Heechul Yun 1 What is an OS? 2 Operating Systems A program