Instruction Parsers Nathan Jay Paradyn Project Scalable Tools - PowerPoint PPT Presentation

Random and Exhaustive T esting of Instruction Parsers Nathan Jay Paradyn Project Scalable Tools Workshop Granlibakken, California August 2016

Motivation Lots of tools parse binaries GNU 2 Instruction Parser Testing

Motivation Parsers rely on a disassembly step: Converting object code into a higher-level language with semantic information Hex Assembly 00: 55 push %rbp 01: 48 89 e5 mov %rsp, %rbp 04: 89 7d fc mov %edi, -0x4(%rbp) 07: 8b 45 fc mov – x4(%rbp), %eax 0a: 83 c0 0a add $0xa, %eax 0d: 0f af 45 fc imul – x04(%rpb), %eax 11: 5d pop %rbp 12: c3 retq 3 Instruction Parser Testing

Motivation Size field Operation Immediate Converting object code to assembly is Source Register easy for a single format, like this from Dest. Register ARMv8: Condition Fixed Value Compare and branch (immediate) No single format is difficult to decode. Just extract the fields and translate binary to assembly for each field. 4 Instruction Parser Testing

Motivation Size field Operation Immediate Unfortunately, the format varies between Source Register instructions. Dest. Register Condition Fixed Value Compare and branch (immediate) Test and branch (immediate) Conditional branch (immediate) 5 Instruction Parser Testing

Motivation Size field Operation Immediate And there are a lot of formats: Source Register Dest. Register Condition Fixed Value 6 Instruction Parser Testing

Motivation These formats only partially cover: o load/store o branching The manual specifies more than 5 times as many different, general formats. ARM can vary between implementations: Apple, Samsung, AMD, Nvidia, Broadcom, Applied Micro, Huawei, Cavium… 7 Instruction Parser Testing

Motivation x86 has other challenges with variable length instructions. This format works for some 1 or 2 byte opcodes: Prefixes opcode mod SIB displacement immediate R/M Seg, Rep, Lock, 66, 67 0F XX * * 0, 1, 2 or 4 byte value 0, 1, 2 or 4 byte value REX There is another format for some 3 byte opcodes: Prefixes opcode mod SIB displacement imm R/M Seg, Rep, Lock, 66, 67 0F * XX * * 0, 1, 2 or 4 byte value byte REX This is less than a 3 rd of byte level maps, and there are bit level maps as well. 8 Instruction Parser Testing

Motivation Moreover, instruction sets change over time: x86 Extensions 1977 – 1996: Additions made in NPX (x87) 1977 80186, 80286, 80386, 80387.AMD 1997 – 1999: Additions made in releases first x86 processor, K-5. MMX 1997 Pentium MMX, Pentium Pro, AMD 1999: AMD adds 3DNow! And 2 MMX+ and Intel EMMX SSE 1999 separate additions to 3DNow!+ SSE2 2000 SSE3 2004 2005: Intel adds virtualization SSSE3 2006 2006: AMD adds virtualization 2007-2008: AMD adds SSE4a in SSE4 2007 Phenom Intel adds SSE4.2 in 2008-2010: Intel adds SHA. Nehalem AVX 2008 2013: Intel and AMD both AMD deprecates 3DNow! AVX2 2011 support BMI1, disagree on what’s included. Intel supports BMI 2 AVX512 2013 2015: AMD supports BMI 2, MPX 2013 Intel adds AES support 9 Instruction Parser Testing

Goals o Find disassembler errors o Test enormous instruction space quickly o Consolidate duplicate reports of an error o Avoid instruction set specifics o Work for multiple instruction sets o Don’t rely on specific instruction set versions o Work with any disassembler 10 Instruction Parser Testing

Previous Work Some past efforts: o Comparison of disassembly and execution results, Ormandy 2008 o Generate instructions randomly or by brute force o Disassemble instructions, execute instructions and compare results o Generation of known valid or invalid x86 prefixes and opcodes, Seidel 2014 o Start with empty string of bytes o Use look up tables for next valid byte to build instruction, byte-by-byte o Arbitrary values can be appended after opcode o N-version differential disassembly, Paleari et. al 2010 12 Instruction Parser Testing

Previous Work – Paleari et. al 2010 Input: o Randomized bytes (40,000 sequences used) o CPU-tested instructions (20,000 sequences picked at random) o Enumerate all possible 1, 2 and 3 byte sequences o Execute each byte sequence with a few operands o Prepend a few prefixes to each sequence Test: o Compare 8 disassemblers’ outputs and execution results o Remove disassembly output that conflicts with execution in: o Instruction length o Operand type o Declare the most common output to be correct 13 Instruction Parser Testing

Previous Work - Limitations o Naïve input generation o Randomly choosing instructions inefficiently tests whole space o A brute force approach would require 2 120 instructions o Required expert knowledge of x86 o Semantic specification for decoding to compare to execution o List of all valid bytes, prefixes, knowledge of operand position o Relied on details of the ISA o Opcode length and position o Byte boundaries o No means to coalesce similar error reports 14 Instruction Parser Testing

Approach o Generate instructions more effectively o Avoid repetitions of similar instructions o Cover instruction space more thoroughly than purely random within a reasonable timeframe o Test all functional parts of instructions o Avoid ISA dependencies and expert knowledge 15 Instruction Parser Testing

Workflow Input Generation Create object code to disassemble Differential Disassembly Disassemble object code with each Disassembler 1 Disassembler n disassembler and normalize results to … uniform representation Normalize n Normalize 1 … Compare disassembled code and Comparison & Filtering suppress duplicate differences Reassemble output, looking for Reassembly differences with object code Determine which disassembly is Analysis correct 16 Instruction Parser Testing

Workflow – Current State Generalized, works for x86 and Input Generation ARMv8. PPC64 lacks some register info Differential Disassembly Differential disassembly tested on all Disassembler 1 Disassembler n “In - progress” decoders. … Normalize n Normalize 1 … Normalization ongoing in each. Generalized, works for x86, PPC64 Comparison & Filtering and ARMv8. PPC64 lacks register info. Reassembly Primitive support for x86 and ARMv8 Preliminary results on x86 and Analysis ARMv8 outputs 17 Instruction Parser Testing

Workflow – Current State Generalized, works for x86 and Input Generation ARMv8. PPC64 lacks some register info Differential Disassembly Differential disassembly tested on all Disassembler 1 Disassembler n “In - progress” decoders. … Normalize n Normalize 1 … Normalization ongoing in each. Generalized, works for x86, PPC64 Comparison & Filtering and ARMv8. PPC64 lacks register info. Reassembly Primitive support for x86 and ARMv8 Preliminary results on x86 and Analysis ARMv8 outputs 18 Instruction Parser Testing

Input Generation – Observations o Naïve brute force is too slow o x86 instructions are up to 15 bytes long o There are much less than 2 120 significantly different instructions o Many instructions differ only slightly o Immediate values do not change meaning or decoding of instructions o Registers names (usually) do not change meaning or decoding of instructions 19 Instruction Parser Testing

Input Generation – Observations Disassemblers are likely to decode similar instructions all correctly or all incorrectly. Binary Code Decoded Instruction 1011 0100 1101 1111 mov $0xdf, %ah 1011 0100 0101 1111 mov $0x5f, %ah 1011 0110 1101 1111 mov $0xdf, %dh 1011 1100 1101 1111 movsbb (%rsi), (%rdi) Not all bits flips are equally interesting, so can we find those that are most interesting? 21 Instruction Parser Testing

Input Generation – Observations Goal: Find and ignore bits that encode only register names or immediate values. mov $0xdf, %ah: 1011 0100 1101 1111 We can identify 11 of 16 bits that will not be interesting to vary 22 Instruction Parser Testing

Input Generation Add some random byte strings Seed Work Queue to the queue Queue Check if there are more Done! Empty? instructions to evaluate Find interesting bits to vary for Map Instruction (each decoder) new instructions Flip interesting bits to create Generate Insns (each decoder) instructions Add new instructions to the Queue New Insns queue Differential Disassembly 23

Producing a Map of Interesting Instruction Bits Map: * Base Bits: 1011 0100 1101 1111 New Bits: 0011 0100 1101 1111 Base Insn: mov $0xdf, %ah New Insn: xor $0xdf, %al 24 Instruction Parser Testing

Producing a Map of Interesting Instruction Bits Map: ** Base Bits: 1011 0100 1101 1111 New Bits: 0111 0100 1101 1111 Base Insn: mov $0xdf, %ah New Insn: hlt 25 Instruction Parser Testing

Producing a Map of Interesting Instruction Bits Map: *** Base Bits: 1011 0100 1101 1111 New Bits: 1001 0100 1101 1111 Base Insn: mov $0xdf, %ah New Insn: xchg %eax, %esp 26 Instruction Parser Testing

Producing a Map of Interesting Instruction Bits Map: **** Base Bits: 1011 0100 1101 1111 New Bits: 1010 0100 1101 1111 Base Insn: mov $0xdf, %ah New Insn: movsbb (%rsi), (%rdi) 27 Instruction Parser Testing

Producing a Map of Interesting Instruction Bits Map: **** * Base Bits: 1011 0100 1101 1111 New Bits: 1011 1100 1101 1111 Base Insn: mov $0xdf, %ah New Insn: mov $0x6d5f5…, % esp 28 Instruction Parser Testing

Instruction Parsers Nathan Jay Paradyn Project Scalable Tools - PowerPoint PPT Presentation

Random and Exhaustive T esting of Instruction Parsers Nathan Jay Paradyn Project Scalable Tools Workshop Granlibakken, California August 2016 Motivation Lots of tools parse binaries GNU 2 Instruction Parser Testing Motivation Parsers

Scanners and parsers COMP 520 Fall 2010 Scanners and Parsers (2) A scanner or lexer transforms a

LR Parsing Compiler Design CSE 504 Shift-Reduce Parsing 1 LR Parsers 2 SLR and LR(1) Parsers

Objectives Combinator Parsing Show how to build complex parsers by composing simpler parsers.

XML Parsers Asst. Prof. Dr. Kanda Runapongsa Saikaew (krunapon@kku.ac.th) Dept. of Computer

Instruction Set 2 Architecting a vocabulary for the HW INSTRUCTION SET OVERVIEW 3 Instruction

Features of Statistical Parsers Mark Johnson Brown Laboratory for Linguistic Information

Dependency and Phrasal Parsers of the Czech Language: A Comparison ak 1 , Tom s Holan 2 ,

Shift-Reduce Parsers for Transition Networks Luca Breveglieri Stefano Crespi Reghizzi Angelo

Training Deterministic Parsers with Non-Deterministic Oracles by Yoav Goldberg and Joakim

CS406: Compilers Spring 2020 Week 5: Parsers, AST, and Semantic Routines 1 Recap 2 3

EXPLICIT INSTRUCTION EXPLICIT INSTRUCTION Michael L. Kamil Michael L. Kamil Stanford University

Lecture 3: Instruction Lecture 3: Instruction of a computer that a machine language of a

EE 457 Unit 3 Instruction Sets 2 With Focus on our Case Study: MIPS INSTRUCTION SET OVERVIEW 3

EE 109 Unit 10 MIPS Instruction Set MIPS INSTRUCTION OVERVIEW 10.3 10.4 Instruction Set

Instruction encoding The ISA defines The format of an instruction (syntax) The

Slide Handouts: Instruction Ask the Expert Welcome to Module 6 Lesson 1. Instruction: Ask the

Multi-Architecture ISA-Level Simulation of OpenCL Dana Schaa, Rafael Ubal Northeastern

x86 basics ISA context and x86 history Translation tools: C --> assembly <--> machine

Update on Telecommunications for Disaster Relief, Mitigation, and Early Warning in the ITU-T J.

Geochemical Modeling to Evaluate Remediation Options for Iron-Laden Mine Discharges Charles

1 PIER: 15-Letter Gain From Baseline PIER: Loss of < 15 Letters From Baseline at 12 Months

Cluster Computing: Cluster Computing: You've Come A Long Way You've Come A Long Way In A Short

Online Phase-Adaptive Data Layout Selection Chengliang Zhang Martin Hirzel Microsoft IBM

Experiments at Scale: PRObE Garth Gibson, Carnegie Mellon University Gary Grider, Los Alamos

Instruction Parsers Nathan Jay Paradyn Project Scalable Tools - PowerPoint PPT Presentation

Random and Exhaustive T esting of Instruction Parsers Nathan Jay Paradyn Project Scalable Tools Workshop Granlibakken, California August 2016 Motivation Lots of tools parse binaries GNU 2 Instruction Parser Testing Motivation Parsers

Scanners and parsers COMP 520 Fall 2010 Scanners and Parsers (2) A scanner or lexer transforms a

LR Parsing Compiler Design CSE 504 Shift-Reduce Parsing 1 LR Parsers 2 SLR and LR(1) Parsers

Objectives Combinator Parsing Show how to build complex parsers by composing simpler parsers.

XML Parsers Asst. Prof. Dr. Kanda Runapongsa Saikaew (krunapon@kku.ac.th) Dept. of Computer

Instruction Set 2 Architecting a vocabulary for the HW INSTRUCTION SET OVERVIEW 3 Instruction

Features of Statistical Parsers Mark Johnson Brown Laboratory for Linguistic Information

Dependency and Phrasal Parsers of the Czech Language: A Comparison ak 1 , Tom s Holan 2 ,

Shift-Reduce Parsers for Transition Networks Luca Breveglieri Stefano Crespi Reghizzi Angelo

Training Deterministic Parsers with Non-Deterministic Oracles by Yoav Goldberg and Joakim

CS406: Compilers Spring 2020 Week 5: Parsers, AST, and Semantic Routines 1 Recap 2 3

EXPLICIT INSTRUCTION EXPLICIT INSTRUCTION Michael L. Kamil Michael L. Kamil Stanford University

Lecture 3: Instruction Lecture 3: Instruction of a computer that a machine language of a

EE 457 Unit 3 Instruction Sets 2 With Focus on our Case Study: MIPS INSTRUCTION SET OVERVIEW 3

EE 109 Unit 10 MIPS Instruction Set MIPS INSTRUCTION OVERVIEW 10.3 10.4 Instruction Set

Instruction encoding The ISA defines The format of an instruction (syntax) The

Slide Handouts: Instruction Ask the Expert Welcome to Module 6 Lesson 1. Instruction: Ask the

Multi-Architecture ISA-Level Simulation of OpenCL Dana Schaa, Rafael Ubal Northeastern

x86 basics ISA context and x86 history Translation tools: C --&gt; assembly &lt;--&gt; machine

Update on Telecommunications for Disaster Relief, Mitigation, and Early Warning in the ITU-T J.

Geochemical Modeling to Evaluate Remediation Options for Iron-Laden Mine Discharges Charles

1 PIER: 15-Letter Gain From Baseline PIER: Loss of &lt; 15 Letters From Baseline at 12 Months

Cluster Computing: Cluster Computing: You've Come A Long Way You've Come A Long Way In A Short

Online Phase-Adaptive Data Layout Selection Chengliang Zhang Martin Hirzel Microsoft IBM

Experiments at Scale: PRObE Garth Gibson, Carnegie Mellon University Gary Grider, Los Alamos

x86 basics ISA context and x86 history Translation tools: C --> assembly <--> machine

1 PIER: 15-Letter Gain From Baseline PIER: Loss of < 15 Letters From Baseline at 12 Months