A GPU-based x86 Disassembler ISC 2015 Evangelos Ladakis , Giorgos - PowerPoint PPT Presentation

GPU-Disasm: A GPU-based x86 Disassembler ISC 2015 Evangelos Ladakis , Giorgos Vasiliadis, Michalis Polychronakis, Sotiris Ioannidis, George Portokalidis

First Impressions Evangelos Ladakis - FORTH 2

Outline • Background • Architecture • Optimization • Evaluation • Conclusion Evangelos Ladakis - FORTH 5

Disassembly Software Reverse Engineering • Mandatory when source code is not available o Bad guys • Find vulnerabilities • Bypass protection mechanisms o Good guys • Find malicious code • Debug and patching • Apply protection mechanisms • Techniques o Linear o Recursive Evangelos Ladakis - FORTH 6

Binary Stores • Large number of binaries • 1.6 million Google play • 1.5 million app store • Updated occasionally From a security aspect: • Analysis time and cost are essential Evangelos Ladakis - FORTH 7

Motivation • How can we build a fast and cheap Disassembler for large scale analysis? • Can we use GPU’s to accelerate the decoding process? • Why GPUs? Evangelos Ladakis - FORTH 8

General-Purpose Programming on GPUs (GPGPU) • Powerful co-processors for General Purpose Programming • Commodity hardware, relative cheap • Compute capabilities increasing • Familiar API CUDA and OpenCl Evangelos Ladakis - FORTH 9

GPU memory model Evangelos Ladakis - FORTH 10

X86-ISA • CISC architecture • 1~15 Bytes instructions Why x86? • Widely used • More challenges to address • Applying to RISC is easier Evangelos Ladakis - FORTH 11

GPU-Disasm Arch. GPU-based Disassembler of the x86 architecture Two modes: • Linear disassembly o Each thread is assigned a binary • Exhaustive disassembly o Each thread decodes one instruction of the same binary but from a different offset Evangelos Ladakis - FORTH 12

Challenges • Arbitrary accesses to Global o X86 nature • Load balancing and correctness o Utilize threads fairly with same size buffers o Start disassembling where we left • Large number of static and constant values o Fast memory interfaces are small in capacity o Store the most frequently used Evangelos Ladakis - FORTH 13

GPU-Disasm Arch. GPU-Disasm Components: How to achieve high performance:  Optimize transfers  Optimize the Disassembly process  Pipeline the operations Evangelos Ladakis - FORTH 14

PCI Throughput • PCI 3.0 throughput evaluation Evangelos Ladakis - FORTH 15

PCI Throughput • Maximum throughput on 16MB of data Evangelos Ladakis - FORTH 16

Optimize Transfers 1. Pre-allocate page-locked I/O buffers to the host ( cudaMallocHost) 2. Place I/O to single buffers o Greater of 16 MB for PCI max throughput 3. Minimize the PCI transfer API calls Evangelos Ladakis - FORTH 17

Optimize Disassembly • Store Look-up-tables to Constant & Shared mem. • Pre-fetch input data to registers • Improve cache hits in L2 o Divide input into small buffers o Move threads as groups inside memory Evangelos Ladakis - FORTH 18

Correctness • We keep a copy of old decoded bytes and the upcomming bytes • So that we can continue decoding where we left Evangelos Ladakis - FORTH 19

Evaluation • Implementation in CUDA • System: o GPU: NVIDIA GTX 770 $396 o CPU: intel i7 $305 o Total cost $1120 • Dataset from usr of ubuntu 12.04 • Performance measured in Lines/sec Evangelos Ladakis - FORTH 20

Disassemblers Evaluation • Single threaded, discard disk I/O • Performance divergence due to output construction Evangelos Ladakis - FORTH 21

GPU-Disasm on crafted bins Buffer Size (Bytes) Average Hit Rate % (L1 to L2) 16 58.7 32 53.65 64 45.26 • Decode 2 Bytes Instructions • Impact of L2 optimization o 25.85 % more performance Evangelos Ladakis - FORTH 22

GPU-Disasm on Binaries Comparing only the disassembly process Evangelos Ladakis - FORTH 23

GPU-Disasm on Binaries Comparing only the disassembly process • Linear disassembly 2 times faster • Exhaustive average 4.4 times faster Evangelos Ladakis - FORTH 24

Pipeline Components • After 1024 batch size, disassembly becomes the bottleneck Evangelos Ladakis - FORTH 25

Hybrid (CPU & GPU) • Hybrid has 7 CPU threads and the GPU o 1 thread is needed as the GPU controller Evangelos Ladakis - FORTH 26

Power evaluation • Metrics include CPU, RAM, and peripherals power consumption o Measured internally with sensors Evangelos Ladakis - FORTH 27

Conclusion • Presented a GPU-based implementation of an x86 disassembler • 2 times faster in linear disassembly and 4.4 in exhaustive • Similar power consumption with the CPU implementation Evangelos Ladakis - FORTH 28

Thank you Evangelos Ladakis - FORTH 29

A GPU-based x86 Disassembler ISC 2015 Evangelos Ladakis , Giorgos - PowerPoint PPT Presentation

GPU-Disasm: A GPU-based x86 Disassembler ISC 2015 Evangelos Ladakis , Giorgos Vasiliadis, Michalis Polychronakis, Sotiris Ioannidis, George Portokalidis First Impressions Evangelos Ladakis - FORTH 2 First Impressions Evangelos Ladakis -

x86-32 and x86-64 Assembly (Part 2) (I know Kung-Fu !) Emmanuel Fleury

x86 Introduction Philipp Koehn 25 October 2019 Philipp Koehn Computer Systems Fundamentals: x86

Status of GPU offloading on Wayland Axel Davy FOSDEM 2014 Status of GPU offloading on Wayland

Motivation to Learn GPGPU Julius Parulek Why to Learn About GPU? Computational power of GPU vs.

x86 basics ISA context and x86 history Translation tools: C --> assembly <--> machine

Virtual Memory in x86 Nima Honarmand Fall 2017 :: CSE 306 x86 Processor Modes Real mode

UNIFIED MEMORY ON PASCAL AND VOLTA Nikolay Sakharnykh - May 10, 2017 1 HETEROGENEOUS

Advancements in V-Ray RT GPU Vlado Koylazov, CTO & Co-founder Blagovest Taskov, RT GPU Team

Medusa A disassembler and something more... Angelin Njakasoa BOOZ LSE Summer Week 2016

Instruction Set Architectures Part II: x86, RISC, and CISC Readings: 2.16-2.18 1 Which ISA

CS 105 Intel x86 (IA32/64) Processors Intel x86 (IA32/64) Processors Tour of the Black Holes

CS 105 x86-64 Linux Memory Layout x86-64 Linux Memory Layout Tour of Black Holes of Computing

Android-x86 status update from lead developer Chih-Wei Huang Graphics stack evolution presented

Compiler Construction Lecture 15: x86-64 and real world procedures 2020-02-28 Michael Engel

dirtbox a x86/Windows dirtbox, a x86/Windows Emulator Georg Wicherski Virus Analyst, Global

Interrupt and Exception Handling on the x86 ( Lecture 8 ) x86 Interrupt Vectors - Every

Outline Specific Issues Related to Embedded Processor Architectures General approach Sorin

HRANCO SERVICE FABRICATION ERECTION PRESSURE FINISHING CAPACITY

Public Safety Communications Interoperability: The Future looks Bright! Proudly governed by:

EAST LONDON NHS FOUNDATION PRESENTATION TO TOWER HAMLETS HEALTH SCRUTINY COMMITTEE JULY 2018 1

RoFaL Roma Families Learning Project County Clare VEC Clare Adult Basic Education Service is

Translating ETC to LLVM Assembly Carl Ritson C.G.Ritson@kent.ac.uk School of Computing,

Clamp Selection Guide www.sinctech.com | Page 1 Clamp Selection Guide Clamps and Mounting

Thomas Stewart Clamping mop Karys Thomas Stewart's Life Thomas Stewart's birthday is June

A GPU-based x86 Disassembler ISC 2015 Evangelos Ladakis , Giorgos - PowerPoint PPT Presentation

GPU-Disasm: A GPU-based x86 Disassembler ISC 2015 Evangelos Ladakis , Giorgos Vasiliadis, Michalis Polychronakis, Sotiris Ioannidis, George Portokalidis First Impressions Evangelos Ladakis - FORTH 2 First Impressions Evangelos Ladakis -

x86-32 and x86-64 Assembly (Part 2) (I know Kung-Fu !) Emmanuel Fleury

x86 Introduction Philipp Koehn 25 October 2019 Philipp Koehn Computer Systems Fundamentals: x86

Status of GPU offloading on Wayland Axel Davy FOSDEM 2014 Status of GPU offloading on Wayland

Motivation to Learn GPGPU Julius Parulek Why to Learn About GPU? Computational power of GPU vs.

x86 basics ISA context and x86 history Translation tools: C --&gt; assembly &lt;--&gt; machine

Virtual Memory in x86 Nima Honarmand Fall 2017 :: CSE 306 x86 Processor Modes Real mode

UNIFIED MEMORY ON PASCAL AND VOLTA Nikolay Sakharnykh - May 10, 2017 1 HETEROGENEOUS

Advancements in V-Ray RT GPU Vlado Koylazov, CTO &amp; Co-founder Blagovest Taskov, RT GPU Team

Medusa A disassembler and something more... Angelin Njakasoa BOOZ LSE Summer Week 2016

Instruction Set Architectures Part II: x86, RISC, and CISC Readings: 2.16-2.18 1 Which ISA

CS 105 Intel x86 (IA32/64) Processors Intel x86 (IA32/64) Processors Tour of the Black Holes

CS 105 x86-64 Linux Memory Layout x86-64 Linux Memory Layout Tour of Black Holes of Computing

Android-x86 status update from lead developer Chih-Wei Huang Graphics stack evolution presented

Compiler Construction Lecture 15: x86-64 and real world procedures 2020-02-28 Michael Engel

dirtbox a x86/Windows dirtbox, a x86/Windows Emulator Georg Wicherski Virus Analyst, Global

Interrupt and Exception Handling on the x86 ( Lecture 8 ) x86 Interrupt Vectors - Every

Outline Specific Issues Related to Embedded Processor Architectures General approach Sorin

HRANCO SERVICE FABRICATION ERECTION PRESSURE FINISHING CAPACITY

Public Safety Communications Interoperability: The Future looks Bright! Proudly governed by:

EAST LONDON NHS FOUNDATION PRESENTATION TO TOWER HAMLETS HEALTH SCRUTINY COMMITTEE JULY 2018 1

RoFaL Roma Families Learning Project County Clare VEC Clare Adult Basic Education Service is

Translating ETC to LLVM Assembly Carl Ritson C.G.Ritson@kent.ac.uk School of Computing,

Clamp Selection Guide www.sinctech.com | Page 1 Clamp Selection Guide Clamps and Mounting

Thomas Stewart Clamping mop Karys Thomas Stewart's Life Thomas Stewart's birthday is June

x86 basics ISA context and x86 history Translation tools: C --> assembly <--> machine

Advancements in V-Ray RT GPU Vlado Koylazov, CTO & Co-founder Blagovest Taskov, RT GPU Team