"Systemized" Static Analysis Harry Xu University of - PowerPoint PPT Presentation

"Systemized" Static Analysis Harry Xu University of California, Los Angeles

Overview of My Work Systems PL 2

Static Analysis: Has the Problem Been Solved? Academia Industry • Hundreds of papers published in • Less than a dozen commercial the past decade analysis tools • Algorithms become increasingly • Use very simple algorithms sophisticated • Software becomes increasingly large and dynamic More elegant More practical 3

The Ever-increasing Gap Lost in multiple Difficulty in Scalability languages implementation 4

Attempts from the PL Community • Poor scalability + Trading off precision for scalability + Minimizing generated information - Further complicates the implementation • Complicated implementations + Using declarative models such as Datalog - Fundamentally limited by a Datalog engine • Lost in multiple - Nothing has been done languages 5

The Outside World • The FB graph had 721M vertices (users), 68.7B edges (friendships) in May 2011 • Google Maps had 20 petabytes of data in 2015 6

Our “Large” Programs • The Linux kernel , 16M lines of code; a fully inlined version has about 1B edges • HBase , 1.37M lines of code; 128M edges in a fully inlined version • Hadoop , 546K lines of code; 44M edges in a fully inlined FB Graph: 68.7B edges version 7

Time for a Mindset Shift? It is not because our programs are too large, but because we haven’t thought about how to develop scalable systems 8

“Big Data” Thinking Solution = (1) Large Dataset + (2) Simple Computation + System Design Don’t complicate the algorithm Leave the algorithm simple Leverage modern computing Don’t worry about too much resources (intermediate) data Design and implement Don’t stop at the interface customized systems between app and system 9

What We Did Built single-machine, disk-based systems specifically for the static analysis workload • Graspan: a graph system for CFL-reachability computation [ASPLOS’17] • Grapple: a graph system for finite-state property checking [In Submission] 10

Why Systemized Static Analyses Work • Poor scalability No longer worry about memory blowup as we have disk-support • Complicated implementations Analysis developers only implement a few interfaces; No longer worry about performance • Lost in multiple Components in different languages are turned into languages graphs of the same format and analyzed together 11

Graspan: Context-Free Language (CFL) Reachability • A program graph P K l 2 l 1 a b c c is K-reachable from a • A context-free Grammar G with balanced parentheses properties K à l 1 l 2 12 Reps, Program analysis via graph reachability, IST, 1998

A Wide Range of Applications • Pointer/alias analysis Alias b = a; a b c c = b; Assign Assign Alias à Assign + • Dataflow analysis, pushdown systems, set-constraint problems can all be converted to context-free-language reachability problems Sridharan and Bodik, Refinement-based context-sensitive pointsto analysis for Java, PLDI , 2006 13 Zheng and Rugina, Demand-driven alias analysis for C, POPL , 2008

A Wide Range of Applications (Cont.) • Pointer/alias analysis Alias b = & a; // Address-of a b c d c = b; & Alias * d = *c; // Dereference Alias à Assign + | & Alias * • Address-of & / dereference* are the open/close parentheses Sridharan and Bodik, Refinement-based context-sensitive pointsto analysis for Java, PLDI , 2006 14 Zheng and Rugina, Demand-driven alias analysis for C, POPL , 2008

“Big Data” Thinking Solution = (1) Large Dataset + (2) Simple Computation + System Design 15

Turning Code Analysis into Data Analytics • Key insights: – The input is a fully inlined program graph – Adding transitive edges explicitly – satisfying (1) – Core computation is adding edges – satisfying (2) – Leveraging disk support for memory blowup • Can existing graph systems be directly used? – No, none of them support dynamic addition of a lot of edges (1) Online edge duplicate check and (2) dynamic graph repartitioning 16

Graspan [Wang-ASPLOS’17] • Scalable – Disk-based processing on the developer's work machine • Parallel – Edge-pair centric computation • Easy to implement a static analysis – Implement a few interfaces 4 students + 1 postdoc, 1.5 years of development; implemented in both Java and C++ https://github.com/Graspan/ 17

How It Works? G GRAMMAR RULES 18

Granspan Design Edge-Pair Centric Preprocessing Postprocessing Computation 19

Computation Occurs in Supersteps Edge-Pair Centric Preprocessing Postprocessing Computation 20

Each Superstep Loads Two Partitions 0 C 1 A B 2 3 4 Edge-Pair Centric Preprocessing Postprocessing Computation 21

Each Superstep Loads Two Partitions 0 1 2 3 4 We keep iterating until delta is 0 Edge-Pair Centric Preprocessing Postprocessing Computation 22

Post-Processing • Repartition oversized partitions to maintain balanced load on memory • Save partitions to disk • Scheduler favors in-memory partitions and those with higher matching degrees Edge-Pair Centric Preprocessing Postprocessing Computation 23

What We Have Analyzed Program #LOC #Inlines Linux 4.4.0-rc5 16M 31.7M PostgreSQL 8.3.9 700K 290K Apache httpd 2.2.18 300K 58K • With – A fully context-sensitive pointer/alias analysis – A fully context-sensitive dataflow analysis • On a Dell Desktop Computer with 8GB memory and 1TB SSD 24

Evaluation • Can the interprocedural analyses improve D. Englers’ checkers? – Found 85 new NULL pointer bugs and 1127 unnecessary NULL tests in Linux 4.4.0-rc5 • How well does Graspan perform? – Computations took 11 mins – 12 hrs • How does Graspan compare to other systems? – GraphChi crashed in 133 seconds – Traditional implementations of these algorithms ran out of memory in most cases – Datalog (SociaLite) – based implementation ran out of memory in most cases • Will try a differential dataflow system like Naiad 25

Grapple: A Finite-State Property Checker • Many bugs in large-scale systems have finite-state properties – Many OS bugs studied in Chou et al. in 2001 are finite state property bugs: misplaced locks, use-after-free, etc. – Most distributed system bugs studied in Gunawi et al. in 2014 are finite state property bugs: socket leaks , task state problems , mishandled exceptions , etc. Gunawi et al. , What bugs live in the cloud? a study of 3000+ issues in cloud systems, SoCC , 2014 26 Chou et al., An empirical study of operating systems errors, SOSP , 2001

Analyses Under the Hood • What we need for the checker – Extract sequences of method calls on each object of interest – Check them against the FSM specification • What analyses we need – Alias analysis – Dataflow analysis – Context sensitivity and path sensitivity 27

Grapple • Phases – A fully path-sensitive, context-sensitive alias analysis – A fully path-sensitive, context-sensitive dataflow analysis – Extract event sequences • Computation Model – Edge-pair-centric model – Challenge: how to represent and solve path constraints during graph processing 28

Grapple Computation Model K, C • A program graph P l 2, c 2 l 1, c 1 a b c c is K-reachable from a • A context-free grammar G with balanced parentheses properties K à l 1 l 2 • C = c 1 ∧ c 2 is satisfiable 29

Path Constraint Representation • Challenges – Each edge carries only fixed-size data – The size is often smaller than 4 bytes • Using interprocedural control flow execution tree (ICFET) as an index engine • Each edge contains a path encoding, which is used to query for a path constraint based upon ICFET 30

Control Flow Execution Tree (CFET) x = parse(args[0]); 0 y = x; 0 x>=0 F T if(x > 0) { y--; } 2 else { y++; } 1 2 1 x+1>0 x-1>0 if(y > 0) { … } F T F T 4, 6 else { … } 3, 5 3 4 5 6 return; A simple numbering algorithm: T child -> ID * 2; F child -> ID * 2 + 1 Built before the graph computation starts 31

Path Representation • An intraprocedural CFET path can be uniquely encoded as a pair [ID start , ID end ] • Decoding can be done efficiently online • Loops are unrolled a certain number of times Example: [0, 6] uniquely identifies the 0 x>=0 F right most path T 1 2 Decoding can be done by right shifts x+1>0 x-1>0 F T F T Symbolic execution used to compute 3 4 5 6 conditions 32

Interprocedural CFET void foo (int x) { int y = x + 1; bar(a) 0 if (x > 0) { y = bar (2 * x); //f 2 } a=2*x, foo(x) a<0 if (y < 0) { … } T F ( f2 0 return; 1 2 x>0 F T } … y=a-1, ) f2 y=a+1, int bar (int a) { 1 2 ) f2 x+1<0 if (a < 0) {return a + 1;} y<0 F T F T return a – 1; 3 4 5 6 } Connecting callers with callees using call and return edges, annotated with call site IDs and symbolic equations 33

Interprocedural Path Representation • A sequence of intervals bar(a) – [2, 0], 25, [2, 0] 0 a=2*x, foo(x) a<0 T F ( f2 – Bounded by the call 0 1 2 stack depth x>0 F T … y=a-1, ) f2 • A constraint can be y=a+1, 1 2 ) f2 x+1<0 y<0 computed by extracting F T F T constraints for path 3 4 5 6 fragments and combining them into a conjunctive form 34

"Systemized" Static Analysis Harry Xu University of - PowerPoint PPT Presentation

"Systemized" Static Analysis Harry Xu University of California, Los Angeles Overview of My Work Systems PL 2 Static Analysis: Has the Problem Been Solved? Academia Industry Hundreds of papers published in Less than a dozen

Static and Method Overloading static One per class, not per object static variables

"Big Data" Perspective on Static Analysis Scalability Harry Xu and Zhiqiang Zuo

Static and dynamic verification Static and dynamic V&V Software inspections Concerned

Static analysis of OpenAFS code base Cheyenne Wills OpenAFS 2019 Workshop Overview What is

A Brief Introduction to Static Analysis Sam Blackshear March 13, 2012 Outline A theoretical

Static Analysis of Haskell Neil Mitchell http://ndmitchell.com Static Analysis is getting

Static Code Analysis of Complex PHP Application Vulnerabilities Johannes Dahse Static Code

1 Static Equilibrium From Static Eq. to Dynamic Eq. System of mass points Static

STARTS: STARTS: STARTS: STARTS: STAtic STAtic Regression Test Selection Regression Test

static vs automatic storage classes Three types of memory allocations static storage class

Wrap Up Static, Packages, Exceptions Static methods // Example: // Java's built in Math class

Static and dynamic verification Software inspections Concerned with analysis of the static

CS7038 - Malware Analysis - Wk04.1 Static Analysis Introduction Coleman Kane kaneca@mail.uc.edu

Static Analysis for Secure Development Introduction Static analysis : What , and why ?

Whats Coverity static analysis ever done for us? Philip Withnall Endless Mobile

Introduction to Static Analysis for Assurance John Rushby Computer Science Laboratory SRI

Algorithmen fr die Echtzeitgrafik Algorithmen fr die Echtzeitgrafik Daniel Scherzer

ISC09, Pisa, Italy Outline Outline Contributions 2 Attacks Target Ciphers: RC6, ERC6 and

Predicates Reading: EC 1.4 Peter J. Haas INFO 150 Fall Semester 2019 Lecture 3 1/ 15

Can virtuous institutions crowd out selfish preferences in a market environment? Marco Faillo

Ubuntu Kernel Factory How we have Ubuntu kernels Ike Panhc <ike.pan@canonical.com>

Outline Multiprocessors Flynn taxonomy SIMD architectures Vector architectures MIMD

Web Application Security Payloads Andrs Riancho Director of Web Security BlackHat 2011 -

Advanced Block Cipher Design My crazy boss asked me to design a new block cipher. Whats next?

"Systemized" Static Analysis Harry Xu University of - PowerPoint PPT Presentation

"Systemized" Static Analysis Harry Xu University of California, Los Angeles Overview of My Work Systems PL 2 Static Analysis: Has the Problem Been Solved? Academia Industry Hundreds of papers published in Less than a dozen

Static and Method Overloading static One per class, not per object static variables

&quot;Big Data&quot; Perspective on Static Analysis Scalability Harry Xu and Zhiqiang Zuo

Static and dynamic verification Static and dynamic V&amp;V Software inspections Concerned

Static analysis of OpenAFS code base Cheyenne Wills OpenAFS 2019 Workshop Overview What is

A Brief Introduction to Static Analysis Sam Blackshear March 13, 2012 Outline A theoretical

Static Analysis of Haskell Neil Mitchell http://ndmitchell.com Static Analysis is getting

Static Code Analysis of Complex PHP Application Vulnerabilities Johannes Dahse Static Code

1 Static Equilibrium From Static Eq. to Dynamic Eq. System of mass points Static

STARTS: STARTS: STARTS: STARTS: STAtic STAtic Regression Test Selection Regression Test

static vs automatic storage classes Three types of memory allocations static storage class

Wrap Up Static, Packages, Exceptions Static methods // Example: // Java's built in Math class

Static and dynamic verification Software inspections Concerned with analysis of the static

CS7038 - Malware Analysis - Wk04.1 Static Analysis Introduction Coleman Kane kaneca@mail.uc.edu

Static Analysis for Secure Development Introduction Static analysis : What , and why ?

Whats Coverity static analysis ever done for us? Philip Withnall Endless Mobile

Introduction to Static Analysis for Assurance John Rushby Computer Science Laboratory SRI

Algorithmen fr die Echtzeitgrafik Algorithmen fr die Echtzeitgrafik Daniel Scherzer

ISC09, Pisa, Italy Outline Outline Contributions 2 Attacks Target Ciphers: RC6, ERC6 and

Predicates Reading: EC 1.4 Peter J. Haas INFO 150 Fall Semester 2019 Lecture 3 1/ 15

Can virtuous institutions crowd out selfish preferences in a market environment? Marco Faillo

Ubuntu Kernel Factory How we have Ubuntu kernels Ike Panhc &lt;ike.pan@canonical.com&gt;

Outline Multiprocessors Flynn taxonomy SIMD architectures Vector architectures MIMD

Web Application Security Payloads Andrs Riancho Director of Web Security BlackHat 2011 -

Advanced Block Cipher Design My crazy boss asked me to design a new block cipher. Whats next?

"Big Data" Perspective on Static Analysis Scalability Harry Xu and Zhiqiang Zuo

Static and dynamic verification Static and dynamic V&V Software inspections Concerned

Ubuntu Kernel Factory How we have Ubuntu kernels Ike Panhc <ike.pan@canonical.com>