Learning a Static Analyzer from Data by Pavol Bielik, Veselin - PowerPoint PPT Presentation

Learning a Static Analyzer from Data by Pavol Bielik, Veselin Raychev, and Martin Vechev Daniel Perez The University of Tokyo January 29, 2018

Static analyzers Writing a static analyzer is hard JavaScript points-to sample global.length = 4; var dat = [5, 3, 9, 1]; function isBig(value) { return value >= this .length; } dat.filter(isBig); dat.filter(isBig, 42); dat.filter(isBig, dat); 2 Many corner cases → many handcrafted rules FlowJS type checking core is ~12 , 000 lines of ML

Sample learned analyzer object s i t e value else // 2nd arg is a primitive else object global undefined then i s I d e n t i f i e r then i s We would like to automatically learn such rules while avoiding overfitting the else if 2nd argument global dat.filter(isBig, 42); training data // points to global dat.filter(isBig); // points to boxed 42 // points to dat object dat.filter(isBig, dat); Array . prototype . f i l t e r : : = if c a l l e r has one argument then 3 points − to if 2nd argument points − to points − to 2nd argument points − to new allocation

Overview System takes dataset and rules as input and outputs analysis 4

Model input var b = which enables to model | Language template is Rules description language Analysis result a = b ; Dataset { } ; // object s 0 Sample input x Example sample • y is the analysis result • x is an input program where 5 {( x i , y i )} N System takes a dataset D = i = 1 ⟨ Action ⟩ ::= action on AST ⟨ Guard ⟩ ::= condition ⟨ Prog ⟩ ::= ⟨ Action ⟩ ‘ if ’ ⟨ Guard ⟩ ‘ then ’ ⟨ Prog ⟩ ‘ else ’ ⟨ Prog ⟩ y = { ( a → { s 0 } ) }

Analyzer properties Precise the goal is to minimize otherwise 0 1 We want the analyzer pa to be sound and precise Given 6 i.e. sound on the dataset Analzyer pa is sound if but too hard to proof, instead Sound  ∀ p ∈ T L , α ([ [ p ] ]) ⊑ pa ( p ) y ̸ = pa ( x )  r ( x , y , pa ) =  ∀ i ∈ 1 . . . N , y i ⊑ pa ( x i ) ∑ cost ( D , pa ) = r ( x , y , pa ) ( x , y ) ∈D

Learning algorithm Information gain • Greedy, locally optimal Algorithm properties w a best w a best w a best Learning procedure based on ID3 d w a best IG is information gain: difference of entropy 7 return a best procedure Synthetize( D ) {( x i , y i )} N Input : Dataset D = i = 1 Output : Program pa ∈ L = ⟨ r ( x i , y i , a best ) | i ∈ 1 . . . | d |⟩ a best ← arg min a ∈ Actions cost ( D , a ) if cost ( D , a ) = 0 then − |D g | ( ) ( ) IG a best ( D , g ) = H D D g |D| H − |D ¬ g | ⊤ IG a best ( D , g ) g best ← arg max g ∈ Guards ( ) D ¬ g | D | H if g best = ⊥ then return Approximate( D ) p 1 ← Synthetize ( { ( x , y ) ∈ D| g best ( x ) } ) p 2 ← Synthetize ( { ( x , y ) ∈ D|¬ g best ( x ) } ) • Sound on D iif. Approximate is sound return ( if g best then p 1 else p 2 )

Oracle — counter-example generator Goal a = b; var c = 1; var b = {}; Counter-example else then y if there is VarDecl:x(y) then y if y is VarDecl:y preceding x Overfitted analysis a = b; var b = {}; Sample input Example • Non-semantic preserving ( Global jump ) • Semantic preserving (Equivalence Modulo Abstraction, EMA) Modification types • Random search too slow 8 Find counter-example ( x , y ) st. pa ( x ) ̸ = y in reasonable time • Prioritize modifications affecting execution path of pa ( x ) ⊥

Evaluation Adding dead code Side-Effect Free expressions Changing constants Renaming user functions Adding method parameters Renaming variables Adding method arguments F gj Overview F ema Program modifications • Input programs from ECMAScript conformance suite (~15000 samples) • Site-call allocation analysis • Points-to analysis subset ( this points-to) • Learned 2 analyzers 9

Points-to analysis has one argument then s i t e value else // 2nd arg is a primitive else object global undefined then i s I d e n t i f i e r then i s else if 2nd argument object Goal global if c a l l e r // points to boxed 42 Learn this points-to rules, a function f st. Example // points to global dat.filter(isBig); 10 dat.filter(isBig, 42); // points to dat object dat.filter(isBig, dat); Array . prototype . f i l t e r : : = VarPointsTo ( v 2 , h ) v 2 = f ( this ) VarPointsTo ( this , h ) points − to if 2nd argument points − to points − to 2nd argument points − to new allocation

Points-to analysis rules description language Generate actions with programs up to size 5 and branches programs up to size 6 (5 moves and 1 write) 11 ⟨ MoveCore ⟩ ::= Up | Left | Right | DownFirst | DownLast | Top ⟨ MoveJS ⟩ ::= GoToGlobal | GoToUndef | GoToNull | GoToThis | UpUntilFunc ⟨ Move ⟩ ::= ⟨ MoveCore ⟩ | ⟨ MoveJS ⟩ | GoToCaller ⟨ Write ⟩ ::= WriteValue | WritePos | WriteType | HasLeft | HasRight | HasChild ⟨ Action ⟩ ::= ϵ | ⟨ Move ⟩ ⟨ Action ⟩ ⟨ Guard ⟩ ::= ϵ | ⟨ Move ⟩ ⟨ Guard ⟩ | ⟨ Write ⟩ ⟨ Guard ⟩ ⟨ Context ⟩ ::= ϵ | ( N ∪ Σ ∪ N ) ⟨ Context ⟩ ⟨ Prog ⟩ ::= ϵ | ⟨ Action ⟩ | ‘ if ’ ⟨ Guard ⟩ ‘ = ’ ⟨ Context ⟩ ‘ then ’ ⟨ Prog ⟩ ‘ else ’ ⟨ Prog ⟩

Points-to analysis results map 73 53 find 177 604 forEach 82 229 some 64 Function Name 315 Array.prototype 372 Dataset Size Counter-examples Found Function.prototype call 26 apply 6 182 12 Analysis Size ∗ 97 ( 18 ) 54 ( 10 ) 36 ( 6 ) 36 ( 6 ) 35 ( 5 ) 36 ( 6 ) ∗ Number of instructions in L pt (Number of if branches)

Allocation analysis Goal Learn a allocation site analysis function f st. Results • 34721 input/output samples • 135 branches generated • 905 counter examples found 13 f ( l ) = true AllocSite ( l ) • learned tricky cases — e.g. new Object(obj)

Summary • New approach to learn static analyzer from data • Algorithm to learn analyzer from dataset and inference rules • Oracle to quickly generate counter-examples, avoiding overfitting • Learned tricky rules for JavaScript points-to and site-allocation analysis 14

References P. Bielik, V. Raychev, and M. T. Vechev, “Learning a static analyzer from data,” CoRR , vol. abs/1611.01752, 2016. [Online]. Available: http://arxiv.org/abs/1611.01752 J. R. Quinlan, “Induction of decision trees,” Mach. Learn. , vol. 1, no. 1, pp. 81–106, Mar. 1986. [Online]. Available: http://dx.doi.org/10.1023/A:1022643204877 15

Learning a Static Analyzer from Data by Pavol Bielik, Veselin - PowerPoint PPT Presentation

Learning a Static Analyzer from Data by Pavol Bielik, Veselin Raychev, and Martin Vechev Daniel Perez The University of Tokyo January 29, 2018 Static analyzers Writing a static analyzer is hard JavaScript points-to sample global.length = 4;

Developing the Clang Static Analyzer Artem Dergachev, Apple Clang Static Analyzer Finds bugs

Learning a Static Analyzer from Data Pavol Bielik Veselin Raychev Martin Vechev Department of

Infrared Gas Analyzer - component analyzer - component analyzer Type: ZRJ Standard type Type:

BC-5300 Auto Hematology Analyzer Satisfaction in test BC-5300 Auto Hematology Analyzer The new

BC-5380 Auto Hematology Analyzer Satisfaction in test BC-5380 Auto Hematology Analyzer The new

Formal verification of a static analyzer: abstract interpretation in type theory Xavier Leroy

Faster, Stronger C++ Analysis with the Clang Static Analyzer George Karpenkov, Apple Artem

Summary-based inter-unit analysis for Clang Static Analyzer Aleksei Sidorin 2016-11-01 . S

Static and Method Overloading static One per class, not per object static variables

MPI-Checker Static Analysis for MPI Alexander Droste, Michael Kuhn, Thomas Ludwig November

Using the Clang Static Analyzer Vince Bridgers About this tutorial Soup to nuts

Static and dynamic verification Static and dynamic V&V Software inspections Concerned

FC80 Free Chlorine Analyzer E LECTRO- C HEMICAL D EVICES FC80 System Configuration Free

PSL Analyzer Products PQube 3e PQube 3v PQube 3 PQube 3 - the best power analyzer PQube 3 is

Hand-held laser analyzer chemical composition metals and alloys Elemental Laser AN ANalyzer

valgrind code analyzer Valgrind is another injection-based profiler/analyzer Can be used to

Background Program must be brought into memory and placed within a process for it to be run.

Project 1: -allocator Computation structures October 9, 2018 Memory allocation Static

CS 309 Autonomous Intelligent Robotics Guest Instructor: Justin W. Hart http://justinhart.net

Run-time Environments Status We have so far covered the front-end phases Lexical

CS-527 Software Security Bug finding techniques Asst. Prof. Mathias Payer Department of Computer

Static analysis of OpenAFS code base Cheyenne Wills OpenAFS 2019 Workshop Overview What is

Static Analyzer Non-Comprehensive Overview Dr Christopher Jones HOW 2019 21 March 2019 This

Outline Static Analysis: Overview, Syntactic Analysis and Abstract Interpretation Overview

Learning a Static Analyzer from Data by Pavol Bielik, Veselin - PowerPoint PPT Presentation

Learning a Static Analyzer from Data by Pavol Bielik, Veselin Raychev, and Martin Vechev Daniel Perez The University of Tokyo January 29, 2018 Static analyzers Writing a static analyzer is hard JavaScript points-to sample global.length = 4;

Developing the Clang Static Analyzer Artem Dergachev, Apple Clang Static Analyzer Finds bugs

Learning a Static Analyzer from Data Pavol Bielik Veselin Raychev Martin Vechev Department of

Infrared Gas Analyzer - component analyzer - component analyzer Type: ZRJ Standard type Type:

BC-5300 Auto Hematology Analyzer Satisfaction in test BC-5300 Auto Hematology Analyzer The new

BC-5380 Auto Hematology Analyzer Satisfaction in test BC-5380 Auto Hematology Analyzer The new

Formal verification of a static analyzer: abstract interpretation in type theory Xavier Leroy

Faster, Stronger C++ Analysis with the Clang Static Analyzer George Karpenkov, Apple Artem

Summary-based inter-unit analysis for Clang Static Analyzer Aleksei Sidorin 2016-11-01 . S

Static and Method Overloading static One per class, not per object static variables

MPI-Checker Static Analysis for MPI Alexander Droste, Michael Kuhn, Thomas Ludwig November

Using the Clang Static Analyzer Vince Bridgers About this tutorial Soup to nuts

Static and dynamic verification Static and dynamic V&amp;V Software inspections Concerned

FC80 Free Chlorine Analyzer E LECTRO- C HEMICAL D EVICES FC80 System Configuration Free

PSL Analyzer Products PQube 3e PQube 3v PQube 3 PQube 3 - the best power analyzer PQube 3 is

Hand-held laser analyzer chemical composition metals and alloys Elemental Laser AN ANalyzer

valgrind code analyzer Valgrind is another injection-based profiler/analyzer Can be used to

Background Program must be brought into memory and placed within a process for it to be run.

Project 1: -allocator Computation structures October 9, 2018 Memory allocation Static

CS 309 Autonomous Intelligent Robotics Guest Instructor: Justin W. Hart http://justinhart.net

Run-time Environments Status We have so far covered the front-end phases Lexical

CS-527 Software Security Bug finding techniques Asst. Prof. Mathias Payer Department of Computer

Static analysis of OpenAFS code base Cheyenne Wills OpenAFS 2019 Workshop Overview What is

Static Analyzer Non-Comprehensive Overview Dr Christopher Jones HOW 2019 21 March 2019 This

Outline Static Analysis: Overview, Syntactic Analysis and Abstract Interpretation Overview

Static and dynamic verification Static and dynamic V&V Software inspections Concerned