learning a static analyzer from data
play

Learning a Static Analyzer from Data by Pavol Bielik, Veselin - PowerPoint PPT Presentation

Learning a Static Analyzer from Data by Pavol Bielik, Veselin Raychev, and Martin Vechev Daniel Perez The University of Tokyo January 29, 2018 Static analyzers Writing a static analyzer is hard JavaScript points-to sample global.length = 4;


  1. Learning a Static Analyzer from Data by Pavol Bielik, Veselin Raychev, and Martin Vechev Daniel Perez The University of Tokyo January 29, 2018

  2. Static analyzers Writing a static analyzer is hard JavaScript points-to sample global.length = 4; var dat = [5, 3, 9, 1]; function isBig(value) { return value >= this .length; } dat.filter(isBig); dat.filter(isBig, 42); dat.filter(isBig, dat); 2 Many corner cases → many handcrafted rules FlowJS type checking core is ~12 , 000 lines of ML

  3. Sample learned analyzer object s i t e value else // 2nd arg is a primitive else object global undefined then i s I d e n t i f i e r then i s We would like to automatically learn such rules while avoiding overfitting the else if 2nd argument global dat.filter(isBig, 42); training data // points to global dat.filter(isBig); // points to boxed 42 // points to dat object dat.filter(isBig, dat); Array . prototype . f i l t e r : : = if c a l l e r has one argument then 3 points − to if 2nd argument points − to points − to 2nd argument points − to new allocation

  4. Overview System takes dataset and rules as input and outputs analysis 4

  5. Model input var b = which enables to model | Language template is Rules description language Analysis result a = b ; Dataset { } ; // object s 0 Sample input x Example sample • y is the analysis result • x is an input program where 5 {( x i , y i )} N System takes a dataset D = i = 1 ⟨ Action ⟩ ::= action on AST ⟨ Guard ⟩ ::= condition ⟨ Prog ⟩ ::= ⟨ Action ⟩ ‘ if ’ ⟨ Guard ⟩ ‘ then ’ ⟨ Prog ⟩ ‘ else ’ ⟨ Prog ⟩ y = { ( a → { s 0 } ) }

  6. Analyzer properties Precise the goal is to minimize otherwise 0 1 We want the analyzer pa to be sound and precise Given 6 i.e. sound on the dataset Analzyer pa is sound if but too hard to proof, instead Sound  ∀ p ∈ T L , α ([ [ p ] ]) ⊑ pa ( p ) y ̸ = pa ( x )  r ( x , y , pa ) =  ∀ i ∈ 1 . . . N , y i ⊑ pa ( x i ) ∑ cost ( D , pa ) = r ( x , y , pa ) ( x , y ) ∈D

  7. Learning algorithm Information gain • Greedy, locally optimal Algorithm properties w a best w a best w a best Learning procedure based on ID3 d w a best IG is information gain: difference of entropy 7 return a best procedure Synthetize( D ) {( x i , y i )} N Input : Dataset D = i = 1 Output : Program pa ∈ L = ⟨ r ( x i , y i , a best ) | i ∈ 1 . . . | d |⟩ a best ← arg min a ∈ Actions cost ( D , a ) if cost ( D , a ) = 0 then − |D g | ( ) ( ) IG a best ( D , g ) = H D D g |D| H − |D ¬ g | ⊤ IG a best ( D , g ) g best ← arg max g ∈ Guards ( ) D ¬ g | D | H if g best = ⊥ then return Approximate( D ) p 1 ← Synthetize ( { ( x , y ) ∈ D| g best ( x ) } ) p 2 ← Synthetize ( { ( x , y ) ∈ D|¬ g best ( x ) } ) • Sound on D iif. Approximate is sound return ( if g best then p 1 else p 2 )

  8. Oracle — counter-example generator Goal a = b; var c = 1; var b = {}; Counter-example else then y if there is VarDecl:x(y) then y if y is VarDecl:y preceding x Overfitted analysis a = b; var b = {}; Sample input Example • Non-semantic preserving ( Global jump ) • Semantic preserving (Equivalence Modulo Abstraction, EMA) Modification types • Random search too slow 8 Find counter-example ( x , y ) st. pa ( x ) ̸ = y in reasonable time • Prioritize modifications affecting execution path of pa ( x ) ⊥

  9. Evaluation Adding dead code Side-Effect Free expressions Changing constants Renaming user functions Adding method parameters Renaming variables Adding method arguments F gj Overview F ema Program modifications • Input programs from ECMAScript conformance suite (~15000 samples) • Site-call allocation analysis • Points-to analysis subset ( this points-to) • Learned 2 analyzers 9

  10. Points-to analysis has one argument then s i t e value else // 2nd arg is a primitive else object global undefined then i s I d e n t i f i e r then i s else if 2nd argument object Goal global if c a l l e r // points to boxed 42 Learn this points-to rules, a function f st. Example // points to global dat.filter(isBig); 10 dat.filter(isBig, 42); // points to dat object dat.filter(isBig, dat); Array . prototype . f i l t e r : : = VarPointsTo ( v 2 , h ) v 2 = f ( this ) VarPointsTo ( this , h ) points − to if 2nd argument points − to points − to 2nd argument points − to new allocation

  11. Points-to analysis rules description language Generate actions with programs up to size 5 and branches programs up to size 6 (5 moves and 1 write) 11 ⟨ MoveCore ⟩ ::= Up | Left | Right | DownFirst | DownLast | Top ⟨ MoveJS ⟩ ::= GoToGlobal | GoToUndef | GoToNull | GoToThis | UpUntilFunc ⟨ Move ⟩ ::= ⟨ MoveCore ⟩ | ⟨ MoveJS ⟩ | GoToCaller ⟨ Write ⟩ ::= WriteValue | WritePos | WriteType | HasLeft | HasRight | HasChild ⟨ Action ⟩ ::= ϵ | ⟨ Move ⟩ ⟨ Action ⟩ ⟨ Guard ⟩ ::= ϵ | ⟨ Move ⟩ ⟨ Guard ⟩ | ⟨ Write ⟩ ⟨ Guard ⟩ ⟨ Context ⟩ ::= ϵ | ( N ∪ Σ ∪ N ) ⟨ Context ⟩ ⟨ Prog ⟩ ::= ϵ | ⟨ Action ⟩ | ‘ if ’ ⟨ Guard ⟩ ‘ = ’ ⟨ Context ⟩ ‘ then ’ ⟨ Prog ⟩ ‘ else ’ ⟨ Prog ⟩

  12. Points-to analysis results map 73 53 find 177 604 forEach 82 229 some 64 Function Name 315 Array.prototype 372 Dataset Size Counter-examples Found Function.prototype call 26 apply 6 182 12 Analysis Size ∗ 97 ( 18 ) 54 ( 10 ) 36 ( 6 ) 36 ( 6 ) 35 ( 5 ) 36 ( 6 ) ∗ Number of instructions in L pt (Number of if branches)

  13. Allocation analysis Goal Learn a allocation site analysis function f st. Results • 34721 input/output samples • 135 branches generated • 905 counter examples found 13 f ( l ) = true AllocSite ( l ) • learned tricky cases — e.g. new Object(obj)

  14. Summary • New approach to learn static analyzer from data • Algorithm to learn analyzer from dataset and inference rules • Oracle to quickly generate counter-examples, avoiding overfitting • Learned tricky rules for JavaScript points-to and site-allocation analysis 14

  15. References P. Bielik, V. Raychev, and M. T. Vechev, “Learning a static analyzer from data,” CoRR , vol. abs/1611.01752, 2016. [Online]. Available: http://arxiv.org/abs/1611.01752 J. R. Quinlan, “Induction of decision trees,” Mach. Learn. , vol. 1, no. 1, pp. 81–106, Mar. 1986. [Online]. Available: http://dx.doi.org/10.1023/A:1022643204877 15

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend