learning a variable clustering strategy for octagon from
play

Learning a Variable-Clustering Strategy for Octagon from Labeled - PowerPoint PPT Presentation

1 Learning a Variable-Clustering Strategy for Octagon from Labeled Data Generated by a Static Analysis Kihong Heo 1 , Hakjoo Oh 2 , Hongseok Yang 3 Seoul National University 1 Korea University 2 University of Oxford 3 SAS 2016 @Edinburgh 2


  1. 1 Learning a Variable-Clustering Strategy for Octagon from Labeled Data Generated by a Static Analysis Kihong Heo 1 , Hakjoo Oh 2 , Hongseok Yang 3 Seoul National University 1 Korea University 2 University of Oxford 3 SAS 2016 @Edinburgh

  2. 2 Long Term Goal • Self-evolving static analysis by learning big data • data : similar codes, old versions, user-feedbacks, bug reports, test results, etc • mature in other fields : … + Big Data Static Analyzer

  3. 3 soundness scalability precision soundness scalability precision Long Term Goal F ∈ Pgm × Π → A • Finding a good abstraction for adaptive static analysis • Machine Learning (learner) + Static Analysis (teacher) • e.g.) relation , context, flow, etc

  4. ∞ b i ∞ 0 ∞ ∞ c ∞ ∞ 0 ∞ ∞ ∞ ∞ ∞ 0 a i c b a 4 0 ∞ Relational Analysis • Tracking relationships among variables • e.g.) octagon analysis : ( ± x ) − ( ± y ) ≤ c In int a = b; 1 int c = input(); // User input 2 for (i = 0; i < b; i++) { 3 assert (i < a); // Query 1 4 assert (i < c); // Query 2 5 we } 6 *Consider x-y ≤ c only, 
 for simplicity {a, b, c, i}

  5. ∞ b i ∞ 0 ∞ ∞ c ∞ ∞ 0 0 ∞ ∞ ∞ 0 0 a i c b a 5 0 ∞ Relational Analysis • Tracking relationships among variables • e.g.) octagon analysis : ( ± x ) − ( ± y ) ≤ c In int a = b; 1 int c = input(); // User input 2 for (i = 0; i < b; i++) { 3 assert (i < a); // Query 1 4 assert (i < c); // Query 2 5 we } 6 b - a ≤ 0 a - b ≤ 0 {a, b, c, i}

  6. ∞ b i ∞ 0 ∞ ∞ c ∞ ∞ 0 0 ∞ ∞ ∞ 0 0 a i c b a 6 0 ∞ Relational Analysis • Tracking relationships among variables • e.g.) octagon analysis : ( ± x ) − ( ± y ) ≤ c In int a = b; 1 int c = input(); // User input 2 for (i = 0; i < b; i++) { 3 assert (i < a); // Query 1 4 assert (i < c); // Query 2 5 we } 6 c - a ≤ ∞ c - b ≤ ∞ a - c ≤ ∞ b - c ≤ ∞ {a, b, c, i}

  7. ∞ b i ∞ 0 ∞ ∞ c -1 ∞ 0 0 ∞ ∞ ∞ 0 0 a i c b a 7 0 ∞ Relational Analysis • Tracking relationships among variables • e.g.) octagon analysis : ( ± x ) − ( ± y ) ≤ c In int a = b; 1 int c = input(); // User input 2 for (i = 0; i < b; i++) { 3 assert (i < a); // Query 1 4 assert (i < c); // Query 2 5 we } 6 i - b ≤ -1 {a, b, c, i}

  8. ∞ b i ∞ 0 ∞ ∞ c -1 ∞ 0 0 -1 ∞ ∞ 0 0 a i c b a 8 0 ∞ Relational Analysis • Tracking relationships among variables • e.g.) octagon analysis : ( ± x ) − ( ± y ) ≤ c In int a = b; 1 int c = input(); // User input 2 for (i = 0; i < b; i++) { 3 assert (i < a); // Query 1 4 assert (i < c); // Query 2 5 we } 6 i - a ≤ -1 {a, b, c, i}

  9. ∞ b i ∞ 0 ∞ ∞ c -1 ∞ 0 0 -1 ∞ ∞ 0 0 a i c b a 9 0 ∞ Relational Analysis • Tracking relationships among variables • e.g.) octagon analysis : ( ± x ) − ( ± y ) ≤ c In int a = b; 1 int c = input(); // User input 2 for (i = 0; i < b; i++) { 3 assert (i < a); // Query 1 4 assert (i < c); // Query 2 5 we } 6 i - c ≤ ∞ {a, b, c, i}

  10. ∞ b i ∞ 0 ∞ ∞ c -1 ∞ 0 0 -1 ∞ ∞ 0 0 a i c b a 10 0 ∞ Relational Analysis • Tracking relationships among variables • e.g.) octagon analysis : ( ± x ) − ( ± y ) ≤ c In int a = b; 1 int c = input(); // User input 2 for (i = 0; i < b; i++) { 3 assert (i < a); // Query 1 4 assert (i < c); // Query 2 5 we } 6 Do we need c? {a, b, c, i}

  11. ∞ ∞ 0 11 a b i a 0 0 -1 b 0 0 -1 i Selective Relational Analysis • Selectively tracking relationships among variables • within the same cluster In int a = b; 1 int c = input(); // User input 2 for (i = 0; i < b; i++) { 3 assert (i < a); // Query 1 4 assert (i < c); // Query 2 5 we } 6 + - ∞ ≤ c ≤ + ∞ {a,b,i} {c}

  12. 12 PLDI’14 Previous Solution • Variable clustering by impact pre-analysis • estimating the impact of relationships • more scalable than the baseline Octagon analysis • more scalable & precise than other clustering methods

  13. 13 PLDI’14 Problem • Variable clustering by impact pre-analysis • fully relational pre-analysis as an online estimator • e.g.) 17 open source benchmarks (~100KLOC) Time Var.Clustering Main�Analysis 98% [PLDI’14] 0 10000 20000 30000 40000

  14. 14 This Work New Solution • Learning a variable-clustering strategy from big data • fully relational pre-analysis as an offline teacher • 33x faster yet similarly precise Time Var.Clustering Main�Analysis [PLDI’14] [ML-based] 0 10000 20000 30000 40000

  15. Classifier 15 Big Picture • Learning a variable-clustering strategy from big data Static Analysis Machine Learning Training Data Codebase (Var. relationship) Variable Clustering П Target Results Clusters Program (Var. Relationship)

  16. Classifier 16 Big Picture • Learning a variable-clustering strategy from big data Static Analysis Machine Learning Training Data Codebase (Var. relationship) Variable Clustering П Target Results Clusters Program (Var. Relationship)

  17. i -1 c b a 0 ∞ -1 b 0 0 ∞ c a ∞ ∞ 0 ∞ i ∞ ∞ ∞ 17 0 0 Training Data • Pairs of two variables with label { ⊕ , ⊖ } • ⊕ : precise (< + ∞ ), ⊖ : imprecise (= + ∞ ) In int a = b; 1 int c = input(); // User input 2 for (i = 0; i < b; i++) { 3 assert (i < a); // Query 1 4 assert (i < c); // Query 2 5 we } 6 ⊕ : {(a,b), (a,i), (b,a) …} 
 ⊖ : {(a,c), (b,c), (c,a) …} Octagon Analysis

  18. 0 a ∞ ∞ a b c i 0 T 0 ∞ -1 b 0 c 0 T -1 a 0 18 a b c i ∞ i ∞ ∞ i c T ∞ T ∞ Training Data • Automatically generated by impact pre-analysis[PLDI’14] • fully relational, yet more scalable than the full octagon In int a = b; 1 int c = input(); // User input 2 for (i = 0; i < b; i++) { 3 assert (i < a); // Query 1 4 assert (i < c); // Query 2 5 we } 6 γ ( F ) = Z γ ( > ) = Z [ { + 1 } T ★ ★ ★ b ★ ★ T ★ ⊕ : {(a,b), (a,i), (b,a) …} 
 T ★ ⊖ : {(a,c), (b,c), (c,a) …} T ★ Octagon Analysis Impact Pre-analysis

  19. Classifier 19 Big Picture • Learning a variable-clustering strategy from big data Static Analysis Machine Learning Training Data Codebase (Var. relationship) Variable Clustering П Target Results Clusters Program (Var. Relationship)

  20. (General semantic features) (Negative situations for Octagon) (General syntactic features) 20 (Positive situations for Octagon) Features • 30 Features of variable pairs • boolean predicate of (x,y) in program P - x=y+k � or � y=x+k - x or y is a field - x<=y+k � or � y<=x+k - x and y represent sizes of arrays - x=malloc(y) � or � y=malloc(x) - x or y is the size of a const string - x[y] or y[x] - x or y is a global variable - …� - … - x=cy or � y=cx (c �!=�1)� - x or y has a finite interval - x=yz � or � y=xz - x or y is a local var in a recursive function - x=y/z � or � y=x/z - x, y are not accessed in the same function - … - …

  21. (General syntactic features) 21 *Top 5 most important features (Positive situations for Octagon) (Negative situations for Octagon) (General semantic features) Features • Importance of features by Gini Index • negative & general > positive & domain-specific - x=y+k � or � y=x+k - x or y is a field - x<=y+k � or � y<=x+k - x and y represent sizes of arrays - x=malloc(y) � or � y=malloc(x) - x or y is the size of a const string - x[y] or y[x] - x or y is a global variable - …� - … - x=cy or � y=cx (c �!=�1)� - x or y has a finite interval - x=yz � or � y=xz - x or y is a local var in a recursive function - x=y/z � or � y=x/z - x, y are not accessed in the same function - … - …

  22. 22 Classifier • Learning a binary classifier C : Var ⇥ Var ! { � , } • using an off-the-shelf ML algorithm: decision tree • Why decision tree? • more expressive than linear models • e.g.) Octagon with logistic regression : 10~12x slower

  23. Classifier 23 Big Picture • Learning a variable-clustering strategy from big data Static Analysis Machine Learning Training Data Codebase (Var. relationship) Variable Clustering П Target Results Clusters Program (Var. Relationship)

  24. … C(x,y) … 24 c i b a ⊖ (a,c) ⊕ (b,i) ⊖ (a,i) ⊕ (a,b) Clustering Strategy • ⊕ -marked variable pairs in the same cluster • naturally covers transitive relationships In int a = b; 1 int c = input(); // User input 2 for (i = 0; i < b; i++) { 3 assert (i < a); // Query 1 4 assert (i < c); // Query 2 5 we } 6 ⊕ ⊕

  25. 25 Experiments • Implemented on top of • sound & global analyzer • a buffer overrun detector for full C • 17 open source benchmarks (~100KLOC)

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend