carol v alexandru sebastiano panichella sebastian proksch
play

Carol V. Alexandru , Sebastiano Panichella, Sebastian Proksch, Harald - PowerPoint PPT Presentation

59th CREST Open Workshop Centre for Research on Evolution, Search and Testing University College London, London, United Kingdom Carol V. Alexandru , Sebastiano Panichella, Sebastian Proksch, Harald C. Gall Software Evolution and Architecture Lab


  1. 59th CREST Open Workshop Centre for Research on Evolution, Search and Testing University College London, London, United Kingdom Carol V. Alexandru , Sebastiano Panichella, Sebastian Proksch, Harald C. Gall Software Evolution and Architecture Lab University of Zurich, Switzerland {alexandru,panichella,proksch,gall}@ifi.uzh.ch 26.03.2018

  2. The Problem Domain • Static analysis (e.g. #Attr., McCabe, coupling...) 1

  3. The Problem Domain • Static analysis (e.g. #Attr., McCabe, coupling...) v0.7.0 v1.0.0 v1.3.0 v2.0.0 v3.0.0 v3.3.0 v3.5.0 2

  4. The Problem Domain • Static analysis (e.g. #Attr., McCabe, coupling...) • Many revisions, fine-grained historical data v0.7.0 v1.0.0 v1.3.0 v2.0.0 v3.0.0 v3.3.0 v3.5.0 2

  5. A Typical Analysis Process select project www clone 3

  6. A Typical Analysis Process select project www clone select revision checkout 3

  7. A Typical Analysis Process select project www clone select revision checkout Res apply store tool analysis Purpose-built, results language specific tool 3

  8. A Typical Analysis Process select project www clone more revisions? select revision checkout Res apply store tool analysis Purpose-built, results language specific tool 3

  9. A Typical Analysis Process select project more projects? www clone more revisions? select revision checkout Res apply store tool analysis Purpose-built, results language specific tool 3

  10. Redundancies all over... Redundancies in historical code analysis Impact on Code Study Tools 4

  11. Redundancies all over... Redundancies in historical code analysis Impact on Code Across Revisions Study Tools Few files change Only small parts of a file change 4

  12. Redundancies all over... Redundancies in historical code analysis Impact on Code Across Revisions Study Tools Repeated analysis Few files change of "known" code Only small parts of a file change 4

  13. Redundancies all over... Redundancies in historical code analysis Impact on Code Across Revisions Study Tools Repeated analysis Few files change of "known" code Only small parts of a file change Changes may not even affect results Storing redundant results 4

  14. Redundancies all over... Redundancies in historical code analysis Impact on Code Across Revisions Across Languages Study Tools Repeated analysis Each language has Few files change their own toolchain of "known" code Yet they share Only small parts of a file change many metrics Changes may not even affect results Storing redundant results 4

  15. Redundancies all over... Redundancies in historical code analysis Impact on Code Across Revisions Across Languages Study Tools Repeated analysis Each language has Few files change their own toolchain of "known" code Re-implementing Yet they share Only small parts of a file change identical analyses many metrics Changes may not Generalizability is even affect results expensive Storing redundant results 4

  16. Redundancies all over... Redundancies in historical code analysis Impact on Code Across Revisions Across Languages Study Tools Repeated analysis Each language has Few files change their own toolchain of "known" code Re-implementing Yet they share Only small parts of a file change identical analyses many metrics Changes may not Generalizability is even affect results expensive Storing redundant results 5

  17. #1: Avoid Checkouts

  18. Avoid checkouts clone 7

  19. Avoid checkouts clone checkout read write 7

  20. Avoid checkouts analyze clone read checkout read write 7

  21. Avoid checkouts analyze clone read checkout read write For every file: 2 read ops + 1 write op Checkout includes irrelevant files Need 1 CWD for every revision to be analyzed in parallel 7

  22. Avoid checkouts clone analyze read 8

  23. Avoid checkouts Only read relevant files in a single read op No write ops No overhead for parallization clone analyze read 8

  24. Avoid checkouts Only read relevant files in a single read op No write ops No overhead for parallization clone analyze Analysis Tool File Abstraction Layer Git read 8

  25. Avoid checkouts Only read relevant files in a single read op No write ops No overhead for parallization clone analyze Analysis Tool File Abstraction Layer Git read E.g. for the JDK Compiler: class JavaSourceFromCharrArray(name: String, val code: CharBuffer) extends SimpleJavaFileObject(URI.create("string:///" + name), Kind.SOURCE) { override def getCharContent(): CharSequence = code } 8

  26. Avoid checkouts Only read relevant files in a single read op No write ops No overhead for parallization clone analyze Analysis Tool File Abstraction Layer Git read E.g. for the JDK Compiler: class JavaSourceFromCharrArray(name: String, val code: CharBuffer) extends SimpleJavaFileObject(URI.create("string:///" + name), Kind.SOURCE) { override def getCharContent(): CharSequence = code } 9

  27. #2: Use a multi-revision representation of your sources

  28. Merge ASTs rev. 1 rev. 2 rev. 3 rev. 4 10

  29. Merge ASTs rev. 1 rev. 2 rev. 1 rev. 3 rev. 4 11

  30. Merge ASTs rev. 1 rev. 2 rev. 2 rev. 3 rev. 4 12

  31. Merge ASTs rev. 1 rev. 2 rev. 3 rev. 3 rev. 4 13

  32. Merge ASTs rev. 1 rev. 2 rev. 4 rev. 3 rev. 4 14

  33. Merge ASTs rev. 1 rev. 2 rev. 3 rev. 4 15

  34. Merge ASTs rev. 1 rev. range [1-4] rev. 2 rev. 3 rev. range [1-2] rev. 4 16

  35. Merge ASTs rev. 1 rev. 2 rev. 3 rev. 4 AspectJ (~440k LOC): 1 commit: 2.2M nodes All >7000 commits: 6.5M nodes 17

  36. Merge ASTs rev. 1 rev. 2 rev. 3 rev. 4 AspectJ (~440k LOC): 1 commit: 2.2M nodes All >7000 commits: 6.5M nodes 18

  37. Merge ASTs rev. 1 rev. 2 rev. 3 rev. 4 AspectJ (~440k LOC): 1 commit: 2.2M nodes All >7000 commits: 6.5M nodes 19

  38. #3: Store AST nodes only if they're needed for analysis

  39. public class Demo { public void run() { for (int i = 1; i< 100; i++) { What's the complexity (1+#forks) if (i % 3 == 0 || i % 5 == 0) { and name for each method and System.out.println(i) }} class? } } 20

  40. public class Demo { public void run() { for (int i = 1; i< 100; i++) { What's the complexity (1+#forks) if (i % 3 == 0 || i % 5 == 0) { and name for each method and System.out.println(i) }} class? } } parse 140 AST nodes (using ANTLR) 20

  41. public class Demo { public void run() { for (int i = 1; i< 100; i++) { What's the complexity (1+#forks) if (i % 3 == 0 || i % 5 == 0) { and name for each method and System.out.println(i) }} class? } } parse CompilationUnit TypeDeclaration Members Name Modifiers Method Demo public Body Parameters Name Modifiers ReturnType ... Statements run public PrimitiveType ... VOID 140 AST nodes (using ANTLR) 20

  42. public class Demo { public void run() { for (int i = 1; i< 100; i++) { What's the complexity (1+#forks) if (i % 3 == 0 || i % 5 == 0) { and name for each method and System.out.println(i) }} class? } } parse filtered parse TypeDeclaration Method Name Name ForStatement Demo IfStatement run ConditionalExpression 140 AST nodes 7 AST nodes (using ANTLR) (using ANTLR) 21

  43. public class Demo { public void run() { for (int i = 1; i< 100; i++) { What's the complexity (1+#forks) if (i % 3 == 0 || i % 5 == 0) { System.out.println(i) and name for eachmethod and } class? } } parse filtered parse TypeDeclaration Method Name Name ForStatement Demo IfStatement run ConditionalExpression 140 AST nodes 7 AST nodes (using ANTLR) (using ANTLR) 22

  44. public class Demo { public void run() { for (int i = 1; i< 100; i++) { What's the complexity (1+#forks) if (i % 3 == 0 || i % 5 == 0) { System.out.println(i) and name for eachmethod and } class? } } parse filtered parse TypeDeclaration Method Name Name ForStatement Demo IfStatement run ConditionalExpression 140 AST nodes 7 AST nodes (using ANTLR) (using ANTLR) 23

  45. #4: Use non-duplicative data structures to store your results

  46. rev. 1 rev. 2 rev. 3 rev. 4 24

  47. rev. 1 rev. 2 rev. 3 rev. 4 24

  48. rev. 1 rev. 2 rev. 3 [1-1] [2-3] [4-4] rev. 4 label label label InnerClass #attr #attr #attr 0 4 mcc mcc mcc 1 2 4 24

  49. rev. 1 rev. 2 rev. 3 [1-1] [2-3] [4-4] rev. 4 label label label InnerClass #attr #attr #attr 0 4 mcc mcc mcc 1 2 4 25

  50. LISA also does: #5: Parallel Parsing #6: Asynchronous graph computation #7: Generic graph computations applying to ASTs from compatible languages 26

  51. A light-weight view on multi-language analysis

  52. Typical solutions • Toolchains / Frameworks • Integrate language-specific tooling • Lots of engineering required • Meta-models • Translate language code to some common representation • Significant overhead / rigid models 52

  53. Structure matters most • Complexity? if (true) { if (true) { } if (true) { } if (true) { } } # CYCLO: 3 # CYCLO: 4 • # of Functions / Attributes etc. • Coupling between Classes • Call graphs 53

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend