Generating Precise Dependencies for Large Software Pei Wang, Jinqiu - - PowerPoint PPT Presentation

generating precise dependencies for large software
SMART_READER_LITE
LIVE PREVIEW

Generating Precise Dependencies for Large Software Pei Wang, Jinqiu - - PowerPoint PPT Presentation

Generating Precise Dependencies for Large Software Generating Precise Dependencies for Large Software Pei Wang, Jinqiu Yang, Lin Tan University of Waterloo Robert Kroeger, David Morgenthaler Google Inc. P. Wang (UWaterloo) 1 / 13 Generating


slide-1
SLIDE 1

Generating Precise Dependencies for Large Software

Generating Precise Dependencies for Large Software

Pei Wang, Jinqiu Yang, Lin Tan University of Waterloo Robert Kroeger, David Morgenthaler Google Inc.

  • P. Wang (UWaterloo)

1 / 13

slide-2
SLIDE 2

Generating Precise Dependencies for Large Software

Code Base Size is Growing

Mozilla Firefox Code Base Size (2010-2013)† Chromium (Google Chrome) Code Base Size (2010-2013)†

† Data from Ohloh

  • P. Wang (UWaterloo)

2 / 13

slide-3
SLIDE 3

Generating Precise Dependencies for Large Software

Software Complexity is Increasing

webkit v8_base glue net base ui content_common ipc renderer

Dependencies between Some Key Components of Chromium By December 2012, Chromium (svn-171054) has 238 modules.

  • P. Wang (UWaterloo)

3 / 13

slide-4
SLIDE 4

Generating Precise Dependencies for Large Software

Technical Debt Caused by Increasing Structural Complexity

Technical Debt in Software Development Compromises made for short term benefits (meeting product release deadline, etc.) but hurting long term maintainability of the software Two Kinds of Bad Dependencies Inconsistent Dependency: dependencies violating software design Underutilized Dependency: only a small portion of a target module is utilized by a client module Bad Dependencies Tell Us About Modularity Violation Loosely Coupled Components & Useless Code Refactoring Cost

  • P. Wang (UWaterloo)

4 / 13

slide-5
SLIDE 5

Generating Precise Dependencies for Large Software

Light-Weight Dependency Analysis is Not Enough

Light-Weight Analysis Techniques Pattern Matching Abstract Syntax Tree Based Analysis Challenges in Large-Scale C++ Dependency Analysis Function/Operator overloading and default parameters Non-standard language syntax Implicit call sites Templates

  • P. Wang (UWaterloo)

5 / 13

slide-6
SLIDE 6

Generating Precise Dependencies for Large Software

Tool Design Overview

LLVM Compiler configuration source code IR Analyzer LLVM IR Post Processor grouping strategy symbol-level dependencies module-level dependencies

Workflow

1

Compile C/C++ source into LLVM Intermediate Representation (IR).

2

Extract symbol-level dependencies from LLVM IR instructions.

3

Group symbol-level dependencies to get module-level dependencies.

  • P. Wang (UWaterloo)

6 / 13

slide-7
SLIDE 7

Generating Precise Dependencies for Large Software

Step 2: Symbol-Level Dependency Extraction

Obtain symbol references by traversing LLVM IR instruction. Resolve symbol linkage through a mock linking process.

Example: Non-Standard Syntax Support

chromium/src/content/zygote/zygote main linux.cc:182: struct tm* localtime override(const time t* timep) asm ("localtime");

C++ Code

  • bj.target/content browser/content/zygote/zygote main linux.o:

define %struct.tm* localtime(i64* %timep) nounwind uwtable

LLVM IR

  • P. Wang (UWaterloo)

7 / 13

slide-8
SLIDE 8

Generating Precise Dependencies for Large Software

Step 3: Module-Level Dependency Analysis

Group symbols into modules:

The grouping strategy can simply be the build configuration of the software and allows user customization. Target-Module-Util = # of symbols in client’s dependency # of symbols defined in the target

Utilization-related metrics:

Pairwise Utilization Overall Utilization

  • P. Wang (UWaterloo)

8 / 13

slide-9
SLIDE 9

Generating Precise Dependencies for Large Software

Performance Evaluation

Analysis Scale (Chromium svn-171054) Lines of C/C++ Code 6 Million # of Symbols 470,797 # of Symbol References 13,912,651 # of Modules 238 Analysis time: ∼ 123 minutes (3.1GHz Core i5)

∼ 88 minutes’ compilation time ∼ 35 minutes’ analysis time

Peak memory usage: 5.6GB

  • P. Wang (UWaterloo)

9 / 13

slide-10
SLIDE 10

Generating Precise Dependencies for Large Software

Preliminary Findings

Partial List of Underutilized Modules in Chromium Module # of Symbols Overall Util† notifier 181 4.4∼17.1% ppapi cpp objects 1195 17.5∼17.6% dbus 334 18.9∼18.9% ppapi ipc 3228 19.4∼19.4% remoting jingle glue 97 12.4∼19.6%

†The range shows the impact of virtual function calls.

A Potential Inconsistent Dependency The module base, which is not supposed to depend on any other modules, is using a third-party Base64 en-decryption library.

  • P. Wang (UWaterloo)

10 / 13

slide-11
SLIDE 11

Generating Precise Dependencies for Large Software

Conclusion

Scalable and precise structural dependency extraction and analysis

Scales to millions of lines of code

Full C++ Support

Can analyze most salient C++ features Support some non-standard syntax

Detected potential bad dependencies in Chromium

  • P. Wang (UWaterloo)

11 / 13

slide-12
SLIDE 12

Generating Precise Dependencies for Large Software

Future Work

More Advanced Analysis Based on Precise Dependency Data Modularity Violation Detection Invalid Dependency Injection Diagnosis Large-scale Refactoring Assistance

  • P. Wang (UWaterloo)

12 / 13