Static (Software) Analysis Dagstuhl 16172: Machine Learning for - - PowerPoint PPT Presentation

static software analysis
SMART_READER_LITE
LIVE PREVIEW

Static (Software) Analysis Dagstuhl 16172: Machine Learning for - - PowerPoint PPT Presentation

Static (Software) Analysis Dagstuhl 16172: Machine Learning for Dynamic Software Analysis Reiner Hhnle Software Engineering Group Department of Computer Science Technische Universitt Darmstadt http://www.se.tu-darmstadt.de/


slide-1
SLIDE 1

Static (Software) Analysis

Dagstuhl 16172: Machine Learning for Dynamic Software Analysis Reiner Hähnle

Software Engineering Group Department of Computer Science Technische Universität Darmstadt

http://www.se.tu-darmstadt.de/ haehnle@cs.tu-darmstadt.de

160425 | TUD CS SE | R. Hähnle | Static Analysis | Dagstuhl 16172 ML & SA | 0

slide-2
SLIDE 2

What Is Static Analysis (SA) of Software?

Establish a property of a program at compile time, without executing it

160425 | TUD CS SE | R. Hähnle | Static Analysis | Dagstuhl 16172 ML & SA | 1

slide-3
SLIDE 3

What Is Static Analysis (SA) of Software?

Establish a property of a program at compile time, without executing it

Some Facts

◮ Checking done by a tool, not a human ◮ Performed usually on source or assembler code ◮ Original motivation: compiler optimization

◮ Data flow analysis, e.g., used variables ◮ Control flow analysis, e.g., reachable code

◮ Current focus: software quality

◮ Security, e.g., confidentiality, vulnerability ◮ Compliance, e.g., MISRA-C, web service protocols ◮ Defects (bug finding), e.g., memory leaks, buffer overflows ◮ Code quality, e.g., metrics, “code smells” 160425 | TUD CS SE | R. Hähnle | Static Analysis | Dagstuhl 16172 ML & SA | 1

slide-4
SLIDE 4

Static Analysis in the Narrow/Wide Sense Static Analysis in the Narrow Sense

◮ Check a fixed property ◮ Low polynomial decision complexity ◮ Value-insensitive abstraction, e.g., control flow graph

160425 | TUD CS SE | R. Hähnle | Static Analysis | Dagstuhl 16172 ML & SA | 2

slide-5
SLIDE 5

Static Analysis in the Narrow/Wide Sense Static Analysis in the Narrow Sense

◮ Check a fixed property ◮ Low polynomial decision complexity ◮ Value-insensitive abstraction, e.g., control flow graph

Static Analysis in the Wide Sense (which is my sense)

◮ Complex properties, expressed in specification language

◮ security policy, interface protocol, functional property

◮ NP-hard or even undecidable

◮ heuristics optimize the “common case”, no guarantees ◮ interaction by human expert user

◮ Fully precise control flow and data model ◮ Often based on formal methods

160425 | TUD CS SE | R. Hähnle | Static Analysis | Dagstuhl 16172 ML & SA | 2

slide-6
SLIDE 6

Static Analysis Techniques

◮ Graph-based program abstractions

◮ Control flow graph ◮ Program dependence graph

◮ Constraint Solving

◮ Recently popular: SAT/SMT solvers as backend

◮ Automata-based representations

◮ Model checking

◮ Abstract Interpretation ◮ Symbolic Execution ◮ Deductive Verification

Algorithms Search Inference

160425 | TUD CS SE | R. Hähnle | Static Analysis | Dagstuhl 16172 ML & SA | 3

slide-7
SLIDE 7

Static Analysis Techniques

◮ Graph-based program abstractions

◮ Control flow graph ◮ Program dependence graph

◮ Constraint Solving

◮ Recently popular: SAT/SMT solvers as backend

◮ Automata-based representations

◮ Model checking

◮ Abstract Interpretation ◮ Symbolic Execution ◮ Deductive Verification

Algorithms Search Inference ML has most potential in complex SA techniques: Search ⇒ Lookup

160425 | TUD CS SE | R. Hähnle | Static Analysis | Dagstuhl 16172 ML & SA | 3

slide-8
SLIDE 8

Interlude

A State-of-art Tool for Complex Static Analysis

By Source, Fair use, https://en.wikipedia.org/w/index.php?curid=20208543 160425 | TUD CS SE | R. Hähnle | Static Analysis | Dagstuhl 16172 ML & SA | 4

slide-9
SLIDE 9

Challenges for SA Scaling

◮ Intra- vs. inter-procedural: compositionality difficult for complex SA ◮ Coverage/rapid evolution of industrial programming languages ◮ Hard to analyze language features: dynamic typing, reflection, HO closures

160425 | TUD CS SE | R. Hähnle | Static Analysis | Dagstuhl 16172 ML & SA | 5

slide-10
SLIDE 10

Challenges for SA Scaling

◮ Intra- vs. inter-procedural: compositionality difficult for complex SA ◮ Coverage/rapid evolution of industrial programming languages ◮ Hard to analyze language features: dynamic typing, reflection, HO closures

Precision

◮ Incompleteness, false positives ◮ “Soundiness”, see B. Livshits et al., CACM 58(2) 44–46, Feb. 2015

160425 | TUD CS SE | R. Hähnle | Static Analysis | Dagstuhl 16172 ML & SA | 5

slide-11
SLIDE 11

Challenges for SA Scaling

◮ Intra- vs. inter-procedural: compositionality difficult for complex SA ◮ Coverage/rapid evolution of industrial programming languages ◮ Hard to analyze language features: dynamic typing, reflection, HO closures

Precision

◮ Incompleteness, false positives ◮ “Soundiness”, see B. Livshits et al., CACM 58(2) 44–46, Feb. 2015

Modern computer architecture

The deployment gap

◮ Multi-level caches, stale data ◮ Parallel computing: GPUs, weak memory models ◮ Cloud: provisioning bugs, resource-aware computing

160425 | TUD CS SE | R. Hähnle | Static Analysis | Dagstuhl 16172 ML & SA | 5

slide-12
SLIDE 12

Current Trends More complex properties, often combine behavior and data

◮ Integration tasks (web interfaces, frameworks, APIs, . . . ) ◮ Security-related: information flow ◮ Evolution: regression, certification, . . .

160425 | TUD CS SE | R. Hähnle | Static Analysis | Dagstuhl 16172 ML & SA | 6

slide-13
SLIDE 13

Current Trends More complex properties, often combine behavior and data

◮ Integration tasks (web interfaces, frameworks, APIs, . . . ) ◮ Security-related: information flow ◮ Evolution: regression, certification, . . .

Combine Static and Dynamic Analysis

◮ Concolic or dynamic symbolic execution ◮ Incomplete static analysis to speed up runtime monitoring

160425 | TUD CS SE | R. Hähnle | Static Analysis | Dagstuhl 16172 ML & SA | 6

slide-14
SLIDE 14

Current Trends More complex properties, often combine behavior and data

◮ Integration tasks (web interfaces, frameworks, APIs, . . . ) ◮ Security-related: information flow ◮ Evolution: regression, certification, . . .

Combine Static and Dynamic Analysis

◮ Concolic or dynamic symbolic execution ◮ Incomplete static analysis to speed up runtime monitoring

Immersion in IDEs

◮ Use machine idle time while user deliberates ◮ Improved usability

160425 | TUD CS SE | R. Hähnle | Static Analysis | Dagstuhl 16172 ML & SA | 6

slide-15
SLIDE 15

Current Trends More complex properties, often combine behavior and data

◮ Integration tasks (web interfaces, frameworks, APIs, . . . ) ◮ Security-related: information flow ◮ Evolution: regression, certification, . . .

Combine Static and Dynamic Analysis

◮ Concolic or dynamic symbolic execution ◮ Incomplete static analysis to speed up runtime monitoring

Immersion in IDEs

◮ Use machine idle time while user deliberates ◮ Improved usability

Resource Analysis

Resource management and deployment become separate development phases

160425 | TUD CS SE | R. Hähnle | Static Analysis | Dagstuhl 16172 ML & SA | 6

slide-16
SLIDE 16

Related Fields Static Analysis of Programs is a not a Separate Discipline

Cross cutting with many other fields, distinction is blurry

◮ Type Theory ◮ Abstract Interpretation ◮ Model Checking ◮ Software Verification

160425 | TUD CS SE | R. Hähnle | Static Analysis | Dagstuhl 16172 ML & SA | 7

slide-17
SLIDE 17

Program Analysis: The Two Worlds The Two Worlds Meet

◮ Glassbox ◮ Symbolic ◮ Heuristic ◮ Analysis ◮ Static Analyses

◮ Model checking ◮ Abstract interpretation ◮ Symbolic execution ◮ Deductive verification

◮ Blackbox ◮ Statistical ◮ Complete ◮ Synthesis ◮ Learning Techniques

◮ Extract Behavior from Traces ◮ Learning-based Software Testing ◮ Learning-based synthesis ◮ 160425 | TUD CS SE | R. Hähnle | Static Analysis | Dagstuhl 16172 ML & SA | 8

slide-18
SLIDE 18

Program Analysis: The Two Worlds Advantages

◮ Precise, rich modelling ◮ Executable/compilable target ◮ Can be scalable ◮ Certificates possible ◮ Source code not needed ◮ Applicable to any system level ◮ Fully automatic ◮ Robust

160425 | TUD CS SE | R. Hähnle | Static Analysis | Dagstuhl 16172 ML & SA | 8

slide-19
SLIDE 19

Program Analysis: The Two Worlds Disadvantages

◮ Must have/generate source code ◮ Where do the specs come from? ◮ Some expert interaction necessary ◮ Evolution of target expensive ◮ Incompleteness, soundiness ◮ Learned models very abstract ◮ How to map abstract to code level? ◮ No use of symbolic techniques ◮ Slow convergence, small coverage ◮ Doesn’t scale/not compositional

160425 | TUD CS SE | R. Hähnle | Static Analysis | Dagstuhl 16172 ML & SA | 8

slide-20
SLIDE 20

Software Model ⇔ Executable Code Pivotal Issue: The Link between Models and Code

Too tight: Need to hand-craft modelling abstractions Too loose: Unsatisfactory precision/coverage

160425 | TUD CS SE | R. Hähnle | Static Analysis | Dagstuhl 16172 ML & SA | 9

slide-21
SLIDE 21

Software Model ⇔ Executable Code Pivotal Issue: The Link between Models and Code

Too tight: Need to hand-craft modelling abstractions Too loose: Unsatisfactory precision/coverage Increase elasticity of model-code link without sacrificing precision

160425 | TUD CS SE | R. Hähnle | Static Analysis | Dagstuhl 16172 ML & SA | 9

slide-22
SLIDE 22

Software Model ⇔ Executable Code Pivotal Issue: The Link between Models and Code

Too tight: Need to hand-craft modelling abstractions Too loose: Unsatisfactory precision/coverage Increase elasticity of model-code link without sacrificing precision

Potential Benefits

◮ Decrase dependency of glassbox from availability of specs, source code ◮ Integrate blackbox into precise/sound(y) reasoning framework ◮ Dramatically improve performance of both glass-/blackbox

160425 | TUD CS SE | R. Hähnle | Static Analysis | Dagstuhl 16172 ML & SA | 9

slide-23
SLIDE 23

Summary

There is a substantial, unexploited potential in systematically linking static analysis with symbolic methods and automata-based/statistical learning methods

◮ The static analysis community needs to understand better recent advances in

learning system behavior

◮ The learning community needs to understand better the opportunities offered

by automated reasoning and symbolic formalisms That’s why we are here!

160425 | TUD CS SE | R. Hähnle | Static Analysis | Dagstuhl 16172 ML & SA | 10