Finding Software Bugs Using Active Automata Learning Frits - - PowerPoint PPT Presentation

finding software bugs using active automata learning
SMART_READER_LITE
LIVE PREVIEW

Finding Software Bugs Using Active Automata Learning Frits - - PowerPoint PPT Presentation

Introduction to Model Learning Applications Theory & Tools for Model Learning Conclusions and Future Work Finding Software Bugs Using Active Automata Learning Frits Vaandrager Radboud University Nijmegen RV 2018, Limassol, November 2018


slide-1
SLIDE 1

Introduction to Model Learning Applications Theory & Tools for Model Learning Conclusions and Future Work

Finding Software Bugs Using Active Automata Learning

Frits Vaandrager

Radboud University Nijmegen

RV 2018, Limassol, November 2018

Frits Vaandrager Finding Bugs Using Automata Learning

slide-2
SLIDE 2

Introduction to Model Learning Applications Theory & Tools for Model Learning Conclusions and Future Work

Outline

1

Introduction to Model Learning

2

Applications

3

Theory & Tools for Model Learning

4

Conclusions and Future Work

Frits Vaandrager Finding Bugs Using Automata Learning

slide-3
SLIDE 3

Introduction to Model Learning Applications Theory & Tools for Model Learning Conclusions and Future Work

Research Question

Frits Vaandrager Finding Bugs Using Automata Learning

slide-4
SLIDE 4

Introduction to Model Learning Applications Theory & Tools for Model Learning Conclusions and Future Work

Research Question

We assume SUT behaves deterministically and can be reset.

Frits Vaandrager Finding Bugs Using Automata Learning

slide-5
SLIDE 5

Introduction to Model Learning Applications Theory & Tools for Model Learning Conclusions and Future Work

Machine Learning in General

Frits Vaandrager Finding Bugs Using Automata Learning

slide-6
SLIDE 6

Introduction to Model Learning Applications Theory & Tools for Model Learning Conclusions and Future Work

Learning Regular Languages

Frits Vaandrager Finding Bugs Using Automata Learning

slide-7
SLIDE 7

Introduction to Model Learning Applications Theory & Tools for Model Learning Conclusions and Future Work

Regular Languages and Congruences

Definition The equivalence relation ∼L on Σ∗ induced by a language L ⊆ Σ∗: u ∼L v iff ∀w ∈ Σ∗ : u · w ∈ L ⇔ v · w ∈ L This relation is a right-congruence with respect to concatenation: ∀u, v, w ∈ Σ∗ : u ∼L v ⇒ u · w ∼L v · w Theorem (Myhill-Nerode, 1958) Language L is regular iff ∼L has finitely equivalence classes.

Frits Vaandrager Finding Bugs Using Automata Learning

slide-8
SLIDE 8

Introduction to Model Learning Applications Theory & Tools for Model Learning Conclusions and Future Work

Visualisation

Consider the regular language a(a | b)∗b

Frits Vaandrager Finding Bugs Using Automata Learning

slide-9
SLIDE 9

Introduction to Model Learning Applications Theory & Tools for Model Learning Conclusions and Future Work

Visualisation

Consider the regular language a(a | b)∗b

Frits Vaandrager Finding Bugs Using Automata Learning

slide-10
SLIDE 10

Introduction to Model Learning Applications Theory & Tools for Model Learning Conclusions and Future Work

Hankel Matrix

Frits Vaandrager Finding Bugs Using Automata Learning

slide-11
SLIDE 11

Introduction to Model Learning Applications Theory & Tools for Model Learning Conclusions and Future Work

Hankel Matrix

Frits Vaandrager Finding Bugs Using Automata Learning

slide-12
SLIDE 12

Introduction to Model Learning Applications Theory & Tools for Model Learning Conclusions and Future Work

Hankel Matrix

Frits Vaandrager Finding Bugs Using Automata Learning

slide-13
SLIDE 13

Introduction to Model Learning Applications Theory & Tools for Model Learning Conclusions and Future Work

Hankel Matrix

Frits Vaandrager Finding Bugs Using Automata Learning

slide-14
SLIDE 14

Introduction to Model Learning Applications Theory & Tools for Model Learning Conclusions and Future Work

Hankel Matrix

Frits Vaandrager Finding Bugs Using Automata Learning

slide-15
SLIDE 15

Introduction to Model Learning Applications Theory & Tools for Model Learning Conclusions and Future Work

Hankel Matrix

Frits Vaandrager Finding Bugs Using Automata Learning

slide-16
SLIDE 16

Introduction to Model Learning Applications Theory & Tools for Model Learning Conclusions and Future Work

Hankel Matrix

u ∼L v iff rows of u and v in Hankel matrix for L have the same color

Frits Vaandrager Finding Bugs Using Automata Learning

slide-17
SLIDE 17

Introduction to Model Learning Applications Theory & Tools for Model Learning Conclusions and Future Work

Hankel Matrix

u ∼L v iff rows of u and v in Hankel matrix for L have the same color Language L is regular iff its Hankel matrix contains a finite number of distinct rows, i.e., colors

Frits Vaandrager Finding Bugs Using Automata Learning

slide-18
SLIDE 18

Introduction to Model Learning Applications Theory & Tools for Model Learning Conclusions and Future Work

Hankel Matrix

u ∼L v iff rows of u and v in Hankel matrix for L have the same color Language L is regular iff its Hankel matrix contains a finite number of distinct rows, i.e., colors The number of states in the smallest DFA for L equals the number of colors in the Hankel matrix

Frits Vaandrager Finding Bugs Using Automata Learning

slide-19
SLIDE 19

Introduction to Model Learning Applications Theory & Tools for Model Learning Conclusions and Future Work

What is the FSM for this Hankel Matrix?

Frits Vaandrager Finding Bugs Using Automata Learning

slide-20
SLIDE 20

Introduction to Model Learning Applications Theory & Tools for Model Learning Conclusions and Future Work

What is the FSM for this Hankel Matrix?

Frits Vaandrager Finding Bugs Using Automata Learning

slide-21
SLIDE 21

Introduction to Model Learning Applications Theory & Tools for Model Learning Conclusions and Future Work

Solution

Colors of rows in Hankel matrix give us the states. Access strings and one-letter extensions allow us to determine transitions. Column for empty suffix gives us the accepting states.

Frits Vaandrager Finding Bugs Using Automata Learning

slide-22
SLIDE 22

Introduction to Model Learning Applications Theory & Tools for Model Learning Conclusions and Future Work

What if Hankel Matrix is Incomplete?

Problem to color such a matrix is NP-hard!

Frits Vaandrager Finding Bugs Using Automata Learning

slide-23
SLIDE 23

Introduction to Model Learning Applications Theory & Tools for Model Learning Conclusions and Future Work

Minimally Adequate Teacher (Angluin)

Learner Teacher MQ

string +/-

EQ

hypothesis

y/n, counterexample

Learner asks membership queries and equivalence queries

Frits Vaandrager Finding Bugs Using Automata Learning

slide-24
SLIDE 24

Introduction to Model Learning Applications Theory & Tools for Model Learning Conclusions and Future Work

Angluin’s L∗ Algorithm

Frits Vaandrager Finding Bugs Using Automata Learning

slide-25
SLIDE 25

Introduction to Model Learning Applications Theory & Tools for Model Learning Conclusions and Future Work

Black Box Checking (Peled, Vardi & Yannakakis)

TQs SUL CT MQ EQ Learner Teacher Learner: Formulate hypotheses Conformance Tester (CT): Test correctness hypotheses

Frits Vaandrager Finding Bugs Using Automata Learning

slide-26
SLIDE 26

Introduction to Model Learning Applications Theory & Tools for Model Learning Conclusions and Future Work

Black Box Checking (Peled, Vardi & Yannakakis)

TQs SUL CT MQ EQ Learner Teacher Learner: Formulate hypotheses Conformance Tester (CT): Test correctness hypotheses Model learning and conformance testing two sides of same coin!

Frits Vaandrager Finding Bugs Using Automata Learning

slide-27
SLIDE 27

Introduction to Model Learning Applications Theory & Tools for Model Learning Conclusions and Future Work

Implements MAT framework for DFAs and Mealy machines

Frits Vaandrager Finding Bugs Using Automata Learning

slide-28
SLIDE 28

Introduction to Model Learning Applications Theory & Tools for Model Learning Conclusions and Future Work

A Theory of Mappers (FMSD, 2015)

Frits Vaandrager Finding Bugs Using Automata Learning

slide-29
SLIDE 29

Introduction to Model Learning Applications Theory & Tools for Model Learning Conclusions and Future Work

A Theory of Mappers (cnt)

Formally, a mapper can be viewed as a transducer (deterministic Mealy machine). A mapper A induces an abstraction operation αA and a concretization operator γA. Theorem For a mapper A and nondeterministic Mealy machines M and H, αA(M) ≤ H iff M ≤ γA(H).

(modulo a minor technical condition) Frits Vaandrager Finding Bugs Using Automata Learning

slide-30
SLIDE 30

Introduction to Model Learning Applications Theory & Tools for Model Learning Conclusions and Future Work

Our Research Method

Frits Vaandrager Finding Bugs Using Automata Learning

slide-31
SLIDE 31

Introduction to Model Learning Applications Theory & Tools for Model Learning Conclusions and Future Work

EMV Protocol (Aarts et al, 2013)

EMV = Europay/Mastercard/Visa Compatibility between smartcards and terminals SEPA requires EMV compliance EMV standard has >700 pages Learning took at most 1500 membership queries, less than 30 minutes Useful for fingerprinting cards

Frits Vaandrager Finding Bugs Using Automata Learning

slide-32
SLIDE 32

Introduction to Model Learning Applications Theory & Tools for Model Learning Conclusions and Future Work

E.dentifier2 (WOOT’14)

Frits Vaandrager Finding Bugs Using Automata Learning

slide-33
SLIDE 33

Introduction to Model Learning Applications Theory & Tools for Model Learning Conclusions and Future Work

State Machines for Old and New E.dentifier2

Frits Vaandrager Finding Bugs Using Automata Learning

slide-34
SLIDE 34

Introduction to Model Learning Applications Theory & Tools for Model Learning Conclusions and Future Work

Bugs in Protocol Implementations

Standard violations found in implementations of major protocols: TLS (Usenix Security’15) TCP (CAV’16) SSH (Spin’17)

Frits Vaandrager Finding Bugs Using Automata Learning

slide-35
SLIDE 35

Introduction to Model Learning Applications Theory & Tools for Model Learning Conclusions and Future Work

Bugs in Protocol Implementations

Standard violations found in implementations of major protocols: TLS (Usenix Security’15) TCP (CAV’16) SSH (Spin’17) These findings led to bug fixes in implementations.

Frits Vaandrager Finding Bugs Using Automata Learning

slide-36
SLIDE 36

Introduction to Model Learning Applications Theory & Tools for Model Learning Conclusions and Future Work

Learned Model for SSH Implementation

Frits Vaandrager Finding Bugs Using Automata Learning

slide-37
SLIDE 37

Introduction to Model Learning Applications Theory & Tools for Model Learning Conclusions and Future Work

SSH Model Checking Results

Frits Vaandrager Finding Bugs Using Automata Learning

slide-38
SLIDE 38

Introduction to Model Learning Applications Theory & Tools for Model Learning Conclusions and Future Work

Other Protocol Case Studies

Session Initiation Protocol (SIP) Message Queuing Telemetry Transport (MQTT) protocol Quick UDP Internet Connections (QUIC) protocol WiFi IEC 60870-5-104 protocol ...

Frits Vaandrager Finding Bugs Using Automata Learning

slide-39
SLIDE 39

Introduction to Model Learning Applications Theory & Tools for Model Learning Conclusions and Future Work

Lorentz Workshop

Participants from automata learning, model-based testing, cryptography, and security protocol implementation. Working groups on e.g., WiFi side channels in TLS LTE

Frits Vaandrager Finding Bugs Using Automata Learning

slide-40
SLIDE 40

Introduction to Model Learning Applications Theory & Tools for Model Learning Conclusions and Future Work

Engine Status Manager in Oc´ e Printer (ICFEM’15)

Can we learn models of realistic printer controllers?

Frits Vaandrager Finding Bugs Using Automata Learning

slide-41
SLIDE 41

Introduction to Model Learning Applications Theory & Tools for Model Learning Conclusions and Future Work

Engine Status Manager in Oc´ e Printer (ICFEM’15)

Can we learn models of realistic printer controllers? Potential applications:regression testing, generation of new implementations

Frits Vaandrager Finding Bugs Using Automata Learning

slide-42
SLIDE 42

Introduction to Model Learning Applications Theory & Tools for Model Learning Conclusions and Future Work

Conformance Testing Becomes the Bottleneck!

No existing conformance testing methods (W, Wp, HSI, ADS, UIOv, P, H, SPY,..) was able to find counterexamples for some hypotheses models of the printer software. We had to develop a new hybrid ADS method, based on work of Lee & Yannakakis.

Frits Vaandrager Finding Bugs Using Automata Learning

slide-43
SLIDE 43

Introduction to Model Learning Applications Theory & Tools for Model Learning Conclusions and Future Work

Mealy Machine for Engine Status Manager

Frits Vaandrager Finding Bugs Using Automata Learning

slide-44
SLIDE 44

Introduction to Model Learning Applications Theory & Tools for Model Learning Conclusions and Future Work

Power Control Service from Philips Healthcare (iFM’16)

Are legacy component and refactored implementation equivalent?

Frits Vaandrager Finding Bugs Using Automata Learning

slide-45
SLIDE 45

Introduction to Model Learning Applications Theory & Tools for Model Learning Conclusions and Future Work

Refactoring Legacy Implementations

Frits Vaandrager Finding Bugs Using Automata Learning

slide-46
SLIDE 46

Introduction to Model Learning Applications Theory & Tools for Model Learning Conclusions and Future Work

Refactoring Legacy Implementations

This approach allowed us to find several bugs in refactored implementations of power control service.

Frits Vaandrager Finding Bugs Using Automata Learning

slide-47
SLIDE 47

Introduction to Model Learning Applications Theory & Tools for Model Learning Conclusions and Future Work

Refactoring Legacy Implementations

This approach allowed us to find several bugs in refactored implementations of power control service. Learned model of a legacy component may also be used as runtime monitor for a refactored implementation (“lifelong learning”)

Frits Vaandrager Finding Bugs Using Automata Learning

slide-48
SLIDE 48

Introduction to Model Learning Applications Theory & Tools for Model Learning Conclusions and Future Work

ASML Twinscan

Frits Vaandrager Finding Bugs Using Automata Learning

slide-49
SLIDE 49

Introduction to Model Learning Applications Theory & Tools for Model Learning Conclusions and Future Work

ASML Challenge

Can active automata learning be used to support refactoring of legacy software at ASML? ASML machines run on legacy software. Recent components have been designed using model-based techniques. Can we learn those? Can we learn the hundreds of design and interface models used for high level control of the wafer flow during lot operation?

Frits Vaandrager Finding Bugs Using Automata Learning

slide-50
SLIDE 50

Introduction to Model Learning Applications Theory & Tools for Model Learning Conclusions and Future Work

ASML Challenge

Can active automata learning be used to support refactoring of legacy software at ASML? ASML machines run on legacy software. Recent components have been designed using model-based techniques. Can we learn those? Can we learn the hundreds of design and interface models used for high level control of the wafer flow during lot operation? ⇒ RERS @ TOOLympics’19

Frits Vaandrager Finding Bugs Using Automata Learning

slide-51
SLIDE 51

Introduction to Model Learning Applications Theory & Tools for Model Learning Conclusions and Future Work

Benchmark Wiki automata.cs.ru.nl

Frits Vaandrager Finding Bugs Using Automata Learning

slide-52
SLIDE 52

Introduction to Model Learning Applications Theory & Tools for Model Learning Conclusions and Future Work

Benchmark Wiki automata.cs.ru.nl

Frits Vaandrager Finding Bugs Using Automata Learning

slide-53
SLIDE 53

Introduction to Model Learning Applications Theory & Tools for Model Learning Conclusions and Future Work

Benchmark Wiki: Supported Automata Frameworks

Frits Vaandrager Finding Bugs Using Automata Learning

slide-54
SLIDE 54

Introduction to Model Learning Applications Theory & Tools for Model Learning Conclusions and Future Work

Automata Wiki: Most Benchmarks are not that Big

Frits Vaandrager Finding Bugs Using Automata Learning

slide-55
SLIDE 55

Introduction to Model Learning Applications Theory & Tools for Model Learning Conclusions and Future Work

Learning Product Automata (Moerman, ICGI’18)

Consider a Moore machine with outputs O1 × O2

Frits Vaandrager Finding Bugs Using Automata Learning

slide-56
SLIDE 56

Introduction to Model Learning Applications Theory & Tools for Model Learning Conclusions and Future Work

An Example from Rivest & Schapire

Frits Vaandrager Finding Bugs Using Automata Learning

slide-57
SLIDE 57

Introduction to Model Learning Applications Theory & Tools for Model Learning Conclusions and Future Work

Experimental Evaluation

Frits Vaandrager Finding Bugs Using Automata Learning

slide-58
SLIDE 58

Introduction to Model Learning Applications Theory & Tools for Model Learning Conclusions and Future Work

Register Automata

Actions may carry data parameters that may be stored in registers:

Frits Vaandrager Finding Bugs Using Automata Learning

slide-59
SLIDE 59

Introduction to Model Learning Applications Theory & Tools for Model Learning Conclusions and Future Work

Data Types

Register automata may be parametrized by a (relational) structure: a pair D, R where D is an unbounded domain of data values, and R is a collection of relations on D. Examples of simple structures include: N, {=}, the natural numbers with equality; R, {<}, the real numbers with inequality: this structure also allows one to express equality between elements. Transition guards are conjunctions of negated and unnegated relations from R.

Frits Vaandrager Finding Bugs Using Automata Learning

slide-60
SLIDE 60

Introduction to Model Learning Applications Theory & Tools for Model Learning Conclusions and Future Work

Learning Tools for Register Automata

Tomte, Radboud University, can only handle N, {=} LearnLib, TU Dortmund, can only handle N, {=} RALib, Uppsala/Dortmund, can handle some richer structures

Frits Vaandrager Finding Bugs Using Automata Learning

slide-61
SLIDE 61

Introduction to Model Learning Applications Theory & Tools for Model Learning Conclusions and Future Work

TCP Protocol Case Study (FMICS-AVoCS’17)

Frits Vaandrager Finding Bugs Using Automata Learning

slide-62
SLIDE 62

Introduction to Model Learning Applications Theory & Tools for Model Learning Conclusions and Future Work

TCP Protocol Case Study (FMICS-AVoCS’17)

These findings led to bug fix in Linux TCP implementation!

Frits Vaandrager Finding Bugs Using Automata Learning

slide-63
SLIDE 63

Introduction to Model Learning Applications Theory & Tools for Model Learning Conclusions and Future Work

Limits of Black-box Learning?

Model learning is an highly effective bug finding technique

Frits Vaandrager Finding Bugs Using Automata Learning

slide-64
SLIDE 64

Introduction to Model Learning Applications Theory & Tools for Model Learning Conclusions and Future Work

Limits of Black-box Learning?

Model learning is an highly effective bug finding technique ... but it has some serious scalability problems

Frits Vaandrager Finding Bugs Using Automata Learning

slide-65
SLIDE 65

Introduction to Model Learning Applications Theory & Tools for Model Learning Conclusions and Future Work

Limits of Black-box Learning?

Model learning is an highly effective bug finding technique ... but it has some serious scalability problems Can we use white-box information while preserving the extensionality of black-box models?

Frits Vaandrager Finding Bugs Using Automata Learning

slide-66
SLIDE 66

Introduction to Model Learning Applications Theory & Tools for Model Learning Conclusions and Future Work

Limits of Black-box Learning?

Model learning is an highly effective bug finding technique ... but it has some serious scalability problems Can we use white-box information while preserving the extensionality of black-box models? Yes, we can!

Frits Vaandrager Finding Bugs Using Automata Learning

slide-67
SLIDE 67

Introduction to Model Learning Applications Theory & Tools for Model Learning Conclusions and Future Work

Fuzzing

By combining LearnLib, hybrid ADS testing, and the American fuzzy lop fuzzer (AFL), my group, together with colleagues from Delft, won the RERS 2016 challenge.

Frits Vaandrager Finding Bugs Using Automata Learning

slide-68
SLIDE 68

Introduction to Model Learning Applications Theory & Tools for Model Learning Conclusions and Future Work

Taint Analysis

Frits Vaandrager Finding Bugs Using Automata Learning

slide-69
SLIDE 69

Introduction to Model Learning Applications Theory & Tools for Model Learning Conclusions and Future Work

Taint Analysis

White-box technique for code analysis Instruments code to track input values Many tools focus on specific vulnerabilities, e.g. buffer

  • verflows and sql injections

Usually implemented using Dynamic Binary Analysis, e.g. Valgrind We use Python library from Pygmalion tool from Andreas Zeller et al.

Frits Vaandrager Finding Bugs Using Automata Learning

slide-70
SLIDE 70

Introduction to Model Learning Applications Theory & Tools for Model Learning Conclusions and Future Work

What Does Pygmalion Do For Us?

Frits Vaandrager Finding Bugs Using Automata Learning

slide-71
SLIDE 71

Introduction to Model Learning Applications Theory & Tools for Model Learning Conclusions and Future Work

What Does Pygmalion Do For Us?

Potential of exponential gains during learning!

Frits Vaandrager Finding Bugs Using Automata Learning

slide-72
SLIDE 72

Introduction to Model Learning Applications Theory & Tools for Model Learning Conclusions and Future Work

Architecture RAlib Tool for Learning Register Automata

Frits Vaandrager Finding Bugs Using Automata Learning

slide-73
SLIDE 73

Introduction to Model Learning Applications Theory & Tools for Model Learning Conclusions and Future Work

Tree Oracle

Frits Vaandrager Finding Bugs Using Automata Learning

slide-74
SLIDE 74

Introduction to Model Learning Applications Theory & Tools for Model Learning Conclusions and Future Work

Ongoing Work

Replace tree oracle in RAlib by a version that uses taint analysis.

Frits Vaandrager Finding Bugs Using Automata Learning

slide-75
SLIDE 75

Introduction to Model Learning Applications Theory & Tools for Model Learning Conclusions and Future Work

Ongoing Work

Replace tree oracle in RAlib by a version that uses taint analysis. First prototype finished (for integers with equality)

Frits Vaandrager Finding Bugs Using Automata Learning

slide-76
SLIDE 76

Introduction to Model Learning Applications Theory & Tools for Model Learning Conclusions and Future Work

Conclusions

Active automata learning is emerging as a highly effective bug-finding technique, and (very) slowly becoming a standard tool in the toolbox of the software engineer. But much further research is needed!

Frits Vaandrager Finding Bugs Using Automata Learning

slide-77
SLIDE 77

Introduction to Model Learning Applications Theory & Tools for Model Learning Conclusions and Future Work

Future Work

1 Further improvement of black-box learning/testing algorithms

for FSMs

2 Explore combinations of black-box and white-box learning 3 Develop algorithms for models with time and probabilities 4 Refactoring of legacy software is potentially excellent

application domain

Frits Vaandrager Finding Bugs Using Automata Learning