SLIDE 1
Active Learning of State Machines
tutorial Frits Vaandrager
Radboud University Nijmegen Dagstuhl, March 2018
SLIDE 2 Informationsteknologi Informationsteknologi
Goal Active Automaton Learning
What state machine governs the behavior
SUT
input 1 input 2
reset
SLIDE 3 Informationsteknologi Informationsteknologi
Why Study Automata Learning?
Fundamental: System Identification Useful
- Often we don’t have models of software
components
- When we have models we often don’t know
whether they are correct
SLIDE 4
Informationsteknologi Informationsteknologi
Machine Learning in General
SLIDE 5
Informationsteknologi Informationsteknologi
Learning Regular Languages
SLIDE 6
Informationsteknologi Informationsteknologi
Minimally Adequate Teacher
Teacher Learner
Equivalence Queries Membership Queries Yes / No
Yes / No + Counterexample
SLIDE 7
Informationsteknologi Informationsteknologi
Regular Sets and Congruences
SLIDE 8
Informationsteknologi Informationsteknologi
Angluin’s L* Algorithm
SLIDE 9
Informationsteknologi Informationsteknologi
Black Box Checking (Peled, Vardi & Yannakakis, ‘99)
Learner: Formulate hypothesis Model-Based Testing: Test hypothesis
SLIDE 10
Informationsteknologi Informationsteknologi
SLIDE 11
Informationsteknologi Informationsteknologi
Our Research Method
Applications Tools Theory
SLIDE 12 Informationsteknologi Informationsteknologi
Application 1: EMV protocol
Inference of EMV protocol
- Credit card with EMV chip
EMV = Europay, Mastercard and Visa Compatibility between smartcards and terminals EMV-compliance required for
SLIDE 13
Informationsteknologi Informationsteknologi
Model of SecureCode app on Dutch banking card
EMV standard has over 700 pages At most 1500 membership queries, less than 30 minutes
SLIDE 14
Informationsteknologi Informationsteknologi
Different cards, different state machines Specification? Learned models provide unique fingerprints of cards!
SLIDE 15
Informationsteknologi Informationsteknologi
Application 2: E.dentifier2
SLIDE 16
Informationsteknologi Informationsteknologi
State Machines for Old and New E.dentifier2
SLIDE 17 Informationsteknologi Informationsteknologi
A Theory of Abstractions
(Aarts, Jonsson, Uijen & Vaandrager, 2015)
Learner
small ∑
Teacher
probably large ∑
Mapper
abstract input abstract
concrete input concrete
SLIDE 18
Informationsteknologi Informationsteknologi Application 3-5: Protocol Implementations We found standard violations in implementations of major protocols:
TCP (CAV’16, FMICS’17) TLS (Usenix Security ‘15) SSH (Spin’17)
SLIDE 19
Informationsteknologi Informationsteknologi
SSH Learning Results
SLIDE 20
Informationsteknologi Informationsteknologi
SSH Model Checking Results
SLIDE 21
Informationsteknologi Informationsteknologi
Application 6: Power Control Service of Philips
Legacy component Refactored component Equivalent?
SLIDE 22
Legacy Implementation Refactored Implementation model learner model learner Model Model equivalence checker equiv ? counter example Y done N models correct for ? Adapt model(s) using N Adapt implementations(s) Y
Our Approach
SLIDE 23
Informationsteknologi Informationsteknologi
Application 7: Engine Status Manager Océ Printer Goal: learn models of realistic printer controllers Possible use: regression testing, generation of new implementations,..
SLIDE 24 Informationsteknologi Informationsteknologi
Adaptive Distinguishing Sequences
(Lee & Yannakakis, 1994)
SLIDE 25
Informationsteknologi Informationsteknologi
Results
Learned model from SUT
equivalent to handcrafted model
114 hypotheses
generated
8.5 hours needed 29.933.643 membership
queries with ≈35 inputs
30.629.711validity
queries with ≈30 inputs
SLIDE 26 Informationsteknologi Informationsteknologi
Theory+Tools: Learning Register Automata
Three approaches:
- 1. Using adapted Myhill-Nerode (LearnLib, RALib)
- 2. Using mappers and CEGAR (Tomte)
- 3. Using NLambda Haskell library for nominal automata
SLIDE 27
Informationsteknologi Informationsteknologi
Theory: Learning Timed Mealy Machines
(Jonsson & Vaandrager, 2018)
SLIDE 28 Informationsteknologi Informationsteknologi
Future Work: Opening the Box
Some possible approaches:
- 1. Fuzzing
- 2. Static analysis
- 3. Tainting
SLIDE 29
Informationsteknologi Informationsteknologi
Other Research Challenges
I/O transition systems Nondeterminism More complex (operations on) data Quality of learned models …
SLIDE 30
Informationsteknologi Informationsteknologi
Conclusions
Nice mix of theory and applications Numerous challenges