Towards automatic state machine reconstruction from legacy PLC using - - PowerPoint PPT Presentation

towards automatic state machine reconstruction from
SMART_READER_LITE
LIVE PREVIEW

Towards automatic state machine reconstruction from legacy PLC using - - PowerPoint PPT Presentation

Towards automatic state machine reconstruction from legacy PLC using data collection Daniil Chivilikhin, Sandeep Patil, Anthony Cordonnier, Valeriy Vyatkin IEEE INDIN 2019, Helsinki, Finland 24 July 2019 Goal Legacy PLC IEC 61131-3 (black


slide-1
SLIDE 1

Towards automatic state machine reconstruction from legacy PLC using data collection

IEEE INDIN 2019, Helsinki, Finland 24 July 2019

Daniil Chivilikhin, Sandeep Patil, Anthony Cordonnier, Valeriy Vyatkin

slide-2
SLIDE 2

Goal

2

Legacy PLC IEC 61131-3 (black box) IEC 61499 state machine

slide-3
SLIDE 3

Contributions

  • 1. Hardware and software architecture for collecting

behavioral data from legacy PLCs in production

  • 2. Algorithm based on translation to Boolean satisfiability

problem (SAT) for reconstructing controller logic in the form of a state machine from data collected from PLC

  • 3. Demonstration of the proposed solution on an example of

a laboratory scale model of a distribution station

3

slide-4
SLIDE 4

Overview of the proposed approach

4

slide-5
SLIDE 5

Data collection

slide-6
SLIDE 6

Hardware architecture for data collection

6

  • Black-box PLC
  • Data collection PLC

running IEC 61499 app

  • Reconstructed PLC
slide-7
SLIDE 7

Example system: Festo distribution station

7

slide-8
SLIDE 8

Data preprocessing (1/4)

Input=[01000101001001] Output=[10000010] Input=[01000101001001] Output=[10000010] Input=[01000101001001] Output=[10000010] Input=[01000101001001] Output=[10000010] Input=[11000101001001] Output=[00000010] Input=[11000101001001] Output=[00000010] Input=[11000101001001] Output=[00000010] Input=[11000101001001] Output=[00000010]

8

Raw data:

slide-9
SLIDE 9

Data preprocessing (2/4)

Input=[01000101001001] Output=[10000010] Input=[01000101001001] Output=[10000010] Input=[01000101001001] Output=[10000010] Input=[01000101001001] Output=[10000010] Input=[11000101001001] Output=[00000010] Input=[11000101001001] Output=[00000010] Input=[11000101001001] Output=[00000010] Input=[11000101001001] Output=[00000010]

9

Raw data:

slide-10
SLIDE 10

Data preprocessing (3/4)

Input=[01000101001001] Output=[10000010] Input=[01000101001001] Output=[10000010] Input=[01000101001001] Output=[10000010] Input=[01000101001001] Output=[10000010] Input=[11000101001001] Output=[00000010] Input=[11000101001001] Output=[00000010] Input=[11000101001001] Output=[00000010] Input=[11000101001001] Output=[00000010]

10

slide-11
SLIDE 11

Data preprocessing (4/4)

Input=[01000101001001] Output=[10000010] Input=[01000101001001] Output=[10000010] Input=[01000101001001] Output=[10000010] <REQ[01000101001001], CNF[10000010]>; <REQ[11000101001001], CNF[00000010]> Input=[11000101001001] Output=[00000010] Input=[11000101001001] Output=[00000010] Input=[11000101001001] Output=[00000010]

11

slide-12
SLIDE 12

Basic function block model

Boolean input/output vars

12

slide-13
SLIDE 13

State machine reconstruction

slide-14
SLIDE 14

Background

  • Minimum deterministic finite automaton construction

from labeled data is NP-complete [Gold, 1978]

14

Heuristics, e.g. state merging, k-tails Metaheuristic, e.g. genetic algorithms SAT-based

T+={ab, b, ba, bbb} T₋={abbb, baba}

slide-15
SLIDE 15

SAT-based state machine synthesis (1/3)

15

SAT-solver Data Solution

Propositional encoding Solution reconstruction

https://srlabs.de/bites/minisat-intro/

  • Heule et al. Exact DFA Identification

Using SAT Solvers [ICGI’10]

  • ...
  • Ulyantsev et al. Exact finite-state

machine identification from scenarios and temporal properties [STTT’18]

  • Chivilikhin et al. Function block

finite-state model identification using SAT and CSP solvers [TII’19]

slide-16
SLIDE 16

SAT-based state machine synthesis (2/3)

16

T+={ab, b, ba, bbb} T₋={abbb, baba}

slide-17
SLIDE 17

SAT-based state machine synthesis (3/3)

17

Traces T

〈...〉, ... , 〈...〉 〈...〉, ... , 〈...〉 〈...〉, ... ,〈...〉

SAT solver No solution (UNSAT) Number of states N Boolean formula Translation function f Trace tree construction Values of variables 𝕎 Automaton

N := N + 1

slide-18
SLIDE 18

Example

18

  • 1
  • 1
  • 2
  • 1
  • 1
  • 2

i1 i2 i3 i1 i2 i2

  • 2
  • 2

i4 i3

slide-19
SLIDE 19

Example

19

  • 1
  • 1
  • 2
  • 1
  • 1
  • 2

i1 i2 i3 i1 i2 i2

  • 2
  • 2

i4 i3

slide-20
SLIDE 20

Example

20

  • 1
  • 1
  • 2
  • 1
  • 1
  • 2

i1 i2 i3 i1 i2 i2

  • 2
  • 2

i4 i3

slide-21
SLIDE 21

Example

21

  • 1
  • 1
  • 2
  • 1
  • 1
  • 2

i1 i2 i3 i1 i2 i2

  • 2
  • 2

i4 i3

slide-22
SLIDE 22

Example

22

  • 1
  • 1
  • 2
  • 1
  • 2

i1 i2 i3 i2 i2

  • 2
  • 2

i4 i3

slide-23
SLIDE 23

Example

23

  • 1
  • 1
  • 2
  • 1
  • 2

i1 i2 i3 i2 i2

  • 2
  • 2

i4 i3

slide-24
SLIDE 24

Example

24

  • 1
  • 1
  • 2
  • 2

i1 i2 i3 i2

  • 2
  • 2

i4 i3

slide-25
SLIDE 25

Example

25

  • 1
  • 1
  • 2
  • 2

i1 i2 i3 i2

  • 2
  • 2

i4 i3

slide-26
SLIDE 26

Example

26

  • 1
  • 1
  • 2
  • 2

i1 i2 i3 i2

  • 2
  • 2

i4 i3

  • 1
  • 1
  • 4

i1 i2 i2

slide-27
SLIDE 27

Example: fail!

27

  • 1
  • 1
  • 2
  • 2

i1 i2 i3 i2

  • 2
  • 2

i4 i3

  • 4

i2 Difference in scan cycles of PLCs leads to inconsistent traces!

slide-28
SLIDE 28

Challenges & approach

Challenges

  • 1. Traces contain errors due to trace collection

procedure

  • 2. We do not know the ground truth

Approach

  • 1. Account for errors in the SAT reduction
  • 2. Enumerate all possible solutions

28

slide-29
SLIDE 29

Trace tree → Trace graph

29

slide-30
SLIDE 30

Error model for trace graph

Add multi-edges on the interface between different outputs

30

  • Simple model, richer models are possible

○ Up to fully connected graph in the worst case

slide-31
SLIDE 31

Constraints...

31

slide-32
SLIDE 32

Color graph nodes in N colors = map graph nodes to automaton states

32

slide-33
SLIDE 33

Only one of the multi-edges may be used for each pair of nodes

33

  • 1
  • 1
  • 2

i1

  • 2

i4 i2 i3

slide-34
SLIDE 34

Find all solutions with different alternative edge choices

34

slide-35
SLIDE 35

Find all solutions with different alternative edge choices

35

Still, exponential number of solutions!

slide-36
SLIDE 36

Coping with exponential number of solutions

  • Zakirzyanov et al. Efficient Symmetry Breaking for

SAT-Based Minimum DFA Inference [LATA’19]

  • Minimize parameters of state machine

○ N – number of states ○ K – outgoing degree of states ○ R – number of transitions

36

slide-37
SLIDE 37

Algorithm

37

slide-38
SLIDE 38

Experiment with distribution station

  • 12 inputs, 8 outputs
  • Six logs for different use cases with varying

complexity and length of runs

  • Algorithm found 63 different state machines that

satisfy the traces with respect to the error model

  • Launch simulation of use cases in NxtStudio
  • Only one (!) state machine was truly correct

38

slide-39
SLIDE 39

Generated state machine

39

slide-40
SLIDE 40

Conclusion & Future work

  • Developed hardware and software architecture for data

collection from PLC

  • Developed algorithm for reconstructing state machine

from (noisy) PLC traces Future work

  • Improve synthesis algorithm
  • Automate validation against legacy system (model)
  • Improve data collection, add time synchronization
  • Target distributed controller reconstruction
  • Move data storage and synthesis to the cloud

40

slide-41
SLIDE 41

Thank you! Daniil Chivilikhin, chivdan@itmo.ru