Defending Distributed Cyber-Physical Systems with Bounded Time Recovery
Bri Brian Sa Sandler, Neeraj Gandhi, Linh Thi Xuan Phan, Andreas Haeberlen NSF/Intel CPS PI Meeting July 2018
1
Defending Distributed Cyber-Physical Systems with Bounded Time - - PowerPoint PPT Presentation
Defending Distributed Cyber-Physical Systems with Bounded Time Recovery Bri Brian Sa Sandler, Neeraj Gandhi, Linh Thi Xuan Phan, Andreas Haeberlen NSF/Intel CPS PI Meeting July 2018 1 Machines in Control Vulnerable CPS can cause
Bri Brian Sa Sandler, Neeraj Gandhi, Linh Thi Xuan Phan, Andreas Haeberlen NSF/Intel CPS PI Meeting July 2018
1
disaster.
BTR - NSF/Intel PI Meeting - July 2018
2
Bellingham, WA
Oil pipeline explosion after the two controlling computers failed.
We want to pre reve vent disa sast ster.
Iran
Stuxnet vulnerability destroyed centrifuges used for nuclear enrichment.
Ivano-Frankivsk, Ukraine
Controlling power grid systems were compromised leaving residents in the dark.
BTR - NSF/Intel PI Meeting - July 2018
3
Crashes
Byzantine Faults
Non-Crash Bugs Hacking
BTR - NSF/Intel PI Meeting - July 2018
4
Let’s take a simple example system…
N1
S1 S2 A1
N2 N3 N4
A2 A3 A4
BTR - NSF/Intel PI Meeting - July 2018
5
This system will run four applications.
N1
S1 S2 A1
N2 N3 N4
A2 A3 A4
1 2 3 4 6 8 5 7
BTR - NSF/Intel PI Meeting - July 2018
6
We’ll focus on the burner control application…
N1
S1 S2 A1
N2 N3 N4
A2 A3 A4
1 2 3 4 6 8 5 7
BTR - NSF/Intel PI Meeting - July 2018
7 N1
S1 S2 A1
N2 N3 N4
A2 A3 A4
1 2 3 4 6 8 5 7
What can go wrong?
N4 can send an inco corre rrect ct va value to A1 and light the building on fire. N4 can dro rop or delay delay messages and ruin the chemical processing.
Be Benefit fits
8
gains control?
that resist quick changes
perfect
BTR - NSF/Intel PI Meeting - July 2018
9
We ca can leve vera rage this! s!
N4
BTR - NSF/Intel PI Meeting - July 2018
10
A time me peri riod usu sually y exi xist sts s where re faulty y behavi vior r is s ok k so so long as s the syst system m re return rns s to its s co corre rrect ct behavi vior r within that peri riod.
DC/DC converters (STM) 20μs Direct torque control (ABB) 25μs AC/DC converters 50μs Electronic throttle control (Ford) 5ms Traction control (Ford) 20ms Micro-scale race cars 40ms Autonomous vehicle steering 50ms Energy-efficient building control 500ms
Source: M. Morari. Fast model predictive control (mpc).
short period of time, so that the end goal will be met
BTR - NSF/Intel PI Meeting - July 2018
11
Time
Recovery Period
Fault Recovered
Correct Operation Correct Operation
BTR - NSF/Intel PI Meeting - July 2018
12
BTR - NSF/Intel PI Meeting - July 2018
13
N2 fails
N1: N3: N4:
N1 N2 N4 N3 N1 N2 N4 N3
Evidence
N4 is faulty.
Nodes watch over each other to detect faults.
BTR - NSF/Intel PI Meeting - July 2018
14 N1
S1 S2 A1
N2 N3 N4
A2 A3 A4
1 2 3 4 6 8 5 7 3 3 SEND… RECV… … SEND… RECV… …
N4 is faulty
Flood evidence throughout the system.
BTR - NSF/Intel PI Meeting - July 2018
15 N1
S1 S2 A1
N2 N3 N4
A2 A3 A4
1 2 3 4 6 8 5 7 3 3
N4 is faulty
BTR - NSF/Intel PI Meeting - July 2018
16 N1
S1 S2 A1
N2 N3 N4
A2 A3 A4
1 2 3 4 6 8 5 7 3 8
Each node independently transitions to a new mode
All nodes OK N4 is faulty All nodes OK All nodes OK N4 is faulty All node OK All no All nodes OK All nodes OK l nodes OK N4 is faulty N4 is faulty N4 faulty N4 is faulty N4 is faulty N4 is faulty N4 is faulty
BTR - NSF/Intel PI Meeting - July 2018
17
For every* mode, we have a precomputed schedule and plan for every node.
* Can limit the number of faults to improve computation time.
BTR - NSF/Intel PI Meeting - July 2018
18
Node 1 Faulty No Faults Link 1-2 Faulty Nodes 1&4 Faulty
… …
Omission Faults
from a neighbor is not received
mode.
Commission Faults
misbehavior.
Challenge: Bounding Time of Detection
BTR - NSF/Intel PI Meeting - July 2018
19
RECV… SEND… RECV…
2 4
Audit/Witne Task (runs a replica
4 2 4 2 N1 N2
I declare link N1 – N be fault
RECV… SEND… RECV… RECV… SEND… RECV…
We need a solution where…
state of the system
communicate St Stra rawma man: flood the system periodically with signed attestations of current mode
BTR - NSF/Intel PI Meeting - July 2018
20
bounded period of time.
BTR - NSF/Intel PI Meeting - July 2018
21
N2 fails
N1: N3: N4:
N1 fails N4 fails N1 N2 N4 N3 N1 N2 N4 N3 N1 N2 N4 N3 N1 N2 N4 N3
N3: N4: N3:
N2 Faulty N1 & N2 Faulty N1,N2,N4 Faulty
to their neighbors
BTR - NSF/Intel PI Meeting - July 2018
22
… … …
BTR - NSF/Intel PI Meeting - July 2018
23
BTR - NSF/Intel PI Meeting - July 2018
24
nodes.
nodes, f.
for the lifetime of the system.
parallelizable.
f = # of faulty nodes protected against
BTR - NSF/Intel PI Meeting - July 2018
25
Unprotected System, N2 Compromised
BTR - NSF/Intel PI Meeting - July 2018
26
Protected System, N2 Compromised
Recovery Period
BTR - NSF/Intel PI Meeting - July 2018
27
Protected System, N1, N2, N3 Compromised
BTR - NSF/Intel PI Meeting - July 2018
28
Ke Key y Idea: Period of Imperfection
Many CPS can tolerate a short period of aulty behavior.
Ap Appro roach ch: Bounded Time Recovery
Bounded time recovery guarantees that the system quickly returns to correct behavior fter a fault.
So Solution: REBOUND
Algorithms and protocols to provide BTR