designing fault diagnosis and reintegration
play

Designing fault-diagnosis and reintegration to prevent node - PowerPoint PPT Presentation

Designing fault-diagnosis and reintegration to prevent node redundancy attrition in highly reliable control systems based on FTT-Ethernet Sinisa Derasevic, Manuel Barranco, Julin Proenza Mathematics and Computer Science Department, University


  1. Designing fault-diagnosis and reintegration to prevent node redundancy attrition in highly reliable control systems based on FTT-Ethernet Sinisa Derasevic, Manuel Barranco, Julián Proenza Mathematics and Computer Science Department, University of the Balearic Islands (UIB), Spain 1

  2. diagnosis and reintegration of faulty nodes in highly reliable Distributed Control Systems based on FTT-Ethernet node node node node ... 1 2 3 M switch 2

  3. diagnosis and reintegration of faulty nodes in highly reliable Distributed Control Systems based on FTT-Ethernet relevant piece of FT4FTT node node node node ... 1 2 3 M switch 3

  4. • high reliability by tolerating faults at o switch  duplicate o links  duplicate o nodes node node node node ... 1 2 3 M leader follower switch switch 4

  5. • high reliability by tolerating faults at o switch  duplicate o links  duplicate o nodes  actively replicate critical nodes & vote node node node node ... 1 2 3 M leader follower switch switch 5

  6. which are the critical nodes? 6

  7. which are the critical nodes? plant S A node ... S A C M sensor actuation controller 7

  8. which are the critical nodes? in principle all these nodes can be considered as critical plant S A system failure node ... S A C M sensor actuation controller 8

  9. which are the critical nodes? replicate sensor and actuation nodes is trivial plant S A node ... S A C M S A controller S A 9

  10. which are the critical nodes? replicate a controller node is complex: replicas must coordinate among them plant S sensor(s) actuator(s) node ... S A M replica replica replica ... 1 2 N coordinate among them 10

  11. how do replicas coordinate? • synchronize at communication & app. levels o using the Trigger Message (TM) • vote on intermediate results 11

  12. how do replicas coordinate? • synchronize at communication & app. levels o using the Trigger Message (TM) • vote on intermediate results  12

  13. voting app: sense control actuate control cycle replica replica replica 1 2 3 leader follower switch switch 13

  14. voting app: sense control actuate control cycle A A B aquire replica replica replica sensors 1 2 3 leader follower switch switch 14

  15. voting A A B aquire replica replica replica sensors 1 2 3 exchange A A B sensors leader follower switch switch 15

  16. voting vote on vote vote vote sensors A A B A A B A A B aquire replica replica replica sensors 1 2 3 exchange A A B sensors leader follower switch switch 16

  17. voting vote on A A A consensus sensors A A B A A B A A B aquire replica replica replica sensors 1 2 3 exchange A A B sensors leader follower switch switch 17

  18. voting app: sense control actuate control cycle A A A consensus replica replica replica 1 2 3 leader follower switch switch 18

  19. benefits of active node replication with voting ? 19

  20. compensate errors the sytem can correctly deliver ✔ replica replica 1 2 its service e replica replica replica 1 2 3 leader follower switch switch 20

  21. replicas may recover from errors ✔ replica 3 recovers if replica 3 replica replica replica and keeps can vote 1 2 3 participating e temporar replica replica replica y 1 2 3 leader follower switch switch 21

  22. however … 22

  23. what if a temporary fault makes a replica to be lost from then on ?? replica replica replica 1 2 3 leader follower switch switch 23

  24. what if a temporary fault makes a replica to be lost from then on ?? temporary fault affects replica 3 internals or communication capabilities replica replica replica 1 2 3 leader follower switch switch 24

  25. what if a temporary fault makes a replica to be lost from then on ?? temporary fault affects replica 3 internals or communication capabilities ? replica replica replica 1 2 3 replica 3 may desynchronize at the ? level of application and/or communication leader follower switch switch 25

  26. what if a temporary fault makes a replica to be lost from then on ?? temporary fault affects replica 3 internals or communication capabilities I cannot recover ! ? replica replica replica 1 2 3 replica 3 may desynchronize at the ? level of application and/or communication leader follower switch switch 26

  27. node redundancy attrition replica 3 is not permanently faulty, I cannot recover ! but can not be used ! ? × replica replica replica 1 2 3 ? leader follower switch switch 27

  28. temporary faults are more probable than permanent ones 28

  29. if we do not prevent redundancy attrition caused by temporary faults 29

  30. then we do not take full advantage of the redundancy investment 30

  31. objective prevent node redundancy attrition 31

  32. objective identify and implement mechanisms to diagnose and reintegrate temporary-faulty nodes that are lost 32

  33. steps • classify faults • exhaustively analyze how they can affect a replica • design needed mechanisms • implement and test them 33

  34. steps • classify faults • exhaustively analyze how they can affect a replica • design needed mechanisms • implement and test them  pending 34

  35. we plan to quantify the reliability improvement 35

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend