verifying safety of fault tolerant distributed components
play

Verifying Safety of Fault-Tolerant Distributed Components R. - PowerPoint PPT Presentation

Verifying Safety of Fault-Tolerant Distributed Components R. Ameur-Boulifa (1) , R. Halalai (2) , L. Henrio (2) , E. Madelaine (2) (1) Telecom-ParisTech Sophia-Antipolis (2) Oasis team INRIA -- CNRS - I3S -- Univ. of Nice Sophia-Antipolis


  1. Verifying Safety of Fault-Tolerant Distributed Components R. Ameur-Boulifa (1) , R. Halalai (2) , L. Henrio (2) , E. Madelaine (2) (1) Telecom-ParisTech Sophia-Antipolis (2) Oasis team INRIA -- CNRS - I3S -- Univ. of Nice Sophia-Antipolis Sept. 2011 - 1

  2. Motivations Programming asynchronous (component-based) applications is difficult, we must provide tools for analysing / debugging complex behaviours. We want to provide a full behavioural semantics for Fractal/GCM components, including their advanced features: asynchronous request queues and future proxies, multicast interfaces. “Compositional Model-checking can scale very far” How far ? Sept. 2011 - 2

  3. Byzantine Fault Tolerant Systems • Byzantine hypothesis: – “bad” guys can have any possible behaviour, – everybody can turn bad, but only up to a fixed % of the processes. ������� ���� ������� ������ ����� ������� ������ ������� ��������� ��������� ��������� Sept. 2011 - 3

  4. Byzantine Fault Tolerant Systems • Correction of BFT is difficult to prove [see bibs in the paper] … but is important in the context of large distributed infrastructures (e.g. P2P networks). • high complexity because of the behaviour of faulty processes, and asynchronous group communication. • several advanced features of the GCM component model. Sept. 2011 - 4

  5. Challenges • Scaling up : are finite-state models able to tame complex, hierarchical, distributed systems ? – Compositionnality: hierarchical semantic model for hierarchical components – Bisimulations; context dependent minimization • Combining reduction techniques: – Data abstraction + compositionality + distributed MC Sept. 2011 - 5

  6. Agenda • Use-case • Formalisms and Semantics • Use-case: state-space generation and model-checking • Conclusion and Perspectives Sept. 2011 - 6

  7. Use-case modeling in GCM - 1 composite component with 2 external services Fig 3 from paper Read/Write. Note: 3f+1 slaves - The service requests are delegated to the Master. - 1 multicast interface sending write/read/commit requests to all slaves. - the salves reply asynchronously, the master only needs 2f+1 coherent answers to terminate Sept. 2011 - 7

  8. Simplification hypothesis 1. The master is reliable: this simplifies the 3-phases commit protocol, and avoid the consensus research phase. The underlying middleware ensures safe communications : faulty 2. components only respond to their own requests, and communication order is preserved. 3. To tolerate f faults we use 3f+1 slaves, and require 2f+1 agreeing answers, as in the usual BFT algorithms. Sept. 2011 - 8

  9. Properties Reachability(*): 1- The Read service can terminate ∀ ∀ ∀ ∀ fid:nat among {0...2}. ∃ ∃ b:bool. ∃ ∃ <true* . {!R_Read !fid !b}> true 2- Is the BFT hypothesis respected bu the model ? < true* . 'Error (NotBFT)'> true Termination: After receiving a Q_Write(f,x) request, it is (fairly) inevitable that the Write services terminates with a R_Write(f) answer, or an Error is raised. Functional correctness: After receiving a ?Q_Write(f1,x), and before the next ?Q_Write, a ?Q_Read requests raises a !R_Read(y) response, with y=x (*) Model Checking Language (MCL), Mateescu et al, FM’08 Sept. 2011 - 9

  10. Agenda • Use-case • Formalisms and Semantics • State-space generation and model-checking • Conclusion and Perspectives Sept. 2011 - 10

  11. Semantic Formalism : the pNet model [Annals of Telecoms 2008] • LTS with explicit data handling (value-passing) with 1st order types • Parallelism and hierarchy using extended synchronization vectors, with parameterized topology. Compromise: • Flexible : accommodate a wide choice of communication / synchronization mechanisms • Opened to convenient “abstractions” towards specific classes of decidable models (finite, regular, etc.) Sept. 2011 - 11

  12. Full picture Sept. 2011 - 12

  13. Building pNets (1) : parameterized LTSs Labelled transition systems, with: • Value passing • Local variables • Guards…. Eric MADELAINE 13 Sept. 2011 - 13

  14. Building pNets (2) : generalized parallel operator BFT-Net Slave[k] Master Q_Write(f,x) Write(x) R_Write() Q_Commit(f) Read() R_Commit() Q_Read() R_Read() BFT-Net : < Master, Slave_1,…,Slave_n > k ∈ ∈ [1:n] ∈ ∈ with synchronisation vectors : <?Write(x), - , …, - > => ?Write(x) <!Q_Write(f,x), ?Q_Write(f,x) , …, ?Q_Write(f,x) > => Q_Write(f,x) ∀ k ∀ ∀ ∀ <?R_Write(f,k), - , …, !R_Write(f), …, - > => R_Write(f,k) Sept. 2011 - 14

  15. Building pNet models (3) Group Proxy [c] Proxies for Asynchronous ?Q_m(d) i ∈ ∈ ∈ D ∈ D D D CO group requests !waitN_m(n,x) R_m(v) ?R_m(x) Collate !get_m(x) manage the return of !get_m(i,x) results, with flexible i ∈ ∈ D ∈ ∈ D D D policies: BC Q_m(d) Body Broadcast - Vector of results [c] !Q_m(d) - First N results - Individual results ?get_m(x) Sept. 2011 - 15

  16. Agenda • Challenges • Formalisms and Semantics • Use-case: state-space generation and model-checking • Conclusion and Perspectives Sept. 2011 - 16

  17. Tool Architecture Goal: fully automatic chain Current state of the platform: production of the CADP input formats only partially (~50%) available. Sept. 2011 - 17

  18. Generation of state-space Taming state-space explosion: Data abstraction (through abstract interpretation): integers => small intervals arrays ??? => open question. Partitioning and minimizing by bisimulation + context specification Distributed verification. Only partially available (state-space generation, but no M.C.). 3 Tbytes of RAM ? Sept. 2011 - 18

  19. State-space generation workflow WriteProxy.fcr MQueue.fcr MBody.fcr … Flac + Master Master Body Write Proxy Distributor Queue + Minimize WriteProxy.bcg MQueue.bcg MQueue.bcg Build product Master.exp (Hide/Rename) Minimize … Master.exp Sept. 2011 - 19

  20. Model generation workflow # cores 5 5 5 5 5 # states (minimized) Read Write Commit Read Write Proxy Proxy Proxy 542 28 171 71 131 2 5 2 5 1 R W Queue ACF Body PM 1 SubSys SubSys 237 139 20 15 33 788 3 728 5 1 10 Master Bad Slave 2 Bad Slave Good Slave Good Slave Good Slave 5 866 073 Client Good Slave 2420 16 5936 Altogether: BFT 1 ~60 cores, 300 GB RAM 1 hour clock time 34 677 Eric MADELAINE 29 Sept. 2011 - 20 Sept. 2011 - 20

  21. Distributed State generation Abstract model: f=1, (=> 4 slaves), |data|= 2, |proxies|=3*3, |client requests|=3 Master queue size = 2 ~100 cores, max 300 GB RAM System parts sizes (states/transitions): Largest Queue Master Good Slave Global Time intermediate 237/3189 524/3107 5M/103M 5936/61K 34K/164K 59’ Estimated brute force state spaces : 10 18 6.10 3 ~ 10 32 Sept. 2011 - 21

  22. Conclusions FACS’08: 1 st class futures CoCoME: hier. & synch. This paper: GCM & multicast interfaces WCSI’10: group More to come ? Reconfiguration... pNets: 2004-2008 communication Contributions : Semantics of GCM components with multicast interfaces. Scaling-up : gained 2 orders of magnitude by a combination of: - data abstraction, - compositional and contextual minimization, - distributed state-space generation. Verification of the correctness of a simple BFT application. Sept. 2011 - 22

  23. Ongoing and Future Work 1. Tooling 2. Verifying dynamic distributed systems (GCM + Reconfiguration): – handle Life-cycle and Binding Controllers, – encode sub-component updates, – several orders of magnitude bigger. 3. Support for distributed MC: – scripting languages, – partitioning strategies Sept. 2011 - 23

  24. Open Questions 1. More on data abstraction: – symmetry in useful data structures (intervals, arrays, …), 2. Context constraints: – ad-hoc correctness proofs (e.g. through proof obligations), – links with assume-guaranty approaches, with behavioural typing. 3. Tooling : – Assisted definition of (valid) abstractions. – Assisted definition of MC partitioning and strategies. Sept. 2011 - 24

  25. Thank you 谢谢 谢谢 谢谢 谢谢 Takk Mul Ń umesc mult Papers, Use-cases, and Tools at : http://www-sop.inria.fr/oasis/Vercors Partially Funded by ANR Blanc with Tsinghua Un. Bejing. Sept. 2011 - 25

  26. Active Objects (very short…) -Runnable (mono-threaded) objects -Communicating by remote method call -Asynchronous computation -Request queues (user-definable policy) Server obj. Client obj. -No shared memory B A -Futures Sept. 2011 - 26

  27. Fractal hierarchical model : Attribute Lifecycle Binding Content Controller Controller Controller Controller • Provided/Required Controller / membrane Interfaces • Hierarchy • Separation of concern: functional / non-functional • ADL • Extensible Content composites encapsulate primitives, which encapsulates code Sept. 2011 - 27

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend