experimental performability evaluation of middleware for
play

Experimental Performability Evaluation of Middleware for Large-Scale - PowerPoint PPT Presentation

Experimental Performability Evaluation of Middleware for Large-Scale Distributed Systems L. Soares J. Pereira losoares@di.uminho.pt jop@di.uminho.pt Distributed Systems Group Informatics Department University of Minho PORTUGAL L. Soares


  1. Experimental Performability Evaluation of Middleware for Large-Scale Distributed Systems L. Soares J. Pereira losoares@di.uminho.pt jop@di.uminho.pt Distributed Systems Group Informatics Department University of Minho PORTUGAL L. Soares and J. Pereira PMCCS’05 - September 24

  2. Context Database Replication providing high availabitiy and high performance. Replication protocols based on group communication services, for example: DBSM (Database State Machine). Optimistic Execution. Atomic Broadcast. Deterministic Certification Process. We study both wide area and cluster replication. L. Soares and J. Pereira PMCCS’05 - September 24

  3. Problem Statement Goals Incremental development and testing repeatability. Keep control of testing variables (specially hard in WAN networks). Problem Statement How to perform realistic tests under self-contained and highly controllable environments? How to setup and support a cost-effective test environment with few resources? Existing testing and validation approaches do not meet all the requirements. L. Soares and J. Pereira PMCCS’05 - September 24

  4. Existing Approaches DART and PlanetLab A collection of machines distributed over the globe, in which applications run across all (or some) of the machines. Full implementations required. FAUMachine High realistic virtualization software, which allows fault injection. Small scale tests and full implementations required. L. Soares and J. Pereira PMCCS’05 - September 24

  5. Existing Approaches Unit Testing Designed to evaluate the correctness of a particular unit of software. Limited when analyzing performance and dependability of large scale distributed middleware. CESIUM Presents the notion of centralized simulation, in which distributed processes execute under the same address space. It is nearly want we wanted. L. Soares and J. Pereira PMCCS’05 - September 24

  6. Discrete Event Simulation and Real Execution Provides deterministic execution and repeatability which is most valuable for debugging and tuning. Reuse validated simulation models mimicking the real components. Embed real implementations into the simulation environment. Freedom to choose testing and evaluation scenarios. Reduced probe effect in observations. Demands less resources than a real system. L. Soares and J. Pereira PMCCS’05 - September 24

  7. � � � � � � � � � � � Simulation Clock vs Real-Time Clock - Issues A simulation keeps virtual timelines that are explicitly advanced only by scheduling events When running real code, real time must modify the simulation clock An example: A request message is transmitted over a simulated network with delay simulation time δ 1 . δ 1 � • δ 2 =? � Some real code is run at the server real time • � � � � � to handle the message and is ∆ 1 δ 3 � • � • measured as taking △ 1 . � � δ 2 =∆ 1 A reply message is transmitted back with a delay δ 3 . L. Soares and J. Pereira PMCCS’05 - September 24

  8. Simulation Clock vs Real-Time Clock - Desired Features The timing problem gets solved if the simulation kernel was capable of: Accounting time spent in real execution (precision issues). Scheduling events from the real execution into the simulation run-time, and making them show up at the correct instant in virtual time. Embedding real components into simulation framework without changes to the component source code. L. Soares and J. Pereira PMCCS’05 - September 24

  9. Simulation Kernel With Real-Time Concerns - Contribution Augmented a popular open-standard simulation kernel, Scalable Simulation Framework (SSF), with real-time extensions: Interface enabling transparent accounting of time spent on execution of real code. Proxies enabling the interaction between unmodified implementations and simulation models. Extending SSF kernel enables the usage of libraries of simulation models already available for the original one. L. Soares and J. Pereira PMCCS’05 - September 24

  10. Simulation Kernel With Real-Time Concerns - Contribution Figure: Simulation Calls Real Figure: Real Calls Simulation L. Soares and J. Pereira PMCCS’05 - September 24

  11. Database Replication Evaluation Fault Tolerante Scalable Distributed Databases (Cluster Approach) Three sites connected through a LAN. Simulation Models: Database engine and network. Real Code: Group Communication and DBSM Protocol. Results published in the DSN-PDS’05. Open Replication of Databases (Large Scale Approach) Nine sites connected through a LAN/WAN. Simulation Models: Database engine and network. Real Code: Group Communication, Middle-R, Postgres-R and DBSM protocol. Results published in the LADC’05. L. Soares and J. Pereira PMCCS’05 - September 24

  12. Database Replication Evaluation The kernel, we named it MinhaSSF , embraces all components which also make use of logging facilities. Load is generated according the TPC-C benchmark. The database box is a simulation model. Grey shaded boxes, are real components embedded in the simulation. Blue shaded box is a library of fully featured and fully tested network Figure: Database Site Model simulation models. L. Soares and J. Pereira PMCCS’05 - September 24

  13. Database Replication Evaluation - Fault Injection No Faults 0.8 0.8 ratio of latencies ratio of latencies Random Loss Bursty Loss 0.4 0.4 0.0 0.0 1 5 10 50 500 1 2 5 10 20 50 100 500 (ms) (ms) (a) Transaction latency. (b) Certification latency. Figure: Performance results with fault injection (750 Clients, LAN). L. Soares and J. Pereira PMCCS’05 - September 24

  14. Database Replication Evaluation - Fault Injection (II) No Faults Random Bursty Loss Loss (5%) (5%) 6.72 11.94 7.96 (a) Abort rates with 3 sites and 750 clients (%). Run Usage No Faults 1.22 Random Loss 1.90 Bursty Loss 1.89 (b) Protocol CPU us- age (%). L. Soares and J. Pereira PMCCS’05 - September 24

  15. Simulation Performance 10 Simulation was performed in a LAN WAN 8 Dual Opteron at 2.4GHz. Real vs Sim. 6 Each database connection 4 emulates a TPC-C client. 2 Simulating 1 minute takes 0 from 6 to 10 in real time when 0 1000 2000 3000 4000 there are 4000 clients. Clients The 10 minutes regard the centralized execution of 9 Figure: Ratio between real and simulation time. distributed sites. L. Soares and J. Pereira PMCCS’05 - September 24

  16. Conclusions and Future Work Extended the SSF kernel to enable centralized execution. Use recognized simulation models, SSFNet, or even develop our own. The framework enables incremental development. It has been used to evaluate database replication using real implemented replication protocols. We measure performance, availability and reliability metrics with neglectible probe effect. Issues and future work: Accounting of single-threaded run-times only. Enhance the calibration of simulation models. L. Soares and J. Pereira PMCCS’05 - September 24

  17. Contact Information People L. Soares - losoares@di.uminho.pt J. Pereira - jop@di.uminho.pt Projects GORDA - http://gorda.di.uminho.pt [/community] ESCADA - http://escada.lsd.di.uminho.pt StrongRep - http://strongrep.lsd.di.uminho.pt Research Team GSD - http://gsd.di.uminho.pt L. Soares and J. Pereira PMCCS’05 - September 24

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend