how to test the ability of large scale distributed
play

How to Test the Ability of Large-Scale, Distributed Software Systems - PowerPoint PPT Presentation

How to Test the Ability of Large-Scale, Distributed Software Systems to Cope with Failures Pavel Lipsky Dell Technologies 06/04/2019 Who am I? Pavel Lipsky Before 2005 Building scalable web sites From 2005 to 2014 Test automation and


  1. How to Test the Ability of Large-Scale, Distributed Software Systems to Cope with Failures Pavel Lipsky 
 Dell Technologies 
 06/04/2019

  2. Who am I? Pavel Lipsky Before 2005 Building scalable web sites From 2005 to 2014 Test automation and DevOps From 2014 Performance and reliability of large-scale, https://github.com/leapsky distributed systems � 2

  3. Agenda • What is Fault Injection? • Test Object • Stories & Demos - https://github.com/leapsky • Tools & Frameworks � 3

  4. Story 1 Memcached

  5. Fetching Data from Memcached Memcached 2 4 1 Application 5 3 Database � 5

  6. Changing Data in Memcached Database 2 1 Application 5 3 4 Memcached � 6

  7. Types of Software Testing Functional testing Load testing Fault Injection Security testing Usability testing � 7

  8. Story 2

  9. Payments for Goods with Payment Cards Issued by Russian Banks � 9

  10. New IT Platform Horizontal scaling • Using open-source software • Affordable low-end hardware • Reliability • Storing data in RAM • � 10

  11. GridGain Enterprise • SQL support • Quick access to objects by key • In-memory computing • Persistent Data Store • Strong consistency • Failure resistance • Horizontal scalability • … � 11

  12. Forcing a System to Fail “Without explicitly forcing a system to fail, it is unreasonable to have any confidence it will operate correctly in failure modes.” Caitie McCaffrey (Backed Brat & Distributed Systems Diva), The Verification of a Distributed System � 12

  13. Story 3 Lost Updates

  14. Example of Fund Transfer 1. read(A) 
 2. A := A - 50 
 3. write(A) 
 4. read(B) 
 5. B := B + 50 
 6. write(B) 
 � 14

  15. Fund Transfers Between Bank Accounts 3 4 $100 $100 5 2 $50 $100 $100 $95 $18 1 6 $100 $100 $3 $20 10 7 $100 $100 8 9 $100 $100 � 15

  16. Demo Time Lost Updates

  17. Lost Updates А := $50 Task Task 2 1 T1 read(A) 
 read(A) 
 T2 A:= A - 50 A := A - 50 T3 write(A) T4 write(A) T5 … … Expected value of А is $50 Real value of A is $0 � 17

  18. Story 4 ACID

  19. 
 ACID Properties 1. read(A) 
 • Atomicity 2. A := A - 50 
 • Consistenc 3. write(A) 
 y 4. read(B) 
 5. B := B + 50 
 • Isolation 6. write(B) 
 • Durability � 19

  20. Isolation Levels and the ANSI/ISO SQL Standard Isolation Levels Dirty Read Non-Repeatable Read Phantom Read READ Permitted Permitted Permitted UNCOMMITTED READ COMMITTED -- Permitted Permitted REPEATABLE READ -- -- Permitted SERIALIZABLE -- -- -- � 20

  21. READ_COMMITTED А := $50 Transaction 1 Transaction 2 T1 read(A) 
 read(A) 
 T2 A:= A - 50 A := A + 50 T3 write(A) T4 commit write(A) T5 … commit Expected value of А is $50 Real value of A is $100 � 21

  22. Apache Ignite Concurrency Modes and Isolation Levels Concurrency Modes Isolation Levels • PESSIMISTIC • READ_COMMITTE D • OPTIMISTIC • REPEATABLE_REA D • SERIALIZABLE � 22

  23. Apache Ignite Documentation: Concurrency Modes and Isolation Levels PESSIMISTIC REPEATABLE_READ - Entry lock is acquired and data is fetched from the primary node on the first read or write access and stored in the local transactional map. All consecutive access to the same data is local and will return the last read or updated transaction value. This means no other concurrent transactions can make changes to the locked data, and you are getting Repeatable Reads for your transaction. OPTIMISTIC SERIALIZABLE - Stores an entry version upon first read access. Ignite will fail a transaction at the commit stage if the Ignite engine detects that at least one of the entries used as part of the initiated transaction has been modified. � 23

  24. Demo Time Transactions

  25. .txStart(CONCURRENCY_MODE, ISOLATION_LEVEL) try (Transaction tx = ignite .transactions().txStart( OPTIMISTIC , SERIALIZABLE )) { 
 Account fromAccount = cache .get(fromAccountId); 
 Account toAccount = cache .get(toAccountId); 
 ... 
 tx.commit(); 
 } 
 � 25

  26. Story 5 Testing Under Load

  27. Performance Testing Tools � 27

  28. Demo Time

  29. What cache mode to choose? PARTIONED REPLICATED 1 2 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 3 4 Primary copy Backup copy 1 2 3 4 3 4 1 2 Primary copy Backup copy � 29

  30. .txStart(CONCURRENCY_MODE, ISOLATION_LEVEL) CacheConfiguration<Integer, Account> cfg = new CacheConfiguration<>( CACHE_NAME ); 
 cfg.setAtomicityMode(CacheAtomicityMode. TRANSACTIONAL ); 
 cfg.setCacheMode(CacheMode. PARTITIONED ); 
 cfg.setBackups(2); 
 � 30

  31. Demo Time

  32. Jepsen Test lein run test \ --test bank \ --time-limit 60 \ --concurrency 5 \ --nodes-file nodes \ --username root \ --password root \ --cache-mode PARTITIONED \ --cache-atomicity-mode TRANSACTIONAL \ --cache-write-sync-mode FULL_SYNC \ --read-from-backup YES \ --transaction-concurrency PESSIMISTIC \ --transaction-isolation REPEATABLE_READ \ --backups 2 \ --pds true \ --version 2.7.0 \ --os debian \ --nemesis kill-node � 32

  33. Story 6 Disruptive Scenarios

  34. Node failure Application crash Hardware crash JVM crash OS crash � 34

  35. Disruptive Scenarios • Hardware • Network • Application • Other scenarios � 35

  36. Disruptive Scenarios: Hardware Data Center Data Center #1 #2 1 2 3 4 3 4 5 6 5 6 7 8 7 8 1 2 2 3 4 5 4 5 6 7 6 7 8 1 8 1 2 3 Primary partition Backup partitions � 36

  37. Disruptive Scenarios: Hardware Data Center Data Center #1 #2 1 2 3 4 3 4 5 6 5 6 7 8 7 8 1 2 2 3 4 5 4 5 6 7 6 7 8 1 8 1 2 3 Primary partition Backup partitions � 37

  38. Disruptive Scenarios: Network • iptables • NetEm emulates: • network delays with different distribution functions • packet loss • repeat packets • reordering of packets • packet distortion � 38

  39. Disruptive Scenarios: Network Data Center Data Center #2 #1 1 2 3 4 3 4 5 6 5 6 7 8 7 8 1 2 2 3 4 5 4 5 6 7 6 7 8 1 8 1 2 3 � 39

  40. Disruptive Scenarios: Application � 40

  41. Disruptive Scenarios: Application Presentation Layer (UI) Integration Layer (Kafka & ZeroMQ) Business Modules Data Storage & Computing (GridGain) Logging, Access Granting � 41

  42. Disruptive Scenarios: Other Scenarios � 42

  43. Tools to start using Fault Injection Code examples Load testing tools https://github.com/leapsky/FaultInjectionExamples JMeter - https:// jmeter.apache.org Frameworks Configuration Management Jepsen - https://github.com/jepsen-io/jepsen Chaos Monkey - https://github.com/Netflix/SimianArmy/wiki/Chaos- Ansible - https:// Monkey docs.ansible.com Puppet - https://puppet.com Linux Utilities NetEm (tc) - https://wiki.linuxfoundation.org/networking/netem stress-ng - https://manned.org/stress-ng/fd34c972 Iperf - https://iperf.fr kill -9 � 43 iptables

  44. Lessons Learned • Fault Injection is the art of explicitly forcing a system to fail to make sure that it will operate correctly in failure modes. • No risk - no test! • Test results must be clear and unambiguous . • The closer your test environments match your production environments , the more accurate your testing will be. � 44

  45. Thank you! Questions? Pavel Lipsky pavel.lipsky@gmail.com https://github.com/jepsen-io/jepsen/tree/master/ignite https://github.com/leapsky/ � 45

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend