Why Is Random Testing Effective for Partition Tolerance Bugs? Rupak - PowerPoint PPT Presentation

Why Is Random Testing Effective for Partition Tolerance Bugs? Rupak Majumdar, Filip Niksic Max Planck Institute for Software Systems (MPI-SWS)

Despite Many Formal Approaches…

Despite Many Formal Approaches… …practitioners test their code

Despite Many Formal Approaches… …practitioners test their code …by providing random inputs.

Despite Many Formal Approaches… …practitioners test their code …by providing random inputs. And despite our best judgement,

Despite Many Formal Approaches… …practitioners test their code …by providing random inputs. And despite our best judgement, …testing is surprisingly e ff ective in finding bugs .

Despite Many Formal Approaches… …practitioners test their code …by providing random inputs. And despite our best judgement, …testing is surprisingly e ff ective in finding bugs . We explore this unexpected e ff ectiveness   in testing distributed systems under partition faults.

Jepsen: Call Me Maybe A framework for black-box testing of distributed systems   by randomly inserting network partition faults Analyses on http://jepsen.io/: etcd, Postgres, Redis, Riak, MongoDB, Cassandra, Kafka, RabbitMQ, Consul, Elasticsearch, Aerospike, Zookeeper, Chronos…

1. General Random Testing Framework 2. Randomly Testing Distributed Systems 3. Wider Context: Combinatorial Testing

Tests and Goal Coverage Tests T Goals G

Tests and Goal Coverage A test covers   some goals Tests T Goals G

Tests and Goal Coverage A test covers   some goals Tests T Goals G Covering family = Set of tests that cover all goals

Tests and Goal Coverage A test covers   some goals Tests T Goals G Covering family = Set of tests that cover all goals “Small” covering families = E ffi cient testing

Random Testing Pick a random test from T Fix a goal from G Suppose P[ covers ] ≥ p Characterize covering families with respect to p and |G|

Probabilistic Method Let G be the set of goals and P[ random covers ] ≥ p Theorem. There exists a covering family of size p -1 log|G| .

Probabilistic Method Let G be the set of goals and P[ random covers ] ≥ p Theorem. There exists a covering family of size p -1 log|G| . Proof. P[ random does not cover ] ≤ 1 - p

Probabilistic Method Let G be the set of goals and P[ random covers ] ≥ p Theorem. There exists a covering family of size p -1 log|G| . Proof. P[ random does not cover ] ≤ 1 - p P[ K independent do not cover ] ≤ (1 - p) K

Probabilistic Method Let G be the set of goals and P[ random covers ] ≥ p Theorem. There exists a covering family of size p -1 log|G| . Proof. P[ random does not cover ] ≤ 1 - p P[ K independent do not cover ] ≤ (1 - p) K P[ K independent are not a covering family ] ≤ |G| (1 - p) K

Probabilistic Method Let G be the set of goals and P[ random covers ] ≥ p Theorem. There exists a covering family of size p -1 log|G| . Proof. P[ random does not cover ] ≤ 1 - p P[ K independent do not cover ] ≤ (1 - p) K P[ K independent are not a covering family ] ≤ |G| (1 - p) K For K = p -1 log|G| , this probability is strictly less than 1 . Therefore, there must exist K tests that are a covering family!

Probabilistic Method Let G be the set of goals and P[ random covers ] ≥ p Theorem. There exists a covering family of size p -1 log|G| . Theorem. For ϵ > 0 , a random family of p -1 log|G| + p -1 log ϵ -1 tests is a covering family with probability at least 1 - ϵ .

Random Testing Framework 1. What are tests? 2. What are   testing goals? Tests T Goals G 3. What is the notion of coverage? 4. Can we bound P[ random covers ] ?

1. General Random Testing Framework 2. Randomly Testing Distributed Systems 3. Wider Context: Combinatorial Testing

Ninjas in Training In a dojo in Kaiserslautern, n ninjas are in training. Training is complete if for every pair of ninjas, there is a round where they are in opposing teams. How many rounds make the training complete? … 1 2 3 n … Round 1: … Round 2:

Ninjas in Training In a dojo in Kaiserslautern, n ninjas are in training. Training is complete if for every pair of ninjas, there is a round where they are in opposing teams. How many rounds make the training complete? … 1 2 3 n Round 1: … … Round 2:

Ninjas in Training In a dojo in Kaiserslautern, n ninjas are in training. Training is complete if for every pair of ninjas, there is a round where they are in opposing teams. How many rounds make the training complete? • Naïve: O(n 2 ) • Can you do it in log n rounds?

Ninjas in Training More generally, n ninjas are training in k teams. Training is complete if for every choice of k ninjas, there is a round where they are each in different team. How many rounds make the training complete? … 1 2 3 n … Round 1: … Round 2:

Ninjas in Training More generally, n ninjas are training in k teams. Training is complete if for every choice of k ninjas, there is a round where they are each in different team. How many rounds make the training complete? • Naïve: O(n k ) • Can you do it in k k+1 (k!) -1 log n rounds?

From Training Ninjas to   Distributed Systems with Partition Faults ninjas nodes in a network teams blocks in a partition rounds partitions covering family complete training

Splitting Coverage Given n nodes and k ≤ n : • Tests are partitions of nodes into k blocks: P = {B 1 , …, B k } • Testing goals are sets of k nodes: S = {x 1 , …, x k } • P covers S if P splits S: x 1 ∈ B 1 , …, x k ∈ B k Covering families are called k-splitting families here  

A Bug in Chronos • A distributed fault-tolerant job scheduler • Works in conjunction with Mesos and Zookeeper • Three special nodes: Chronos leader, Mesos leader, Zookeeper leader Mesos leader Zookeeper Chronos leader leader

Splitting Coverage Given n nodes and k ≤ n : • Number of partitions with k blocks: � n ≈ k n k k ! ≈ n k • Number of sets of k nodes: � n � k k ! p = k n − k • Splitting a set with a random partition: k } ≈ k ! { n k k By the general theorem, there exists a k -splitting family   of size k k+1 (k!) -1 log n

Effectiveness of Jepsen Theorem. For ϵ > 0 , a random family of partitions of size   k k+1 (k!) -1 log n + k k (k!) -1 log ϵ -1 is a k -splitting family with probability at least 1 - ϵ . For Chronos, with n = 5 , k = 2 , ϵ = 0.2 : a family of 10 randomly chosen partitions is splitting with probability 80%

Why Is Random Testing Effective for Partition Tolerance Bugs? Rupak - PowerPoint PPT Presentation

Why Is Random Testing Effective for Partition Tolerance Bugs? Rupak Majumdar, Filip Niksic Max Planck Institute for Software Systems (MPI-SWS) Despite Many Formal Approaches Despite Many Formal Approaches practitioners test their

Random Numbers RANDOM VS PSEUDO RANDOM Truly Random numbers From Wolfram: A random number

SOCIAL JUSTICE 101: Based on Materials from Teaching Tolerance About Teaching Tolerance

Adaptability and Fault Tolerance Adaptability and Fault Tolerance Rog rio rio de Lemos de

General Principles of Fault- Tolerance Daniel Gottesman Perimeter Institute Whats Left For

Roadmap for Section 10.1 The Notion of Fault-Tolerance Fault-Tolerance Support in NTFS Volume

Levels of Testing Chapter 12 Beyond unit testing Developer Testing stages Unit testing

Testing Terminology System testing Types of errors Function testing Structure

Fault Tolerance at Speed Todd L. Montgomery @toddlmontgomery About me What type of Fault

I. FAQ S Q. What is partition? Partition is a proceeding in equity to determine the way in

On partition identities of Capparelli and Primc Jehanne Dousse CNRS and Universit e Lyon 1

IT1100 : Introduction to Operating Systems Chapter 15 What is a partition? A partition is just a

SET-PARTITION TABLEAUX Tom Halverson Macalester College FPSAC 2019 Ljubljana July 2, 2019

Information Information partition Player 's information partition is a collection of his

GUID Partition Table (GPT) A Forensic Perspective Villanova University Department of

Vector-partition functions Matthias Beck San Francisco State University math.sfsu.edu/beck

Minimal k -partition for the p -norm of the eigenvalues V. Bonnaillie-No el DMA, CNRS, ENS

Fault Tolerance, Replication, and Consistency 1 Motivation: Hadoop Cluster 2 Motivation:

AFT: A Serverless Fault- Tolerance Shim Vikram Sreekanti , Chenggang Wu, Saurav Chhatrapati,

Fault Tolerant Distributed Main Memory Systems CompSci 590.04

Distributed Systems Fault Tolerance Paul Krzyzanowski Except as otherwise noted, the content of

Automated combination of tolerance and control flow integrity countermeasures against multiple

Cross Link Insertion for Improving Tolerance to Variations in Clock Network Synthesis Tarun

Adjustable flat layouts for Two- Failure Tolerant Storage Systems Thomas Schwarz, SJ Marquette

NOVA-Fortis: A Fault-Tolerant Non- Volatile Main Memory File System Jian Andiry Xu, Lu Zhang ,