DOMINO: Fast and Effective Test Data Generation for Relational - - PowerPoint PPT Presentation

domino fast and effective test data generation for
SMART_READER_LITE
LIVE PREVIEW

DOMINO: Fast and Effective Test Data Generation for Relational - - PowerPoint PPT Presentation

DOMINO: Fast and Effective Test Data Generation for Relational Database Schemas Abdullah Alsharif, Gregory M. Kapfhammer, and Phil McMinn 1 2 Database Schema 3 4 5 6 7 8 9 10 Testing Database Schemas Motivation Industrial


slide-1
SLIDE 1

DOMINO: Fast and Effective Test Data Generation for Relational Database Schemas

Abdullah Alsharif, Gregory M. Kapfhammer, and Phil McMinn

1

slide-2
SLIDE 2

2

slide-3
SLIDE 3

3

Database Schema

slide-4
SLIDE 4

4

slide-5
SLIDE 5

5

slide-6
SLIDE 6

6

slide-7
SLIDE 7

7

slide-8
SLIDE 8

8

slide-9
SLIDE 9

9

slide-10
SLIDE 10

10

slide-11
SLIDE 11

Testing Database Schemas Motivation

  • Industrial practitioners recommend testing databases (S. Guz, 2011)
  • Databases schema, if changed, it need to be tested
  • If the DBMS is changed, we need to test schemas behaviour
  • Forgetting to add a UNIQUE to a column will duplicate data within a database

11

slide-12
SLIDE 12

PRIMARY KEY constraint must be NOT NULL

12

slide-13
SLIDE 13

PRIMARY KEY constraint must be NOT NULL SQLite allows NULLs in a PRIMARY KEY column

13

slide-14
SLIDE 14

PRIMARY KEY constraint must be NOT NULL SQLite allows NULLs in a PRIMARY KEY column Follows the standard

14

slide-15
SLIDE 15

PRIMARY KEY constraint must be NOT NULL SQLite allows NULLs in a PRIMARY KEY column Follows the standard DEVELOPMENT TO PRODUCTION DEPLOYMENT ISSUES!

15

slide-16
SLIDE 16

16

slide-17
SLIDE 17

1) INSERT INTO products(product_no, name, price, discounted_price) VALUES (0, 'ijyv', 638, 168) 2) INSERT INTO orders(order_id, shipping_address) VALUES (192, 'mrus') 3) INSERT INTO order_items(product_no, order_id, quantity) VALUES (0, 192, 750) 4) INSERT INTO products(product_no, name, price, discounted_price) VALUES (-602, 'ehm', 960, 126) 5) INSERT INTO orders(order_id, shipping_address) VALUES (0, 'u') 6) INSERT INTO order_items(product_no, order_id, quantity) VALUES (0, 192, 64)

17

Manual Testing

slide-18
SLIDE 18

1) INSERT INTO products(product_no, name, price, discounted_price) VALUES (0, 'ijyv', 638, 168) 2) INSERT INTO orders(order_id, shipping_address) VALUES (192, 'mrus') 3) INSERT INTO order_items(product_no, order_id, quantity) VALUES (0, 192, 750) 4) INSERT INTO products(product_no, name, price, discounted_price) VALUES (-602, 'ehm', 960, 126) 5) INSERT INTO orders(order_id, shipping_address) VALUES (0, 'u') 6) INSERT INTO order_items(product_no, order_id, quantity) VALUES (-602, 192, 64)

18

Manual Testing

slide-19
SLIDE 19

Manual Database Schema Testing is Challenging

19

slide-20
SLIDE 20

Automated Test Data Generation - Background

  • SchamaAnalyst is a framework that generates

test data for database schemas.

  • It has two data generators:

○ Random+ that uses a pool of constants. ○ The state of the art generator uses Alternating Variable Method (AVM).

  • It searches for a value for each column involved

in the INSERT statement.

20

http://schemaanalyst.org

slide-21
SLIDE 21
  • There are two variants of AVM:

○ AVM-Random which uses random values as a starting point for the first generation. ○ AVM-Defaults which uses default values (i.e., empty strings for string and 0s for numerics) as a starting point for the first generation. This helps optimise test generation timing.

  • The search is evaluated depending on the test

requirement, which what drives the search (i.e., fitness function).

Alternating Variable Method - Background

21

http://schemaanalyst.org

slide-22
SLIDE 22

Algorithms

Random+ AVM

22

slide-23
SLIDE 23

1) INSERT INTO products(product_no, name, price, discounted_price) VALUES(10, 'abc', 2, 1); 2) INSERT INTO products(product_no, name, price, discounted_price) VALUES(10, NULL, -1, -1);

23

slide-24
SLIDE 24

1) INSERT INTO products(product_no, name, price, discounted_price) VALUES(10, 'abc', 2, 1); 2) INSERT INTO products(product_no, name, price, discounted_price) VALUES(10, NULL, -1, -1);

24

slide-25
SLIDE 25

1) INSERT INTO products(product_no, name, price, discounted_price) VALUES(10, 'abc', 2, 1); 2) INSERT INTO products(product_no, name, price, discounted_price) VALUES(10, NULL, -1, -1); 10 11

25

slide-26
SLIDE 26

1) INSERT INTO products(product_no, name, price, discounted_price) VALUES(10, 'abc', 2, 1); 2) INSERT INTO products(product_no, name, price, discounted_price) VALUES(10, NULL, -1, -1); 10 NULL ‘def’ 11

26

slide-27
SLIDE 27

1) INSERT INTO products(product_no, name, price, discounted_price) VALUES(10, 'abc', 2, 1); 2) INSERT INTO products(product_no, name, price, discounted_price) VALUES(10, NULL, -1, -1); 10 NULL ‘def’ 11

  • 1

1

27

slide-28
SLIDE 28

1) INSERT INTO products(product_no, name, price, discounted_price) VALUES(10, 'abc', 2, 1); 2) INSERT INTO products(product_no, name, price, discounted_price) VALUES(10, NULL, -1, -1); 10 NULL ‘def’ 11

  • 1

1

  • 1

1

28

slide-29
SLIDE 29

1) INSERT INTO products(product_no, name, price, discounted_price) VALUES(10, 'abc', 2, 1); 2) INSERT INTO products(product_no, name, price, discounted_price) VALUES(10, NULL, -1, -1); 10 NULL ‘def’ 11

  • 1

1

  • 1

1 1 > 1 2 2 > 1

29

slide-30
SLIDE 30

Automated Test Generation - Prior Work

Coverage <= 70% Coverage > 70% Coverage == 100%

30

Based on Integrity Constraint Coverage Criterion (McMinn et al, 2015)

slide-31
SLIDE 31

AVM Inefficiencies

31

slide-32
SLIDE 32

AVM Inefficiencies

32

Can we improve ?

slide-33
SLIDE 33

Domain Specific Operators

33

Copying Values

slide-34
SLIDE 34

Domain Specific Operators

34

Copying Values Flipping NULLs

slide-35
SLIDE 35

Domain Specific Operators

35

Copying Values Flipping NULLs Randomise

slide-36
SLIDE 36

DOMINO stands for DOMain-specific approach to INtegrity cOnstraint test data generation

36

http://schemaanalyst.org

slide-37
SLIDE 37

Algorithms

DOMINO AVM

37

slide-38
SLIDE 38

It is like playing DOMINO

38

Jessica Peterson/Getty Images

slide-39
SLIDE 39

39

slide-40
SLIDE 40

40

slide-41
SLIDE 41

INSERT INTO products(product_no, name, price, discounted_price) VALUES(10, 'abc', 2, 1); INSERT INTO orders(order_id, shipping_address) VALUES(100, 'uvw'); INSERT INTO order_items(product_no, order_id, quantity) VALUES(-300, 80, 2);

41

slide-42
SLIDE 42

INSERT INTO products(product_no, name, price, discounted_price) VALUES(10, 'abc', 2, 1); INSERT INTO orders(order_id, shipping_address) VALUES(100, 'uvw'); INSERT INTO order_items(product_no, order_id, quantity) VALUES(-300, 80, 2);

42

slide-43
SLIDE 43

INSERT INTO products(product_no, name, price, discounted_price) VALUES(10, 'abc', 2, 1); INSERT INTO orders(order_id, shipping_address) VALUES(100, 'uvw'); INSERT INTO order_items(product_no, order_id, quantity) VALUES(-300, 80, 2); Copying Values

43

slide-44
SLIDE 44

INSERT INTO products(product_no, name, price, discounted_price) VALUES(10, 'abc', 2, 1); INSERT INTO orders(order_id, shipping_address) VALUES(100, 'uvw'); INSERT INTO order_items(product_no, order_id, quantity) VALUES(10, 80, 2); Copying Values

44

slide-45
SLIDE 45

INSERT INTO products(product_no, name, price, discounted_price) VALUES(10, 'abc', 2, 1); INSERT INTO orders(order_id, shipping_address) VALUES(100, 'uvw'); INSERT INTO order_items(product_no, order_id, quantity) VALUES( 10, 80, 2); Copying Values

45

slide-46
SLIDE 46

INSERT INTO products(product_no, name, price, discounted_price) VALUES(10, 'abc', 2, 1); INSERT INTO orders(order_id, shipping_address) VALUES(100, 'uvw'); INSERT INTO order_items(product_no, order_id, quantity) VALUES( 10, 100, 2); Copying Values

46

slide-47
SLIDE 47

INSERT INTO products(product_no, name, price, discounted_price) VALUES(10, 'abc', 2, 1); INSERT INTO orders(order_id, shipping_address) VALUES(100, 'uvw'); INSERT INTO order_items(product_no, order_id, quantity) VALUES( 10, 100, 2); Copying Values

47

slide-48
SLIDE 48

Flipping NULLs 1) INSERT INTO products(product_no, name, price, discounted_price) VALUES(10, 'abc', 2, 1); 2) INSERT INTO products(product_no, name, price, discounted_price) VALUES(10, ‘def’, 2, 1);

48

slide-49
SLIDE 49

1) INSERT INTO products(product_no, name, price, discounted_price) VALUES(10, 'abc', 2, 1); 2) INSERT INTO products(product_no, name, price, discounted_price) VALUES(10, ‘def’, 2, 1); 1) INSERT INTO products(product_no, name, price, discounted_price) VALUES(10, 'abc', 2, 1); 2) INSERT INTO products(product_no, name, price, discounted_price) VALUES(10, NULL, 2, 1); Flipping NULLs

49

slide-50
SLIDE 50

1) INSERT INTO products(product_no, name, price, discounted_price) VALUES(10, 'abc', 2, 1); 2) INSERT INTO products(product_no, name, price, discounted_price) VALUES(10, ‘def’, 2, 1); 1) INSERT INTO products(product_no, name, price, discounted_price) VALUES(10, 'abc', 2, 1); 2) INSERT INTO products(product_no, name, price, discounted_price) VALUES(28, NULL, 2, 1); Randomise

50

slide-51
SLIDE 51

1) INSERT INTO products(product_no, name, price, discounted_price) VALUES(10, 'abc', 2, 1); 2) INSERT INTO products(product_no, name, price, discounted_price) VALUES(10, ‘def’, 2, 1); 1) INSERT INTO products(product_no, name, price, discounted_price) VALUES(10, 'abc', 2, 1); 2) INSERT INTO products(product_no, name, price, discounted_price) VALUES(28, ‘def’, 2, 1); Randomise

51

slide-52
SLIDE 52

Can we get the best of two worlds?

52

http://schemaanalyst.org

slide-53
SLIDE 53

Hybrid Technique

  • DOMINO still uses the pool of constants and

random picking to solve CHECK constraints.

  • AVM is a guided search technique that can help

solve CHECK constraints more efficiently.

53

slide-54
SLIDE 54

Research Question 1 - Effectiveness and Efficiency

VS

54

slide-55
SLIDE 55

Research Question 2 - Fault-Finding Effectiveness

VS

55

slide-56
SLIDE 56

Research Question 3 - DOMINO-AVM Technique

VS

56

slide-57
SLIDE 57

Experimental Setup

57

slide-58
SLIDE 58

Experimental Setup

34 Schemas 1 to 42 tables 3 to 309 columns 590 ICs

58

slide-59
SLIDE 59

Experimental Setup

34 Schemas 1 to 42 tables 3 to 309 columns 590 ICs

59

30 Runs

slide-60
SLIDE 60

60

slide-61
SLIDE 61

DOMINO’s test coverage scores are either equal to or higher than those of tests from either AVM variant

61

slide-62
SLIDE 62

62

slide-63
SLIDE 63

Test suite generation with DOMINO is faster than both

  • f the state-of-the-art AVM methods

63

slide-64
SLIDE 64

Answering RQ1

64

>

slide-65
SLIDE 65

65

slide-66
SLIDE 66

DOMINO achieved significantly higher mutation scores than the state-of-the-art AVM-Defaults and competitive with AVM-Random

66

slide-67
SLIDE 67

Answering RQ2

67

>

slide-68
SLIDE 68

68

DOMINO-AVM is slower because AVM has

  • verheads
slide-69
SLIDE 69

69

slide-70
SLIDE 70
  • DOMINO-AVM is significantly better in regard of fault finding than DOMINO

for two DBMSs (PostgreSQL & HyperSQL) in just a few cases

○ DOMINO-AVM will generate more diverse data for CHECK constraints ○ DOMINO uses pool of constants which less diverse than guided search ○ We found that constants impact relational operators within CHECK constraints

Results - Hybrid (DOMINO-AVM)

70

slide-71
SLIDE 71

Answering RQ3

  • Using AVM has a potential to improve the generation of data involving

CHECK constraints is only of benefit for a few cases

  • However, the use of random search, as employed by DOMINO, achieves

similar results to DOMINO-AVM in a shorter amount of time

71

slide-72
SLIDE 72
  • This paper introduced DOMINO, a method for automatically generating test

data that systematically exercise the integrity constraints in relational database schemas

  • DOMINO uses domain-specific operators and it is extremely competitive and

faster to the state-of-the-art methods

  • In future, we will look at adding readability to the diverse generated data to

help with maintainability

  • We are planning to compare DOMINO with more techniques (e.g.,

Evolutionary Algorithms or/and constraint solvers)

Conclusion and Future Work

72

slide-73
SLIDE 73

73