Specialised vs Declarative Data Mining Software Testing - - PowerPoint PPT Presentation

specialised vs declarative data mining
SMART_READER_LITE
LIVE PREVIEW

Specialised vs Declarative Data Mining Software Testing - - PowerPoint PPT Presentation

Specialised vs Declarative Data Mining Software Testing Applications Nadjib Lazaar , CNRS, University of Montpellier Join works with: M. Maamar, Y. Lebbah, S. Loudni, C. Bessiere, et. al. SIMULA, Oslo, 11 oct. 2018 DATA MINING 2 DATA


slide-1
SLIDE 1

Specialised vs Declarative Data Mining

Software Testing Applications

Nadjib Lazaar, CNRS, University of Montpellier

Join works with: M. Maamar, Y. Lebbah, S. Loudni, C. Bessiere, et. al.

SIMULA, Oslo, 11 oct. 2018

slide-2
SLIDE 2

DATA MINING

2

slide-3
SLIDE 3

DATA MINING

➤ Data Mining (DM) or Knowledge Discovery in Databases

(KDD) revolves around the investigation and creation of knowledge, processes, algorithms, and the mechanisms for retrieving potential knowledge from data collections.

2

slide-4
SLIDE 4

DATA MINING

➤ Data Mining (DM) or Knowledge Discovery in Databases

(KDD) revolves around the investigation and creation of knowledge, processes, algorithms, and the mechanisms for retrieving potential knowledge from data collections. Mining on:

➤ Itemsets (Finding itemsets from a collection of transactions)

2

slide-5
SLIDE 5

DATA MINING

➤ Data Mining (DM) or Knowledge Discovery in Databases

(KDD) revolves around the investigation and creation of knowledge, processes, algorithms, and the mechanisms for retrieving potential knowledge from data collections. Mining on:

➤ Itemsets (Finding itemsets from a collection of transactions) ➤ Sequences (Finding subsequences from collection of

sequences)

2

slide-6
SLIDE 6

DATA MINING

➤ Data Mining (DM) or Knowledge Discovery in Databases

(KDD) revolves around the investigation and creation of knowledge, processes, algorithms, and the mechanisms for retrieving potential knowledge from data collections. Mining on:

➤ Itemsets (Finding itemsets from a collection of transactions) ➤ Sequences (Finding subsequences from collection of

sequences)

➤ Graphs (Finding subgraphs from collection of graphs)

2

slide-7
SLIDE 7

DATA MINING

➤ Data Mining (DM) or Knowledge Discovery in Databases

(KDD) revolves around the investigation and creation of knowledge, processes, algorithms, and the mechanisms for retrieving potential knowledge from data collections. Mining on:

➤ Itemsets (Finding itemsets from a collection of transactions) ➤ Sequences (Finding subsequences from collection of

sequences)

➤ Graphs (Finding subgraphs from collection of graphs) ➤ Tree, Geometric structures…

2

slide-8
SLIDE 8

DATA MINING APPLICATIONS

3

slide-9
SLIDE 9

DATA MINING APPLICATIONS

➤ Market Basket Analysis [Agrawal93]

3

slide-10
SLIDE 10

DATA MINING APPLICATIONS

➤ Market Basket Analysis [Agrawal93] ➤ Future Healthcare ➤ Great potential to improve health systems [Obenshain04]

3

slide-11
SLIDE 11

DATA MINING APPLICATIONS

➤ Market Basket Analysis [Agrawal93] ➤ Future Healthcare ➤ Great potential to improve health systems [Obenshain04] ➤ Education ➤ Knowledge

from data educational environments [Scheuer12]

3

slide-12
SLIDE 12

DATA MINING APPLICATIONS

➤ Market Basket Analysis [Agrawal93] ➤ Future Healthcare ➤ Great potential to improve health systems [Obenshain04] ➤ Education ➤ Knowledge

from data educational environments [Scheuer12]

➤ Fraud and Intrusion detection [Wang10] [Lee98]

3

slide-13
SLIDE 13

DATA MINING APPLICATIONS

➤ Market Basket Analysis [Agrawal93] ➤ Future Healthcare ➤ Great potential to improve health systems [Obenshain04] ➤ Education ➤ Knowledge

from data educational environments [Scheuer12]

➤ Fraud and Intrusion detection [Wang10] [Lee98] ➤ Lie detection and Criminal Investigation [Chen04]

3

slide-14
SLIDE 14

DATA MINING APPLICATIONS

➤ Market Basket Analysis [Agrawal93] ➤ Future Healthcare ➤ Great potential to improve health systems [Obenshain04] ➤ Education ➤ Knowledge

from data educational environments [Scheuer12]

➤ Fraud and Intrusion detection [Wang10] [Lee98] ➤ Lie detection and Criminal Investigation [Chen04] ➤ Bio Informatics [Hoffman97]

3

slide-15
SLIDE 15

DATA MINING APPLICATIONS

➤ Market Basket Analysis [Agrawal93] ➤ Future Healthcare ➤ Great potential to improve health systems [Obenshain04] ➤ Education ➤ Knowledge from data educational environments

[Scheuer12]

➤ Fraud and Intrusion detection [Wang10] [Lee98] ➤ Lie detection and Criminal Investigation [Chen04] ➤ Bio Informatics [Hoffman97] ➤ …

3

slide-16
SLIDE 16

DATA MINING APPLICATIONS

4

Bio- Informatics Marketing

Mining process Inputs Outputs

  • Custumers behavior
  • Frequent products
  • shopping basket
  • Protein structure prediction
  • Cancer classification
  • DNA sequencing
  • Classes of genes

Software Engineering

  • Program comprehension
  • Fault localization/prediction
  • Execution traces
  • Flow diagram
  • Source code

Aurora Project

slide-17
SLIDE 17

DATA MINING APPLICATIONS

4

Bio- Informatics Marketing

Mining process Inputs Outputs

  • Custumers behavior
  • Frequent products
  • shopping basket
  • Protein structure prediction
  • Cancer classification
  • DNA sequencing
  • Classes of genes

Software Engineering

  • Program comprehension
  • Fault localization/prediction
  • Execution traces
  • Flow diagram
  • Source code

Aurora Project

slide-18
SLIDE 18

DATA MINING APPLICATIONS

4

Bio- Informatics Marketing

Mining process Inputs Outputs

  • Custumers behavior
  • Frequent products
  • shopping basket
  • Protein structure prediction
  • Cancer classification
  • DNA sequencing
  • Classes of genes

Software Engineering

  • Program comprehension
  • Fault localization/prediction
  • Execution traces
  • Flow diagram
  • Source code

Aurora Project

slide-19
SLIDE 19

DATA MINING APPLICATIONS

4

Bio- Informatics Marketing

Mining process Inputs Outputs

  • Custumers behavior
  • Frequent products
  • shopping basket
  • Protein structure prediction
  • Cancer classification
  • DNA sequencing
  • Classes of genes

Software Engineering

  • Program comprehension
  • Fault localization/prediction
  • Execution traces
  • Flow diagram
  • Source code

Aurora Project

slide-20
SLIDE 20

FREQUENT ITEMSET MINING

[Agrawal et al, 93]

5

slide-21
SLIDE 21

FREQUENT ITEMSET MINING

➤ Aims at finding regularities in datasets (e.g., shopping

behavior of customers) [Agrawal et al, 93]

5

slide-22
SLIDE 22

FREQUENT ITEMSET MINING

➤ Aims at finding regularities in datasets (e.g., shopping

behavior of customers) In market basket analysis:

➤ Find sets of products that are frequently bought together

[Agrawal et al, 93]

5

slide-23
SLIDE 23

FREQUENT ITEMSET MINING

➤ Aims at finding regularities in datasets (e.g., shopping

behavior of customers) In market basket analysis:

➤ Find sets of products that are frequently bought together

Often found patterns are expressed as association rules, for example:

➤ If a customer buys bread and wine, then she/he will

probably also buy cheese. [Agrawal et al, 93]

5

slide-24
SLIDE 24

FREQUENT ITEMSET MINING (PROBLEM)

6

slide-25
SLIDE 25

FREQUENT ITEMSET MINING (PROBLEM)

6

➤ Aims at finding regularities in datasets (e.g., shopping

behavior of customers)

slide-26
SLIDE 26

FREQUENT ITEMSET MINING (PROBLEM)

6

➤ Aims at finding regularities in datasets (e.g., shopping

behavior of customers)

➤ Given: ➤ A set of items ➤ A set of transactions overs the items ➤ A minimum support

I = {i1, …, in} T = {t1, …, tm} θ

slide-27
SLIDE 27

FREQUENT ITEMSET MINING (PROBLEM)

6

➤ Aims at finding regularities in datasets (e.g., shopping

behavior of customers)

➤ Given: ➤ A set of items ➤ A set of transactions overs the items ➤ A minimum support ➤ The need: ➤ The set of itemset P s.t.:

I = {i1, …, in} T = {t1, …, tm} θ freq(P) ≥ θ

slide-28
SLIDE 28

STANDARD ITEMSET MINING

7

slide-29
SLIDE 29

STANDARD ITEMSET MINING

t1: B C E F G H t2: A D G t3: A C D H t4: A E F t5: B E F t6: B E F G

7

slide-30
SLIDE 30

STANDARD ITEMSET MINING

t1: B C E F G H t2: A D G t3: A C D H t4: A E F t5: B E F t6: B E F G

7

slide-31
SLIDE 31

STANDARD ITEMSET MINING

t1: B C E F G H t2: A D G t3: A C D H t4: A E F t5: B E F t6: B E F G cover(BEF) = {t1, t5, t6}

7

slide-32
SLIDE 32

STANDARD ITEMSET MINING

t1: B C E F G H t2: A D G t3: A C D H t4: A E F t5: B E F t6: B E F G freq(BEF) = 50 % cover(BEF) = {t1, t5, t6}

7

slide-33
SLIDE 33

STANDARD ITEMSET MINING

t1: B C E F G H t2: A D G t3: A C D H t4: A E F t5: B E F t6: B E F G freq(BEF) = 50 % cover(BEF) = {t1, t5, t6}

➤ Brute force enumeration is infeasible ➤ 128 items 1068 itemsets (atoms in

the universe)

7

slide-34
SLIDE 34

STANDARD ITEMSET MINING

t1: B C E F G H t2: A D G t3: A C D H t4: A E F t5: B E F t6: B E F G freq(BEF) = 50 % cover(BEF) = {t1, t5, t6}

➤ Brute force enumeration is infeasible ➤ 128 items 1068 itemsets (atoms in

the universe)

➤ Several specialised algorithms have

been developed: Apriori, Eclat, FP-Growth, LCM…

7

slide-35
SLIDE 35

STANDARD ITEMSET MINING

t1: B C E F G H t2: A D G t3: A C D H t4: A E F t5: B E F t6: B E F G freq(BEF) = 50 % cover(BEF) = {t1, t5, t6}

➤ Brute force enumeration is infeasible ➤ 128 items 1068 itemsets (atoms in

the universe)

➤ Several specialised algorithms have

been developed: Apriori, Eclat, FP-Growth, LCM…

➤ Dealing with basic user’s constraints:

Frequency, Condensed representations (closedness, maximality,…), Size…

7

slide-36
SLIDE 36

EXAMPLE

8

slide-37
SLIDE 37

EXAMPLE

8

(2I, ⊆ )

slide-38
SLIDE 38

EXAMPLE

8

(2I, ⊆ ) D

slide-39
SLIDE 39

EXAMPLE

8

(2I, ⊆ ) D

θ = 3

slide-40
SLIDE 40

EXAMPLE

8

(2I, ⊆ ) D

θ = 3

slide-41
SLIDE 41

EXAMPLE

8

(2I, ⊆ ) D

θ = 3

Mθ = {P ∈ I| freq(P) ≥ θ ∧ ∀P 0 ⊃ P : freq(P 0) < θ} Maximal

slide-42
SLIDE 42

EXAMPLE

9

(2I, ⊆ ) D

θ = 3

Mθ = {P ∈ I| freq(P) ≥ θ ∧ ∀P 0 ⊃ P : freq(P 0) < θ} Maximal

slide-43
SLIDE 43

EXAMPLE

9

(2I, ⊆ ) D

θ = 3

slide-44
SLIDE 44

EXAMPLE

10

(2I, ⊆ ) D

θ = 3

slide-45
SLIDE 45

EXAMPLE

10

(2I, ⊆ ) D

θ = 3

Mθ = {P ∈ I| freq(P) ≥ θ ∧ ∀P 0 ⊃ P : freq(P 0) < θ} Closedness

slide-46
SLIDE 46

EXAMPLE

11

(2I, ⊆ ) D

θ = 3

Mθ = {P ∈ I| freq(P) ≥ θ ∧ ∀P 0 ⊃ P : freq(P 0) < θ} Closedness

slide-47
SLIDE 47

CONDENSED REPRESENTATION

12

slide-48
SLIDE 48

CONDENSED REPRESENTATION

12

slide-49
SLIDE 49

CONDENSED REPRESENTATION

12

slide-50
SLIDE 50

CONDENSED REPRESENTATION

12

Dataset #Frequent #Closed #Maximal Zoo-1 151 807 3 292 230 Mushroom 155 734 3 287 453 Lymph 9 967 402 46 802 5 191 Hepa;;s 27 . 107 1 827 264 189 205

slide-51
SLIDE 51

SPECIALIZED VS DECLARATIVE DATA MINING

13

slide-52
SLIDE 52

SPECIALIZED VS DECLARATIVE DATA MINING

dataset

13

slide-53
SLIDE 53

SPECIALIZED VS DECLARATIVE DATA MINING

Basic user’s constraints

Query

dataset

13

slide-54
SLIDE 54

SPECIALIZED VS DECLARATIVE DATA MINING

Specialised Miner

+

Basic user’s constraints

Query

dataset

13

slide-55
SLIDE 55

SPECIALIZED VS DECLARATIVE DATA MINING

Specialised Miner

+

Patterns Basic user’s constraints

Query

dataset

13

slide-56
SLIDE 56

SPECIALIZED VS DECLARATIVE DATA MINING

Specialised Miner

+

Patterns Basic user’s constraints

Query

dataset

13

Limitations: Dealing with sophisticated user’s constraints [Wojciechowski and

Zakrzewicz, 02]

slide-57
SLIDE 57

SPECIALIZED VS DECLARATIVE DATA MINING

Specialised Miner

+

Patterns Basic user’s constraints

Query

dataset

13

Limitations: Dealing with sophisticated user’s constraints [Wojciechowski and

Zakrzewicz, 02]

Sophisticated user’s constraints

slide-58
SLIDE 58

SPECIALIZED VS DECLARATIVE DATA MINING

Specialised Miner

+

Patterns Basic user’s constraints

Query

dataset

13

Limitations: Dealing with sophisticated user’s constraints [Wojciechowski and

Zakrzewicz, 02]

Sophisticated user’s constraints

1

preprocessing

slide-59
SLIDE 59

SPECIALIZED VS DECLARATIVE DATA MINING

Specialised Miner

+

Patterns Basic user’s constraints

Query

dataset

13

Limitations: Dealing with sophisticated user’s constraints [Wojciechowski and

Zakrzewicz, 02]

Sophisticated user’s constraints

1

preprocessing

2

post- processing

slide-60
SLIDE 60

SPECIALIZED VS DECLARATIVE DATA MINING

Specialised Miner

+

Patterns Basic user’s constraints

Query

dataset

13

Limitations: Dealing with sophisticated user’s constraints [Wojciechowski and

Zakrzewicz, 02]

Sophisticated user’s constraints

1

preprocessing

2

post- processing

3

new algo

slide-61
SLIDE 61

SPECIALIZED VS DECLARATIVE DATA MINING

Specialised Miner

+

Patterns Basic user’s constraints

Query

dataset

13

Limitations: Dealing with sophisticated user’s constraints [Wojciechowski and

Zakrzewicz, 02]

Sophisticated user’s constraints

1

preprocessing

2

post- processing

3

new algo Need: Declarative way to deal with more complex queries

➤ Declarative data Mining

slide-62
SLIDE 62

SPECIALIZED VS DECLARATIVE DATA MINING

+

Patterns Basic user’s constraints

Query

dataset

14

Limitations: Dealing with sophisticated user’s constraints [Wojciechowski and

Zakrzewicz, 02]

Sophisticated user’s constraints

1

preprocessing

2

post- processing

3

new algo Need: Declarative way to deal with more complex queries

➤ Declarative data Mining

CP model CP solver

+

slide-63
SLIDE 63

SPECIALIZED VS DECLARATIVE DATA MINING

+

Patterns Basic user’s constraints

Query

dataset

14

Limitations: Dealing with sophisticated user’s constraints [Wojciechowski and

Zakrzewicz, 02]

Sophisticated user’s constraints

1

preprocessing

2

post- processing

3

new algo Need: Declarative way to deal with more complex queries

➤ Declarative data Mining

CP model CP solver

+

slide-64
SLIDE 64

SPECIALIZED VS DECLARATIVE DATA MINING

+

Patterns Basic user’s constraints

Query

dataset

14

Limitations: Dealing with sophisticated user’s constraints [Wojciechowski and

Zakrzewicz, 02]

Sophisticated user’s constraints Need: Declarative way to deal with more complex queries

➤ Declarative data Mining

CP model CP solver

+

slide-65
SLIDE 65

SPECIALISED VS DECLARATIVE DATA MINING

15

slide-66
SLIDE 66

SPECIALISED VS DECLARATIVE DATA MINING

15

slide-67
SLIDE 67

SPECIALISED VS DECLARATIVE DATA MINING

15

Specialised is the winner!

slide-68
SLIDE 68

SPECIALISED VS DECLARATIVE DATA MINING

15

Specialised is the winner!

slide-69
SLIDE 69

SPECIALISED VS DECLARATIVE DATA MINING

15

Specialised is the winner! Declarative is the winner!

slide-70
SLIDE 70

SPECIALISED VS DECLARATIVE DATA MINING

16

slide-71
SLIDE 71

SPECIALISED VS DECLARATIVE DATA MINING

16

Preprocessing + Specialised step vs Declarative

slide-72
SLIDE 72

SPECIALISED VS DECLARATIVE DATA MINING

17

slide-73
SLIDE 73

SPECIALISED VS DECLARATIVE DATA MINING

17

Specialised + postprocessing vs Declarative

slide-74
SLIDE 74

CONCLUSIONS (PART I)

18

slide-75
SLIDE 75

CONCLUSIONS (PART I)

➤ Specialised methods are suitable for: ➤ Enumerating Patterns ➤ Taking into account classic constraints (simple queries)

18

slide-76
SLIDE 76

CONCLUSIONS (PART I)

➤ Specialised methods are suitable for: ➤ Enumerating Patterns ➤ Taking into account classic constraints (simple queries) ➤ Declarative methods are suitable for: ➤ Taking into account user’s constraints (complex

queries)

➤ Iterative data mining process

18

slide-77
SLIDE 77

CONCLUSIONS (PART I)

➤ Specialised methods are suitable for: ➤ Enumerating Patterns ➤ Taking into account classic constraints (simple queries) ➤ Declarative methods are suitable for: ➤ Taking into account user’s constraints (complex

queries)

➤ Iterative data mining process

18

Time left?

slide-78
SLIDE 78

FAULT LOCALISATION

19

slide-79
SLIDE 79

FAULT LOCALISATION

➤ The need: identify a subset of statements that are susceptible to

explain a fault in a program

➤ Precision <=> Efficiency

19

slide-80
SLIDE 80

FAULT LOCALISATION

➤ The need: identify a subset of statements that are susceptible to

explain a fault in a program

➤ Precision <=> Efficiency ➤ Spectrum-based approaches: (ranking metrics - suspiciousness

score)

➤ Tarantula [Jones and Harrold 05] ➤ Ochiai [Abreu et al. 07] ➤ Jaccard [Abreu et al. 07] ➤ …

19

slide-81
SLIDE 81

FAULT LOCALISATION (MOTIVATIONS)

20

slide-82
SLIDE 82

FAULT LOCALISATION (MOTIVATIONS)

➤ Pros: Quick localisation

20

slide-83
SLIDE 83

FAULT LOCALISATION (MOTIVATIONS)

➤ Pros: Quick localisation ➤ Cons: independent evaluation of each statement at the expense of accuracy

20

slide-84
SLIDE 84

FAULT LOCALISATION (MOTIVATIONS)

21

slide-85
SLIDE 85

FAULT LOCALISATION (MOTIVATIONS)

21

Test cases Program : Character counter tc1 tc2 tc3 tc4 tc5 tc6 tc7 tc8 function count (char *s) { int let, dig, other, i = 0; char c; e1: while (c = s[i++]) { 1 1 1 1 1 1 1 1 e2: if(’A’<=c && ’Z’>=c) 1 1 1 1 1 1 1 e3: let += 2; //- fault - 1 1 1 1 1 1 e4: else if ( ’a’<=c && ’z’>=c ) 1 1 1 1 1 1 e5: let += 1; 1 1 1 e6: else if ( ’0’<=c && ’9’>=c ) 1 1 1 1 1 e7: dig += 1; 1 1 e8: else if (isprint (c)) 1 1 1 e9:

  • ther += 1;

1 1 1 e10: printf("%d %d %d\n", let, dig, other);} 1 1 1 1 1 1 1 1 Passing/Failing F F F F F F P P

slide-86
SLIDE 86

FAULT LOCALISATION (MOTIVATIONS)

21

Test cases Program : Character counter tc1 tc2 tc3 tc4 tc5 tc6 tc7 tc8 function count (char *s) { int let, dig, other, i = 0; char c; e1: while (c = s[i++]) { 1 1 1 1 1 1 1 1 e2: if(’A’<=c && ’Z’>=c) 1 1 1 1 1 1 1 e3: let += 2; //- fault - 1 1 1 1 1 1 e4: else if ( ’a’<=c && ’z’>=c ) 1 1 1 1 1 1 e5: let += 1; 1 1 1 e6: else if ( ’0’<=c && ’9’>=c ) 1 1 1 1 1 e7: dig += 1; 1 1 e8: else if (isprint (c)) 1 1 1 e9:

  • ther += 1;

1 1 1 e10: printf("%d %d %d\n", let, dig, other);} 1 1 1 1 1 1 1 1 Passing/Failing F F F F F F P P

slide-87
SLIDE 87

FAULT LOCALISATION (MOTIVATIONS)

21

Test cases Program : Character counter tc1 tc2 tc3 tc4 tc5 tc6 tc7 tc8 function count (char *s) { int let, dig, other, i = 0; char c; e1: while (c = s[i++]) { 1 1 1 1 1 1 1 1 e2: if(’A’<=c && ’Z’>=c) 1 1 1 1 1 1 1 e3: let += 2; //- fault - 1 1 1 1 1 1 e4: else if ( ’a’<=c && ’z’>=c ) 1 1 1 1 1 1 e5: let += 1; 1 1 1 e6: else if ( ’0’<=c && ’9’>=c ) 1 1 1 1 1 e7: dig += 1; 1 1 e8: else if (isprint (c)) 1 1 1 e9:

  • ther += 1;

1 1 1 e10: printf("%d %d %d\n", let, dig, other);} 1 1 1 1 1 1 1 1 Passing/Failing F F F F F F P P

slide-88
SLIDE 88

FAULT LOCALISATION (MOTIVATIONS)

22

slide-89
SLIDE 89

FAULT LOCALISATION (MOTIVATIONS)

➤ Pros: Quick localisation

22

slide-90
SLIDE 90

FAULT LOCALISATION (MOTIVATIONS)

➤ Pros: Quick localisation ➤ Cons: independent evaluation of each statement at the expense of accuracy

22

slide-91
SLIDE 91

FAULT LOCALISATION (MOTIVATIONS)

➤ Pros: Quick localisation ➤ Cons: independent evaluation of each statement at the expense of accuracy ➤ Need: more finer-grained localisation, taking into account user’s constraints

22

slide-92
SLIDE 92

FAULT LOCALISATION (MOTIVATIONS)

➤ Pros: Quick localisation ➤ Cons: independent evaluation of each statement at the expense of accuracy ➤ Need: more finer-grained localisation, taking into account user’s constraints ➤ How: Use of Declarative Data Mining

22

slide-93
SLIDE 93

FAULT LOCALISATION (MOTIVATIONS)

23

Test cases Program : Character counter tc1 tc2 tc3 tc4 tc5 tc6 tc7 tc8 function count (char *s) { int let, dig, other, i = 0; char c; e1: while (c = s[i++]) { 1 1 1 1 1 1 1 1 e2: if(’A’<=c && ’Z’>=c) 1 1 1 1 1 1 1 e3: let += 2; //- fault - 1 1 1 1 1 1 e4: else if ( ’a’<=c && ’z’>=c ) 1 1 1 1 1 1 e5: let += 1; 1 1 1 e6: else if ( ’0’<=c && ’9’>=c ) 1 1 1 1 1 e7: dig += 1; 1 1 e8: else if (isprint (c)) 1 1 1 e9:

  • ther += 1;

1 1 1 e10: printf("%d %d %d\n", let, dig, other);} 1 1 1 1 1 1 1 1 Passing/Failing F F F F F F P P

slide-94
SLIDE 94

FAULT LOCALISATION (MOTIVATIONS)

23

Test cases Program : Character counter tc1 tc2 tc3 tc4 tc5 tc6 tc7 tc8 function count (char *s) { int let, dig, other, i = 0; char c; e1: while (c = s[i++]) { 1 1 1 1 1 1 1 1 e2: if(’A’<=c && ’Z’>=c) 1 1 1 1 1 1 1 e3: let += 2; //- fault - 1 1 1 1 1 1 e4: else if ( ’a’<=c && ’z’>=c ) 1 1 1 1 1 1 e5: let += 1; 1 1 1 e6: else if ( ’0’<=c && ’9’>=c ) 1 1 1 1 1 e7: dig += 1; 1 1 e8: else if (isprint (c)) 1 1 1 e9:

  • ther += 1;

1 1 1 e10: printf("%d %d %d\n", let, dig, other);} 1 1 1 1 1 1 1 1 Passing/Failing F F F F F F P P

Fault localisation = Mining Task

slide-95
SLIDE 95

PATTERN SUSPICIOUSNESS DEGREE (PSD)

24

slide-96
SLIDE 96

PATTERN SUSPICIOUSNESS DEGREE (PSD)

➤ PSD function. Given a pattern P of a program:

24

PSD(P) = freq−(P) + |F AIL|−freq+(P )

|P ASS|+1

slide-97
SLIDE 97

PATTERN SUSPICIOUSNESS DEGREE (PSD)

➤ PSD function. Given a pattern P of a program: ➤ PSD-dominance relation. Given two patterns Pi and Pj

24

PSD(P) = freq−(P) + |F AIL|−freq+(P )

|P ASS|+1

Pi BP SD Pj ⇔ PSD(Pi) > PSD(Pj)

slide-98
SLIDE 98

PATTERN SUSPICIOUSNESS DEGREE (PSD)

➤ PSD function. Given a pattern P of a program: ➤ PSD-dominance relation. Given two patterns Pi and Pj ➤ Top-k suspicious patterns.

24

PSD(P) = freq−(P) + |F AIL|−freq+(P )

|P ASS|+1

Pi BP SD Pj ⇔ PSD(Pi) > PSD(Pj) top-k= {P| 6 9P1, . . . , Pk : 81  j  k, Pj BP SD P}

slide-99
SLIDE 99

FCP-MINER TOOL (SOME RESULTS)

25

slide-100
SLIDE 100

CONCLUSIONS (PART II)

26

slide-101
SLIDE 101

CONCLUSIONS (PART II)

➤ Software Testing/Program comprehension tasks can be

tackled using Data Mining

➤ Trace analysis ➤ Test suites mining ➤ Source code mining ➤ …

26

slide-102
SLIDE 102

CONCLUSIONS (PART II)

➤ Software Testing/Program comprehension tasks can be

tackled using Data Mining

➤ Trace analysis ➤ Test suites mining ➤ Source code mining ➤ … ➤ Think about using Declarative methods in Software

Testing

26

slide-103
SLIDE 103

CONCLUSIONS (PART II)

➤ Software Testing/Program comprehension tasks can be

tackled using Data Mining

➤ Trace analysis ➤ Test suites mining ➤ Source code mining ➤ … ➤ Think about using Declarative methods in Software

Testing

26