specialised vs declarative data mining
play

Specialised vs Declarative Data Mining Software Testing - PowerPoint PPT Presentation

Specialised vs Declarative Data Mining Software Testing Applications Nadjib Lazaar , CNRS, University of Montpellier Join works with: M. Maamar, Y. Lebbah, S. Loudni, C. Bessiere, et. al. SIMULA, Oslo, 11 oct. 2018 DATA MINING 2 DATA


  1. EXAMPLE θ = 3 D (2 I , ⊆ ) Closedness M θ = { P ∈ I| freq ( P ) ≥ θ ∧ ∀ P 0 ⊃ P : freq ( P 0 ) < θ } � 10

  2. EXAMPLE θ = 3 D (2 I , ⊆ ) Closedness M θ = { P ∈ I| freq ( P ) ≥ θ ∧ ∀ P 0 ⊃ P : freq ( P 0 ) < θ } � 11

  3. CONDENSED REPRESENTATION � 12

  4. CONDENSED REPRESENTATION � 12

  5. CONDENSED REPRESENTATION � 12

  6. CONDENSED REPRESENTATION Dataset #Frequent #Closed #Maximal 151 807 3 292 230 Zoo-1 Mushroom 155 734 3 287 453 9 967 402 46 802 5 191 Lymph 27 . 10 7 1 827 264 189 205 Hepa;;s � 12

  7. SPECIALIZED VS DECLARATIVE DATA MINING � 13

  8. SPECIALIZED VS DECLARATIVE DATA MINING dataset � 13

  9. SPECIALIZED VS DECLARATIVE DATA MINING Query Basic user’s constraints dataset � 13

  10. SPECIALIZED VS DECLARATIVE DATA MINING Query Basic user’s constraints + Specialised Miner dataset � 13

  11. SPECIALIZED VS DECLARATIVE DATA MINING Query Basic user’s constraints + Specialised Miner Patterns dataset � 13

  12. SPECIALIZED VS DECLARATIVE DATA MINING Query Basic user’s constraints + Specialised Miner Patterns dataset Limitations: Dealing with sophisticated user’s constraints [Wojciechowski and Zakrzewicz, 02] � 13

  13. SPECIALIZED VS DECLARATIVE DATA MINING Query Basic user’s constraints Sophisticated user’s + constraints Specialised Miner Patterns dataset Limitations: Dealing with sophisticated user’s constraints [Wojciechowski and Zakrzewicz, 02] � 13

  14. SPECIALIZED VS DECLARATIVE DATA MINING Query Basic user’s constraints Sophisticated user’s + constraints Specialised Miner 1 Patterns dataset preprocessing Limitations: Dealing with sophisticated user’s constraints [Wojciechowski and Zakrzewicz, 02] � 13

  15. SPECIALIZED VS DECLARATIVE DATA MINING Query Basic user’s constraints Sophisticated user’s 2 + constraints Specialised post- Miner processing 1 Patterns dataset preprocessing Limitations: Dealing with sophisticated user’s constraints [Wojciechowski and Zakrzewicz, 02] � 13

  16. SPECIALIZED VS DECLARATIVE DATA MINING Query new algo 3 Basic user’s constraints Sophisticated user’s 2 + constraints Specialised post- Miner processing 1 Patterns dataset preprocessing Limitations: Dealing with sophisticated user’s constraints [Wojciechowski and Zakrzewicz, 02] � 13

  17. SPECIALIZED VS DECLARATIVE DATA MINING Query new algo 3 Basic user’s constraints Sophisticated user’s 2 + constraints Specialised post- Miner processing 1 Patterns dataset preprocessing Limitations: Dealing with sophisticated user’s constraints [Wojciechowski and Zakrzewicz, 02] Need: Declarative way to deal with more complex queries ➤ Declarative data Mining � 13

  18. SPECIALIZED VS DECLARATIVE DATA MINING Query new algo 3 Basic user’s constraints Sophisticated user’s 2 + CP model constraints post- + CP solver processing 1 Patterns dataset preprocessing Limitations: Dealing with sophisticated user’s constraints [Wojciechowski and Zakrzewicz, 02] Need: Declarative way to deal with more complex queries ➤ Declarative data Mining � 14

  19. SPECIALIZED VS DECLARATIVE DATA MINING Query new algo 3 Basic user’s constraints Sophisticated user’s 2 + CP model constraints post- + CP solver processing 1 Patterns dataset preprocessing Limitations: Dealing with sophisticated user’s constraints [Wojciechowski and Zakrzewicz, 02] Need: Declarative way to deal with more complex queries ➤ Declarative data Mining � 14

  20. SPECIALIZED VS DECLARATIVE DATA MINING Query Basic user’s constraints Sophisticated user’s + CP model constraints + CP solver Patterns dataset Limitations: Dealing with sophisticated user’s constraints [Wojciechowski and Zakrzewicz, 02] Need: Declarative way to deal with more complex queries ➤ Declarative data Mining � 14

  21. SPECIALISED VS DECLARATIVE DATA MINING � 15

  22. SPECIALISED VS DECLARATIVE DATA MINING � 15

  23. SPECIALISED VS DECLARATIVE DATA MINING Specialised is the winner! � 15

  24. SPECIALISED VS DECLARATIVE DATA MINING Specialised is the winner! � 15

  25. SPECIALISED VS DECLARATIVE DATA MINING Specialised is the winner! Declarative is the winner! � 15

  26. SPECIALISED VS DECLARATIVE DATA MINING � 16

  27. SPECIALISED VS DECLARATIVE DATA MINING Preprocessing + Specialised step vs Declarative � 16

  28. SPECIALISED VS DECLARATIVE DATA MINING � 17

  29. SPECIALISED VS DECLARATIVE DATA MINING Specialised + postprocessing vs Declarative � 17

  30. CONCLUSIONS (PART I) � 18

  31. CONCLUSIONS (PART I) ➤ Specialised methods are suitable for: ➤ Enumerating Patterns ➤ Taking into account classic constraints (simple queries) � 18

  32. CONCLUSIONS (PART I) ➤ Specialised methods are suitable for: ➤ Enumerating Patterns ➤ Taking into account classic constraints (simple queries) ➤ Declarative methods are suitable for: ➤ Taking into account user’s constraints (complex queries) ➤ Iterative data mining process � 18

  33. CONCLUSIONS (PART I) ➤ Specialised methods are suitable for: ➤ Enumerating Patterns ➤ Taking into account classic constraints (simple queries) ➤ Declarative methods are suitable for: ➤ Taking into account user’s constraints (complex queries) ➤ Iterative data mining process Time left? � 18

  34. FAULT LOCALISATION � 19

  35. FAULT LOCALISATION ➤ The need: identify a subset of statements that are susceptible to explain a fault in a program ➤ Precision <=> Efficiency � 19

  36. FAULT LOCALISATION ➤ The need: identify a subset of statements that are susceptible to explain a fault in a program ➤ Precision <=> Efficiency ➤ Spectrum-based approaches: (ranking metrics - suspiciousness score) ➤ Tarantula [Jones and Harrold 05] ➤ Ochiai [Abreu et al. 07] ➤ Jaccard [Abreu et al. 07] ➤ … � 19

  37. FAULT LOCALISATION (MOTIVATIONS) � 20

  38. FAULT LOCALISATION (MOTIVATIONS) ➤ Pros: Quick localisation � 20

  39. FAULT LOCALISATION (MOTIVATIONS) ➤ Pros: Quick localisation ➤ Cons: independent evaluation of each statement at the expense of accuracy � 20

  40. FAULT LOCALISATION (MOTIVATIONS) � 21

  41. FAULT LOCALISATION (MOTIVATIONS) Test cases Program : Character counter tc 1 tc 2 tc 3 tc 4 tc 5 tc 6 tc 7 tc 8 function count (char *s) { int let, dig, other, i = 0; char c; e 1 : while (c = s[i++]) { 1 1 1 1 1 1 1 1 e 2 : 1 1 1 1 1 1 0 1 if(’A’<=c && ’Z’>=c) e 3 : 1 1 1 1 1 1 0 0 let += 2; //- fault - e 4 : 1 1 1 1 1 0 0 1 else if ( ’a’<=c && ’z’>=c ) e 5 : 1 1 0 0 1 0 0 0 let += 1; e 6 : 1 1 1 1 0 0 0 1 else if ( ’0’<=c && ’9’>=c ) e 7 : 0 1 0 1 0 0 0 0 dig += 1; e 8 : 1 0 1 0 0 0 0 1 else if (isprint (c)) e 9 : 1 0 1 0 0 0 0 1 other += 1; e 10 : printf("%d %d %d \ n", let, dig, other); } 1 1 1 1 1 1 1 1 Passing/Failing F F F F F F P P � 21

  42. FAULT LOCALISATION (MOTIVATIONS) Test cases Program : Character counter tc 1 tc 2 tc 3 tc 4 tc 5 tc 6 tc 7 tc 8 function count (char *s) { int let, dig, other, i = 0; char c; e 1 : while (c = s[i++]) { 1 1 1 1 1 1 1 1 e 2 : 1 1 1 1 1 1 0 1 if(’A’<=c && ’Z’>=c) e 3 : 1 1 1 1 1 1 0 0 let += 2; //- fault - e 4 : 1 1 1 1 1 0 0 1 else if ( ’a’<=c && ’z’>=c ) e 5 : 1 1 0 0 1 0 0 0 let += 1; e 6 : 1 1 1 1 0 0 0 1 else if ( ’0’<=c && ’9’>=c ) e 7 : 0 1 0 1 0 0 0 0 dig += 1; e 8 : 1 0 1 0 0 0 0 1 else if (isprint (c)) e 9 : 1 0 1 0 0 0 0 1 other += 1; e 10 : printf("%d %d %d \ n", let, dig, other); } 1 1 1 1 1 1 1 1 Passing/Failing F F F F F F P P � 21

  43. FAULT LOCALISATION (MOTIVATIONS) Test cases Program : Character counter tc 1 tc 2 tc 3 tc 4 tc 5 tc 6 tc 7 tc 8 function count (char *s) { int let, dig, other, i = 0; char c; e 1 : while (c = s[i++]) { 1 1 1 1 1 1 1 1 e 2 : 1 1 1 1 1 1 0 1 if(’A’<=c && ’Z’>=c) e 3 : 1 1 1 1 1 1 0 0 let += 2; //- fault - e 4 : 1 1 1 1 1 0 0 1 else if ( ’a’<=c && ’z’>=c ) e 5 : 1 1 0 0 1 0 0 0 let += 1; e 6 : 1 1 1 1 0 0 0 1 else if ( ’0’<=c && ’9’>=c ) e 7 : 0 1 0 1 0 0 0 0 dig += 1; e 8 : 1 0 1 0 0 0 0 1 else if (isprint (c)) e 9 : 1 0 1 0 0 0 0 1 other += 1; e 10 : printf("%d %d %d \ n", let, dig, other); } 1 1 1 1 1 1 1 1 Passing/Failing F F F F F F P P � 21

  44. FAULT LOCALISATION (MOTIVATIONS) � 22

  45. FAULT LOCALISATION (MOTIVATIONS) ➤ Pros: Quick localisation � 22

  46. FAULT LOCALISATION (MOTIVATIONS) ➤ Pros: Quick localisation ➤ Cons: independent evaluation of each statement at the expense of accuracy � 22

  47. FAULT LOCALISATION (MOTIVATIONS) ➤ Pros: Quick localisation ➤ Cons: independent evaluation of each statement at the expense of accuracy ➤ Need: more finer-grained localisation, taking into account user’s constraints � 22

  48. FAULT LOCALISATION (MOTIVATIONS) ➤ Pros: Quick localisation ➤ Cons: independent evaluation of each statement at the expense of accuracy ➤ Need: more finer-grained localisation, taking into account user’s constraints ➤ How: Use of Declarative Data Mining � 22

  49. FAULT LOCALISATION (MOTIVATIONS) Test cases Program : Character counter tc 1 tc 2 tc 3 tc 4 tc 5 tc 6 tc 7 tc 8 function count (char *s) { int let, dig, other, i = 0; char c; e 1 : while (c = s[i++]) { 1 1 1 1 1 1 1 1 e 2 : 1 1 1 1 1 1 0 1 if(’A’<=c && ’Z’>=c) e 3 : 1 1 1 1 1 1 0 0 let += 2; //- fault - e 4 : 1 1 1 1 1 0 0 1 else if ( ’a’<=c && ’z’>=c ) e 5 : 1 1 0 0 1 0 0 0 let += 1; e 6 : 1 1 1 1 0 0 0 1 else if ( ’0’<=c && ’9’>=c ) e 7 : 0 1 0 1 0 0 0 0 dig += 1; e 8 : 1 0 1 0 0 0 0 1 else if (isprint (c)) e 9 : 1 0 1 0 0 0 0 1 other += 1; e 10 : printf("%d %d %d \ n", let, dig, other); } 1 1 1 1 1 1 1 1 Passing/Failing F F F F F F P P � 23

  50. FAULT LOCALISATION (MOTIVATIONS) Test cases Program : Character counter tc 1 tc 2 tc 3 tc 4 tc 5 tc 6 tc 7 tc 8 function count (char *s) { int let, dig, other, i = 0; char c; e 1 : while (c = s[i++]) { 1 1 1 1 1 1 1 1 e 2 : 1 1 1 1 1 1 0 1 if(’A’<=c && ’Z’>=c) e 3 : 1 1 1 1 1 1 0 0 Fault localisation let += 2; //- fault - e 4 : 1 1 1 1 1 0 0 1 else if ( ’a’<=c && ’z’>=c ) = e 5 : 1 1 0 0 1 0 0 0 let += 1; Mining Task e 6 : 1 1 1 1 0 0 0 1 else if ( ’0’<=c && ’9’>=c ) e 7 : 0 1 0 1 0 0 0 0 dig += 1; e 8 : 1 0 1 0 0 0 0 1 else if (isprint (c)) e 9 : 1 0 1 0 0 0 0 1 other += 1; e 10 : printf("%d %d %d \ n", let, dig, other); } 1 1 1 1 1 1 1 1 Passing/Failing F F F F F F P P � 23

  51. PATTERN SUSPICIOUSNESS DEGREE (PSD) � 24

  52. PATTERN SUSPICIOUSNESS DEGREE (PSD) ➤ PSD function. Given a pattern P of a program: PSD ( P ) = freq − ( P ) + | F AIL | − freq + ( P ) | P ASS | +1 � 24

  53. PATTERN SUSPICIOUSNESS DEGREE (PSD) ➤ PSD function. Given a pattern P of a program: PSD ( P ) = freq − ( P ) + | F AIL | − freq + ( P ) | P ASS | +1 ➤ PSD-dominance relation. Given two patterns P i and P j P i B P SD P j ⇔ PSD ( P i ) > PSD ( P j ) � 24

  54. PATTERN SUSPICIOUSNESS DEGREE (PSD) ➤ PSD function. Given a pattern P of a program: PSD ( P ) = freq − ( P ) + | F AIL | − freq + ( P ) | P ASS | +1 ➤ PSD-dominance relation. Given two patterns P i and P j P i B P SD P j ⇔ PSD ( P i ) > PSD ( P j ) ➤ Top-k suspicious patterns. top-k= { P | 6 9 P 1 , . . . , P k : 8 1  j  k, P j B P SD P } � 24

  55. FCP-MINER TOOL (SOME RESULTS) � 25

  56. CONCLUSIONS (PART II) � 26

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend