Automatic Identification of Bug-fix Commits: The Case of GitHub - - PowerPoint PPT Presentation

automatic identification of bug fix commits
SMART_READER_LITE
LIVE PREVIEW

Automatic Identification of Bug-fix Commits: The Case of GitHub - - PowerPoint PPT Presentation

Automatic Identification of Bug-fix Commits: The Case of GitHub Projects Yujuan Jiang, Rodrigo Morales, Bram Adams, Foutse Khom 1 Case study projects Approach Research questions Result (so far) 2 Case Study Projects key words:


slide-1
SLIDE 1

Automatic Identification of Bug-fix Commits:

The Case of GitHub Projects

Yujuan Jiang, Rodrigo Morales, Bram Adams, Foutse Khom

1

slide-2
SLIDE 2
  • Case study projects
  • Approach
  • Research questions
  • Result (so far)

2

slide-3
SLIDE 3

Case Study Projects

key words: GitHub, C language

3

slide-4
SLIDE 4

Approach

  • Data Collection
  • Feature Extraction (Text & Source code)
  • Model Training
  • Evaluation

4

slide-5
SLIDE 5

Approach: Data collection

5

slide-6
SLIDE 6

Approach: Feature Extraction

Textual Analysis: Code Analysis keywords

6

slide-7
SLIDE 7

Approach: Feature Extraction

1) Textual Analysis:

7

slide-8
SLIDE 8

Approach: Feature Extraction

1) Textual Analysis: keywords

7

slide-9
SLIDE 9

Approach: Feature Extraction

1) Textual Analysis: keywords + feature words

7

slide-10
SLIDE 10

Approach: Feature Extraction

1) Textual Analysis: keywords + feature words

All words

7

slide-11
SLIDE 11

Approach: Feature Extraction

1) Textual Analysis: keywords + feature words

All words Stem + remove stop words

7

slide-12
SLIDE 12

Approach: Feature Extraction

1) Textual Analysis: keywords + feature words

All words Stem + remove stop words Filter

7

slide-13
SLIDE 13

Approach: Feature Extraction

1) Textual Analysis: keywords + feature words

All words Stem + remove stop words Filter

7

slide-14
SLIDE 14

Approach: Feature Extraction

2) Source Code Analysis:

8

slide-15
SLIDE 15

Approach: Feature Extraction

2) Source Code Analysis: Patch Parser

8

slide-16
SLIDE 16

Approach: Feature Extraction

2) Source Code Analysis: Patch Parser + re Script

8

slide-17
SLIDE 17

Approach: Feature Extraction

2) Source Code Analysis: Patch Parser + re Script Commits

8

slide-18
SLIDE 18

Approach: Feature Extraction

2) Source Code Analysis: Patch Parser + re Script Commits Parser

8

slide-19
SLIDE 19

Approach: Feature Extraction

2) Source Code Analysis: Patch Parser + re Script Commits Parser Commit Profile

8

slide-20
SLIDE 20

Approach: Feature Extraction

2) Source Code Analysis: Patch Parser + re Script Commits Parser Commit Profile

# of while loops # of ifs # of boolean ......

Features

8

slide-21
SLIDE 21

Approach: Feature Extraction

9

slide-22
SLIDE 22

Approach: Model Training

Black data

(Manually label 300 bug fixing commits for each project)

Grey data

(Unlabelled)

10

slide-23
SLIDE 23

Approach: Model Training

Black data

(Manually label 300 bug fixing commits for each project)

Grey data

(Unlabelled)

LPU

10

slide-24
SLIDE 24

Approach: Model Training

Black data

(Manually label 300 bug fixing commits for each project)

Grey data

(Unlabelled)

LPU White data

(Bottom k)

Black data

10

slide-25
SLIDE 25

Approach: Model Training

Black data

(Manually label 300 bug fixing commits for each project)

Grey data

(Unlabelled)

LPU White data

(Bottom k)

Black data SVM Random Forest +

10

slide-26
SLIDE 26

Approach: Evaluation

11

slide-27
SLIDE 27

Research Questions

  • Does our classifier work better than the baseline: keyword-based approach?
  • How does the parameter k impact the classifier?
  • What kind of metrics play more important roles in identifying bug-fixing commits?
  • Is the hybrid approach (namely the combination of the LPU and SVM) more

effective than a single classifier approach?

  • Which combination of the options of the tool LPU makes the classifier work best?

12

slide-28
SLIDE 28

Result (so far): recall

  • Libgit2: 76.95%
  • openFrameworks: 96.67%

13

slide-29
SLIDE 29

Result (so far): key features

X38 X41 X33 X37 X26 X2 X3 X4 X9 X11 X13 X14 X15 X17 X18 X19 X24 X25 X28 X29 X30 X32 X34 X35 X36 X39 X40 X42 X43 X44 X45 X46 X47 X48 X49 X51 X16676 X10 X16 X27 X50 X12 X31 X23 X21 X20 X22 X7 X6 X5

  • 0.000

0.005 0.010 0.015 0.020 0.025 0.030

Libgit2

14

slide-30
SLIDE 30

15

slide-31
SLIDE 31

15

slide-32
SLIDE 32

15

slide-33
SLIDE 33

LPU SVM

15

slide-34
SLIDE 34

LPU SVM

X38 X41 X33 X37 X26 X2 X3 X4 X9 X11 X13 X14 X15 X17 X18 X19 X24 X25 X28 X29 X30 X32 X34 X35 X36 X39 X40 X42 X43 X44 X45 X46 X47 X48 X49 X51 X16676 X10 X16 X27 X50 X12 X31 X23 X21 X20 X22 X7 X6 X5

  • 0.000

0.005 0.010 0.015 0.020 0.025 0.030

15