FourD: Do Developers Discuss Design? Revisited Abbas Shakiba - - PowerPoint PPT Presentation

fourd do developers discuss design revisited
SMART_READER_LITE
LIVE PREVIEW

FourD: Do Developers Discuss Design? Revisited Abbas Shakiba - - PowerPoint PPT Presentation

FourD: Do Developers Discuss Design? Revisited Abbas Shakiba Robert Green Rob ober ert Dyer er Bowling Green State University supported in part by the US National Science Foundation under CCF-15-18776 and CNS-15-12947 Do developers


slide-1
SLIDE 1

FourD: “Do Developers Discuss Design?” Revisited

Abbas Shakiba Robert Green Rob

  • ber

ert Dyer er Bowling Green State University

supported in part by the US National Science Foundation under CCF-15-18776 and CNS-15-12947

slide-2
SLIDE 2

Do developers discuss design decisions?

  • Are design decisions only happening before

implementation?

  • Do design discussions/decisions show in the

commit logs?

2

slide-3
SLIDE 3

Prior work

  • Brunet, João, et al. "Do
  • dev

evel elop

  • per

ers discuss des esign?”

11th Working Conference on Mining Software Repositories, 2014

  • Selected set of 5 projects for analysis
  • Analyzed:
  • commit logs
  • bug reports
  • discussions

3

slide-4
SLIDE 4

Our Study

  • Data from 2 software repositories
  • GitHub, SourceForge
  • For each, 5 randomly selected projects
  • Focus on commit logs
  • 200 randomly selected non-empty commits per project
  • 2 x 200 x 5 = 2,000 commits total
  • Train ML classifiers to identify commits discussing design

4

slide-5
SLIDE 5

Tools Used

  • Boa Language and Infrastructure
  • A language for analyzing ultra-large-scale software

repositories

  • Weka
  • Data Mining Tool written in Java
  • Ruby on Rails
  • A web application framework written in Ruby

5

slide-6
SLIDE 6

Approach

Getting etting Data ata (Boa)

  • a)

Manual Manual Classification lassification (survey) (survey) Pre Pre- Processing Processing (W (Wek eka) a) Build uild Mod Models els (W (Wek eka) a) Test est Mod Models els (Wek eka) a) Analyze nalyze Results esults

6

slide-7
SLIDE 7

Approach (Cont'd) 7

  • Boa queries
  • Randomly pick 5 projects

(not shown)

  • Randomly pick 200 commits

(shown)

COMMITS: output top(200)[string] of string weight float; ids := {"6176545", "6150849", "209281", "13151128", "1019785"}; isempty := function(s: string) : bool { s2 := trim(s); if (match(`^\s*$`, s2)) return true; if (match(`^no message$`, lowercase(s2))) return true; if (match(`^\*\*\* empty log message \*\*\*$`, lowercase(s2))) return true; return false; }; exists (i: int; input.id == ids[i]) visit(input, visitor { before rev: Revision -> if (!isempty(rev.log)) COMMITS[input.id] << rev.log weight rand(); });

Getting etting Data ata (B (Boa)

  • a)

Manual l Cla lassif ific ication ion (survey ey) Pre Pre-Proc

  • ces

essin ing (Wek eka) Build ild Mod

  • dels

els (Wek eka) Tes est Mod

  • dels

els (Wek eka) Analy lyze e Res esult lts

slide-8
SLIDE 8

Approach (Cont'd) 8

  • Survey website for

crowdsourcing

  • Each log shown to 2-3 users
  • Required 2 YES or 2 NO

Get ettin ing Data (Boa

  • a)

Manual l Cla lassif ific ication ion (survey ey)

Pre Pre-Proc

  • ces

essin ing (Wek eka) Build ild Mod

  • dels

els (Wek eka) Tes est Mod

  • dels

els (Wek eka) Analy lyze e Res esult lts

slide-9
SLIDE 9

Approach (Cont'd) 9

  • Convert data to ARFF format
  • e.g., data1:

“swapping the position of the input function <</>>” Classified: no

  • e.g., data2:

“reorganized a package structure to better reflect a layered approach” Classified: yes es

class: No Swapping 1 attributes: the 2 position 1

  • f

1 the 1 input 1 function 1 <</>> 1 class: Yes reorganized 1 attributes: package 1 structure 1 to 1 better 1 reflect 1 a 2 layered 1 approach 1

Get ettin ing Data (Boa

  • a)

Manual l Cla lassif ific ication ion (survey ey)

Pre Pre-Processing Processing (W (Wek eka) a)

Build ild Mod

  • dels

els (Wek eka) Tes est Mod

  • dels

els (Wek eka) Analy lyze e Res esult lts

slide-10
SLIDE 10

Approach (Cont'd) 10

class: Yes reorganized 1 attributes: package 1 structure 1 to 1 better 1 reflect 1 a 2 layered 1 approach 1 class: No Swapping 1 attributes: the 2 position 1

  • f

1 the 1 input 1 function 1 <</>> 1

Get ettin ing Data (Boa

  • a)

Manual l Cla lassif ific ication ion (survey ey)

Pre Pre-Processing Processing (W (Wek eka) a)

Build ild Mod

  • dels

els (Wek eka) Tes est Mod

  • dels

els (Wek eka) Analy lyze e Res esult lts

  • Convert data to ARFF format
  • Tokenization
  • Remove tokens without letters
  • Stemming
  • Remove stop words
  • a, an, the, to, etc.
  • Eliminate prefix and suffix
  • -ing, -ed, -ly, etc.
slide-11
SLIDE 11

Approach (Cont'd) 11

class: No Swapping 1 attributes: the 2 position 1

  • f

1 the 1 input 1 function 1 <</>> 1

  • Convert data to ARFF format
  • Tokenization
  • Remove tokens without letters
  • Stemming
  • Remove stop words
  • a, an, the, to, etc.
  • Eliminate prefix and suffix
  • -ing, -ed, -ly, etc.

class: Yes reorganized 1 attributes: package 1 structure 1 to 1 better 1 reflect 1 a 2 layered 1 approach 1

Get ettin ing Data (Boa

  • a)

Manual l Cla lassif ific ication ion (survey ey)

Pre Pre-Processing Processing (W (Wek eka) a)

Build ild Mod

  • dels

els (Wek eka) Tes est Mod

  • dels

els (Wek eka) Analy lyze e Res esult lts

slide-12
SLIDE 12

class: No Swap 1 attributes: input 1 pos 1 function 1 class: Yes

  • rgan

1 attributes: pack 1 struc 1 better 1 flect 1 layer 1 approach 1

Approach (Cont'd) 12

  • Convert data to ARFF format
  • Tokenization
  • Remove tokens without letters
  • Stemming
  • Remove stop words
  • a, an, the, to, etc.
  • Eliminate prefix and suffix
  • -ing, -ed, -ly, etc.

Get ettin ing Data (Boa

  • a)

Manual l Cla lassif ific ication ion (survey ey)

Pre Pre-Processing Processing (W (Wek eka) a)

Build ild Mod

  • dels

els (Wek eka) Tes est Mod

  • dels

els (Wek eka) Analy lyze e Res esult lts

slide-13
SLIDE 13

Approach (Cont'd) 13

  • Machine Learning Algorithms in Weka
  • Decision Tree
  • Random Forest
  • Naïve Bayes
  • Multinomial Bayes
  • Support Vector Machines
  • K-Nearest Neighbor

Get ettin ing Data (Boa

  • a)

Manual l Cla lassif ific ication ion (survey ey) Pre Pre-Proc

  • ces

essin ing (Wek eka)

Build ild Mod

  • dels

els (Wek eka)

Tes est Mod

  • dels

els (Wek eka) Analy lyze e Res esult lts

slide-14
SLIDE 14

Difficulties 14

50 100 Dataset 1 Dataset 2

Different Data Distributions

Class: No Class: Yes

Get ettin ing Data (Boa

  • a)

Manual l Cla lassif ific ication ion (survey ey) Pre Pre-Proc

  • ces

essin ing (Wek eka) Build ild Mod

  • dels

els (Wek eka)

Test est Mod Models els (Wek eka) a)

Analy lyze e Res esult lts

slide-15
SLIDE 15

Difficulties (Cont'd)

  • Confusion Matrix
  • Add weight to cells
  • Statistical measurements
  • F-Measure
  • G-Mean

15

Con

  • nfusion
  • n Matrix

Pred edicted ed Yes es No No Ac Actual Yes es True e Pos

  • sitive

(T (TP) False e Neg egative (F (FN) N) No No False e Pos

  • sitive

(F (FP) True e Neg egative (T (TN) N)

𝐵𝑑𝑑𝑣𝑠𝑏𝑑𝑧'() = 𝑈𝑄 𝑈𝑄 + 𝐺𝑂 𝐵𝑑𝑑𝑣𝑠𝑏𝑑𝑧12 = 𝑈𝑂 𝑈𝑂 + 𝐺𝑄 𝐻4(56 = 𝐵𝑑𝑑𝑣𝑠𝑏𝑑𝑧'()×𝐵𝑑𝑑𝑣𝑠𝑏𝑑𝑧12

  • 𝐺

9𝑡𝑑𝑝𝑠𝑓 = 𝑄𝑠𝑓𝑑𝑗𝑡𝑗𝑝𝑜 ×𝑠𝑓𝑑𝑏𝑚𝑚

𝑄𝑠𝑓𝑑𝑗𝑡𝑗𝑝𝑜 + 𝑠𝑓𝑑𝑏𝑚𝑚 𝑄𝑠𝑓𝑑𝑗𝑡𝑗𝑝𝑜 = 𝑈𝑄 𝑈𝑄 + 𝐺𝑄 𝑆𝑓𝑑𝑏𝑚𝑚 = 𝑈𝑄 𝑈𝑄 + 𝐺𝑂

Get ettin ing Data (Boa

  • a)

Manual l Cla lassif ific ication ion (survey ey) Pre Pre-Proc

  • ces

essin ing (Wek eka) Build ild Mod

  • dels

els (Wek eka)

Test est Mod Models els (Wek eka) a)

Analy lyze e Res esult lts

slide-16
SLIDE 16

All Results 16

Get ettin ing Data (Boa

  • a)

Manual l Cla lassif ific ication ion (survey ey) Pre Pre-Proc

  • ces

essin ing (Wek eka) Build ild Mod

  • dels

els (Wek eka) Tes est Mod

  • dels

els (Wek eka)

Analyze nalyze Results esults

slide-17
SLIDE 17

Interesting Results 17

Get ettin ing Data (Boa

  • a)

Manual l Cla lassif ific ication ion (survey ey) Pre Pre-Proc

  • ces

essin ing (Wek eka) Build ild Mod

  • dels

els (Wek eka) Tes est Mod

  • dels

els (Wek eka)

Analyze nalyze Results esults

slide-18
SLIDE 18

F-measure and G-mean 18

Gi GitH tHub ub Sou

  • urceF

eFor

  • rge

Get ettin ing Data (Boa

  • a)

Manual l Cla lassif ific ication ion (survey ey) Pre Pre-Proc

  • ces

essin ing (Wek eka) Build ild Mod

  • dels

els (Wek eka) Tes est Mod

  • dels

els (Wek eka)

Analyze nalyze Results esults

slide-19
SLIDE 19

Future Work

  • Move analysis completely into Boa
  • Pre-processing tasks
  • Machine learning models
  • Do developers discuss other topics?
  • testing
  • debugging
  • etc.

19

slide-20
SLIDE 20

20

COMMITS: output top(200)[string] of string weight float; ids := {"6176545", "6150849", "209281", "13151128", "1019785"}; isempty := function(s: string) : bool { s2 := trim(s); if (match(`^\s*$`, s2)) return true; if (match(`^no message$`, lowercase(s2))) return true; if (match(`^\*\*\* empty log message \*\*\*$`, lowercase(s2))) return true; return false; }; exists (i: int; input.id == ids[i]) visit(input, visitor { before rev: Revision -> if (!isempty(rev.log)) COMMITS[input.id] << rev.log weight rand(); });

class: No Swap 1 attributes: input 1 pos 1 function 1 class: Yes

  • rgan

1 attributes: pack 1 struc 1 better 1 flect 1 layer 1 approach 1 Con

  • nfusion
  • n

Ma Matri rix Pred edicted ed Yes es No No Ac Actual Yes es TP TP FN FN No No FP FP TN TN

http://boa.cs.iastate.edu/

To summarize…