[PPT] - A Degree-of of-Knowledge Model to Capture Source Code Familiarity PowerPoint Presentation

SLIDE 1

A Degree-of

f-Knowledge Model to Capture Source

Code Familiarity

Thomas Fritz, Jingwen Ou, Gail C. Murphy and Emerson Murphy-Hill Presented by: Haifa Alharthi

SLIDE 2

Outline

Problem statement
Dataset
Description of the main elements used in the model
Description of the degree-of-knowledge model
Determining the weightings needed in degree-of-knowledge model
Case studies
Discussion And Future Work

SLIDE 3

Problem statement

The size and high rate of change of source code make it difficult for

software developers to keep up with who on the team knows about particular parts of the code.

The lack of this knowledge results in:
May complicate many activities e.g., who to ask when questions arise
Make it difficult to know who can bring a new team member up-to-speed in a

particular part of the code

Existing approaches to this problem are based solely on authorship of

code.

SLIDE 4

Dataset

Data was gathered from two professional

SLIDE 5

Degree-of

f-Authorship (D

(DOA):

Three factors to determine DOA:
First authorship (FA): if developer D created

the first version of the element

Number of deliveries (DL): subsequent

changes after first authorship made to the element by D

Acceptances (AC): changes to the element

are not completed by D

SLIDE 6

Degree-of

f-interest (D

(DOI)

The degree-of-interest (DOI): real value that represents the amount of interaction-- selections and edits--a developer has had with a source code element

A selection occurs when a developer touches a code

element ( e.g., opens a class)

An edit occurs when a keystroke is detected in an editor

window.

A selection of an element contributes less to DOI than an

edit of an element.

A positive DOI value: developer has been recently

and frequently interacting with the element

A negative DOI value: a developer has been

interacting with other elements substantially since the developer interacted with this element.

SLIDE 7

Difference between authorship and interaction

Site1: On average, only 27%

elements with a positive DOI also had at least one first authorship or delivery event in the previous three months.

Site2 : The number of

elements with a positive DOI that also had at least

ne first authorship or

delivery event is four (7%)

SLIDE 8

Degree-of

f-knowledge Model
Degree-of-knowledge Model: Assigns a real value to each source

code element-- class, method, field-- for each developer.

Two components of Degree-of-knowledge:
Developer's longer-term knowledge: represented by a degree-of-

authorship (DOA) value

Developer's shorter-term knowledge: represented by a degree-
of-interest value (DOI)

SLIDE 9

Determining DOK Weightings

Appropriate values were determined empirically:

1. Initial determination of weighting values based on the data

collected from Site1.

2. We then test these weightings at Site2.

Determining weightings that might apply across a broader range of development situations would require gathering data from many more projects

SLIDE 10

Determining DOK Weightings: Method

At time T3, for each developer:
Collected 40 random code elements with (DOI ≠ 0) or (FA > 0) or (DL > 0)
Developers assess their knowledge of those elements on a scale 1-5
246 ratings were collected; ratings are ordinal

SLIDE 11

Determining DOK Weightings: Analysis and Results

Multiple linear regression analysis
Independent variables: FA, DL, AC, DOI.
Dependent variable: developer ratings
DOI and AC can be substantially high so

they used the natural logarithms of the values.

The resulting DOK equation is as follows.

The lack of significance in DOI might be from the lack of elements with a positive DOI in the set of randomly chosen elements (only 7%)

SLIDE 12

Determining DOK Weightings: Analysis and Results

F-test: 19.6 with p<0.000001
F test specifies that independent variables (FA,DL,AC.DOI) are jointly

significant in explaining the dependent variable (user rating)

Goodness of fit or R Square= 0.25
R Square assesses if the addition of an independent variable (e.g., FA) has

contributed to increased strength of the model.

R Square value shows that the model does not predict the user rating

completely.

Each of the four variables contributes to the overall explanation of the user

rating.

Definitions from: http://www.analystforum.com/forums/cfa-forums/cfa-level-ii-forum/91311972

SLIDE 13

Determining DOK Weightings: Ext xternal Validity of f the Model

Weightings are tested at Site2

1. Each developer ranked 40 random code elements from 1-5
2. DOK values for each of the elements was computed using the weightings

determined before (in Site1)

3. The Spearman rank correlation coefficient statistic was applied
Non-parametric statistic that is designed to measure the strength of association

between ranked variables.

80 code elements from the two developers

Results:

There is a statistically significant correlation with rs = 0:3847 (p=0.0004).
The model can predict DOK values with reasonable accuracy

Spearman rank correlation: https://statistics.laerd.com/statistical-guides/spearmans-rank-order-correlation-statistical-guide.php

SLIDE 14

Case Studies

To determine if degree-of-knowledge (DOK) values can provide value

to software developers

2 case studies included the seven developers at Site1.
3rd case study included different 3 developers at Site2; average of 2.5

years of professional experience

http://appliedpowerconcepts.com/wp-content/themes/Applied%20Power/images/banner-case-studies.jpg

SLIDE 15

Case study 1: : Finding Experts

Problem: identify which team member knows the

most about each part of the code-base

Method:
Compared packages predicted by DOK model to the

reported assignments of packages

Results
55% of the results we computed based on DOK values

was consistent with the assignments by the developers.

However, The developer assignments were sometimes

guesses

All six developers stated that the knowledge map was

reasonable

Each package is colored (or labelled) according to the developer with the highest DOK values for that package

SLIDE 16

Case study 1: Finding Experts

Comparison to Expertise Recommenders

Expertise approach
It represents code familiarity based solely on authorship
Experts for each package are computed by summing up all first authorship

and delivery events from the last three months for a developer for each class in the package.

Results
In 49% of the cases, the DOK-based approach agreed with the

developer assignments, whereas the expertise approach agreed in

nly 24% of these cases.
DOK values can improve on existing approaches to finding experts.

SLIDE 17

Case study 2: : Onboarding

Problem:
considers a mentoring situation where an experienced developer

might use his DOK values to help a new team member become familiar (onboard) into that part of the code base.

Method
3 random developers
Find for each developer, the twenty elements with the highest DOK

and asked her to specify if it is likely to be helpful for a newcomer.

Results
Only 3% of the elements were considered to likely be helpful for a

newcomer.

The DOK values for the API elements were either very low or zero as

they were neither changing nor were they referred to frequently by the developers who authored them.

The elements with high DOK values were not considered helpful.

http://showd.me/wp-content/uploads/2014/12/bigstock- Helping-hand-to-new-member-or-47078242.jpg

SLIDE 18

Case study 3: : Id Identify fying Changes of f In Interest

Problem: investigated whether a developer's DOK values can be used

to select changes of interest to the developer because of overlap between the source code change and the developer's DOK model.

Method
We computed a DOK model for each of three developers from Site2.
For each bugs of interest, the developer specify whether they had read

the bug or whether they would have wanted to be aware of the bug.

Results
DOK model provided relevant information to developers in four out of

six cases by recommending non-obvious bugs based on the developers' DOK values.

http://www.nonnymous.com/project-zero/bug.jpg

SLIDE 19

Discussion And Future Work

The DOK weighting experiment was during testing stage:
Long-term studies are needed to better understand the impact of project

phases on indicators such as DOK.

The style of code ownership influences DOK values:
Study of more teams is needed to determine how robust the DOK values are

to team and individual styles.

Elements found using the DOK are often one or two layers below

the API elements

Infer familiarity from subclasses up to the API elements that are the super

types as it is likely a subclass user knows the API elements to some extent.

SLIDE 20

Questions

How serious do you think the problem they are solving? Give

examples.

Are there any other factors you would consider for learning the

Degree-of-knowledge (in addition to authorship and interactions)?

What do you think of the methods of learning weights of the DOK

model? Any alternatives?

In addition to the case studies, what other cases can the DOK model

A Degree-of

Code Familiarity

Thomas Fritz, Jingwen Ou, Gail C. Murphy and Emerson Murphy-Hill Presented by: Haifa Alharthi

Outline

Problem statement

software developers to keep up with who on the team knows about particular parts of the code.

particular part of the code

code.

Dataset

development sites:

Degree-of

(DOA):

the first version of the element

changes after first authorship made to the element by D

are not completed by D

Degree-of

(DOI)

The degree-of-interest (DOI): real value that represents the amount of interaction-- selections and edits--a developer has had with a source code element

and frequently interacting with the element

interacting with other elements substantially since the developer interacted with this element.

Difference between authorship and interaction

elements with a positive DOI also had at least one first authorship or delivery event in the previous three months.

elements with a positive DOI that also had at least

delivery event is four (7%)

Degree-of

code element-- class, method, field-- for each developer.

authorship (DOA) value

Determining DOK Weightings

Appropriate values were determined empirically:

collected from Site1.

Determining weightings that might apply across a broader range of development situations would require gathering data from many more projects

Determining DOK Weightings: Method

Determining DOK Weightings: Analysis and Results

they used the natural logarithms of the values.

Determining DOK Weightings: Analysis and Results

significant in explaining the dependent variable (user rating)

contributed to increased strength of the model.

completely.

rating.

Determining DOK Weightings: Ext xternal Validity of f the Model

Weightings are tested at Site2

determined before (in Site1)

Results:

Case Studies

to software developers

years of professional experience

Case study 1: : Finding Experts

most about each part of the code-base

reported assignments of packages

was consistent with the assignments by the developers.

guesses

reasonable

Case study 1: Finding Experts

Comparison to Expertise Recommenders

Comparison to Expertise Recommenders

and delivery events from the last three months for a developer for each class in the package.

developer assignments, whereas the expertise approach agreed in

Case study 2: : Onboarding

Case study 3: : Id Identify fying Changes of f In Interest

Discussion And Future Work

phases on indicators such as DOK.

to team and individual styles.

the API elements

types as it is likely a subclass user knows the API elements to some extent.

Questions

examples.

Degree-of-knowledge (in addition to authorship and interactions)?

model? Any alternatives?

be used for?