A Degree-of of-Knowledge Model to Capture Source Code Familiarity - - PowerPoint PPT Presentation

a degree of of knowledge model to capture source code
SMART_READER_LITE
LIVE PREVIEW

A Degree-of of-Knowledge Model to Capture Source Code Familiarity - - PowerPoint PPT Presentation

A Degree-of of-Knowledge Model to Capture Source Code Familiarity Thomas Fritz, Jingwen Ou, Gail C. Murphy and Emerson Murphy-Hill Presented by: Haifa Alharthi Outline Problem statement Dataset Description of the main elements used


slide-1
SLIDE 1

A Degree-of

  • f-Knowledge Model to Capture Source

Code Familiarity

Thomas Fritz, Jingwen Ou, Gail C. Murphy and Emerson Murphy-Hill Presented by: Haifa Alharthi

slide-2
SLIDE 2

Outline

  • Problem statement
  • Dataset
  • Description of the main elements used in the model
  • Description of the degree-of-knowledge model
  • Determining the weightings needed in degree-of-knowledge model
  • Case studies
  • Discussion And Future Work
slide-3
SLIDE 3

Problem statement

  • The size and high rate of change of source code make it difficult for

software developers to keep up with who on the team knows about particular parts of the code.

  • The lack of this knowledge results in:
  • May complicate many activities e.g., who to ask when questions arise
  • Make it difficult to know who can bring a new team member up-to-speed in a

particular part of the code

  • Existing approaches to this problem are based solely on authorship of

code.

slide-4
SLIDE 4

Dataset

  • Data was gathered from two professional

development sites:

  • Site 1:
  • 7 professional developers
  • Developers’ experience: 1-22 years (11.6 years
  • n average)
  • Each worked on multiple streams of the code
  • Site 2:
  • 2 professional developers
  • Build open source frameworks for Eclipse
  • Developers’ experience: 3 and 5 years
slide-5
SLIDE 5

Degree-of

  • f-Authorship (D

(DOA):

  • Three factors to determine DOA:
  • First authorship (FA): if developer D created

the first version of the element

  • Number of deliveries (DL): subsequent

changes after first authorship made to the element by D

  • Acceptances (AC): changes to the element

are not completed by D

slide-6
SLIDE 6

Degree-of

  • f-interest (D

(DOI)

The degree-of-interest (DOI): real value that represents the amount of interaction-- selections and edits--a developer has had with a source code element

  • A selection occurs when a developer touches a code

element ( e.g., opens a class)

  • An edit occurs when a keystroke is detected in an editor

window.

  • A selection of an element contributes less to DOI than an

edit of an element.

  • A positive DOI value: developer has been recently

and frequently interacting with the element

  • A negative DOI value: a developer has been

interacting with other elements substantially since the developer interacted with this element.

slide-7
SLIDE 7

Difference between authorship and interaction

  • Site1: On average, only 27%

elements with a positive DOI also had at least one first authorship or delivery event in the previous three months.

  • Site2 : The number of

elements with a positive DOI that also had at least

  • ne first authorship or

delivery event is four (7%)

slide-8
SLIDE 8

Degree-of

  • f-knowledge Model
  • Degree-of-knowledge Model: Assigns a real value to each source

code element-- class, method, field-- for each developer.

  • Two components of Degree-of-knowledge:
  • Developer's longer-term knowledge: represented by a degree-of-

authorship (DOA) value

  • Developer's shorter-term knowledge: represented by a degree-
  • of-interest value (DOI)
slide-9
SLIDE 9

Determining DOK Weightings

Appropriate values were determined empirically:

  • 1. Initial determination of weighting values based on the data

collected from Site1.

  • 2. We then test these weightings at Site2.

Determining weightings that might apply across a broader range of development situations would require gathering data from many more projects

slide-10
SLIDE 10

Determining DOK Weightings: Method

  • At time T3, for each developer:
  • Collected 40 random code elements with (DOI ≠ 0) or (FA > 0) or (DL > 0)
  • Developers assess their knowledge of those elements on a scale 1-5
  • 246 ratings were collected; ratings are ordinal
slide-11
SLIDE 11

Determining DOK Weightings: Analysis and Results

  • Multiple linear regression analysis
  • Independent variables: FA, DL, AC, DOI.
  • Dependent variable: developer ratings
  • DOI and AC can be substantially high so

they used the natural logarithms of the values.

  • The resulting DOK equation is as follows.

The lack of significance in DOI might be from the lack of elements with a positive DOI in the set of randomly chosen elements (only 7%)

slide-12
SLIDE 12

Determining DOK Weightings: Analysis and Results

  • F-test: 19.6 with p<0.000001
  • F test specifies that independent variables (FA,DL,AC.DOI) are jointly

significant in explaining the dependent variable (user rating)

  • Goodness of fit or R Square= 0.25
  • R Square assesses if the addition of an independent variable (e.g., FA) has

contributed to increased strength of the model.

  • R Square value shows that the model does not predict the user rating

completely.

  • Each of the four variables contributes to the overall explanation of the user

rating.

Definitions from: http://www.analystforum.com/forums/cfa-forums/cfa-level-ii-forum/91311972

slide-13
SLIDE 13

Determining DOK Weightings: Ext xternal Validity of f the Model

Weightings are tested at Site2

  • 1. Each developer ranked 40 random code elements from 1-5
  • 2. DOK values for each of the elements was computed using the weightings

determined before (in Site1)

  • 3. The Spearman rank correlation coefficient statistic was applied
  • Non-parametric statistic that is designed to measure the strength of association

between ranked variables.

  • 80 code elements from the two developers

Results:

  • There is a statistically significant correlation with rs = 0:3847 (p=0.0004).
  • The model can predict DOK values with reasonable accuracy

Spearman rank correlation: https://statistics.laerd.com/statistical-guides/spearmans-rank-order-correlation-statistical-guide.php

slide-14
SLIDE 14

Case Studies

  • To determine if degree-of-knowledge (DOK) values can provide value

to software developers

  • 2 case studies included the seven developers at Site1.
  • 3rd case study included different 3 developers at Site2; average of 2.5

years of professional experience

http://appliedpowerconcepts.com/wp-content/themes/Applied%20Power/images/banner-case-studies.jpg

slide-15
SLIDE 15

Case study 1: : Finding Experts

  • Problem: identify which team member knows the

most about each part of the code-base

  • Method:
  • Compared packages predicted by DOK model to the

reported assignments of packages

  • Results
  • 55% of the results we computed based on DOK values

was consistent with the assignments by the developers.

  • However, The developer assignments were sometimes

guesses

  • All six developers stated that the knowledge map was

reasonable

Each package is colored (or labelled) according to the developer with the highest DOK values for that package

slide-16
SLIDE 16

Case study 1: Finding Experts

Comparison to Expertise Recommenders

Comparison to Expertise Recommenders

  • Expertise approach
  • It represents code familiarity based solely on authorship
  • Experts for each package are computed by summing up all first authorship

and delivery events from the last three months for a developer for each class in the package.

  • Results
  • In 49% of the cases, the DOK-based approach agreed with the

developer assignments, whereas the expertise approach agreed in

  • nly 24% of these cases.
  • DOK values can improve on existing approaches to finding experts.
slide-17
SLIDE 17

Case study 2: : Onboarding

  • Problem:
  • considers a mentoring situation where an experienced developer

might use his DOK values to help a new team member become familiar (onboard) into that part of the code base.

  • Method
  • 3 random developers
  • Find for each developer, the twenty elements with the highest DOK

and asked her to specify if it is likely to be helpful for a newcomer.

  • Results
  • Only 3% of the elements were considered to likely be helpful for a

newcomer.

  • The DOK values for the API elements were either very low or zero as

they were neither changing nor were they referred to frequently by the developers who authored them.

  • The elements with high DOK values were not considered helpful.

http://showd.me/wp-content/uploads/2014/12/bigstock- Helping-hand-to-new-member-or-47078242.jpg

slide-18
SLIDE 18

Case study 3: : Id Identify fying Changes of f In Interest

  • Problem: investigated whether a developer's DOK values can be used

to select changes of interest to the developer because of overlap between the source code change and the developer's DOK model.

  • Method
  • We computed a DOK model for each of three developers from Site2.
  • For each bugs of interest, the developer specify whether they had read

the bug or whether they would have wanted to be aware of the bug.

  • Results
  • DOK model provided relevant information to developers in four out of

six cases by recommending non-obvious bugs based on the developers' DOK values.

http://www.nonnymous.com/project-zero/bug.jpg

slide-19
SLIDE 19

Discussion And Future Work

  • The DOK weighting experiment was during testing stage:
  • Long-term studies are needed to better understand the impact of project

phases on indicators such as DOK.

  • The style of code ownership influences DOK values:
  • Study of more teams is needed to determine how robust the DOK values are

to team and individual styles.

  • Elements found using the DOK are often one or two layers below

the API elements

  • Infer familiarity from subclasses up to the API elements that are the super

types as it is likely a subclass user knows the API elements to some extent.

slide-20
SLIDE 20

Questions

  • How serious do you think the problem they are solving? Give

examples.

  • Are there any other factors you would consider for learning the

Degree-of-knowledge (in addition to authorship and interactions)?

  • What do you think of the methods of learning weights of the DOK

model? Any alternatives?

  • In addition to the case studies, what other cases can the DOK model

be used for?