Extracting new metrics from Version Control System for the - - PowerPoint PPT Presentation

extracting new metrics from version control system for
SMART_READER_LITE
LIVE PREVIEW

Extracting new metrics from Version Control System for the - - PowerPoint PPT Presentation

Extracting new metrics from Version Control System for the comparison of software developers Marcello Moura 1 , Hugo Nascimento 2 e Thierson Rosa 2 Centro de Recursos Computacionais 1 , Instituto de Inform atica 2 Universidade Federal de Goi


slide-1
SLIDE 1

Extracting new metrics from Version Control System for the comparison of software developers

Marcello Moura1, Hugo Nascimento2 e Thierson Rosa2

Centro de Recursos Computacionais1, Instituto de Inform´ atica2 Universidade Federal de Goi´ as (UFG) Caixa Postal 131 – 74.001-970 – Goiˆ ania – GO – Brazil marcello@ufg.br, {hadn,thierson}@inf.ufg.br

Goiˆ ania, 21 de Setembro 2014

Moura, Nascimento e Rosa Extracting new metrics from VCS ... 1 / 48

slide-2
SLIDE 2

Summary I

1

Introduction

2

Extracting fine-grain operations from VCS

3

Metrics for the developers

4

Comparison of the developers

5

The case study

6

Conclusion

Moura, Nascimento e Rosa Extracting new metrics from VCS ... 2 / 48

slide-3
SLIDE 3

Summary

1

Introduction

2

Extracting fine-grain operations from VCS

3

Metrics for the developers

4

Comparison of the developers

5

The case study

6

Conclusion

Moura, Nascimento e Rosa Extracting new metrics from VCS ... 3 / 48

slide-4
SLIDE 4

Introduction

Version Control Systems (VCSs), like Subversion and Git, store revisions of the files of a software development project, registering its historical evolution.

Moura, Nascimento e Rosa Extracting new metrics from VCS ... 4 / 48

slide-5
SLIDE 5

Introduction

VCSs have been used for: Helping to understand the software development process – Lopez-Fernandez et al. [2004], Huang and Liu [2005], Girba et al. [2005], Voinea and Telea [2006] and Voinea et al. [2007]. Helping to know more about the developers – Gilbert and Karahalios [2007], Jermakovics et al. [2011], Mockus and Herbsleb [2002], Minto and Murphy [2007], Schuler and Zimmermann [2008], Zhang et al. [2008a,b] and Di Bella et al. [2013].

Moura, Nascimento e Rosa Extracting new metrics from VCS ... 5 / 48

slide-6
SLIDE 6

Introduction

Our work focuses on understanding the developers by the analisys of their work.

1

We identify and count finer-grain operations at line and file levels that can be extracted from a VCS, like additions, deletions and modifications.

This allows to derive a much more detailed and rich information about the work performed by the developers.

2

We calculate a new set of formally defined metrics.

3

Developers are characterized by comparing each one of them against the others.

Two comparison approaches for this aim are described.

Moura, Nascimento e Rosa Extracting new metrics from VCS ... 6 / 48

slide-7
SLIDE 7

Introduction

Note: The VCS data can not be taken as a full and precise description of the software development process. It is incomplete and may lead to distinct interpretations. (e.g. Negara et al. [2012]) Information extracted from a VCS has to be revalidated by the project managers and complemented with their own knowledge.

Moura, Nascimento e Rosa Extracting new metrics from VCS ... 7 / 48

slide-8
SLIDE 8

Introduction

Note: The VCS data can not be taken as a full and precise description of the software development process. It is incomplete and may lead to distinct interpretations. (e.g. Negara et al. [2012]) Information extracted from a VCS has to be revalidated by the project managers and complemented with their own knowledge.

Moura, Nascimento e Rosa Extracting new metrics from VCS ... 7 / 48

slide-9
SLIDE 9

Summary

1

Introduction

2

Extracting fine-grain operations from VCS

3

Metrics for the developers

4

Comparison of the developers

5

The case study

6

Conclusion

Moura, Nascimento e Rosa Extracting new metrics from VCS ... 8 / 48

slide-10
SLIDE 10

Extracting fine-grain operations from VCS

Basic notation:

P – a software project in a VCS D – the set of developers that worked on P. A – the set of all files created during the development of P

Ar ⊆ A – the set of files that were removed (not reached the final

version) of P.

Moura, Nascimento e Rosa Extracting new metrics from VCS ... 9 / 48

slide-11
SLIDE 11

Extracting fine-grain operations from VCS

We mine the VCS for three types of operations: additions, deletions and modifications of files and lines of code. Project History

Moura, Nascimento e Rosa Extracting new metrics from VCS ... 9 / 48

slide-12
SLIDE 12

Extracting fine-grain operations from VCS

Moura, Nascimento e Rosa Extracting new metrics from VCS ... 9 / 48

slide-13
SLIDE 13

Summary

1

Introduction

2

Extracting fine-grain operations from VCS

3

Metrics for the developers

4

Comparison of the developers

5

The case study

6

Conclusion

Moura, Nascimento e Rosa Extracting new metrics from VCS ... 10 / 48

slide-14
SLIDE 14

Metrics for the developers

Aspects defined for consideration:

1

Effort – represents the total amount of operations of a type performed by a developer.

2

Code-survival – indicates the amount of operations of a type performed by a developer and not changed later by anyone.

Moura, Nascimento e Rosa Extracting new metrics from VCS ... 11 / 48

slide-15
SLIDE 15

Metrics for the developers

  • A. Metrics for evaluating developers individually

Effo Add(d) = ∑

a∈A

|Ha|

i=1

  • 1

if oa,i

1 .devel = d

  • therwise.

Effo Mod(d) = ∑

a∈A

|Ha|

i=1

|ha

li |

j=1

    

1 if oa,i

j .devel = d

and oa,i

j .type = MOD;

  • therwise.

Moura, Nascimento e Rosa Extracting new metrics from VCS ... 12 / 48

slide-16
SLIDE 16

Metrics for the developers

  • A. Metrics for evaluating developers individually

Surv Add(d) =

a∈(A−Ar)

|Ha|

i=1

            

1 if oa,i

1 .devel = d

and ∀ oa,i

s

with s > 1,

(oa,i

s .type = MOD

and oa,i

s .devel = d);

  • therwise.

Surv Mod(d) =

a∈(A−Ar)

|Ha|

i=1

            

1 if oa,i

end.type = MOD

and oa,i

end.devel = d

and ∃w,1 ≤ w < |ha

li |,

such that oa,i

w .devel = d;

  • therwise.

Moura, Nascimento e Rosa Extracting new metrics from VCS ... 13 / 48

slide-17
SLIDE 17

Metrics for the developers

  • A. Metrics for evaluating developers individually

Surv Add Div Effo Add(d) = Surv Add(d) Effo Add(d)

Moura, Nascimento e Rosa Extracting new metrics from VCS ... 14 / 48

slide-18
SLIDE 18

Metrics for the developers

  • B. Uncovering and measuring relationships between developers

Also, ADD DEL, MOD MOD, MOD DEL.

Moura, Nascimento e Rosa Extracting new metrics from VCS ... 15 / 48

slide-19
SLIDE 19

Metrics for the developers

  • B. Uncovering and measuring relationships between developers

Line Add Mod(x,y) = ∑

a∈A

|Ha|

i=1

                

1 if |hli| > 1 and oa,i

1 .devel = x

and oa,i

1 .type = ADD

and oa,i

2 .devel = y

and oa,i

2 .type = MOD;

  • therwise.

Moura, Nascimento e Rosa Extracting new metrics from VCS ... 16 / 48

slide-20
SLIDE 20

Metrics for the developers

  • B. Uncovering and measuring relationships between developers

Line Add ΣMod(d) =

y∈D−{d}

Line Add Mod(d, y) Line ΣAdd Mod(d) =

x∈D−{d}

Line Add Mod(x, d)

Moura, Nascimento e Rosa Extracting new metrics from VCS ... 17 / 48

slide-21
SLIDE 21

Metrics for the developers

  • C. Extending the metrics for the file level

A project revision is a triple (r,d,L), where: r is the label of the revision, d is a identifier of the developer who made the revision, with d ∈ D, and L is a list of pairs (a,t) where a is a file and t ∈ {A,M,D} describes the operation. A project revision sequence is a sequence

S = (r1,d1,L1),(r2,d2,L2),...,(rm,dm,Lm) of project revisions

that represent the history of changes made on the files of P without going into detail about the changes made on their individual lines.

Moura, Nascimento e Rosa Extracting new metrics from VCS ... 18 / 48

slide-22
SLIDE 22

Metrics for the developers

  • C. Extending the metrics for the file level

File Add Mod(x,y) = ∑

a∈A

                          

1 if there are triples (ri,di,Li) and (rj,dj,Lj) in S,with i < j, such that di = x,dj = y,

(a,A) ∈ Li and (a,M) ∈ Lj,

and for which there is no triple

(rk,dk,Lk) with i < k < j

such that (a,t) ∈ Lk for any operation of type t;

  • therwise.

Moura, Nascimento e Rosa Extracting new metrics from VCS ... 19 / 48

slide-23
SLIDE 23

Metrics for the developers

  • C. Extending the metrics for the file level

File Add ΣMod(d) =

y∈D−{d}

File Add Mod(d, y) File ΣAdd Mod(d) =

x∈D−{d}

File Add Mod(x, d)

Moura, Nascimento e Rosa Extracting new metrics from VCS ... 20 / 48

slide-24
SLIDE 24

Metrics for the developers

  • D. Metrics regarding commits

Commits(x,y) =

|S|−1

i=1

      

1 if triples (ri,di,Li) and

(ri+1,di+1,Li+1) are such that

di = x and di+1 = y;

  • therwise.

ΣCommits(d) =

|S|

i=1

  

1 if triple (ri,di,Li) is such that di = d;

  • therwise.

Moura, Nascimento e Rosa Extracting new metrics from VCS ... 21 / 48

slide-25
SLIDE 25

Metrics for the developers

Metric Rel(d) = Metric(d)

∑x∈D Metric(x)

Moura, Nascimento e Rosa Extracting new metrics from VCS ... 22 / 48

slide-26
SLIDE 26

Summary

1

Introduction

2

Extracting fine-grain operations from VCS

3

Metrics for the developers

4

Comparison of the developers

5

The case study

6

Conclusion

Moura, Nascimento e Rosa Extracting new metrics from VCS ... 23 / 48

slide-27
SLIDE 27

Comparison of the developers

  • A. Performance-based hierarchy

All metrics should have the same orientation

Moura, Nascimento e Rosa Extracting new metrics from VCS ... 24 / 48

slide-28
SLIDE 28

Comparison of the developers

  • B. Similarity Comparison

Moura, Nascimento e Rosa Extracting new metrics from VCS ... 25 / 48

slide-29
SLIDE 29

Summary

1

Introduction

2

Extracting fine-grain operations from VCS

3

Metrics for the developers

4

Comparison of the developers

5

The case study

6

Conclusion

Moura, Nascimento e Rosa Extracting new metrics from VCS ... 26 / 48

slide-30
SLIDE 30

The case study

Evaluating the metrics and the comparison approaches with qualitative assessment on a real software-development project. The software Weby A content management system built by UFG. Hosting more than 400 internal web sites1. Considered time (1 year and 7 months). Eleven (11) developers contributed to the evolution of the source code.

One developer was also the project manager.

1,294 code revisions into VCS (Subversion) of UFG.

1The available at https://github.com/cercomp/weby.

Moura, Nascimento e Rosa Extracting new metrics from VCS ... 27 / 48

slide-31
SLIDE 31

The case study

Files Lines D. Commits Add. Mod. Del. Add. Mod. Del. d1 474 482 1,807 64 110,204 7,026 54,710 d2 159 47 453 4 4,340 1,531 1,587 d3 2 6 26 31 165 d4 170 314 585 12 44,013 1,577 1,224 d5 30 43 78 1 1,736 142 205 d6 99 333 367 17 51,673 1,548 3,220 d7 61 12 379 15 1,116 923 1,214 d8 183 848 783 29 85,686 4,688 5,289 d9 20 1 34 102 398 15 d10 24 8 74 5 542 196 476 d11 72 7 199 4 1,190 489 308 Total 1,294 2,095 4,765 151 300,628 18,549 68,413

Moura, Nascimento e Rosa Extracting new metrics from VCS ... 28 / 48

slide-32
SLIDE 32

The case study

The evaluation was conducted through two assessments involving four steps each:

1

Calculation of the values of a set of metrics for all developers.

2

Computation of the hierarchy of classes and the MDS visualization.

3

Interview with the project manager, aiming to verify if the classes and the visualization produced by the comparison approaches match his/her perception about the developers.

4

Analysis and interpretation of the results obtained from the interview.

Moura, Nascimento e Rosa Extracting new metrics from VCS ... 29 / 48

slide-33
SLIDE 33

The case study

Cargo: Local e Data: 1 2 3 4 5 6

Formulário de Entrevista

Nome do Entrevistado: Nome do Projeto: Formação: Explicar os dados existentes e as métricas. (Explicar o que o sistema desenvolvido faz) Apresentar a classificação por classe de dominância. (Explicar o significado de cada classe) Perguntas sobre a classe de dominância. a) “Essa separação faz sentido para você?” b) “Se você fosse escolher um ou mais desenvolvedores para um projeto futuro, esta classificação ajudaria? Por quê? Quais

  • s desenvolvedores você escolheria?”

c) “Você classificaria os desenvolvedores dessa mesma forma? Por quê? Se não, como seria sua classificação?” d) “Tem algum desenvolvedor que você acha que foi classificado equivocadamente?” Apresentar a visualização em MDS. (Explicar o que significa a distância entre dois desenvolvedores) Perguntas sobre a visualização em MDS. e) “Os desenvolvedores que estão próximos são, de fato, parecidos na sua produção técnica? Eles produzem resultados semelhantes?” f) “Como você rotularia (daria nomes com base em alguma característica de similaridade) os “grupos” de pessoas visivelmente próximas?” g) “Há alguma discrepância ou semelhança entre os resultados das classes de dominância, apresentadas anteriormente, e a visualização MDS atual?” Perguntas sobre o conjunto total de métricas. h) “Você concorda que quanto maior for o valor obtido em cada uma dessas 4 métricas melhor foi o desempenho do desenvolvedor? Por quê?” i) “Quais outras métricas (da planilha completa) você acha interessante/útil para uma avaliação dos desenvolvedores? Por quê?”

Moura, Nascimento e Rosa Extracting new metrics from VCS ... 30 / 48

slide-34
SLIDE 34

The case study

  • A. Metrics and comparisons computed in the first assessment

D. Surv Add Surv Mod Surv Add Div Surv Mod Div Effo Add Effo Dist Mod d1 102,817 539 0.932 0.253 d2 3,188 294 *0.734 *0.609 d3 0.000 0.000 d4 41,929 410 0.952 0.455 d5 1,185 21 *0.682 *0.437 d6 50,630 479 0.979 *0.807 d7 483 163 *0.432 *0.612 d8 83,409 1,302 0.973 0.632 d9 55 211 *0.539 *0.875 d10 225 43 *0.415 *0.605 d11 1,053 315 *0.884 *0.734

Moura, Nascimento e Rosa Extracting new metrics from VCS ... 31 / 48

slide-35
SLIDE 35

The case study

Equivalence Classes Developers 1 d1, d6, d8 2 d4 3 d2, d11 4 d5, d7, d9 5 d10 6 d3

Moura, Nascimento e Rosa Extracting new metrics from VCS ... 32 / 48

slide-36
SLIDE 36

The case study

Moura, Nascimento e Rosa Extracting new metrics from VCS ... 33 / 48

slide-37
SLIDE 37

The case study

Equivalence Classes Developers [first] Developers [second] 1 d1, d6, d8 d1, d6, d4, d8 2 d4 d2, d11 3 d2, d11 d5, d7, d9 4 d5, d7, d9 d10 5 d10 d3 6 d3

Moura, Nascimento e Rosa Extracting new metrics from VCS ... 34 / 48

slide-38
SLIDE 38

The case study

Moura, Nascimento e Rosa Extracting new metrics from VCS ... 35 / 48

slide-39
SLIDE 39

Summary

1

Introduction

2

Extracting fine-grain operations from VCS

3

Metrics for the developers

4

Comparison of the developers

5

The case study

6

Conclusion

Moura, Nascimento e Rosa Extracting new metrics from VCS ... 36 / 48

slide-40
SLIDE 40

Conclusion I

We presented new formal definitions and metrics that allow the extraction of basic but important information from projects hosted in VCSs. We considered measures of efforts and code-survival. Two approaches were suggested for comparing the developers. A case study with a real software project was carried out. The results showed the usefulness of the metrics and of the comparison approaches. The new metrics may help to unveil interesting facts. But there are limitations in the use of VCS data. The logs are in general incomplete and can lead to ambiguous interpretation.

Moura, Nascimento e Rosa Extracting new metrics from VCS ... 37 / 48

slide-41
SLIDE 41

Conclusion II

We tried to compensate this weakness by involving the project manager.

Moura, Nascimento e Rosa Extracting new metrics from VCS ... 38 / 48

slide-42
SLIDE 42

Future Work

Future investigations include: formulating new metrics; using other techniques to compare the developers; improving the diff analysis for detecting other types of

  • peration;

exploring more sources of data.

Moura, Nascimento e Rosa Extracting new metrics from VCS ... 39 / 48

slide-43
SLIDE 43

Questions?

Moura, Nascimento e Rosa Extracting new metrics from VCS ... 40 / 48

slide-44
SLIDE 44

Extracting new metrics from Version Control System for the comparison of software developers

Marcello Moura1, Hugo Nascimento2 e Thierson Rosa2

Centro de Recursos Computacionais1, Instituto de Inform´ atica2 Universidade Federal de Goi´ as (UFG) Caixa Postal 131 – 74.001-970 – Goiˆ ania – GO – Brazil marcello@ufg.br, {hadn,thierson}@inf.ufg.br

Goiˆ ania, 21 de Setembro 2014

Moura, Nascimento e Rosa Extracting new metrics from VCS ... 41 / 48

slide-45
SLIDE 45

References I

Enrico Di Bella, Alberto Sillitti, and Giancarlo Succi. A multivariate classification of open source developers. Information Sciences, 221(0):72–83, February 2013. ISSN 0020-0255. doi: http://dx.doi.org/10.1016/j.ins.2012.09.031. Eric Gilbert and Karrie Karahalios. Codesaw: A social visualization of distributed software development. In Proceedings of the 11th IFIP TC 13 International Conference

  • n Human-computer Interaction - Volume Part II,

INTERACT’07, pages 303–316, Berlin, Heidelberg, 2007. Springer-Verlag. ISBN 3-540-74799-0, 978-3-540-74799-4.

Moura, Nascimento e Rosa Extracting new metrics from VCS ... 42 / 48

slide-46
SLIDE 46

References II

Tudor Girba, Adrian Kuhn, Mauricio Seeberger, and St´ ephane

  • Ducasse. How Developers Drive Software Evolution. In

Proceedings of the Eighth International Workshop on Principles of Software Evolution, IWPSE’05, pages 113–122, Washington, DC, USA, 2005. IEEE Computer Society. ISBN 0-7695-2349-8. doi: 10.1109/IWPSE.2005.21. Shih-Kun Huang and Kang-min Liu. Mining version histories to verify the learning process of legitimate peripheral

  • participants. SIGSOFT Software Engineering Notes, 30(4):

1–5, May 2005. ISSN 0163-5948. doi:

10.1145/1082983.1083158.

Moura, Nascimento e Rosa Extracting new metrics from VCS ... 43 / 48

slide-47
SLIDE 47

References III

Andrejs Jermakovics, Alberto Sillitti, and Giancarlo Succi. Mining and visualizing developer networks from version control systems. In Proceedings of the 4th International Workshop on Cooperative and Human Aspects of Software Engineering, CHASE ’11, pages 24–31, New York, NY, USA,

  • 2011. ACM. ISBN 978-1-4503-0576-1. doi:

10.1145/1984642.1984647.

Luis Lopez-Fernandez, Gregorio Robles, and Jesus M. Gonzalez-Barahona. Applying Social Network Analysis to the Information in CVS Repositories. In First International Workshop on Mining Software Repositories, pages 101–105, 2004.

Moura, Nascimento e Rosa Extracting new metrics from VCS ... 44 / 48

slide-48
SLIDE 48

References IV

Shawn Minto and Gail C. Murphy. Recommending emergent

  • teams. In Proceedings of the Fourth International Workshop
  • n Mining Software Repositories, MSR ’07, page 5,

Washington, DC, USA, 2007. IEEE Computer Society. ISBN 0-7695-2950-X. doi: 10.1109/MSR.2007.27. Audris Mockus and James D. Herbsleb. Expertise browser: A quantitative approach to identifying expertise. In Proceedings

  • f the 24th International Conference on Software Engineering,

ICSE ’02, pages 503–512, New York, NY, USA, 2002. ACM. ISBN 1-58113-472-X. doi: 10.1145/581339.581401.

Moura, Nascimento e Rosa Extracting new metrics from VCS ... 45 / 48

slide-49
SLIDE 49

References V

Stas Negara, Mohsen Vakilian, Nicholas Chen, RalphE. Johnson, and Danny Dig. Is It Dangerous to Use Version Control Histories to Study Source Code Evolution? In James Noble, editor, ECOOP 2012 - Object-Oriented Programming, volume 7313 of Lecture Notes in Computer Science, pages 79–103. Springer Berlin Heidelberg, 2012. ISBN 978-3-642-31056-0. doi: 10.1007/978-3-642-31057-7 5. David Schuler and Thomas Zimmermann. Mining usage expertise from version archives. In Proceedings of the 2008 International Working Conference on Mining Software Repositories, MSR ’08, pages 121–124, New York, NY, USA,

  • 2008. ACM. ISBN 978-1-60558-024-1. doi:

10.1145/1370750.1370779.

Moura, Nascimento e Rosa Extracting new metrics from VCS ... 46 / 48

slide-50
SLIDE 50

References VI

L Voinea, J Lukkien, and A Telea. Visual Assessment of Software Evolution. Science of Computer Programming, 65 (3):222–248, April 2007. ISSN 01676423. Lucian Voinea and Alexandru Telea. An Open Framework for CVS repository Querying, Analysis and Visualization. In Proceedings of the 2006 international workshop on Mining software repositories - MSR’06, pages 33–39, New York, NY, USA, May 20-28 2006. ACM Press. ISBN 1595933972. doi:

10.1145/1137983.1137993.

Shen Zhang, Yongji Wang, and Junchao Xiao. Mining Individual Performance Indicators in Collaborative Development Using Software Repositories. In Software Engineering Conference,

  • 2008. APSEC ’08. 15th Asia-Pacific, pages 247 –254,

December 2008a. doi: 10.1109/APSEC.2008.12.

Moura, Nascimento e Rosa Extracting new metrics from VCS ... 47 / 48

slide-51
SLIDE 51

References VII

Shen Zhang, Yongji Wang, Ye Yang, and Junchao Xiao. Capability assessment of individual software development processes using software repositories and dea. In Proceedings of the Software Process, 2008 International Conference on Making Globally Distributed Software Development a Success Story, ICSP’08, pages 147–159, Berlin, Heidelberg, 2008b. Springer-Verlag. ISBN 3-540-79587-1, 978-3-540-79587-2.

Moura, Nascimento e Rosa Extracting new metrics from VCS ... 48 / 48