RecoveringTraceabilityLinks viaInforma7onRetrievalMethods - PowerPoint PPT Presentation

Recovering Traceability Links  via Informa7on Retrieval Methods ‐ Challenges and Opportuni7es ‐ Dr. Rocco Oliveto, Ph.D. Department of MathemaFcs and InformaFcs,  University of Salerno 84084, Fisciano (SA), Italy roliveto@unisa.it École Polytechnique de Montréal, Montréal, Québec, Canada ‐ September 3rd, 2009 1

Agenda • Traceability recovery: why? – Context and moFvaFon • IR‐based traceability recovery: how? – Canonical IR‐based traceability recovery process – A two step process: incremental process and coverage link analysis • IR‐based traceability recovery in pracFce – Lesson learned from case studies and controlled experiments • Conclusion and challanges in traceability recovery École Polytechnique de Montréal, Montréal, Québec, Canada ‐ September 3rd, 2009 2

Traceability recovery: why? Recovering Traceability Links via Informa7on  Retrieval Methods: Challenges and Opportuni7es by Rocco Oliveto École Polytechnique de Montréal, Montréal, Québec, Canada ‐ September 3rd, 2009 3

Context • Traceability... – the ability to describe and follow the artefact life‐cycle – Example: a use case is implemented by one or more classes that are tested by  a set of test cases • Mantaining traceability between so[ware artefacts is important for  so[ware development and maintenance – program comprehension – requirement tracing – impact analysis – so[ware reuse – … École Polytechnique de Montréal, Montréal, Québec, Canada ‐ September 3rd, 2009 4

Mo7va7ons • Maintaining traceability links during so[ware evoluFon – Tedious and error prone task – O[en this informaFon becomes out of date or it is completely absent – Inadequate traceability contributes to project over‐runs and failures  • Artefact management tools that support traceability do not  provide adequate automaFc or semi‐automaFc traceability  link generaFon and maintenance – The traceability matrix has to be manually managed – Need for automaFc (or semi‐automaFc) traceability link recovery • Promising results have been achieved by using InformaFon  Retrieval methods – The approach was proposed in 1999 by Antoniol et al. École Polytechnique de Montréal, Montréal, Québec, Canada ‐ September 3rd, 2009 5

IR‐based Traceability Recovery • RaFonale... – Most so[ware artefacts contains text – Requirement specificaFons, design documents, idenFfiers and comments in  UML diagrams and source code, test case specificaFons, manual pages,  maintenance reports, change logs • Conjecture... – Artefacts having a high text similarity are likely good candidates to be traced  onto each other – Artefacts with high similairty probably describe similar concepts  • AssumpFon... – Consistent use of domain terms in the so[ware documents (e.g., programmers  use meaningful names for program’s items, such as funcFons, variables, types,  classes, and methods. École Polytechnique de Montréal, Montréal, Québec, Canada ‐ September 3rd, 2009 6

IR‐based traceability recovery: how? Recovering Traceability Links via Informa7on  Retrieval Methods: Challenges and Opportuni7es by Rocco Oliveto École Polytechnique de Montréal, Montréal, Québec, Canada ‐ September 3rd, 2009 7

The traceability recovery process École Polytechnique de Montréal, Montréal, Québec, Canada ‐ September 3rd, 2009 8

Indexer and classifier: two basic  models • ProbabilisFc model – The similarity between a source and a target artefact is based on the  probability that the target artefact is related to the source artefact – Not discussed in details in this talk… • Vector space model – Source and target artefacts are represented in a vector space and the  similarity is computed through vector operaFons, e.g. cosine of the angle  between the two vectors  • Many improvements of the basic models – Latent SemantIc Indexing – Keyword list – Relevance feedback analysis École Polytechnique de Montréal, Montréal, Québec, Canada ‐ September 3rd, 2009 9

Vector Space Model (VSM) • So[ware artefacts are represented as vectors in the space of the  terms (vocabulary)  – Also possible to use a combinaFon of terms (i.e., n‐grams) as vector  characterisFcs (…expensive) – The artefact space is represented by the  term‐by‐document  matrix T2 Term-by-document matrix Geometrical representation of term-by-document matrix D1 D2 D3 D2 T1 1 4 0 T2 2 1 3 D1 D3 T1 École Polytechnique de Montréal, Montréal, Québec, Canada ‐ September 3rd, 2009 10

Term weigh7ng • How to represent the importance (i.e., weight) of a term in a  document?  – Term occurrences – Boolean value (0 if the term occurs, 1 otherwise) – An advanced approach considers local and global weights • Generally, a generic entry a i,j  of the term‐by‐document matrix is  calculated as follow:  a i,j = L ( i, j ) · G ( i ) • Tf‐Idf term weighFng: n i,j � � s tf i,j = k n k,j , n s id , f i = log P doc i École Polytechnique de Montréal, Montréal, Québec, Canada ‐ September 3rd, 2009 11

Artefact similarity • How to define the textual similairty between artefacts?  – Using the corresponding vectors – Dot product or... – cosine of the angle between the two corresponding vectors (beger) − → D · − → � t i ∈ D,Q w t i D · w t i Q Q sim ( D, Q ) = = �− → D � · � − → �� Q � t i ∈ D w 2 t i ∈ Q w 2 t i D · t i Q • The cosine: – Has values in [0, 1] since the maximum angle is 90°  – Increases as more  terms are shared • Thus, two artefacts are considered similar if their corresponding  vectors point in the same direcFon (the angle is close to 0°) École Polytechnique de Montréal, Montréal, Québec, Canada ‐ September 3rd, 2009 12

Limita7ons of the VSM • The vector space model does not take into account relaFons  between terms – It soffers of the synonymy and polysemy problems – synonymy: different words with the same meaning – polysemy: same words with different meanings (depending on the context) • For instance, having “automobile” in one artefacts and “car” in  another artefact does not contribute to the similarity measure  between these two documents • How to try to miFgate such problems – Using a dicFonary – By using morphological analysis, like stemming • Stemming aims at removing suffixes of words to extract their stems • Example: working, worker, worked have the same stem work École Polytechnique de Montréal, Montréal, Québec, Canada ‐ September 3rd, 2009 13

Latent Seman7c Indexing (LSI) • Extension of the vector space model – Provides a way to automaFcally deal with synonymy and polisemy  – Avoids preliminary morphological analysis  • How does LSI miFgate the synonumy and polisemy problems? – It analyses the co‐occurrence of the terms by using the Singular Value  DecomposiFon (SVD) • SVD is used to decompose the term‐by‐document matrix into a set  of k orthogonal factors from which the original matrix can be  approximated by linear combinaFon – The idea is to reduce the space of the terms – Reducing the term space we also reduce the noice in the word usage caused  by synonymy and polisemy words École Polytechnique de Montréal, Montréal, Québec, Canada ‐ September 3rd, 2009 14

RecoveringTraceabilityLinks viaInforma7onRetrievalMethods - PowerPoint PPT Presentation

RecoveringTraceabilityLinks viaInforma7onRetrievalMethods ChallengesandOpportuni7es Dr.RoccoOliveto,Ph.D. DepartmentofMathemaFcsandInformaFcs, UniversityofSalerno

Produce Traceability Initiative Produce Traceability Initiative Traceability 2012 Traceability

Links Student Web Presence Guidelines Summary 1. The Purpose of Links 2. Worst Links 3. Best

Traceability: Overview GrapeNet 2 Traceability in practice In practice, the term

Supply shed & traceability Refinery Supply shed & traceability Certified sustainable

On the Equivalence of Information Retrieval Methods for Automated Traceability Link Recovery

Outline 1 Basics 2 Traceability links 3 Evolve requirements 4 Way More Stuff Requirements

XML Retrieval XML Retrieval XML Retrieval XML Retrieval DB/IR in DB/IR in Theory Theory Web

LINKS AND RULES GENOME VISUALIZATION WITH CIRCOS LINKS AND RULES 1 Martin Krzywinski

How of the Conceptual Future Internet Links lead to links that link to other links. Many

Macro-Level Traceability via Media Transformations Orlena Gotel Stephen Morris Pace University,

Traceability in laboratory medicine: a driver of accurate results for patients Graham H Beastall

Key Traceability Unlocking Seafood Supply Chains Meeting Market Requirements for Longline Tuna

On the traceability in a graph roundtrip On the traceability in a graph roundtrip transformation

Retrieval by Content Part 2: Text Retrieval Term Frequency and Inverse Document Frequency

Retrieval by Content Image Retrieval Image Retrieval Problem Large Image and video data sets

Information Retrieval Introducing Information Retrieval and Web Search Information Retrieval

Towards Distributed Trustworthy Traceability and Accountability Jrn Erbguth a and Jean-Henry

White-Box Security Notions for Symmetric Encryption Schemes ee 1 ede Lepoint 1 , 2 C ecile

Strong Jump-Traceability The Computably Enumerable Case Peter Cholak University of Notre Dame

DISTRIBUTED TRACING WHO ARE WE? Frank Pfleger Lukasz Pielak @frankpfleger

E-Passport: The Global Traceability or How to Feel Like an UPS Package Dario Carluccio, Kerstin

Non-Hamiltonian and Non-Traceable Regular 3-Connected Planar Graphs Nico Van Cleemput Carol T.

Scott Hebbard Scott Hebbard Communicatjons Manager at Sparx Systems Over 2 decades of

Comparison of complementary statistical analysis approaches in metabolomic food traceability Ral

RecoveringTraceabilityLinks viaInforma7onRetrievalMethods - PowerPoint PPT Presentation

RecoveringTraceabilityLinks viaInforma7onRetrievalMethods ChallengesandOpportuni7es Dr.RoccoOliveto,Ph.D. DepartmentofMathemaFcsandInformaFcs, UniversityofSalerno

Produce Traceability Initiative Produce Traceability Initiative Traceability 2012 Traceability

Links Student Web Presence Guidelines Summary 1. The Purpose of Links 2. Worst Links 3. Best

Traceability: Overview GrapeNet 2 Traceability in practice In practice, the term

Supply shed &amp; traceability Refinery Supply shed &amp; traceability Certified sustainable

On the Equivalence of Information Retrieval Methods for Automated Traceability Link Recovery

Outline 1 Basics 2 Traceability links 3 Evolve requirements 4 Way More Stuff Requirements

XML Retrieval XML Retrieval XML Retrieval XML Retrieval DB/IR in DB/IR in Theory Theory Web

LINKS AND RULES GENOME VISUALIZATION WITH CIRCOS LINKS AND RULES 1 Martin Krzywinski

How of the Conceptual Future Internet Links lead to links that link to other links. Many

Macro-Level Traceability via Media Transformations Orlena Gotel Stephen Morris Pace University,

Traceability in laboratory medicine: a driver of accurate results for patients Graham H Beastall

Key Traceability Unlocking Seafood Supply Chains Meeting Market Requirements for Longline Tuna

On the traceability in a graph roundtrip On the traceability in a graph roundtrip transformation

Retrieval by Content Part 2: Text Retrieval Term Frequency and Inverse Document Frequency

Retrieval by Content Image Retrieval Image Retrieval Problem Large Image and video data sets

Information Retrieval Introducing Information Retrieval and Web Search Information Retrieval

Towards Distributed Trustworthy Traceability and Accountability Jrn Erbguth a and Jean-Henry

White-Box Security Notions for Symmetric Encryption Schemes ee 1 ede Lepoint 1 , 2 C ecile

Strong Jump-Traceability The Computably Enumerable Case Peter Cholak University of Notre Dame

DISTRIBUTED TRACING WHO ARE WE? Frank Pfleger Lukasz Pielak @frankpfleger

E-Passport: The Global Traceability or How to Feel Like an UPS Package Dario Carluccio, Kerstin

Non-Hamiltonian and Non-Traceable Regular 3-Connected Planar Graphs Nico Van Cleemput Carol T.

Scott Hebbard Scott Hebbard Communicatjons Manager at Sparx Systems Over 2 decades of

Comparison of complementary statistical analysis approaches in metabolomic food traceability Ral

Supply shed & traceability Refinery Supply shed & traceability Certified sustainable