RecoveringTraceabilityLinks viaInforma7onRetrievalMethods - - PowerPoint PPT Presentation

recovering traceability links via informa7on retrieval
SMART_READER_LITE
LIVE PREVIEW

RecoveringTraceabilityLinks viaInforma7onRetrievalMethods - - PowerPoint PPT Presentation

RecoveringTraceabilityLinks viaInforma7onRetrievalMethods ChallengesandOpportuni7es Dr.RoccoOliveto,Ph.D. DepartmentofMathemaFcsandInformaFcs, UniversityofSalerno


slide-1
SLIDE 1

École
Polytechnique
de
Montréal,
Montréal,
Québec,
Canada
‐
September
3rd,
2009

Recovering
Traceability
Links
 via
Informa7on
Retrieval
Methods

‐
Challenges
and
Opportuni7es
‐

Dr.
Rocco
Oliveto,
Ph.D. Department
of
MathemaFcs
and
InformaFcs,
 University
of
Salerno 84084,
Fisciano
(SA),
Italy roliveto@unisa.it

1

slide-2
SLIDE 2

Agenda

  • Traceability
recovery:
why?

– Context
and
moFvaFon

  • IR‐based
traceability
recovery:
how?

– Canonical
IR‐based
traceability
recovery
process – A
two
step
process:
incremental
process
and
coverage
link
analysis

  • IR‐based
traceability
recovery
in
pracFce

– Lesson
learned
from
case
studies
and
controlled
experiments

  • Conclusion
and
challanges
in
traceability
recovery

École
Polytechnique
de
Montréal,
Montréal,
Québec,
Canada
‐
September
3rd,
2009 2

slide-3
SLIDE 3

Traceability
recovery:
why?

Recovering
Traceability
Links
via
Informa7on
 Retrieval
Methods:
Challenges
and
Opportuni7es

by
Rocco
Oliveto

École
Polytechnique
de
Montréal,
Montréal,
Québec,
Canada
‐
September
3rd,
2009 3

slide-4
SLIDE 4

Context

  • Traceability...

– the
ability
to
describe
and
follow
the
artefact
life‐cycle – Example:
a
use
case
is
implemented
by
one
or
more
classes
that
are
tested
by
 a
set
of
test
cases

  • Mantaining
traceability
between
so[ware
artefacts
is
important
for


so[ware
development
and
maintenance

– program
comprehension – requirement
tracing – impact
analysis – so[ware
reuse – …

École
Polytechnique
de
Montréal,
Montréal,
Québec,
Canada
‐
September
3rd,
2009 4

slide-5
SLIDE 5

Mo7va7ons

  • Maintaining
traceability
links
during
so[ware
evoluFon

– Tedious
and
error
prone
task – O[en
this
informaFon
becomes
out
of
date
or
it
is
completely
absent – Inadequate
traceability
contributes
to
project
over‐runs
and
failures


  • Artefact
management
tools
that
support
traceability
do
not


provide
adequate
automaFc
or
semi‐automaFc
traceability
 link
generaFon
and
maintenance

– The
traceability
matrix
has
to
be
manually
managed – Need
for
automaFc
(or
semi‐automaFc)
traceability
link
recovery

  • Promising
results
have
been
achieved
by
using
InformaFon


Retrieval
methods

– The
approach
was
proposed
in
1999
by
Antoniol
et
al.

École
Polytechnique
de
Montréal,
Montréal,
Québec,
Canada
‐
September
3rd,
2009 5

slide-6
SLIDE 6

IR‐based
Traceability
Recovery

  • RaFonale...

– Most
so[ware
artefacts
contains
text – Requirement
specificaFons,
design
documents,
idenFfiers
and
comments
in
 UML
diagrams
and
source
code,
test
case
specificaFons,
manual
pages,
 maintenance
reports,
change
logs

  • Conjecture...

– Artefacts
having
a
high
text
similarity
are
likely
good
candidates
to
be
traced


  • nto
each
other

– Artefacts
with
high
similairty
probably
describe
similar
concepts


  • AssumpFon...

– Consistent
use
of
domain
terms
in
the
so[ware
documents
(e.g.,
programmers
 use
meaningful
names
for
program’s
items,
such
as
funcFons,
variables,
types,
 classes,
and
methods.

École
Polytechnique
de
Montréal,
Montréal,
Québec,
Canada
‐
September
3rd,
2009 6

slide-7
SLIDE 7

IR‐based
traceability
recovery:
how?

Recovering
Traceability
Links
via
Informa7on
 Retrieval
Methods:
Challenges
and
Opportuni7es

by
Rocco
Oliveto

École
Polytechnique
de
Montréal,
Montréal,
Québec,
Canada
‐
September
3rd,
2009 7

slide-8
SLIDE 8

The
traceability
recovery
process

École
Polytechnique
de
Montréal,
Montréal,
Québec,
Canada
‐
September
3rd,
2009 8

slide-9
SLIDE 9

Indexer
and
classifier:
two
basic
 models

  • ProbabilisFc
model

– The
similarity
between
a
source
and
a
target
artefact
is
based
on
the
 probability
that
the
target
artefact
is
related
to
the
source
artefact – Not
discussed
in
details
in
this
talk…

  • Vector
space
model

– Source
and
target
artefacts
are
represented
in
a
vector
space
and
the
 similarity
is
computed
through
vector
operaFons,
e.g.
cosine
of
the
angle
 between
the
two
vectors


  • Many
improvements
of
the
basic
models

– Latent
SemantIc
Indexing – Keyword
list – Relevance
feedback
analysis

École
Polytechnique
de
Montréal,
Montréal,
Québec,
Canada
‐
September
3rd,
2009 9

slide-10
SLIDE 10

Vector
Space
Model
(VSM)

  • So[ware
artefacts
are
represented
as
vectors
in
the
space
of
the


terms
(vocabulary)


– Also
possible
to
use
a
combinaFon
of
terms
(i.e.,
n‐grams)
as
vector
 characterisFcs
(…expensive) – The
artefact
space
is
represented
by
the
term‐by‐document
matrix

1 4 2 1 3 D1 D2 D3 T1 T2

T2 T1 D1 D3 D2 Term-by-document matrix Geometrical representation of term-by-document matrix

École
Polytechnique
de
Montréal,
Montréal,
Québec,
Canada
‐
September
3rd,
2009 10

slide-11
SLIDE 11
  • How
to
represent
the
importance
(i.e.,
weight)
of
a
term
in
a


document?


– Term
occurrences – Boolean
value
(0
if
the
term
occurs,
1
otherwise) – An
advanced
approach
considers
local
and
global
weights

  • Generally,
a
generic
entry
ai,j
of
the
term‐by‐document
matrix
is


calculated
as
follow:


  • Tf‐Idf
term
weighFng:

Term
weigh7ng

ai,j = L(i, j) · G(i)

s tfi,j =

ni,j P

k nk,j ,

s id fi = log

  • n

doci

  • ,

École
Polytechnique
de
Montréal,
Montréal,
Québec,
Canada
‐
September
3rd,
2009 11

slide-12
SLIDE 12
  • How
to
define
the
textual
similairty
between
artefacts?


– Using
the
corresponding
vectors – Dot
product
or... – cosine
of
the
angle
between
the
two
corresponding
vectors
(beger)

  • The
cosine:

– Has
values
in
[0,
1]
since
the
maximum
angle
is
90°
 – Increases
as
more

terms
are
shared

  • Thus,
two
artefacts
are
considered
similar
if
their
corresponding


vectors
point
in
the
same
direcFon
(the
angle
is
close
to
0°)

Artefact
similarity

sim(D, Q) = − → D · − → Q − → D · − → Q =

  • ti∈D,Q wtiD · wtiQ
  • ti∈D w2

tiD ·

  • ti∈Q w2

tiQ

École
Polytechnique
de
Montréal,
Montréal,
Québec,
Canada
‐
September
3rd,
2009 12

slide-13
SLIDE 13
  • The
vector
space
model
does
not
take
into
account
relaFons


between
terms

– It
soffers
of
the
synonymy
and
polysemy
problems – synonymy:
different
words
with
the
same
meaning – polysemy:
same
words
with
different
meanings
(depending
on
the
context)

  • For
instance,
having
“automobile”
in
one
artefacts
and
“car”
in


another
artefact
does
not
contribute
to
the
similarity
measure
 between
these
two
documents

  • How
to
try
to
miFgate
such
problems

– Using
a
dicFonary – By
using
morphological
analysis,
like
stemming

  • Stemming
aims
at
removing
suffixes
of
words
to
extract
their
stems
  • Example:
working,
worker,
worked
have
the
same
stem
work

Limita7ons
of
the
VSM

École
Polytechnique
de
Montréal,
Montréal,
Québec,
Canada
‐
September
3rd,
2009 13

slide-14
SLIDE 14
  • Extension
of
the
vector
space
model

– Provides
a
way
to
automaFcally
deal
with
synonymy
and
polisemy
 – Avoids
preliminary
morphological
analysis


  • How
does
LSI
miFgate
the
synonumy
and
polisemy
problems?

– It
analyses
the
co‐occurrence
of
the
terms
by
using
the
Singular
Value
 DecomposiFon
(SVD)

  • SVD
is
used
to
decompose
the
term‐by‐document
matrix
into
a
set

  • f
k
orthogonal
factors
from
which
the
original
matrix
can
be


approximated
by
linear
combinaFon

– The
idea
is
to
reduce
the
space
of
the
terms – Reducing
the
term
space
we
also
reduce
the
noice
in
the
word
usage
caused
 by
synonymy
and
polisemy
words

Latent
Seman7c
Indexing
(LSI)

École
Polytechnique
de
Montréal,
Montréal,
Québec,
Canada
‐
September
3rd,
2009 14

slide-15
SLIDE 15
  • Let
A
the
term‐by‐document
computed
as
well
as
in
the
VSM
  • where:

– T0
is
the
m x r
matrix
of
the
terms
containing
the
le[
singular
vectors
(rows
of
 the
matrix)
 – D0
is
the
r x n
matrix
of
the
documents
containing
the
right
singular
vectors
 (columns
of
the
matrix),
 – S0
is
an
r x r
diagonal
matrix
of
singular
values,
 – r
is
the
rank
of
A.

Singular
Value
Decomposi7on

A = T0 · S0 · D0

École
Polytechnique
de
Montréal,
Montréal,
Québec,
Canada
‐
September
3rd,
2009 15

slide-16
SLIDE 16
  • SVD
can
be
viewed
as
a
technique
for
deriving
a
set
of
uncorrelated


indexing
factors
or
concepts

– their
number
is
given
by
the
rank
r
of
A,
 – their
relevance
is
given
by
the
singular
values
in
S0.


  • Concepts
are
a
way
to
cluster
related
terms
with
respect
to


documents
and
related
documents
with
respect
to
terms


  • The
product
S0
D0
(T0
S0,
respecFvely)
is
a
matrix
whose
columns


(rows,
respecFvely)
are
the
document
(term,
respecFvely)
vectors
 in
the
r‐space
of
the
concepts

  • The
cosine
of
the
angle
between
two
vectors
in
this
space


represents
the
similarity
of
the
two
documents
(terms,
 respecFvely)
with
respect
to
the
concepts
they
share

SVD
and
concept
clustering

École
Polytechnique
de
Montréal,
Montréal,
Québec,
Canada
‐
September
3rd,
2009 16

slide-17
SLIDE 17
  • SVD
also
allows
a
simple
strategy
for
opFmal
approximate
fit
using


smaller
matrices

  • If
the
singular
values
in
S0
are
ordered
by
size,
the
first
k
largest


values
may
be
kept
and
the
remaining
smaller
ones
set
to
zero

  • S
is
obtained
from
S0
by
deleFng
zero
rows
and
columns
  • The
truncated
SVD
captures
most
of
the
important
underlying


structure
in
the
associaFon
of
terms
and
documents,
at
the
same
 Fme
it
removes
the
noise
or
variability
in
word
usage

  • The choice of k is cri?cal and the proper way to make such a choice

is s?ll an open issue

Reduced
concept
space

., A ≈ Ak = T · S · D,

Reduced concept-by-document matrix

École
Polytechnique
de
Montréal,
Montréal,
Québec,
Canada
‐
September
3rd,
2009 17

slide-18
SLIDE 18
  • The
outcome
of
the
classifier
is
a
ranked


list
of
pairs
of
artefacts
(links)
with
their
 similarity

  • Methods
have
to
be
used
to
cut
the


ranked
list
and
retrieve
only
the
top
 links
in
the
list

  • The
simplest
methods
do
not
take
into


account
the
similarity
values – Cut
point:
a
fixed
number
of
top
 links
are
retrieved – Threshold
based:
the
links
with
 similarity
above
the
thresholds
are
 returned

CuKng
the
ranked
list

École
Polytechnique
de
Montréal,
Montréal,
Québec,
Canada
‐
September
3rd,
2009 18

slide-19
SLIDE 19
  • The
outcome
of
the
classifier
is
a
ranked


list
of
pairs
of
artefacts
(links)
with
their
 similarity

  • Methods
have
to
be
used
to
cut
the


ranked
list
and
retrieve
only
the
top
 links
in
the
list

  • The
simplest
methods
do
not
take
into


account
the
similarity
values – Cut
point:
a
fixed
number
of
top
 links
are
retrieved – Threshold
based:
the
links
with
 similarity
above
the
thresholds
are
 returned

CuKng
the
ranked
list

Source_1

Target_2

95.4%

Source_3

Target_4

92.1%

Source_1

Target_1

85.6%

Source_2

Target_2

83.2%

Source_3

Target_3

81.2%

Source_1

Target_3

79.0%

Source_3

Target_2

77.5%

Source_2

Target_4

64.3%

Source_2

Target_3

53.2%

Source_3

Target_1

43.9%

Source_2

Target_1

38.7%

Source_1

Target_4 23.6%

École
Polytechnique
de
Montréal,
Montréal,
Québec,
Canada
‐
September
3rd,
2009 18

slide-20
SLIDE 20

The
Precision/Recall
problem

  • Two
metrics
to
measure
the
performances
of
IR‐based
tools

– Recall
=
(|correct
∩
retrieved|)
/
|correct| – Precision
=
(|correct
∩
retrieved|)
/
|retrieved| – Low
precision
=>
high
number
of
false
posiFves
to
discard

  • Recovering
all
correct
links
is
in
general
impracFcal!

– Necessary
to
use
a
very
low
threshold – Low
threshold
=>
High
number
of
links
retrieved
=>
Low
precision – High
effort
to
discard
too
many
false
posiFves

  • Results
from
a
case
study
show
that...

– about
50,000
false
posiFves
have
to
be
discarded
by
the
so[ware
 engineer
in
order
to
trace
361
links
among
about
200
artefacts

École
Polytechnique
de
Montréal,
Montréal,
Québec,
Canada
‐
September
3rd,
2009 19

slide-21
SLIDE 21

Precision/Recall
in
EasyClinic

École
Polytechnique
de
Montréal,
Montréal,
Québec,
Canada
‐
September
3rd,
2009

When recall is 100%, precision ranges between 5% and 30%

20

slide-22
SLIDE 22

Trend
of
correct
links
and
false
 posi7ves

École
Polytechnique
de
Montréal,
Montréal,
Québec,
Canada
‐
September
3rd,
2009

1013 links analysed 93 correct links and 990 false positives (1110 possible links)

452 links analysed 83 correct links and 369 false positives (1260 possible links)

21

slide-23
SLIDE 23

Density
of
correct
links
and
false
 posi7ves

  • AutomaFc
tracing
should
be
combined
with
manual
tracing!

Correct link False positive

École
Polytechnique
de
Montréal,
Montréal,
Québec,
Canada
‐
September
3rd,
2009 22

slide-24
SLIDE 24

Density
of
correct
links
and
false
 posi7ves

  • AutomaFc
tracing
should
be
combined
with
manual
tracing!

Ideal (optimal) threshold

Correct link False positive

cut point

École
Polytechnique
de
Montréal,
Montréal,
Québec,
Canada
‐
September
3rd,
2009 22

slide-25
SLIDE 25

Incremental
traceability
recovery

Correct link False positive École
Polytechnique
de
Montréal,
Montréal,
Québec,
Canada
‐
September
3rd,
2009 23

slide-26
SLIDE 26

Incremental
traceability
recovery

Source_1

Target_2

95.4%

Source_3

Target_4

92.1%

Source_1

Target_1

85.6%

Source_2

Target_2

83.2%

Source_3

Target_3

81.2%

Source_1

Target_3

79.0%

Source_3

Target_2

77.5%

Source_2

Target_4

72.3%

Source_2

Target_3

53.2%

Source_3

Target_1

43.9%

Source_2

Target_1

38.7%

Source_1

Target_4 23.6%

Correct link False positive École
Polytechnique
de
Montréal,
Montréal,
Québec,
Canada
‐
September
3rd,
2009 23

slide-27
SLIDE 27

Incremental
traceability
recovery

Source_1

Target_2

95.4%

Source_3

Target_4

92.1%

Source_1

Target_1

85.6%

Source_2

Target_2

83.2%

Source_3

Target_3

81.2%

Source_1

Target_3

79.0%

Source_3

Target_2

77.5%

Source_2

Target_4

72.3%

Source_2

Target_3

53.2%

Source_3

Target_1

43.9%

Source_2

Target_1

38.7%

Source_1

Target_4 23.6%

90%

Correct link False positive Source_1

Target_2

95.4%

Source_3

Target_4

92.1%

Link classification

École
Polytechnique
de
Montréal,
Montréal,
Québec,
Canada
‐
September
3rd,
2009 23

slide-28
SLIDE 28

Incremental
traceability
recovery

Source_1

Target_2

95.4%

Source_3

Target_4

92.1%

Source_1

Target_1

85.6%

Source_2

Target_2

83.2%

Source_3

Target_3

81.2%

Source_1

Target_3

79.0%

Source_3

Target_2

77.5%

Source_2

Target_4

72.3%

Source_2

Target_3

53.2%

Source_3

Target_1

43.9%

Source_2

Target_1

38.7%

Source_1

Target_4 23.6%

90%

Source_1

Target_2

95.4%

Source_3

Target_4

92.1%

Correct link False positive

Link classification

École
Polytechnique
de
Montréal,
Montréal,
Québec,
Canada
‐
September
3rd,
2009 23

slide-29
SLIDE 29

Incremental
traceability
recovery

Source_1

Target_2

95.4%

Source_3

Target_4

92.1%

Source_1

Target_1

85.6%

Source_2

Target_2

83.2%

Source_3

Target_3

81.2%

Source_1

Target_3

79.0%

Source_3

Target_2

77.5%

Source_2

Target_4

72.3%

Source_2

Target_3

53.2%

Source_3

Target_1

43.9%

Source_2

Target_1

38.7%

Source_1

Target_4 23.6%

80%

Correct link False positive Source_1

Target_1

85.6%

Source_2

Target_2

83.2%

Source_3

Target_3

81.2%

Link classification

École
Polytechnique
de
Montréal,
Montréal,
Québec,
Canada
‐
September
3rd,
2009 23

slide-30
SLIDE 30

Incremental
traceability
recovery

Source_1

Target_2

95.4%

Source_3

Target_4

92.1%

Source_1

Target_1

85.6%

Source_2

Target_2

83.2%

Source_3

Target_3

81.2%

Source_1

Target_3

79.0%

Source_3

Target_2

77.5%

Source_2

Target_4

72.3%

Source_2

Target_3

53.2%

Source_3

Target_1

43.9%

Source_2

Target_1

38.7%

Source_1

Target_4 23.6%

80%

Correct link False positive Source_1

Target_1

85.6%

Source_2

Target_2

83.2%

Source_3

Target_3

81.2%

Link classification

École
Polytechnique
de
Montréal,
Montréal,
Québec,
Canada
‐
September
3rd,
2009 23

slide-31
SLIDE 31

Incremental
traceability
recovery

Source_1

Target_2

95.4%

Source_3

Target_4

92.1%

Source_1

Target_1

85.6%

Source_2

Target_2

83.2%

Source_3

Target_3

81.2%

Source_1

Target_3

79.0%

Source_3

Target_2

77.5%

Source_2

Target_4

72.3%

Source_2

Target_3

53.2%

Source_3

Target_1

43.9%

Source_2

Target_1

38.7%

Source_1

Target_4 23.6%

70%

Correct link False positive Source_1

Target_3

79.0%

Source_3

Target_2

77.5%

Source_2

Target_4

72.3%

Link classification

École
Polytechnique
de
Montréal,
Montréal,
Québec,
Canada
‐
September
3rd,
2009 23

slide-32
SLIDE 32

Incremental
traceability
recovery

Source_1

Target_2

95.4%

Source_3

Target_4

92.1%

Source_1

Target_1

85.6%

Source_2

Target_2

83.2%

Source_3

Target_3

81.2%

Source_1

Target_3

79.0%

Source_3

Target_2

77.5%

Source_2

Target_4

72.3%

Source_2

Target_3

53.2%

Source_3

Target_1

43.9%

Source_2

Target_1

38.7%

Source_1

Target_4 23.6%

70%

Correct link False positive Source_1

Target_3

79.0%

Source_3

Target_2

77.5%

Source_2

Target_4

72.3%

The
 so[ware
 engineer
 decides
 to
 stop
 the
 process
 as
 the
 effort
 to
 discard
false
posiFves
is
becoming
to
 high.
Probably
 he
does
 not
retrieve
 all
correct
links!

Link classification

École
Polytechnique
de
Montréal,
Montréal,
Québec,
Canada
‐
September
3rd,
2009 23

slide-33
SLIDE 33

Drawback
of
the
incremental
 process

  • Threshold
used
to
stop
the
process
plays
an
important
role

– lower
the
threshold
higher
the
number
of
correct
links
retrieved

  • Thus,
beger
recall
could
be
achieved
by
providing
the


so[ware
engineer
with
the
full
ranked
list...

– but,
the
ranked
list
contains
a
high
density
of
correct
links
in
the
upper
 part
and
a
low
density
of
such
links
in
the
lower
part

  • Thus

– full
ranked
list
method
might
result
in
a
beger
recall – but
in
a
worse
precision
(more
tracing
errors)
with
respect
to
the
 incremental
process

École
Polytechnique
de
Montréal,
Montréal,
Québec,
Canada
‐
September
3rd,
2009 24

slide-34
SLIDE 34

Coverage
link
analysis:
a
two
steps
 approach

  • First
step:
the
so[ware
engineer
performs
an
incremental


coarse‐grained
traceability
recovery
between
a
set
of
source
 artefacts
and
a
set
of
target
artefacts

– During
this
step
she
traces
as
many
links
as
possible
keeping
low
the
 effort
to
discard
false
posiFves

  • Second
step:
she
uses
a
coverage
link
analysis
aiming
at


idenFfying
source
artefacts
poorly
traced
and
guiding
focused
 fine‐grained
traceability
recovery
sessions
to
recover
links
 missed
in
the
first
step

– Coverage
index
of
a
generic
artefact
a
from
the
set
of
source
artefacts traceabilityCoveragea = |linksa(targets)| |targets|

École
Polytechnique
de
Montréal,
Montréal,
Québec,
Canada
‐
September
3rd,
2009 25

slide-35
SLIDE 35

IR‐based
traceability
recovery
in
prac7ce

Recovering
Traceability
Links
via
Informa7on
 Retrieval
Methods:
Challenges
and
Opportuni7es

by
Rocco
Oliveto

École
Polytechnique
de
Montréal,
Montréal,
Québec,
Canada
‐
September
3rd,
2009 26

slide-36
SLIDE 36

ADAMS
Re‐Trace:
wizard

École
Polytechnique
de
Montréal,
Montréal,
Québec,
Canada
‐
September
3rd,
2009 27

slide-37
SLIDE 37

ADAMS
Re‐Trace:
candidate
links

École
Polytechnique
de
Montréal,
Montréal,
Québec,
Canada
‐
September
3rd,
2009 28

slide-38
SLIDE 38

Support
given
by
IR‐based
 traceability
recovery
tools

  • Goal…

– evaluate
the
support
given
by
a
traceability
recovery
tool
during
tracing
 acFviFes...

  • Subjects’
performances
were
measure
by
tracing
accuracy
and
Fme

  • Context…

– 20
first
year
and
12
second
year
master
students
(different
ability
and
 experience) – They
had
to
perform
two
traceability
recovery
task
on
a
repository
of
a
 completed
so[ware
project

  • T1:
recovering
links
between
use
cases
and
classes
  • T2:
recovering
links
between
UML
diagrams
and
test
cases


– Tasks
were
performed
in
two
separate
laboratory
sessions – Tasks
were
performed
with
and
without
the
tool
support

École
Polytechnique
de
Montréal,
Montréal,
Québec,
Canada
‐
September
3rd,
2009 29

slide-39
SLIDE 39

Lessons
learned

  • The
tool
reduces
the
Fme
spent
by
the
so[ware
engineer
  • In
general,
the
tool
reduces
tracing
errors
  • Ability
and
Experience
are
influencing
factors

– The
tool
helps
to
reduce
the
gap
between
high
and
low
ability
subjects


  • The
performances
of
the
IR
method
is
an
influencing
factor

– Beger
the
performances
of
the
IR
engine
beger
the
subjects’
performances


  • Almost
all
the
subjects
used
the
incremental
process

École
Polytechnique
de
Montréal,
Montréal,
Québec,
Canada
‐
September
3rd,
2009 30

slide-40
SLIDE 40

Role
played
by
the
tracing
process

  • Goal…

– evaluate
the
impact
of
the
traceability
recovery
process
(incremental
 vs
“one‐shot”)
adopted
on
the
tracing
performances
of
the
so[ware
 engineer...

  • Subjects’
performances
were
measure
by
tracing
accuracy
and
Fme

  • Context…

– 20
master
students
of
the
University
of
Salerno
(different
abiliFes) – They
had
to
perform
two
traceability
recovery
task
on
a
repository
of
a
 completed
so[ware
project

  • T1:
recovering
links
between
use
cases
and
classes
  • T2:
recovering
links
between
UML
diagrams
and
test
cases


– Tasks
were
performed
in
two
separate
laboratory
sessions – Tasks
were
performed
using
ADAMS
Re‐Trace
(using
different
appraoches)

École
Polytechnique
de
Montréal,
Montréal,
Québec,
Canada
‐
September
3rd,
2009 31

slide-41
SLIDE 41

Lessons
learned

  • In
general,
the
incremental
process
improves
the
traceability
link


idenFficaFon
performances
of
the
so[ware
engineer,
in
terms
of
 correct
links
traced
and
false
posiFves
discarded

  • In
conclusion...

– the
incremental
process
reduces
the
effort
required
to
complete
a
traceability
 recovery
task

  • comparable
tracing
performances
achieved
analysing
a
lower
number
of
links

– Average
number
of
links
analysed
with
the
“one‐shot”
approach:
about
400 – Average
number
of
links
analysed
with
the
incremental
approach:
about
250

École
Polytechnique
de
Montréal,
Montréal,
Québec,
Canada
‐
September
3rd,
2009 32

slide-42
SLIDE 42

Role
played
by
the
coverage
 analysis

  • Goal…

– evaluate
the
impact
of
the
coverage
link
analysis
on
the
tracing
 performances
of
the
so[ware
engineer...

  • Subjects’
performances
were
measure
by
tracing
accuracy
and
Fme

  • Context…

– 30
master
students
of
the
University
of
Salerno
(different
abiliFes) – They
had
to
perform
two
traceability
recovery
task
on
a
repository
of
a
 completed
so[ware
project

  • T1:
recovering
links
between
use
cases
and
classes
  • T2:
recovering
links
between
UML
diagrams
and
test
cases


– Tasks
were
performed
in
two
separate
laboratory
sessions – Tasks
were
performed
using
ADAMS
Re‐Trace
(with
or
without
the
use
of
the
 coverage
link
analysis)

École
Polytechnique
de
Montréal,
Montréal,
Québec,
Canada
‐
September
3rd,
2009 33

slide-43
SLIDE 43

Lessons
learned

  • In
general,
the
two
steps
process
(incremental
+
coverage
link


analysis)
positevely
affects
the
tracing
performance
of
the
so[ware
 engineer
with
respect
to
the
pure
incremental
approach

– Same
precision
but
beger
recall

  • The
coverage
analysis
reduces
the
gap
between
the
number
of


tracing
errors
made
by
low
and
high
ability
subjects

  • The
coverage
analysis
reduce
the
Fme
spent
by
high
ability
subjects


to
complete
the
traceability
recovery
task

– Performed
few
iteraFon
using
the
incremental
approach – Stopped
the
first
step
of
the
process
with
a
relaFve
high
threshold
(75%) – Exploited
several
Fmes
the
coverage
analysis
to
perform
fine‐grained
 traceability
recovery
acFviFes


École
Polytechnique
de
Montréal,
Montréal,
Québec,
Canada
‐
September
3rd,
2009 34

slide-44
SLIDE 44

Challenges
and
open
problems

Recovering
Traceability
Links
via
Informa7on
 Retrieval
Methods:
Challenges
and
Opportuni7es

by
Rocco
Oliveto

École
Polytechnique
de
Montréal,
Montréal,
Québec,
Canada
‐
September
3rd,
2009 35

slide-45
SLIDE 45

Conclusion

  • Advantages
of
IR‐based
traceability
recovery

– It
reduces
the
traceability
recovery
effort
 – It
reduces
the
gap
between
skills
and
competences
of
so[ware
engineers
 (more
advantages
for
less
skilled
people)

  • LimitaFon
of
IR‐based
traceability
recovery

– Generally
impossible
achieving
100%
recall
with
an
acceptable
precision
 – It
has
to
be
applied
semi‐automaFcally
(link
classificaFon)

  • An
interisFng
direcFon
for
improvement

– Combining
textual
informaFon
with
other
(orthogonal)
informaFon – For
example:
combining
with
syntacFc
(structural)
and
version
history
mining
 techniques
(e.g.,
co‐changes
analysis)

École
Polytechnique
de
Montréal,
Montréal,
Québec,
Canada
‐
September
3rd,
2009 36

slide-46
SLIDE 46

Challenges
in
Traceability
(1)

  • Challenge
1.
Improve
tracing
accuracy
of
recovering
methods


to
reduce
the
effort
required
to
the
so[ware
engineer
for
 discarding
false
posiFve
(one
click
traceability)

– CombinaFon
of
different
techniques?

  • Challenge
2.
Manage
mulF‐language
document
traceability?

– Using
cross‐language
IR
methods?

  • Challenge
3.
Manage
the
traceability
links
between


unstructured
documents
or
mulFmedia
artefacts

– Using
clustering
technique?

École
Polytechnique
de
Montréal,
Montréal,
Québec,
Canada
‐
September
3rd,
2009 37

slide-47
SLIDE 47

Challenges
in
Traceability
(2)

  • Challenge
4.
SFmulate
developers
to
maintain
traceability


links
up‐to‐date
during
so[ware
development?

– IntegraFng
traceability
recovery
funcFonality
in
development
environment?

  • Challange
5.
Manage
the
evoluFon
of
the
links

– Using
textual
similarity
between
traced
artefacts?

  • Challange
6.
Manage
the
semanFcs
of
the
links

– NLP
can
be
used
to
extract
the
semanFcs
of
a
links?

École
Polytechnique
de
Montréal,
Montréal,
Québec,
Canada
‐
September
3rd,
2009 38

slide-48
SLIDE 48

References

  • A.
De
Lucia,
F.
Fasano,
R.
Oliveto,
and
G.
Tortora.
Recovering
Traceability
Links
in
So[ware


Artefact
Management
Systems
using
InformaFon
Retrieval
Methods.
ACM Transac?ons on SoCware Engineering and Methodology,
16(4):
13
(arFcle
number).
ISSN
1049‐331X.


  • R.
Oliveto,
G.
Antoniol,
A.
Marcus,
and
J.
Hayes.
So[ware
Artefact
Traceability:
the
Never‐

ending
Challenge.
In
Proceedings of the 23rd Interna?onal Conference on SoCware Maintenance,
pages
485‐488,
Paris,
France,
2007.
IEEE
Press.
ISBN
1‐4244‐1256‐0.

  • A.
De
Lucia,
F.
Fasano,
and
R.
Oliveto.
Traceability
Management
for
Impact
Analysis.
FronFers

  • f
So[ware
Maintenance,
mini‐tutorial of the 24th Interna?onal Conference on SoCware

Maintenance,
pages
21‐30,
Beijing,
China,
2008.
ISBN:
978‐1‐4244‐2654‐6.
IEEE
Press.

  • A.
De
Lucia,
R.
Oliveto,
and
G.
Tortora.
IR‐based
Traceability
Recovery
Processes:
an
Empirical


Comparison
of
“One‐Shot”
and
Incremental
Processes.
Proceedings of the 23rd IEEE/ACM Interna?onal Conference on Automated SoCware Engineering,
pages
39‐48,
L’Aquila,
Italy,
 2008.
IEEE
Press.
ISBN
978‐1‐4244‐2188‐6.

  • A.
De
Lucia,
R.
Oliveto,
and
G.
Tortora.
Assessing
IR‐based
Traceability
Recovery
Tools
through


Controlled
Experiments.
Empirical SoCware Engineering journal.
14(1):57‐93,
2009.
Springer
 Press.

  • A.
De
Lucia,
R.
Oliveto,
G.
Tortora.
The
Role
of
the
Coverage
Analysis
in
Traceability
Recovery


Process:
a
Controlled
Experiment.
Proceedings of 25th Interna?onal Conference on SoCware Maintenance,
Edmonton,
Canada,
2009.
To
appear.

École
Polytechnique
de
Montréal,
Montréal,
Québec,
Canada
‐
September
3rd,
2009 39

slide-49
SLIDE 49

Acknowledgements

The
work
presented
in
this
talk
is
the
result
of
several
years
of
work
whereby
 I
have
been
accompanied
and
supported
by
many
people.
Although
my
name
sits
on
the
first
 slide,
many
others
have
assisted
in
various
forms
to
bring
this
work
to
its
conclusion.

  • Andrea
De
Lucia,
Full
professor,
University
of
Salerno
  • Massimiliano
Di
Penta,
Assistant
professor,
University
of
Sannio
  • Fausto
Fasano,
Assistant
professor,
University
of
Molise

  • Simone
Avossa,
Bsc
student
(integraFon
of
ADAMS
Re‐Trace
in
Eclipse)
  • Gabriele
Bavota,
Msc
student
(coverage
link
analysis)
  • Davide
Mastrogiovanni,
Bsc
student

(integraFon
of
ADAMS
Re‐Trace
in
Eclipse)
  • Annibale
Panichella,
Msc
student

(experimentaFon
of
traceability
recovery
methods)
  • SebasFano
Panichella,
Msc
student

(experimentaFon
of
traceability
recovery
methods)
  • Giuseppe
Sarno,
Bsc
student

(opFmisaFon
of
indexing
process
in
ADAMS
Re‐Trace)
  • Paola
Sgueglia
(improvement
of
canonical
traceability
recovery
methods)
  • All
the
students
who
were
involved
in
the
experiment
as
subjects

École
Polytechnique
de
Montréal,
Montréal,
Québec,
Canada
‐
September
3rd,
2009 40

slide-50
SLIDE 50

Thank
you!

École
Polytechnique
de
Montréal,
Montréal,
Québec,
Canada
‐
September
3rd,
2009

Questions and/or comments

Rocco Oliveto, Ph.D. DMI - University of Salerno roliveto@unisa.it

41

slide-51
SLIDE 51

ADAMS
overview

  • A
fine‐grained
artefact
management
system

– Projects,
resources,
and
so[ware
artefacts

  • Traceability
in
ADAMS

– Used
for
event
noFficaFon – Traceability
links
were
 manually
managed

  • ADAMS
Re‐Trace

– LSI‐based
traceability
 recovery
tool

École
Polytechnique
de
Montréal,
Montréal,
Québec,
Canada
‐
September
3rd,
2009 42

slide-52
SLIDE 52

Preliminary
evalua7on

  • About
150
students
allocated
in
17
projects
  • Team
composiFon

– Between
6
and
8
undergraduate
students
with
development
roles
 – 2
master
students
with
project
management
roles

  • Master
students
also
in
charge
of
inserFng
traceability
links

– Task
periodically
performed
 – Traced
links
validated
during
review
meeFngs

  • Traceability
links
between
artefacts
of
the
Requirement


Analysis
Document
(RAD)
were
traced
in
all
projects

– Only
in
nine
projects
the
managers
also
traced
links
between
use
cases
and
 code
classes

École
Polytechnique
de
Montréal,
Montréal,
Québec,
Canada
‐
September
3rd,
2009 43

slide-53
SLIDE 53

Discussion

  • The
students
agreed
on
the
need
of
having
some
traceability


recovery
support

– On
average,
more
than
70%
of
links
were
traced
with
the
tool
support


  • The
students
preferred
the
incremental
approach
over
the


“one‐shot”
approach


  • Without
the
correct
traceability
matrix
we
could
not
compute


the
recall,
but
the
precision
of
the
recovered
links
was
quite
 accurate

– Ge}ng
on
average
a
good
recall
(typically
at
least
70%),
with
an
acceptable
 precision
(typically
at
least
30%)
can
be
a
problem
if
the
traceability
links
have
 to
be
retrieved
at
the
end
of
the
project – IR‐based
approach
improves
the
performances
of
manual
tracing

École
Polytechnique
de
Montréal,
Montréal,
Québec,
Canada
‐
September
3rd,
2009 44