Overview Tracking Structural Evolution using Origin Analysis Open - - PDF document

overview tracking structural evolution using origin
SMART_READER_LITE
LIVE PREVIEW

Overview Tracking Structural Evolution using Origin Analysis Open - - PDF document

Overview Tracking Structural Evolution using Origin Analysis Open questions in software evolution research Motivation University of Waterloo Origin analysis and Beagle Michael Godfrey and Qiang Tu Efficiency


slide-1
SLIDE 1

University of Waterloo

Tracking Structural Evolution using Origin Analysis

Michael Godfrey and Qiang Tu

Software Architecture Group (SWAG) University of Waterloo

20 May, 2002 IWPSE-02 2

Overview

  • Open questions in software evolution research
  • Motivation
  • “Origin analysis” and Beagle
  • Efficiency considerations
  • An example
  • Open questions in origin analysis

20 May, 2002 IWPSE-02 3

Some open questions

  • Philosophical:

– Does software evolve in the same way as frogs and social structures?

  • The Nature of Economies, by Jane Jacobs

– What are the recurring patterns and compelling metaphors of software evolution?

  • Methodological:

– How to measure size?

  • How to correlate size and quality?

– How to measure change?

  • How to model architectural change?

– What is the predictive power of such models?

  • Do the “other phenomena” dominate?

20 May, 2002 IWPSE-02 4

Some open questions

  • Practical:

– What information do developers need to know about how a software system has evolved? – What kinds of tools would be useful:

  • to the front-line developer?
  • to the manager?

– How best to deal with:

  • Large data sets

(large_system ∗ many_versions)

  • Visualization and navigation

20 May, 2002 IWPSE-02 5

Motivation

  • Want to build tools to aid developers in understanding how

software evolves.

– Change can be mostly additive … or much more invasive

  • Building an accurate model of how a system has evolved is

hard in the presence of refactoring, redesign, structural and architectural change.

– Usual assumption:

  • A change in name/location of a software entity means the old one died and

a new one was born

– … which means that “structural” discontinuities break old models of the system, and cause useful knowledge to be lost.

20 May, 2002 IWPSE-02 6

Motivation

  • This also begs the question of software artifact ontology:

– What are the software entities/artifacts of interest in evolutionary studies?

  • All CVSd things?
  • “Hard” machine processable things, like source code files?
  • User docs, requirements docs, …?
  • Atomic vs. composite things?

(subsystems vs. files vs. classes vs. methods)

– What does it mean for an artifact/entity to be a different version of an

  • lder artifact/entity?
  • Same name? file? location? CVS control?
  • “Because I say so”?
slide-2
SLIDE 2

20 May, 2002 IWPSE-02 7

“Origin analysis”

Suppose that:

– f is the name of a software entity (e.g., function, type, global variable) of version Vnew of a software system. – There is no entity of the same name/kind in the previous version Vold

We define origin analysis as the process

  • f deciding:

– if f was newly introduced in Vnew,or – if it should be more accurately viewed as a changed/moved/ renamed version

  • f a differently named entity of Vold

g y x z

Vold

f y x z

Vnew ???

20 May, 2002 IWPSE-02 8

The Beagle tool

[IWPC-02]

Design goals:

  • Support browsing of

evolutionary histories of software systems

  • Visual navigation and

querying

  • Architectural-level modelling
  • Compare system snapshots
  • Support identification and

detection of change patterns

20 May, 2002 IWPSE-02 9

The Beagle tool

[IWPC-02]

At system check-in:

  • Populate database with

“facts” and metrics info from various tools.

  • grok scripts “lift” facts to

file/ subsystem /architectural level.

At runtime:

  • SWAGkit (PBS) engine for

visualization/navigation.

  • Java-based infrastructure

using DB/2, VA-Java, IBM- Websphere.

20 May, 2002 IWPSE-02 10

Origin analysis: Two techniques

1. Entity analysis

(i.e., metrics-based “Bertillonage”)

– For each “added” entity f:

  • Calculate combined Euclidean distance from each “deleted” entity for

five metrics [Kostas].

  • Select top k matches; compare entity names.

2. Relationship analysis

(e.g., calls, is-called-by, refs)

– For each “added” entity f:

  • Find Rf, set of all entities that call f that are present in both versions.
  • For each g ∈ Rf, calculate Qg, set of all “deleted” entities that g calls

in the old version.

  • Look at intersection of the Qgs; these are good candidates.

20 May, 2002 IWPSE-02 11

Efficiency considerations

  • When comparing Vnew to Vold, need to find the entities that seem

to have been added and deleted.

– These sets are fast to determine. – Most subsequent calculations involve only these small subsets of the entire entity space (plus the other entities they have “relationships” with).

  • Computationally expensive approaches for clone detection

(e.g., graph matching) were not considered.

– Can’t pre-compute easily. – Precise matching not worth the effort, as it doesn’t seem to help much for this task.

20 May, 2002 IWPSE-02 12

Efficiency considerations

  • Entity analysis:

– Entity info is generated by fact extractor and metrics tool.

  • Info is generated only once per version, when system is checked into

repository.

– Performing entity analysis is a matter of a simple numerical calculation

  • n a small set of “likely candidates”.
  • Relationship analysis:

– Relationship info (who-calls-whom, who-inherits-from-whom, etc.) is generated by fact extractor.

  • Info is generated only once per version, when system is checked into

repository.

– Computation and comparison of relational images is fairly fast.

  • Special-purpose tool (grok ) and relatively small amount of data.
slide-3
SLIDE 3

20 May, 2002 IWPSE-02 13

Case study: gcc/g++/egcs

  • Have extracted full info for 29 versions of gcc/g++/egcs

– Want to examine major breaks in development to see how well origin analysis works.

  • EGCS v1.0 was forked from the GCC v2.7.2.3 codebase

– EGCS project goals:

  • C++ compiler more ANSI compliant,
  • new FORTRAN front-end,
  • new optimizations and code-generation algorithms, …

– … and EGCS introduced a new directory structure and a new file naming scheme, in addition to all of the other redesign and restructuring. – Naïve analysis indicated “everything old is new again”

20 May, 2002 IWPSE-02 14

Case study: gcc/g++/egcs

  • Example:

– The EGCS 1.0 Parser subsystem contains 15 (non- trivial) implementation files, comprising 848 functions. – Using origin analysis and common sense, Qiang decided that about half of the “new” functions weren’t new. – That’s still a massive amount

  • f change for a new release of

a compiler!

File # Fcns # New # Old % New

gcc/cp/errfn.c 9 9 100% gcc/cp/pt.c 59 57 2 97% gcc/except.c 55 52 3 95% gcc/cp/decl2.c 57 50 7 88% gcc/c-lang.c 16 14 2 88% gcc/cp/method.c 30 26 4 87% gcc/cp/except.c 25 20 5 80% gcc/cp/decl.c 134 84 50 63% gcc/cp/error.c 31 16 15 52% gcc/cp/class.c 61 31 30 51% gcc/cp/search.c 81 40 41 49% gcc/c-decl.c 70 29 41 41% gcc/fold-const.c 44 15 29 34% gcc/objc/objc-act.c 167 17 150 10% gcc/c-aux-info.c 9 9 0% TOTAL 848 460 388 54%

20 May, 2002 IWPSE-02 15

Origin analysis: Open issues

  • Origin analysis is a semi-automatic technique; it requires human

intervention to make intelligent decisions.

– In general, there’s no ultimate arbiter of correctness/appropriateness. – Techniques are fast and approximate.

  • Bertillonage, not DNA comparison
  • What are the most effective ways of performing entity and relationship

analysis?

– Which metrics? Which relationships? How best to combine them all? – Requires case studies, validation.

  • What is the best way to consider composite software entities?

(e.g., files, classes, subsystems)

– Can evaluate as atoms, or – Can simply use hints from contained entities.