Extension, Abbreviation and Refinement - Identifying High-Level - - PowerPoint PPT Presentation

extension abbreviation and refinement
SMART_READER_LITE
LIVE PREVIEW

Extension, Abbreviation and Refinement - Identifying High-Level - - PowerPoint PPT Presentation

Extension, Abbreviation and Refinement - Identifying High-Level Dependence Structures Using Slice-Based Dependence Analysis Zheng Li CREST, Kings College London, UK Overview Motivation Three combination techniques Extension


slide-1
SLIDE 1

Extension, Abbreviation and Refinement

  • Identifying High-Level Dependence Structures

Using Slice-Based Dependence Analysis

Zheng Li

CREST, King’s College London, UK

slide-2
SLIDE 2

Overview

  • Motivation
  • Three combination techniques

– Extension – Abbreviation – Refinement

slide-3
SLIDE 3

Many analysis techniques for program comprehension have been proposed

Domain knowledge high-level Source code low-level

Pattern recognition Concept assignment Data-flow analysis Dependence analysis

slide-4
SLIDE 4

Advantages and Disadvantages

High-level Low-level

Accuracy Low High Scalability Yes No Human Knowledge Yes No

slide-5
SLIDE 5

If combine the two?

  • High-level techniques can provide a

reasonable analysis scope with domain knowledge for low-level analysis techniques, then avoiding the scalability problem of low- level techniques.

  • Low-level techniques can improve the

accuracy of high-level techniques.

slide-6
SLIDE 6

In this thesis

Concept Assignment Program Slicing

slide-7
SLIDE 7

Concept Assignment

  • First defined in 1993 and aimed at

comprehension tasks

  • allocate specific high-level meaning to

specific parts of a program

  • Hypothesis-Based Concept Assignment (HB-

CA)

– Existing implementation – Uses domain and program semantics – Good quality assignments

slide-8
SLIDE 8

Program Slicing

we only care about this line which other lines affect the selected line?

slide-9
SLIDE 9

Concept Assignment Program Slicing

Contiguous? Executable?

High/low level?

slide-10
SLIDE 10

Combination 1: Extension

  • Concept Slice

– Using program slicing to ‘extend’ a concept binding by tracing its dependencies

  • Algorithm

– Using concepts as slicing criteria, the concept slice is the union of slices for each program point in the concept

slide-11
SLIDE 11

Combination 2: Abbreviation

  • Extract key statements within concept bindings

Less is More!

– The statements that capture most impact with highest cohesion – help to focus attention more rapidly on the core of a concept binding

  • Algorithm

– Intersection of slices with respect to principal variables within a concept binding

slide-12
SLIDE 12

D=2*r; perimeter=PI*D; undersurface=PI*r*r; sidesurface=perimeter*h; area=2*undersurface+sidesurface; volume=undersurface*h; printf(“\nThe Area is %d\n", ); printf(“\nThe Volume is %d\n", );

r h

area volume

slide-13
SLIDE 13

The Results so far

The concept slice has no size explosion. The identified key statements have high Impact and Cohesion, but some concept bindings do not contain key statements.

slide-14
SLIDE 14

Combination 3: Refinement

A more accurate dependence based concept binding by removing non-concept-dependent statements

slide-15
SLIDE 15

D=2*r; perimeter=PI*D; undersurface=PI*r*r; sidesurface=perimeter*h; area=2*undersurface+sidesurface; volume=undersurface*h; printf(“\nThe Area is %d\n", area); printf(“\nThe Volume is %d\n", volume);

r h

slide-16
SLIDE 16

Program Chopping

Given source S and target T, what program points transmit effects from S to T?

S T

slide-17
SLIDE 17
slide-18
SLIDE 18

Vertex Rank Model

  • Google’s Page Rank Model
  • Dependence is transitive
  • the weight of a vertex will be distributed

following the outgoing edges and inherited through incoming edges.

slide-19
SLIDE 19

Weight of Nodes

  • sum of all node weights = 1
  • weight of node represents the

importance of dependence of a vertex

slide-20
SLIDE 20

Weights of Edges

  • Node weight is distributed to each outgoing edge
  • Edge weights are collected at the destination node
  • sum of all outgoing edge weights = origin node weight
  • sum of all incoming edge weights = destination node weight

A 0.2 0.05 0.05 0.05 0.05 B 0.2 0.05 0.15 0.4 d=1/4 d=1/4 d=1/4 d=1/4

d: distribution ratio

slide-21
SLIDE 21

Definition of Weights

              ) ( ) ( ) (

2 1 n

v w v w v w 

              ) ( ) ( ) (

2 1 n

v w v w v w 

t d d d d d d d d d

nn n n n n

                    

2 1 2 22 21 1 12 11

=

.

Dt: transposed matrix of distribution ratios W: node weight vector

slide-22
SLIDE 22

Propagating Weights

A B C 0.34 0.33 0.33 0.17 0.17 0.33 0.33

slide-23
SLIDE 23

Propagating Weights

A B C 0.33 0.17 0.5 0.175 0.175 0.17 0.5

slide-24
SLIDE 24

Propagating Weights

A B C 0.5 0.175 0.345 0.25 0.25 0.175 0.345

slide-25
SLIDE 25

Propagating Weights

  • Stable weight assignment

– next-step weights are the same as previous ones

A B C 0.4 0.2 0.4 0.2 0.2 0.2 0.4

slide-26
SLIDE 26

Pseudo Use Relation

  • Weight computation does not always converge
  • Add a pseudo edge from a node to another,

if there is no 'real' edge

  • Distribution ratios:

pseudo edges << real edges

A B C

slide-27
SLIDE 27
slide-28
SLIDE 28

Empirical Study

  • Tools

– WeSCA and CodeSurfer

  • 10 Subject programs

– Open source and industry code – More than 600 concept bindings are extracted

  • Dependence based metrics are defined
  • Statistical analysis
slide-29
SLIDE 29

Size reduction

slide-30
SLIDE 30

Impact

slide-31
SLIDE 31

Cohesion

slide-32
SLIDE 32

Summary

  • The combination of approaches can be

fully automated and implemented.

  • Concept refinement is better than concept

extension and concept abbreviation.

slide-33
SLIDE 33

Questions?