Exploring Action Unit Granularity for Automatically Generating - - PowerPoint PPT Presentation

exploring action unit granularity for automatically
SMART_READER_LITE
LIVE PREVIEW

Exploring Action Unit Granularity for Automatically Generating - - PowerPoint PPT Presentation

Exploring Action Unit Granularity for Automatically Generating Natural Language Descriptions for Methods Lori Pollock Collaborators: Xiaoran Wang, K. Vijay-Shanker University of Delaware UD-Summarize ( Sridhara et al. 2010) Method


slide-1
SLIDE 1

Exploring Action Unit Granularity for Automatically Generating Natural Language Descriptions for Methods

Lori Pollock Collaborators: Xiaoran Wang, K. Vijay-Shanker University of Delaware

slide-2
SLIDE 2

UD-Summarize

( Sridhara et al. 2010)

Generate Phrases for Selected Statements and Combine Phrases Method M’s code Summary comment for M Build structural and linguis;c models Select Statements for Summary

slide-3
SLIDE 3

class Player{ /** * Play a specified file with specified time interval */ public static boolean play(final File file,final float fPosition, final long length) { fCurrent = file; try { playerImpl = null; //make sure to stop non-fading players stop(false); //Choose the player Class cPlayer = file.getTrack().getType().getPlayerImpl(); … }

Class names Method comments Method names Parameter names Internal comments Other variables

slide-4
SLIDE 4

class Player{ /** * Play a specified file with specified time interval */ public static boolean play(final File file,final float fPosition, final long length) { fCurrent = file; try { playerImpl = null; //make sure to stop non-fading players stop(false); //Choose the player Class cPlayer = file.getTrack().getType().getPlayerImpl(); … }

Code characteristics are not as natural as English.

More regular word usage Not full sentences No spaces in names

slide-5
SLIDE 5

Preprocessing Text Analysis

Split names into words Expand Abbrevia;ons Stem Words Iden;fy Part-of-speech Iden;fy Word Rela;ons (Synonyms, …)

Extract & Preprocess Words

slide-6
SLIDE 6

A Software Word Usage Model

slide-7
SLIDE 7

/* Update linear edge view. If width <= 1, draw line to given graphics2d, else draw polyline to graphics2d */

slide-8
SLIDE 8

Lesson: Method = Multiple High-level Algorithmic Steps Create and set up a queue menu item. Build the menu. Create and set up a stop menu.

slide-9
SLIDE 9

Which Led To…

Initial Approach: Manually created templates for set

  • f common high level actions

(Sridhara et al. 2011)

Limitation: Not extensible

slide-10
SLIDE 10

Research Question

Can we define and automatically identify these high- level algorithmic steps in real-world codes?

  • Noun. Action Unit:

A code block that consists of a sequence of consecutive statements that logically implement a high level action as a substep within a method’s primary function.

slide-11
SLIDE 11

Goal #1: Identify Action Units

An Action Unit = code block consisting of a sequence of consecutive statements that logically implement a high-level action.

slide-12
SLIDE 12

Goal #2: Generate Descriptions

Determine if an element exists in the bitstream Add given bitstream to bitstreams Add the newly created mapping row to the database

slide-13
SLIDE 13

What We Have Done So Far

Automatically identify and generate natural language descriptions for specific high level algorithmic steps

✔ Loop-based action units ✔ Object-related sequences ✔ Evaluated effectiveness: human judgement studies

slide-14
SLIDE 14

Loop-based Action Units

✔ Identify Java loop action units based on their structure, data flow, & linguistic features learned from code corpus ✔ Demonstrate feasibility of automatically characterizing loops into stereotypes from code corpus ✔ Determine action to represent loop stereotype from clustering loops based on verb distribution on existing internal comments

slide-15
SLIDE 15

Action Identification Process

slide-16
SLIDE 16

Targeted Loops

Loop-if: Java loop (for, enhanced-for, while, do- while) with single if-statement as last lexical statement Of 14,317 Java projects, 1.3 M loops, 26% loop-if

slide-17
SLIDE 17

Loop-if Feature Vectors

slide-18
SLIDE 18

Loop Action Identification Model

slide-19
SLIDE 19

Building the Loop Action Identification Model

  • 1. Automatically mine loop-ifs that have descriptive
  • comments. loop comment associations.
  • 2. Extract main verb (action) from comment.

Hypothesis: Different verbs might be associated with loops that have same feature vector; however, those verbs are related.

slide-20
SLIDE 20

Building the Loop Action Identification Model

è We should expect that Two loop vectors that have similar verb distributions associated with them correspond to similar actions. => Cluster feature vectors by their probability distribution of verbs in loop-comment associations (230 unique verbs in Top 100 most freq feature vectors)

RESULT:Top 100 most frequently occurring loop feature vectors cluster into 12 actions.

slide-21
SLIDE 21

Loop Action Identification Model

slide-22
SLIDE 22

Developing the Loop Action Identification Model

slide-23
SLIDE 23

Action Identification Process

slide-24
SLIDE 24

Evaluation Methodology

  • 1. Effectiveness: 15 humans; 180 judgements on

60 loops total, 3 per loop, over all action stereotypes.

  • 1. How much do you agree that loop code implements

this action?

  • 2. How confident are you in your assessment?
  • 2. Prevalence (Impact):
  • 1. Ran prototype on test set of 7,159 projects (over 9M

methods).

  • 2. Collected frequency of each of the 12 actions
slide-25
SLIDE 25

Evaluation Results & Conclusions

Effectiveness

Conclusion: Human judges view our automatically identified descriptions as accurately expressing the high level actions

  • f loop-ifs.

Confidence Agreement with identified action

slide-26
SLIDE 26

Evaluation Results & Conclusions

Prevalence (Impact)

Question for Charles & company: Extend through idiom mining work applied to commented loops?

1.3 M loops contain 337,294 loop-ifs Identified 195,277 high level actions (57%)

slide-27
SLIDE 27

Object-related Action Units

  • Algorithm to identify object-related action units
  • Rules to synthesize natural language descriptions for

them

  • Evaluation study of action & argument identification &

generated descriptions

Consist of non-structured consecutive statements associated with each other by object(s). In 1000 open source projects, 23% of blank-line separated blocks are object-related

slide-28
SLIDE 28

Identifying Object-related Action Units

Action Unit contains 3 parts: Declaration or assignment to object reference o Method call invoked on o Use of object o

slide-29
SLIDE 29

Identifying Focal Statement of Object-related Action Units

Focal Statement: provides primary content for description: action theme secondary argument Three cases: (3) exists; (3) does not exist; multiple objects Declaration or assignment to object reference o Method call invoked on o Uses object o

slide-30
SLIDE 30

Overall Approach

slide-31
SLIDE 31

Overall Approach

slide-32
SLIDE 32

Generating Description

  • Identify Action, Theme, Secondary Argument

– Focused on method calls: receiver.verbNoun(arg) formPanel.add(xLabel2)

  • Lexicalize to form a verb phrase

– Extend prior work to get more detailed descriptions

add label to panel

  • Add adjectives from class names, string literals, program

structure add user id label to form panel

slide-33
SLIDE 33

Evaluation: Effectiveness of Action & Argument Identification

Methodology: 10 Human annotators for 100 action units “ Given code segments, write action description adequate to be copied from this local context” Results: 97/100 human action = system action 98/100 human theme = system theme 94/100 human 2ndary arg = system 2nday arg

slide-34
SLIDE 34

Evaluation: Text Generation

Methodology: Humans created descriptions, given an action. Other humans judged both human and system descriptions without knowledge of origin. How much do you agree with: “The description serves as an adequate and concise abstraction

  • f the code block’s high level action.”

Results: On the 5-point Likert scale: average score of 100 system-generated descriptions = 4.24 average score of 100 human-written descriptions = 4.43 63/100 system cases rated equal or better than human cases

slide-35
SLIDE 35

Conclusions & Future Work

  • Automatically identify & describe object-related

action and loop-if action units

  • Comparable descriptions to human descriptions

Future Work:

  • Other kinds of action units
  • Use to generate better method summaries &

internal comments

  • Other use cases
slide-36
SLIDE 36

Another Thought

Do the features learned through this work lead to alternate representations for machine learning approaches to mining patterns?

slide-37
SLIDE 37

What have we learned?

slide-38
SLIDE 38

Current Source Code Analyses: Unit = Method, Statement or Word

slide-39
SLIDE 39

Should we worry about that?

slide-40
SLIDE 40

Yes

✔ Method name too coarse

“Shouldn’t judge a book by its cover”

slide-41
SLIDE 41

Yes

✔ Individual statements are related.

“Small steps can lead to BIG CHANGES”

Eat fruits, proteins, veggies. Stop eating sweets and carbs. Each less overall. Reduce alcohol intake. Exercise daily. Reduce sitting time periods. Lift weights.

slide-42
SLIDE 42

Yes

✔ Words can have different meaning when put together. “The whole is not always the sum of its parts.”

slide-43
SLIDE 43

Who Cares?

Text and structure analyzers in client tools care. e.g., ✓Code Search ✓Code Summary generators ✓Traceability ✓Code reuse analysis