Exploring Action Unit Granularity for Automatically Generating - PowerPoint PPT Presentation

Exploring Action Unit Granularity � for Automatically Generating Natural Language Descriptions for Methods Lori Pollock Collaborators: Xiaoran Wang, K. Vijay-Shanker University of Delaware

UD-Summarize � ( Sridhara et al. 2010) � Method M’s code Build structural and linguis;c models Select Statements for Summary Generate Phrases for Selected Statements and Combine Phrases Summary comment for M

  class Player{ /** Class names * Play a specified file with specified time interval Method */ public static boolean play(final File file,final float fPosition, comments final long length) { Method names fCurrent = file; try { Parameter playerImpl = null; names //make sure to stop non-fading players stop(false); Other variables //Choose the player Class cPlayer = file.getTrack().getType().getPlayerImpl(); … Internal } comments

  Code characteristics are not as natural as English. class Player{ /** Not full sentences * Play a specified file with specified time interval */ public static boolean play(final File file,final float fPosition, final long length) { fCurrent = file; No spaces in names try { playerImpl = null; //make sure to stop non-fading players stop(false); More regular word usage //Choose the player Class cPlayer = file.getTrack().getType().getPlayerImpl(); … }

Preprocessing Text Analysis Expand Abbrevia;ons Iden;fy Split Part-of-speech names into words Extract & Preprocess Words Iden;fy Word Rela;ons Stem (Synonyms, …) Words

A Software Word Usage Model

/* Update linear edge view. If width <= 1, draw line to given graphics2d, else draw polyline to graphics2d */

Lesson: Method = Multiple High-level � Algorithmic Steps Create and set up a queue menu item. Create and set up a stop menu. Build the menu.

Which Led To … Initial Approach: Manually created templates for set of common high level actions (Sridhara et al. 2011) Limitation: Not extensible

Research Question Can we define and automatically identify these high- level algorithmic steps in real-world codes? Noun. Action Unit: A code block that consists of a sequence of consecutive statements that logically implement a high level action as a substep within a method’s primary function.

Goal #1: Identify Action Units An Action Unit = code block consisting of a sequence of consecutive statements that logically implement a high-level action.

Goal #2: Generate Descriptions Determine if an element exists in the bitstream Add given bitstream to bitstreams Add the newly created mapping row to the database

What We Have Done So Far Automatically identify and generate natural language descriptions for specific high level algorithmic steps ✔ Loop-based action units ✔ Object-related sequences ✔ Evaluated effectiveness: human judgement studies

Loop-based Action Units ✔ Identify Java loop action units based on their structure, data flow, & linguistic features learned from code corpus ✔ Demonstrate feasibility of automatically characterizing loops into stereotypes from code corpus ✔ Determine action to represent loop stereotype from clustering loops based on verb distribution on existing internal comments

Action Identification Process

Targeted Loops Loop-if: Java loop (for, enhanced-for, while, do- while) with single if-statement as last lexical statement Of 14,317 Java projects, 1.3 M loops, 26% loop-if

Loop-if Feature Vectors

Loop Action Identification Model

Building the � Loop Action Identification Model 1. Automatically mine loop-ifs that have descriptive comments . loop comment associations. 2. Extract main verb (action) from comment. Hypothesis: Different verbs might be associated with loops that have same feature vector; however, those verbs are related.

Building the � Loop Action Identification Model è We should expect that Two loop vectors that have similar verb distributions associated with them correspond to similar actions. => Cluster feature vectors by their probability distribution of verbs in loop-comment associations ( 230 unique verbs in Top 100 most freq feature vectors) RESULT:Top 100 most frequently occurring loop feature vectors cluster into 12 actions.

Loop Action Identification Model

Developing the Loop Action Identification Model

Action Identification Process

Evaluation Methodology 1. Effectiveness: 15 humans; 180 judgements on 60 loops total, 3 per loop, over all action stereotypes. 1. How much do you agree that loop code implements this action? 2. How confident are you in your assessment? 2. Prevalence (Impact): 1. Ran prototype on test set of 7,159 projects (over 9M methods). 2. Collected frequency of each of the 12 actions

Evaluation � Results & Conclusions Effectiveness Agreement with identified action Confidence Conclusion: Human judges view our automatically identified descriptions as accurately expressing the high level actions of loop-ifs.

Evaluation � Results & Conclusions Prevalence (Impact) 1.3 M loops contain 337,294 loop-ifs Identified 195,277 high level actions (57%) Question for Charles & company: Extend through idiom mining work applied to commented loops?

Object-related Action Units Consist of non-structured consecutive statements associated with each other by object(s). In 1000 open source projects, 23% of blank-line separated blocks are object-related • Algorithm to identify object-related action units • Rules to synthesize natural language descriptions for them • Evaluation study of action & argument identification & generated descriptions

Identifying � Object-related Action Units Action Unit contains 3 parts: Declaration or assignment to object reference o Method call invoked on o Use of object o

Identifying Focal Statement of Object-related Action Units Focal Statement: provides primary content for description: action theme secondary argument Three cases: (3) exists; (3) does not exist; multiple objects Declaration or assignment to object reference o Method call invoked on o Uses object o

Overall Approach

Generating Description • Identify Action, Theme, Secondary Argument – Focused on method calls: receiver.verbNoun(arg) formPanel.add(xLabel2) • Lexicalize to form a verb phrase – Extend prior work to get more detailed descriptions add label to panel • Add adjectives from class names, string literals, program structure add user id label to form panel

Evaluation: Effectiveness of Action & Argument Identification Methodology: 10 Human annotators for 100 action units “ Given code segments, write action description adequate to be copied from this local context” Results: 97/100 human action = system action 98/100 human theme = system theme 94/100 human 2ndary arg = system 2nday arg

Evaluation: Text Generation Methodology: Humans created descriptions, given an action. Other humans judged both human and system descriptions without knowledge of origin. How much do you agree with: “The description serves as an adequate and concise abstraction of the code block’s high level action.” Results: On the 5-point Likert scale: average score of 100 system-generated descriptions = 4.24 average score of 100 human-written descriptions = 4.43 63/100 system cases rated equal or better than human cases

Conclusions & Future Work • Automatically identify & describe object-related action and loop-if action units • Comparable descriptions to human descriptions Future Work: • Other kinds of action units • Use to generate better method summaries & internal comments • Other use cases

Another Thought Do the features learned through this work lead to alternate representations for machine learning approaches to mining patterns?

What have we learned?

Current Source Code Analyses: � Unit = Method, Statement or Word

Should we worry about that?

Yes ✔ Method name too coarse “Shouldn’t judge a book by its cover”

Yes ✔ Individual statements are related. Eat fruits, proteins, veggies. Stop eating sweets and carbs. Each less overall. Reduce alcohol intake. Exercise daily. Reduce sitting time periods. Lift weights. “Small steps can lead to BIG CHANGES”

Yes ✔ Words can have different meaning when put together. “The whole is not always the sum of its parts.”

Who Cares? Text and structure analyzers in client tools care. e.g., ✓ Code Search ✓ Code Summary generators ✓ Traceability ✓ Code reuse analysis

Exploring Action Unit Granularity for Automatically Generating - PowerPoint PPT Presentation

Exploring Action Unit Granularity for Automatically Generating Natural Language Descriptions for Methods Lori Pollock Collaborators: Xiaoran Wang, K. Vijay-Shanker University of Delaware UD-Summarize ( Sridhara et al. 2010) Method

Exploring the IPY with NOAA Exploring the IPY with NOAA Exploring the IPY with NOAA Exploring

Green Action Centre, 2019 Green Action Centre, 2019 Green Action Centre, 2019 Green Action

HOUSING PROJECT 1 UNIT 4 UNIT 1 UNIT 6 UNIT 5 UNIT 3 UNIT 2 Application of the Concept

15-Minute Day-Ahead Scheduling Granularity Second Revised Straw Proposal Stakeholder Conference

Temporal, Spatial, and Spatio-temporal Granularities Gabriele Pozzani Department of Computer

- CRR MW Granularity - 30-Day Rule for Outages Lorenzo Kristov, Principal Market Architect

Fixing bugs in Python programs with Genetic Improvement Program size and search granularity

Access Methods 1 / 44 Recap Recap 2 / 44 Recap A More Detailed Architecture granularity:

Scheduling Granularity in Underwater Acoustic Networks Kurtis Kredo II 1 Prasant Mohapatra 2 1

Unit Identifier Unit October 21, 2014 Unit Identifiers Unit Members Representing Name Email

Unit Title: Presentation Software Unit Level: 2 Unit Credit Value: 4 GLH: 30 LASER Unit

Municipal Water District of Orange County May 1, 2019 Action 1 Action 1 Action 2 Action 2

A new high granularity Silicon Array for future reaction studies D.Beaumel, Unit mixte de

Automatically Automatically Finding Patches Finding Patches Using Genetic Using Genetic

Automatically Identifying Automatically Identifying and Georeferencing Georeferencing and

Technical Memorandum 1 Unit 28 Remedial Action Unit 28: 107 acres Fieldwork completed

Forum Shopping in Marine Casualty Cases by David McInnes, Partner Content: Limitation of

Introduction to the classification of group actions on C -algebras G abor Szab o

1 Background | Problems | Challenges | Design | Evaluation | Summary Control Plane Applications

Overcoming Limitations of Game-Theoretic Distributed Control Jason R. Marden California

r s t t r

Luigi Zingales Discussion of Colin Mayer The Future of the Corporation and the Economics of

Selling to Cournot oligopolists: pricing under uncertainty & generalized mean residual life

Earnings Conference Call Third Quarter 2013 November 1, 2013 Cautionary Statements And Risk

Exploring Action Unit Granularity for Automatically Generating - PowerPoint PPT Presentation

Exploring Action Unit Granularity for Automatically Generating Natural Language Descriptions for Methods Lori Pollock Collaborators: Xiaoran Wang, K. Vijay-Shanker University of Delaware UD-Summarize ( Sridhara et al. 2010) Method

Exploring the IPY with NOAA Exploring the IPY with NOAA Exploring the IPY with NOAA Exploring

Green Action Centre, 2019 Green Action Centre, 2019 Green Action Centre, 2019 Green Action

HOUSING PROJECT 1 UNIT 4 UNIT 1 UNIT 6 UNIT 5 UNIT 3 UNIT 2 Application of the Concept

15-Minute Day-Ahead Scheduling Granularity Second Revised Straw Proposal Stakeholder Conference

Temporal, Spatial, and Spatio-temporal Granularities Gabriele Pozzani Department of Computer

- CRR MW Granularity - 30-Day Rule for Outages Lorenzo Kristov, Principal Market Architect

Fixing bugs in Python programs with Genetic Improvement Program size and search granularity

Access Methods 1 / 44 Recap Recap 2 / 44 Recap A More Detailed Architecture granularity:

Scheduling Granularity in Underwater Acoustic Networks Kurtis Kredo II 1 Prasant Mohapatra 2 1

Unit Identifier Unit October 21, 2014 Unit Identifiers Unit Members Representing Name Email

Unit Title: Presentation Software Unit Level: 2 Unit Credit Value: 4 GLH: 30 LASER Unit

Municipal Water District of Orange County May 1, 2019 Action 1 Action 1 Action 2 Action 2

A new high granularity Silicon Array for future reaction studies D.Beaumel, Unit mixte de

Automatically Automatically Finding Patches Finding Patches Using Genetic Using Genetic

Automatically Identifying Automatically Identifying and Georeferencing Georeferencing and

Technical Memorandum 1 Unit 28 Remedial Action Unit 28: 107 acres Fieldwork completed

Forum Shopping in Marine Casualty Cases by David McInnes, Partner Content: Limitation of

Introduction to the classification of group actions on C -algebras G abor Szab o

1 Background | Problems | Challenges | Design | Evaluation | Summary Control Plane Applications

Overcoming Limitations of Game-Theoretic Distributed Control Jason R. Marden California

r s t t r

Luigi Zingales Discussion of Colin Mayer The Future of the Corporation and the Economics of

Selling to Cournot oligopolists: pricing under uncertainty &amp; generalized mean residual life

Earnings Conference Call Third Quarter 2013 November 1, 2013 Cautionary Statements And Risk

Selling to Cournot oligopolists: pricing under uncertainty & generalized mean residual life