- Dept. of Computer Science
- Dept. of Computer Science
- Dept. Information Eng. and
Automated Topic Naming to Support Cross-project Analysis of Software Maintenance Activities
US NSF SHF Medium 0964703 1
Automated Topic Naming to Support Cross-project Analysis of - - PowerPoint PPT Presentation
Automated Topic Naming to Support Cross-project Analysis of Software Maintenance Activities Abram Hindle Neil A. Ernst Dept. of Computer Science Dept. of Computer Science University of California, Davis University of Toronto Davis, CA, USA
Automated Topic Naming to Support Cross-project Analysis of Software Maintenance Activities
US NSF SHF Medium 0964703 1Who Cares About Quality?
Customers 2Added a test for bug #1326 on OSX What is this commit about?
3Added a test for bug #1326 on OSX What is this commit about?
4Added a test for bug #1326 on OSX What is this commit about?
Reliability Maintain- ability Portability
5But we have many commits..
Reliability Maintain- ability Portability
6Commit
Developer TopicDeveloper Topics
Developer TopicCommit
Maintainability Reliability
purpose? 7Shared Concepts
Cross Project Relevance
8portability m a i n t a i n a b i l i t y reliability and functionality (includes correctness) efficiency
Quality-related Non Functional Requirements (NFRs)
usability
[iso9126] [cleland-huang03] [ernst10] 9Can't we just summarize quality related efforts within this project?
Source Code Source Code Documentation Build / Configuration T ests Revisions Software Repositories 10Time (months) Unique Topics Labelled Developer Topics
11Time (months) Unique Topics Labelled Developer Topics
Linux Kernel Windows AMD64
12maintainability portability reliability efficiency efficiency functionality functionality
maintainability portability portability efficiencyTime (months) Unique Topics Labelled Developer Topics
13Example
[Blei] 14Arts
International NewsOpinion
15Section Section Arts International News
Article Article Article Article Article Article Article Article Article Article 16What if we didn't know what section the articles were in?
17LDA LSI
18LDA LSI
Article Article Article Article Article 19LDA LSI
Articlecat dog car city pound festival street mischief Word Distribution Documents are represented as word distributions (word counts)
20LDA LSI
21Sports Entertainment
Athlete and Actor Award Nominees Baseball Movie Theatre Review Original Article 22C C
1+ x x = ~ Documents are represented as a linear combination of independent topics
23Topic 1 * play * game * inning * player * quarter * opponent * ... Topic 2 * gambling * play * night life * comedy * movie * theatre * ... These word lists look look like: Sports and Entertainment !
Article Article Article Article Article Article Article Article Article Article Article Article Article LDA LSIHere are two topics. I don't know what they are about!
24Word bag analysis
Portability Usability Reliability Efficiency Maintainability
26Portability Reliability
portability transferability interoperability documentation internationalization i18n ... reliability failure error redundancy fails bug ...Word Bag Examples
27Time (months) Unique Topics
Labelled Topics of MaxDB 7.500
28MaxDB 7.500 Timeline
Maintainability Portability Maintainability Portability Reliability Effeciency Maintainability Effeciency 29Topics of MySQL 3.23
tags Time (months) Unique Topics
30MySQL 3.23 Timeline
31ROC Values of Semi-Supervised Word Bags ROC NFR
32Supervised Tags
33Supervised Multitag Classifiers: MySQL and MaxDB
MaxDB Classifiers MySQL Classifiers
34Conclusions
Managers Investors and Acquisitions New Developers Core Developers Customers Version Control Version Control Version Control Version Control portability efficiency maintainability usability reliability and functionality (includes correctness) Shared Concepts http://softwareprocess.es/name/ 35F-1 Measure of Semi-Supervised Word Bags F-1 NFR
36Annotation: Stop Words
STOP
words
STOP
words
topics joined due to similarity 2 long trends instead of one MaxDB 7.500 Case Study 38Annotation: Training Sets
Version ControlMaintainability+ Maintainability-
39Annotation: Stop Words
STOP
words
a l r e a d y t h
g h l y m i g h t b e s i d e i n s t e p e r h a p s t e n d s t h a t s n e c e s s a r y b y e g f
l e a r l y b e s t l e s t h e r e ' s b e e n g e t s a l m
t b e t w e e n s h e i m m e d i a t e n p l u s fi f t h a t d
e t h e m a n y
e s h a l l s e e a m
n g s t w h
e c a u s e &
n w e n t f f
l
e d i h a v e n ' t m
e e v e r y t h i n g u p c
c e r n i n g i n a s k i n g e x a m p l e t h i r d m u c h 3 s
b
t n e v e r t h e l e s s d
s n ' t i ' m m a y b e d u r i n g l a s e n s i b l e
r s
e w h a t
i n c i n n
u d e e l s e w h e r e u p
a s k h e r e u p
i s n ' t b e f
e h a n d i e f
n d e x c e p t u n l e s s 5 c a n ' t a n y w h e r e i t c
t a i n i n g i n t e r e s t n
e s i x e v e r y w h e r e d e t a i l w h
e n e i t h e r t h e r e s n e e d a s s
i a t e d
g a i n b e l i e v e g
s l i k e l y s p e c i f y i n g r i g h t 1 s i n c e r e s i x t y g c l e t w ! c
d i d n ' t t h i s # m e a n w h i l e h e l l
e r f
m e r t w e n t y s u r e ) c ' s l l a t t e r r e g a r d s
Used in topic analysisAnnotation: Training Sets
Version Control Maintainability+ Maintainability- Maintainabilitysample and correct
Maintainability+ Maintainability-Message Word Distribution Topic Trend
Top 10 Words: * perforce * bug # * POSIX * Opteron * ... 42