automated topic naming to support cross project analysis
play

Automated Topic Naming to Support Cross-project Analysis of - PowerPoint PPT Presentation

Automated Topic Naming to Support Cross-project Analysis of Software Maintenance Activities Abram Hindle Neil A. Ernst Dept. of Computer Science Dept. of Computer Science University of California, Davis University of Toronto Davis, CA, USA


  1. Automated Topic Naming to Support Cross-project Analysis of Software Maintenance Activities Abram Hindle Neil A. Ernst Dept. of Computer Science Dept. of Computer Science University of California, Davis University of Toronto Davis, CA, USA Toronto, Ontario, CANADA abram@softwareprocess.es nernst@cs.toronto.edu Michael W. Godfrey John Mylopoulos David Cheriton School of Dept. Information Eng. and Computer Science Computer Science University of Waterloo University of Trento Waterloo, Ontario, CANADA Trento, ITALY migod@uwaterloo.ca jm@disi.unitn.it US NSF SHF Medium 0964703 1

  2. Who Cares About Quality? Managers Developers New Developers Investors Customers 2

  3. What is this commit about? Added a test for bug #1326 on OSX 3

  4. What is this commit about? Added a test for bug #1326 on OSX 4

  5. What is this commit about? Added a test for bug #1326 on OSX Maintain- Reliability Portability ability 5

  6. But we have many commits.. Maintain- Reliability Portability ability 6

  7. Developer Topics Commit Commit Developer Topic Developer Topic purpose? Maintainability Reliability L D A 7 L S I

  8. Cross Project Relevance Version Version Control Control efficiency Shared usability reliability and functionality Concepts (includes correctness) maintainability Version portability Version Control Control 8

  9. Quality-related Non Functional Requirements (NFRs) portability reliability and functionality (includes correctness) usability efficiency [iso9126] m a i n t a i n a b i l i t y [cleland-huang03] [ernst10] 9

  10. Can't we just Revisions Software summarize Repositories quality related Source Code efforts within Source Code Build / Configuration this project? T ests Documentation Non-Functional Requirements Maintainability Functionality Portability Efficiency Usability Reliability time -> 10

  11. Labelled Developer Topics Unique Topics Time (months) 11

  12. Labelled Developer Topics Linux Unique Topics Kernel Windows AMD64 Time (months) 12

  13. Labelled Developer Topics efficiency portability efficiency portability functionality maintainability efficiency Unique Topics reliability maintainability portability functionality Time (months) 13

  14. Example [Blei] apologies to those with prior LDA/LSI experience 14

  15. Opinion Arts International News 15

  16. Arts International News Section Section Article Article Article Article Article Article Article Article Article Article 16

  17. What if we didn't know what section the articles were in? Article Article Article Article Article Article Article Article Article Article Article Article Article Article Article Article Article Article Article Article Article Article Article Article Article Article 17

  18. Article Article Article Article Article Article Article Article Article Article Article Article Article LDA LSI 18

  19. Article Article Article Article Article Article Article Article Article Article Article Article Article Article Article Article Article Article Article Article Article Article Article Article Article Article LDA LSI 19

  20. Word Distribution Article dog cat car city pound festival street mischief Documents are LDA represented as word distributions LSI (word counts) 20

  21. Word Distributions Topics: Independent Word Distributions LDA finds independent word distributions that the LDA documents are related to. Documents can be associated LSI with more than one topic. 21

  22. Topics: Original Word Independent Article Distributions Word Distributions Baseball Movie Sports Athlete and Actor Award Entertainment Nominees Theatre Review 22

  23. Documents are represented as a linear combination of independent topics Topics: Independent Word Distributions Word Distributions Sports Athlete Entertainment and Actor C x 0 ~ + = C x 1 23

  24. Article Article Article Article Article Article Article Here are two topics. I Article Article Article Article Article Article don't know what they are about! LDA LSI These word lists look look like: Sports and Topic 1 Topic 2 Entertainment ! * play * gambling * game * play * inning * night life * player * comedy * quarter * movie * opponent * theatre * ... * ... 24

  25. 25

  26. Word bag analysis Usability Maintainability Portability Reliability Efficiency 26

  27. Word Bag Examples Reliability Portability portability reliability transferability failure interoperability error documentation redundancy internationalization fails i18n bug ... ... 27

  28. Labelled Topics of MaxDB 7.500 efficiency portability efficiency portability functionality maintainability efficiency reliability Unique Topics maintainability portability functionality Time (months) 28

  29. MaxDB 7.500 Timeline Maintainability Maintainability Maintainability Portability Portability Effeciency Reliability Effeciency 29

  30. Topics of MySQL 3.23 functionality usability functionality efficiency functionality/portability portability reliability portability reliability/usability functionality/portability maintainability/reliability/portability portability usability Unique Topics portability tags Time (months) 30

  31. MySQL 3.23 Timeline Maintainability Maintainability Functionality Functionality Functionality Reliability Efficiency Portability Portability 31

  32. ROC Values of Semi-Supervised Word Bags 0.8 Ma xDB e xp 2 MyS QLe xp 2 0.75 Ma xDB e xp 3 0.7 MyS QLe xp 3 0.65 0.6 0.55 ROC 0.5 0.45 0.4 porta bility efficiency relia bility functiona lity m a inta ina bility usa bility tota l NFR 32

  33. Supervised Tags 33

  34. Supervised Multitag Classifiers: MySQL and MaxDB 1 1 m ic ro 0.9 0.9 m a cro m ic ro 0.8 0.8 0.7 0.7 0.6 0.6 0.5 0.5 0.4 0.4 CLR HOMER CLR HOMER BR BR MySQL MaxDB Classifiers Classifiers 34

  35. Conclusions efficiency portability Usability Version Managers efficiency Control Portability Developer Topic Analysis Maintainability reliability and Labelling Revisions maintainability portability Core Developers Source Control functionality functionality Efficiency L D A Version Version Control Control Reliability L S I efficiency Shared usability reliability and [Hindle09ICSM] Concepts functionality (includes correctness) New Developers maintainability Version portability Version Control Control Investors and Acquisitions Customers 35 http://softwareprocess.es/name/

  36. F-1 Measure of Semi-Supervised Word Bags 0.8 Ma xDB e xp 2 MyS QLe xp 2 Ma xDB e xp 3 MyS QLe xp 3 0.7 0.6 0.5 0.4 0.3 F-1 0.2 0.1 0 porta bility efficiency relia bility functiona lity m a inta ina bility usa bility tota l NFR 36

  37. Many Documents Topic 1 Topic 10 Few Documents Topic 20 37

  38. Annotation: Stop Words MaxDB 7.500 Case Study 2 long trends instead of one topics joined due to similarity STOP STOP words words 38

  39. Annotation: Training Sets Maintainability+ Version Control Maintainability- 39

  40. a l r e p a e d r y h c Annotation: Stop Words t a l e h p a o s b r r e l o t y e u t w n d b g d h o e e s n e s l y t w e n t h l m h s e t a h o h s i t h g STOP e t e s b a h m h e i n t v m e c e i e b n a r a n m c e n e u e a ' ' s y t e s s s n s i o d m e d s k e b n i a e i e v a n o & e r words s e t e i v y g n e e s n r o e t b s n e h w n h t g e a y s x e e n e e i a v l p l b l e l s t m e e w l g e l s u s e r s b e e w p y s a s f e e o t n o l h fi l e d h m f u t a o e f a o i r t t n f r m o r n h h e e e s g f s y s i o h o o a t r w u n u e d l u a m l t p v ' p o n h n t m e o e w d g Used in topic analysis e i c n w n r ' u e s r m o y i e e h t e c d a n w or to reduce # of e a h s i m c h t f i s d k t o e p 3 e c a u o r features for learners. a h e r o y n s n r s e e c c n b o i s d i n r o i d t n e f o e a g y a n e c e c u i i d b 40 f x n i t n d p o a u o a c i i g n i o r t n r u i e d m e l n i g r p n t n n d w i e g t g ' i o i t h s r o n h u t n l t o t a t t n h u a e ' s w t 1 l i g r e e d s e e a s s e n n s # i i s n t t n e y 5 m i n c b t e h o s e c e r u e o a l a e i r n r n e n e s e v t ' w t i h ) x e s h e c t i g x i y r ' l s e e o g s e l h s l c a e l t l l i l e t k o e t e r o w l y v r e e ! g r a r d s

  41. AUTO Annotation: Training Sets Version Maintainability+ Control MANUAL Maintainability Maintainability+ Maintainability- sample and correct 41 Maintainability-

  42. Message Word Distribution Top 10 Words: Topic * perforce * bug # * POSIX * Opteron * ... Trend 42

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend