learning from evolving streams
play

Learning from evolving streams Online triage of bug reports - PowerPoint PPT Presentation

Learning from evolving streams Online triage of bug reports Grzegorz Chrupa la Spoken Language Systems Saarland University EACL 2012 G. Chrupa la (Saarland Uni) Learning from streams EACL 2012 1 / 23 Issue trackers Used to track


  1. Learning from evolving streams Online triage of bug reports Grzegorz Chrupa� la Spoken Language Systems Saarland University EACL 2012 G. Chrupa� la (Saarland Uni) Learning from streams EACL 2012 1 / 23

  2. Issue trackers Used to track bugs or feature requests in software projects May receive hundreds of reports per day Need to be triaged : labeled and assigned developers Domain-specific challenges G. Chrupa� la (Saarland Uni) Learning from streams EACL 2012 2 / 23

  3. G. Chrupa� la (Saarland Uni) Learning from streams EACL 2012 3 / 23

  4. Automate Predict project subcomponent labels Predict developers assigned to bugs G. Chrupa� la (Saarland Uni) Learning from streams EACL 2012 4 / 23

  5. As social media Issue trackers: ◮ very specialized social media Practices (labeling, triage) ◮ Negotiated explicitly ◮ Emerging via imitation ◮ Influenced by automation G. Chrupa� la (Saarland Uni) Learning from streams EACL 2012 5 / 23

  6. Concept drift Practices evolve Software projects mature People involved come and go For a learner, input and output change over time. G. Chrupa� la (Saarland Uni) Learning from streams EACL 2012 6 / 23

  7. Contributions Collect data from modern software projects Analyze concept drift Apply state-of-the-art online learning and improve on current approaches G. Chrupa� la (Saarland Uni) Learning from streams EACL 2012 7 / 23

  8. Data Alternate items assigned to dev and test Dev sets sizes: Tracker Output # Items # Labels Chromium Subcomponent 31,953 75 Chromium Assigned 16,154 591 Android Subcomponent 888 12 Android Assigned 718 72 Firefox Assigned 12,733 503 Launchpad Assigned 18,634 1,970 G. Chrupa� la (Saarland Uni) Learning from streams EACL 2012 8 / 23

  9. Evolving class distribution Chromium Subcomponent G. Chrupa� la (Saarland Uni) Learning from streams EACL 2012 9 / 23

  10. Evolving class distribution Launchpad Assigned G. Chrupa� la (Saarland Uni) Learning from streams EACL 2012 10 / 23

  11. Progressive validation For i = 1 to ∞ ◮ Send input i to learner ◮ Receive prediction i and record error i ◮ Send true output i to learner Error ( n ) = � n i =1 error ( i ) G. Chrupa� la (Saarland Uni) Learning from streams EACL 2012 11 / 23

  12. Evaluation of ranking Triage assistant ◮ show user a ranked list of suggested targets Mean reciprocal rank N MRR = 1 � rank( i ) − 1 N i =1 G. Chrupa� la (Saarland Uni) Learning from streams EACL 2012 12 / 23

  13. Features Title unigram and bigram counts Description unigram and bigram counts Author ID Year, month and day of submission G. Chrupa� la (Saarland Uni) Learning from streams EACL 2012 13 / 23

  14. Baselines Window frequency ◮ Relative class frequencies in previous k ∈ { 100 , 1000 } items SVM minibatch ◮ Retrain every n = 100 steps on previous k = 1000 items Perceptron ◮ Single pass, constant learning rate G. Chrupa� la (Saarland Uni) Learning from streams EACL 2012 14 / 23

  15. Bugzie Tamrawi et al. 2011, Fuzzy set and cache-based approach for bug triaging. Based on a fuzzy set membership function: � n ( y, x ) � � µ ( y, X ) = 1 − 1 − n ( y, · ) + n ( · , x ) − n ( y, x ) x ∈ X Counts n ( · , · ) updated incrementally Feature cache: keep track of k most significant features G. Chrupa� la (Saarland Uni) Learning from streams EACL 2012 15 / 23

  16. → Regression SGD SGD with square loss as basic learner. w ( t +1) = w ( t ) − η ( t ) ∇ L ( y ( t ) , w ( t ) T x ( t ) ) y ) 2 L ( y, ˆ y ) = ( y − ˆ Adaptive, per-feature learning rate (Duchi et al. 2010, Streeter and McMahan 2010) Learning rate larger for infrequent features. G. Chrupa� la (Saarland Uni) Learning from streams EACL 2012 16 / 23

  17. Reduction from multiclass One-versus-all reduction T ( x , y ) = { ( x ′ , I ( y = y ′ )) | y ′ ∈ Y, x ′ h ( i,y ′ ) = x i } h ( i, y ′ ) composes the index i with the label y ′ by hashing. G. Chrupa� la (Saarland Uni) Learning from streams EACL 2012 17 / 23

  18. Summary of results (test) 0.7 Win Svm Perc 0.6 Bugz Regr 0.5 0.4 MRR 0.3 0.2 0.1 0.0 CS AS CA AA FA LA G. Chrupa� la (Saarland Uni) Learning from streams EACL 2012 18 / 23

  19. Chromium Subcomponent G. Chrupa� la (Saarland Uni) Learning from streams EACL 2012 19 / 23

  20. Firefox Assigned Data becomes more difficult around 9.000 G. Chrupa� la (Saarland Uni) Learning from streams EACL 2012 20 / 23

  21. Launchpad Assigned Little concept drift ≈ 2000 labels: hashing collisions G. Chrupa� la (Saarland Uni) Learning from streams EACL 2012 21 / 23

  22. Best improvement over Window 0.4 0.3 0.2 0.1 0.0 ChS AnS ChA AnA FiA LaA G. Chrupa� la (Saarland Uni) Learning from streams EACL 2012 22 / 23

  23. To conclude Concept drift is a crucial concern Modern online learner successfully tracks stream evolution Data available at: www.lsv.uni-saarland.de/resources.htm Ready to go beyond bag-of-words G. Chrupa� la (Saarland Uni) Learning from streams EACL 2012 23 / 23

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend