dse data driven
play

DSE = Data-Driven Gothenburg, Sweden Search-Based SE Vivek Nair, - PowerPoint PPT Presentation

Computer Science software 1 if engineering, then NC State ... williams stolee heckman parnin murphy-hill menzies king MSR18, DSE = Data-Driven Gothenburg, Sweden Search-Based SE Vivek Nair, Amritanshu


  1. Computer Science software 1 … if engineering, then NC State ... williams stolee heckman parnin murphy-hill menzies king MSR’18, DSE = Data-Driven Gothenburg, Sweden Search-Based SE Vivek Nair, Amritanshu Agrawal, Jianfeng Chen Wei Fu, George Mathew, Tim Menzies Leandro Minku, Markus Wagner , Zhe Yu leandro.minku@le.ac.uk markus.wagner@adelaide.edu.au timm@ieee.org @timmenzies tiny.cc/18msr @timmenzies tiny.cc/18msr tiny.cc/18msr

  2. Computer Science Why did these MSR people 2 meet in Japan in Dec’17? DSE = Data-Driven Search-based SE @timmenzies tiny.cc/18msr @timmenzies tiny.cc/18msr

  3. Computer Science Search-based SE: 3 highly acceptable at MSR @timmenzies tiny.cc/18msr @timmenzies tiny.cc/18msr

  4. Computer Science What is SBSE? 4 (Search-based Software Engineering) 4 • Many SE activities are like optimization problems 3 [Harman’02] 2 1 Recall • Due to computational complexity, exact optimization methods are impractical • Alternative: find good-enough False alarms solutions using meta-heuristic search as our optimizers – e.g. genetic algorithms – e.g simulated annealing – e.g. tabu search – e.g. NSGA-II, SPEA2, MOEA/D, Differential Evolution, Bayesian parameter optimization, etc etc @timmenzies tiny.cc/18msr @timmenzies tiny.cc/18msr

  5. Computer Science DSE = Data-Driven 5 Search-based SE = + • Conceptually, common higher level goal – supporting and giving insights to software engineers @timmenzies tiny.cc/18msr @timmenzies tiny.cc/18msr

  6. Computer Science 6 Data-Driven Search-based SE (DSE) • To solve an SE problem: – Insert a data miner into an optimizer; – Or use an optimizer to improve a data miner. • A new era for MSR (better MSR) • A new era for SBSE (better SBSE) @timmenzies tiny.cc/18msr @timmenzies tiny.cc/18msr

  7. Computer Science A new era for SBSE: 7 Supercharging MSR • Black art: hyperparameter optimization • E.G. learning how many trees in a random forest • E.G. learning how many “k” in kth-nearest neighbors • Thanks to SBSE: massive improvements in, say, defect prediction – e.g. Agrawal & Menzies, ICSE 2018 – performance details (after - before) tuning @timmenzies tiny.cc/18msr @timmenzies tiny.cc/18msr

  8. Computer Science A new era for SBSE: 8 Let MSR help you run faster • Landscape analysis – Find the lay of the land (shape of data) – Jump faster to better conclusions – e.g.. GALE, TSE 2015 • Note that this “optimizer” is really a “data miner” – clustering, PCA mutate all orange points this way Red ignored @timmenzies tiny.cc/18msr @timmenzies tiny.cc/18msr

  9. Computer Science Q: Why explore MSR+SBSE? 9 A: So many application areas 1. Requirements Menzies, Feather, Bagnall, Mansouri, Zhang 2. Transformation Cooper, Ryan, Schielke, Subramanian, Fatiregun, Williams 3. Effort prediction Aguilar-Ruiz, Burgess, Dolado, Lefley, Shepperd 4. Management Alba, Antoniol, Chicano, Di Pentam Greer, Ruhe 5. Heap allocation Cohen, Kooi, Srisa-an 6. Regression test Li, Yoo, Elbaum, Rothermel, Walcott, Soffa, Kampfhamer 7. SOA Canfora, Di Penta, Esposito, Villani 8. Refactoring Antoniol, Briand, Cinneide, O’Keeffe, Merlo, Seng, Tratt 9. Test Generation Alba, Binkley, Bottaci, Briand, Chicano, Clark, Cohen, Gutjahr, Harrold, Holcombe, Jones, Korel, Pargass, Reformat, Roper, McMinn, Michael, Sthamer, Tracy, Tonella,Xanthakis, Xiao, Wegener, Wilkins so many novel 10. Maintenance Antoniol, Lutz, Di Penta, Madhavi, Mancoridis, Mitchell, Swift contributions 11. Model checking Alba, Chicano, Godefroid to so many 12. ProbingCohen, Elbaum areas 13. Comprehension Gold, Li, Mahdavi 14. Protocols Alba, Clark, Jacob, Troya 15. Component sel Baker, Skaliotis, Steinhofel, Yoo 16. Agent Oriented Haas, Peysakhov, Sinclair, Shami, Mancoridis @timmenzies tiny.cc/18msr @timmenzies tiny.cc/18msr

  10. Computer Science Q: Why explore MSR+SBSE? 10 A2: cause you got to • How to get a paper rejected (in 2020): – Publish data mining results without hyper-parameter optimization • Coming to the end of “merely mining” – See debates on “unsupervised learning” • Too easy to just chase precision, recall etc • Complex problems need complex inference – e.g.minimizing #false alarms before first defect [Huang et al.ICSME’17] – Needed to reply to (e.g.) [Parnin, Orso, Issta’11] @timmenzies tiny.cc/18msr @timmenzies tiny.cc/18msr

  11. Computer Science http://tiny.cc/data-SE: A new resource for MSR researchers 11 89 DSE artifacts, in 13 groups (e.g. RE,software product lines, software processes) existing results; useful for testing new methods @timmenzies tiny.cc/18msr @timmenzies tiny.cc/18msr

  12. Computer Science So now we know why all these MSR 12 people are so interested in SBSE • Thanks to organizers the Dec’17 NII Shonan Meeting – Data-Driven Search-based SE, Dec 11-14, 2017 – Markus Wagner, Leandro Minku , Ahmed Hassan, John Clark @timmenzies tiny.cc/18msr @timmenzies tiny.cc/18msr

  13. Computer Science 13 DSE = Data-Driven Search-based SE + = • To solve an SE problem: – Insert a data miner into an optimizer; – Or use an optimizer to improve a data miner. • A new era for MSR (better MSR) • A new era for SBSE (better SBSE) @timmenzies tiny.cc/18msr @timmenzies tiny.cc/18msr

  14. Computer Science e e r r a a w w t t f f o o 14 s s … if engineering, then NC State ... williams stolee heckman parnin murphy-hill menzies king @timmenzies tiny.cc/18msr @timmenzies tiny.cc/18msr

  15. Computer Science 15 Back-up slides @timmenzies tiny.cc/18msr @timmenzies tiny.cc/18msr

  16. Computer Science A new era for MSR: 16 Data farming (MSR + SBSE) • Big data and massive Monte Carlo analysis – find important interactions • domain intuitions ⇒ • model ⇒ – generation += 1 – simulation[ i] – data – mining – insight – repeat @timmenzies tiny.cc/18msr @timmenzies tiny.cc/18msr

  17. Computer Science Q:Why explore farming data from models? 17 A: Cause models are everywhere 1. Silicon valley developers, new 5. Stock traders use models to simulate features are experiments, to be trading strategies tested http://www.quantopian.com 2. Chemists win Nobel Prize for 6. Analysts review proposed gov policies model sims http://goo.gl/Lwensc via models of labor statistics data http://goo.gl/X4kgnc 3. Engineers test designs via models: 7. Journalists use models to analyze radiation therapy, remote sensing, economic data http://fivethirtyeight.com chip design, http://goo.gl/qBMyIZ 4. Web analysts use models to 8. In London or New York, ambulances analyze clickstreams to improve wait at locations determined by a model marketing: http://goo.gl/b26CfY http://goo.gl/8SMd1p 9. Etc etc etc @timmenzies tiny.cc/18msr @timmenzies tiny.cc/18msr

  18. Computer Science Why explore SBSE + MSR? 18 (the carrot) 1. Requirements Menzies, Feather, Bagnall, Mansouri, Zhang 2. Transformation Cooper, Ryan, Schielke, Subramanian, Fatiregun, Williams 3. Effort prediction Aguilar-Ruiz, Burgess, Dolado, Lefley, Shepperd 4. Management Alba, Antoniol, Chicano, Di Pentam Greer, Ruhe 5. Heap allocation Cohen, Kooi, Srisa-an 6. Regression test Li, Yoo, Elbaum, Rothermel, Walcott, Soffa, Kampfhamer 7. SOA Canfora, Di Penta, Esposito, Villani 8. Refactoring Antoniol, Briand, Cinneide, O’Keeffe, Merlo, Seng, Tratt 9. Test Generation Alba, Binkley, Bottaci, Briand, Chicano, Clark, Cohen, Gutjahr, Harrold, Holcombe, Jones, Korel, Pargass, Reformat, Roper, McMinn, Michael, Sthamer, Tracy, Tonella,Xanthakis, Xiao, Wegener, Wilkins so many novel 10. Maintenance Antoniol, Lutz, Di Penta, Madhavi, Mancoridis, Mitchell, Swift contributions 11. Model checking Alba, Chicano, Godefroid to so many 12. ProbingCohen, Elbaum areas 13. Comprehension Gold, Li, Mahdavi 14. Protocols Alba, Clark, Jacob, Troya 15. Component sel Baker, Skaliotis, Steinhofel, Yoo 16. Agent Oriented Haas, Peysakhov, Sinclair, Shami, Mancoridis @timmenzies tiny.cc/18msr @timmenzies tiny.cc/18msr

  19. Computer Science 19 Some technical differences MSR SBSE Inference induction, visualize optimization Speed Faster, often more scalable Becoming faster Data Collected before inference Sampling controlled by inference Tools R, SciKitLearn, WEKA jMetal, AutoWeka, AutoSklearn, Opt4j, DEAP Example e.g. defect prediction; minimize a test suite ● ● problems StackOverflow mining configure software ● ● Goals e.g. just a few: recall, precision, domain-specific goals. ● MRE meta-criteria (hypervolume, ● spread, IGD) @timmenzies tiny.cc/18msr @timmenzies tiny.cc/18msr

  20. Computer Science 20 Optimization = surfing the landscape Particle Swarm Optimization: murmuration of starlings new = old + φ1*rand( ourBest - now ) ;; social cognition (learn safe “shapes” to avoid predators) + φ2*rand( myBest - now )) ;; private cognition use data miners to learn the landscape, guide our optimizers? @timmenzies tiny.cc/18msr @timmenzies tiny.cc/18msr

  21. Computer Science Something is changing. 21 Things are …. different Strange new words: - “hyper- parameter optimization” - “evolutionary algorithms” - “differential evolution” - “model-based reasoning” What is going on? @timmenzies tiny.cc/18msr @timmenzies tiny.cc/18msr

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend