Predicting What Follows Predictive Modeling TIM MENZIES, CS, NC - PowerPoint PPT Presentation

Predicting What Follows Predictive Modeling TIM MENZIES, CS, NC State, USA tim.menzies@gmail.com UCL, Crest Open Workshop, Nov 23,24 2015 view : tiny.cc/timcow15 discuss: tiny.cc/timcow15discuss

Tools to let other people run data miners … better 2

tiny.cc/timcow15 Sound bites “Prediction” = combination of many things ● Can remix, reuse in novel ways ○ Is it “prediction”? ● Or “optimization” ? ○ or “spectral learning” ? or “response surface methods” ? or “surrogate modeling”? or “local search”? or ... or “finding useful quirks in the data”? ○ Call it anything: ● But expand your mind, ○ Refactor your tools, ○ Expand your role ○ 3

tiny.cc/timcow15 Why expand our role? After continuous deployment: ● Next gen SE = “continuous science”. ○ Services for data repositories supporting large teams running data miners ○ NOW: we run the data miners ● NEXT: we write tools that let other people run data miners… better ○ 4

tiny.cc/timcow15 eg #1 : Helping Magne Models : useful for exploring uncertainty: Menzies ASE’07, Gay ASE journal March’10 6

tiny.cc/timcow15 eg #2: Helping Queens Yesterday: 30 mins per optimizer? Can we do better than that? 7

tiny.cc/timcow15 eg #2 : Helping Queens Decision tree options = #examples too split; #examples to stop, etc (usually 6 settings per learner) Differential evolution (Storn 1995) ● frontier = Pick N options at random # e.g. N = 5 ● R times repeat: # e.g. R = 10 for Parent in frontier: • j,k,l = three other frontier items • Candidate = j + f * (k - l) # ish • if Candidate “better”, replaces Parent Large improvements in defect prediction ● (Xalan, Jedit, Lucene, etc) For astonishingly little effort: seconds to run ● 8

tiny.cc/timcow15 eg #2 : Helping Queens Decision tree options = #examples too split; #examples to stop, etc (usually 6 settings per learner) Differential evolution (Storn 1995) ● frontier = Pick N options at random # e.g. N = 5 ● R times repeat: # e.g. R = 10 for Parent in frontier: • j,k,l = three other frontier items • Candidate = j + f * (k - l) # ish • if Candidate “better”, replaces Parent Large improvements in defect prediction ● (Xalan, Jedit, Lucene, etc) For astonishingly little effort: seconds to run ● ● No more prediction without pre-tuning study 9

tiny.cc/timcow15 eg #3: helping tuning for HARDER problems GALE: Krall, Menzies TSE 2015 ● k=2 divisive clustering ● function GALE(): 1. (X,Y)= 2 very distant points found in O(2N) time Euclidean distance in decision space ○ 2. Evaluate only (X,Y) 3. If X “better” than Y If size(cluster) < sqrt(N) mutate towards X Else split, cull worst half, goto 1 Only log 2 N evaluations. 10

tiny.cc/timcow15 eg #3: helping tuning for HARDER problems GALE: Krall, Menzies TSE 2015 ● k=2 divisive clustering ● function GALE(): 1. (X,Y)= 2 very distant points found in O(2N) time Euclidean distance in decision space ○ 2. Evaluate only (X,Y) 3. If X “better” than Y If size(cluster) < sqrt(N) mutate towards X Else split, cull worst half, goto 1 Only log 2 N evaluations. 4 minutes, not 7 hours 11

tiny.cc/timcow15 eg #3: helping tuning for HARDER problems GALE: Krall, Menzies TSE 2015 ● k=2 divisive clustering ● function GALE(): 1. (X,Y)= 2 very distant points found in O(2N) time Euclidean distance in decision space ○ 2. Evaluate only (X,Y) 3. If X “better” than Y If size(cluster) < sqrt(N) mutate towards X Else split, cull worst half, goto 1 Only log 2 N evaluations. 12

And more... 13

tiny.cc/timcow15 http://www.slideshare.net/timmenzies/actionable-analytics-why-how http://www.slideshare.net/timmenzies/future-se-oct15 MSR’13 14

tiny.cc/timcow15 Sound bites “Prediction” = combination of many things ● Can remix, reuse in novel ways ○ Is it “prediction”? ● Or “optimization” ? ○ or “spectral learning” ? or “response surface methods” ? or “surrogate modeling”? or “local search”? or ... or “finding useful quirks in the data”? ○ Call it anything: ● But expand your mind, ○ Refactor your tools, ○ Expand your role ○ 15

tiny.cc/timcow15 Why expand our role? After continuous deployment: ● Next gen SE = “continuous science”. ○ Services for data repositories supporting large teams running data miners ○ NOW: we run the data miners ● NEXT: we write tools that let other people run data miners… better ○ 16

tiny.cc/timcow15 18

Back up slides 19

tiny.cc/timcow15 Next gen3: Insight generators 20 Less numbers, more insight Burak Turhan’s “The graph” ● ● circle = reported to ● red = error report ● green = error fix ● blue = report+fix in the same team More coarse grain control (“ontime”, “aLittleLate”, “wayOverdue”) ● ● E.g.. Predicting delays in software projects using networked classification ● Choetkiertikul et al. ASE’15 20

tiny.cc/timcow15 Good News Software project data can be shared ● still be private ● still be used to build predictors ● Peters ICSE’12 ● Peters TSE’13 ● Peters ICSE’15 ● 21

tiny.cc/timcow15 Good News Software project data can be shared ● still be private ● still be used to build predictors ● Peters ICSE’12 ● Peters TSE’13 ● Peters ICSE’15 ● 22

tiny.cc/timcow15 Gooder news: Transfer learning Cross-company learning works: even proprietary to open source, ● even better data with different ● column names Turhan, Menzies, Bener ESE’09 ● He et al. ESEM’13 ● Peters ICSE’15 ● Nam FSE’15 (Heterogeneous) ● 23

tiny.cc/timcow15 Scales up to massive studies e.g. every Devanbu et al. study of Github 24

A little advertisement 25

tiny.cc/timcow15 Let’s all share more data openscience.us/repo ● 26

tiny.cc/timcow15 (My) Lessons from the PROMISE project 27 more data no “best” model no “best” metrics goals More data does Ensembles rule Best thing to do with data is to Learners must be biased. not actually help (N models beat one) throw most of it away • increases variance • Kocageunli TSE’12 (Ensemble) • Select sqrt(columns) No bias in conclusions • Minku IST’13 55(8) • Select sqrt(rows) ⇒ no way to cull “dull” stuff • So n 2 cells becomes (n 0.5 ) 2 = n • need to reason ⇒ no summary within data clusters ⇒ no model. data mining • Menzies TSE’13 Combine survivors, synthesize ⇒ no predictions (local vs global) dimensions (e.g. using WHERE). Poor method to confirm hypothesis • IST ’13, 55(8), Then cluster in synthesize space. So bias makes us blind, but bias Promise issue • Menzies TSE’13 (local vs global) lets us see (the future). Good method to refute hypothesis (when target not in any model) Not general models, but Can’t assure that best models are Need learners that are biased general methods for human comprehensible, or by the users’ goals Great way to generate hypotheses finding local models contain initial expectations • Menzies, Bener et al. (user meetings: heh… that’s funny) • Menzies TSE’13 ASE journal, 2010, 17(4) • Inductive SE Manifesto (local vs global) • Krall, TSE 2015 Menzies Malets’11 • IST ’13, 55(8), • Minku, TOSEM’13 27 Promise issue

tiny.cc/timcow15 Next gen challenges 28 always re-learning no “best” model no “best” model generator goals, matter New data? Ensembles rule Learners must be biased. Dramatic improvements to • Then, maybe, new model. (N models beat one) learner performance via data-set- • Kocageunli TSE’12 (Ensemble) No bias? Then... dependent tunings • Minku IST’13 55(8) Not general models, but ⇒ no way to cull “dull” stuff • See next slide. general methods for ⇒ no summary finding local models ⇒ no model. Hyper-parameter optimization • Menzies TSE’13 ⇒ no predictions • Maybe, N papers at ICSE’16 no “best” prediction (local vs global) • IST ’13, 55(8), So bias makes us blind, but bias Promise issue Need to know range of outputs lets us see (the future). • Then summarize the output Conclusions that hold for • Then try to pick inputs to Need learners that are biased all, may not hold for one (so minimize variance in output by the users’ goals beware SLRs) • J Ø rgensen 2015, COW • Menzies, Bener et al. • Posnett et al. ASE’11 • Menzies, ASE’07 ASE journal, 2010, 17(4) • Krall, TSE 2015 • Minku, TOSEM’13 28

Predicting What Follows Predictive Modeling TIM MENZIES, CS, NC - PowerPoint PPT Presentation

Predicting What Follows Predictive Modeling TIM MENZIES, CS, NC State, USA tim.menzies@gmail.com UCL, Crest Open Workshop, Nov 23,24 2015 view : tiny.cc/timcow15 discuss: tiny.cc/timcow15discuss Tools to let other people run data miners

Predicting and modeling water chemistry Predicting and modeling water chemistry associated with

Session 3 Upskilling for Predictive Analytics Travis M Short, FSA Upskilling for Predictive

Model Predictive Control Model Predictive Control of Hybrid Systems of Hybrid Systems Model

Session 5 A brief introduction to Predictive Modeling Lichen Bao, Ph.D A Brief Introduction to

Lessons Learned (the Hard Way) in an Organization from Predictive Modeling Projects Predictive

Welcome Predicting Change Outcomes Leveraging SQL Server Profiler Lee Everest SQL Rx Predicting

Predicting the Future of Model Predictive Control Manfred Morari In Honor of Professor David

Predictive maintenance Predicting failures using machine learning company confidential 3

Predictive Analytics for Capacity Planning HIC 2015 Andrae Gaeth What is predictive

Overcoming big data bottlenecks in healthcare : a Predictive Modeling case study Predictive

Predictive microbiology Survival, multiplication, or Predictive Modeling death of spoilage

PECDC Predicting Bank Loan Recoveries Agenda Challenges and importance of modeling LGD

Welcome Overview of Predictive Analytics Claudia Perlich Chief Scientist, Dstillery Predictive

Predictive Modeling and Design Solutions for Beneficial Use of Dredged Material Presented by Tom

Modeling of proteins and complexes High resolution Low resolution Modeling of domains Modeling

Virtual Reality Modeling Virtual Reality Modeling from http://www.okino.com/ Modeling Modeling

Instructor: Chi Tse (Ricky) T o pic s: vi cheat sheet Escape characters / sequences

Adversarial Nonnegative Matrix Factorization Lei Luo, Yanfu Zhang, Heng Huang Electrical and

Fairness and Optimal Stochastic Control for Heterogeneous Networks sensor network wired network

BWN Seminar 2009 1 BWN Seminar 2009 2 [1]P. Gupta and P. Kumar, The capacity of wireless

On the Cost of CSI Acquisition in Large MIMO Systems Giuseppe Durisi Chalmers, Sweden June,

Compressed Sensing under Optimal Quantization Alon Kipnis (Stanford) Galen Reeves (Duke) Yonina

Troubleshooting SDN Control Software with Minimal Causal Sequences Colin Scott , Andreas Wundsam,

General LTL Specification Mining Caroline Lemieux , Dennis Park and Ivan Beschastnikh University