Predicting What Follows Predictive Modeling
TIM MENZIES, CS, NC State, USA tim.menzies@gmail.com UCL, Crest Open Workshop, Nov 23,24 2015 view : tiny.cc/timcow15 discuss: tiny.cc/timcow15discuss
Predicting What Follows Predictive Modeling TIM MENZIES, CS, NC - - PowerPoint PPT Presentation
Predicting What Follows Predictive Modeling TIM MENZIES, CS, NC State, USA tim.menzies@gmail.com UCL, Crest Open Workshop, Nov 23,24 2015 view : tiny.cc/timcow15 discuss: tiny.cc/timcow15discuss Tools to let other people run data miners
TIM MENZIES, CS, NC State, USA tim.menzies@gmail.com UCL, Crest Open Workshop, Nov 23,24 2015 view : tiny.cc/timcow15 discuss: tiny.cc/timcow15discuss
2
tiny.cc/timcow15
○ Can remix, reuse in novel ways
○ Or “optimization” ?
○
○ But expand your mind, ○ Refactor your tools, ○ Expand your role
3
tiny.cc/timcow15
○ Next gen SE = “continuous science”. ○ Services for data repositories supporting large teams running data miners
○ NEXT: we write tools that let other people run data miners… better
4
5
tiny.cc/timcow15
Models : useful for exploring uncertainty: Menzies ASE’07, Gay ASE journal March’10
6
tiny.cc/timcow15
Yesterday: 30 mins per optimizer? Can we do better than that?
7
tiny.cc/timcow15
8
Decision tree options = #examples too split; #examples to stop, etc (usually 6 settings per learner)
R times repeat: # e.g. R = 10 for Parent in frontier:
(Xalan, Jedit, Lucene, etc)
tiny.cc/timcow15
9
Decision tree options = #examples too split; #examples to stop, etc (usually 6 settings per learner)
R times repeat: # e.g. R = 10 for Parent in frontier:
(Xalan, Jedit, Lucene, etc)
tiny.cc/timcow15
10
function GALE(): 1. (X,Y)= 2 very distant points found in O(2N) time ○ Euclidean distance in decision space 2. Evaluate only (X,Y) 3. If X “better” than Y If size(cluster) < sqrt(N) mutate towards X Else split, cull worst half, goto 1 Only log2N evaluations.
tiny.cc/timcow15
11
4 minutes, not 7 hours
function GALE(): 1. (X,Y)= 2 very distant points found in O(2N) time ○ Euclidean distance in decision space 2. Evaluate only (X,Y) 3. If X “better” than Y If size(cluster) < sqrt(N) mutate towards X Else split, cull worst half, goto 1 Only log2N evaluations.
tiny.cc/timcow15
12
function GALE(): 1. (X,Y)= 2 very distant points found in O(2N) time ○ Euclidean distance in decision space 2. Evaluate only (X,Y) 3. If X “better” than Y If size(cluster) < sqrt(N) mutate towards X Else split, cull worst half, goto 1 Only log2N evaluations.
13
tiny.cc/timcow15
http://www.slideshare.net/timmenzies/actionable-analytics-why-how http://www.slideshare.net/timmenzies/future-se-oct15
14
MSR’13
tiny.cc/timcow15
○ Can remix, reuse in novel ways
○ Or “optimization” ?
○
○ But expand your mind, ○ Refactor your tools, ○ Expand your role
15
tiny.cc/timcow15
○ Next gen SE = “continuous science”. ○ Services for data repositories supporting large teams running data miners
○ NEXT: we write tools that let other people run data miners… better
16
17
tiny.cc/timcow15
18
19
tiny.cc/timcow15
Less numbers, more insight
More coarse grain control
networked classification
20
20
tiny.cc/timcow15
Software project data can
21
tiny.cc/timcow15
Software project data can
22
tiny.cc/timcow15
Cross-company learning works:
column names
23
tiny.cc/timcow15
e.g. every Devanbu et al. study of Github
24
25
tiny.cc/timcow15
26
tiny.cc/timcow15
27
Ensembles rule (N models beat one)
Best thing to do with data is to throw most of it away
Combine survivors, synthesize dimensions (e.g. using WHERE). Then cluster in synthesize space.
Can’t assure that best models are human comprehensible, or contain initial expectations
no “best” model no “best” metrics
27
Poor method to confirm hypothesis Good method to refute hypothesis (when target not in any model) Great way to generate hypotheses (user meetings: heh… that’s funny)
Menzies Malets’11
data mining
More data does not actually help
in conclusions
within data clusters
(local vs global)
Promise issue Not general models, but general methods for finding local models
(local vs global)
Promise issue
more data
goals
Learners must be biased. No bias ⇒ no way to cull “dull” stuff ⇒ no summary ⇒ no model. ⇒ no predictions So bias makes us blind, but bias lets us see (the future). Need learners that are biased by the users’ goals
ASE journal, 2010, 17(4)
tiny.cc/timcow15
28
Ensembles rule (N models beat one)
no “best” model
28
always re-learning
Dramatic improvements to learner performance via data-set- dependent tunings
Hyper-parameter optimization
goals, matter
Learners must be biased. No bias? Then... ⇒ no way to cull “dull” stuff ⇒ no summary ⇒ no model. ⇒ no predictions So bias makes us blind, but bias lets us see (the future). Need learners that are biased by the users’ goals
ASE journal, 2010, 17(4)
New data?
Not general models, but general methods for finding local models
(local vs global)
Promise issue Conclusions that hold for all, may not hold for one (so beware SLRs)
no “best” model generator no “best” prediction
Need to know range of outputs
minimize variance in output
tiny.cc/timcow15
29
Conclusions that hold for all, may not hold for one (so beware SLRs)
Not general models, but general methods for finding local models
Context best uncovered automatically, not specified manually.
Humans rarely use lessons from past projects to improve their future reasoning
“Size” metrics useful, but not essential for accurate estimates
Model-based effort estimation, New high water mark:
More data does not actually help
software project data effort estimation more data
29