SLIDE 1 Better Machine Learning Through Data
Sa Saleema ema Amershi shi Machine T eaching Group Microsoft Research
August 14, 2016
SLIDE 2
Making better sense of data. Better data makes better machine learning.
SLIDE 3
Data + Algorithm = Model
Data Model Algorithm
SLIDE 4
Data + Algorithm = Model
Data Model Algorithm Machine learning research often takes the data as given.
SLIDE 5 When Algorithms Discriminate – The New York Times, 2015 Big Data’s all-too-human failings – Reuters, 2016 Artificial Intelligence’s White Guy Problem
– The New York Times, 2016
Mapping Crime – Or Stirring Hate?– Financial Times, 2014
SLIDE 6
Making better sense of data. Better data makes better machine learning. Most influence practitioners have on machine learning is through data.
SLIDE 7
Data + Algorithm = Model
Data Model Algorithm In research, data is often taken as given.
SLIDE 8
Algorithm
Data + Algorithm = Model
Model Data In practice, the algorithm is often taken as given. In research, data is often taken as given.
SLIDE 9 Algorithm
Data + Algorithm = Model
Model Data
“Data scientists, according to interviews and expert estimates, spend 50 perc ercen ent to 80 perc ercen ent
eir r time mired in this more mundane labor
- f collecting and preparing unruly digital data.”
- New York Times, 2014
In practice, the algorithm is often taken as given.
SLIDE 10
Algorithm
Data + Algorithm = Model
Model Data
SLIDE 11 [Patel et al., CHI 2008]
SLIDE 12
Algorithm
Data + Algorithm = Model
Model Data
SLIDE 13
Algorithm
Data + Algorithm = Model
Data Model Iterations are driven by evaluating models on data.
SLIDE 14
Algorithm
Data + Algorithm = Model
Model Data Iterations are driven by evaluating models on data. In practice, most effort is spent crafting input data.
SLIDE 15
Algorithm Model Data
Machine learning in theory
SLIDE 16 Algorithm Collect & Label Samples Create Features Evaluate Results
Machine learning in practice
SLIDE 17 Algorithm Evaluate Results Collect & Label Samples Create Features
SLIDE 18 Algorithm Evaluate Results Collect & Label Samples Create Features Structured Labeling [CHI 2014] Feature Insight [VAST 2015] ModelTracker [CHI 2015, VAST 2016]
SLIDE 19 Evaluate Results Create Features Algorithm Feature Insight [VAST 2015] ModelTracker [CHI 2015, VAST 2016] Collect & Label Samples Structured Labeling [CHI 2014]
SLIDE 20 Traditional Labeling
Pre-defined high-level categories.
Cat Cat Not Cat Cat Is this a Cat?
SLIDE 21 Traditional Labeling
Pre-defined high-level categories.
Cat Cat Not Cat Cat Is this a Cat?
SLIDE 22 Traditional Labeling
Pre-defined high-level categories.
Cat Cat Not Cat Cat Is this a Cat?
1
SLIDE 23 Traditional Labeling
Pre-defined high-level categories.
Cat Cat Not Cat Cat Is this a Cat?
1
SLIDE 24 Traditional Labeling
Pre-defined high-level categories.
Cat Cat Not Cat Cat Is this a Cat?
2
SLIDE 25 Traditional Labeling
Pre-defined high-level categories.
Cat Cat Not Cat Cat Is this a Cat?
2
SLIDE 26 Traditional Labeling
Pre-defined high-level categories.
Cat Cat Not Cat Cat Is this a Cat?
2 1
SLIDE 27 Traditional Labeling
Pre-defined high-level categories.
Cat Cat Not Cat Cat Is this a Cat?
2 1
SLIDE 28 Traditional Labeling
Pre-defined high-level categories.
Cat Cat Not Cat Cat Is this a Cat?
2 2
SLIDE 29 Traditional Labeling
Pre-defined high-level categories.
Cat Cat Not Cat Cat Is this a Cat?
2 2
SLIDE 30 Traditional Labeling
Pre-defined high-level categories.
Cat Cat Not Cat Cat Is this a Cat?
3 2
SLIDE 31 Traditional Labeling
Pre-defined high-level categories.
Cat Cat Not Cat Cat Is this a Cat?
3 2
SLIDE 32 Traditional Labeling
Pre-defined high-level categories.
Cat Cat Not Cat Cat Is this a Cat?
4 2
SLIDE 33 Traditional Labeling
Pre-defined high-level categories.
Cat Cat Not Cat Cat Is this a Cat?
4 2
SLIDE 34 Traditional Labeling
Pre-defined high-level categories.
Cat Cat Not Cat Cat Is this a Cat?
4 3
SLIDE 35 Traditional Labeling
Pre-defined high-level categories.
Cat Cat Not Cat Cat Is this a Cat?
4 3
SLIDE 36 Traditional Labeling
Pre-defined high-level categories.
Cat Cat Not Cat Cat Is this a Cat?
4 3
Does not support concep ncept t evolution
(refining the target concept as data is
SLIDE 37 How common is concept evolution?
Nine machine learning experts labeled the same 200 pages in two sessions (4 weeks apart). Average consistency 81.7% (SD=6.8%) 6 out of 9 people’s labels changed significantly (via Chi Square test of symmetry)
25 50 75 100
Consistency Participants
SLIDE 38 Proposed Solution – Structured Labeling
Enable people to exp xplic licitly itly organize anize th their eir concept ncept via grou
ping and ta tagging gging within a traditional labeling scheme.
SLIDE 39 Traditional Labeling
Pre-defined high-level categories.
Cat Cat Not Cat Cat Is this a Cat?
2 2
SLIDE 40 Structured Labeling
Cat Cat Not Cat Cat Is this a Cat?
2
Definitely Cat
2
Definitely Not Cat
SLIDE 41 Structured Labeling
Cat Cat Not Cat Cat Is this a Cat?
2
Definitely Cat
2
Definitely Not Cat
SLIDE 42 Structured Labeling
Cat Cat Not Cat Cat Is this a Cat?
2
Definitely Cat
2
Definitely Not Cat
2 2
Definitely Not Cat Cat Poster
1
User provided tags on groups aid recall. Grouping within high-level categories.
SLIDE 43 Structured Labeling
Cat Cat Not Cat Cat Is this a Cat?
2
Definitely Cat
2
Definitely Not Cat
2 2
Definitely Not Cat Cat Poster
1
User provided tags on groups aid recall. Grouping within high-level categories.
Blogs
2
Lions
2
SLIDE 44 Structured Labeling
Cat Cat Not Cat Cat Is this a Cat?
2
Definitely Cat
2
Definitely Not Cat
2 2
Definitely Not Cat Cat Poster
1
User provided tags on groups aid recall. Grouping within high-level categories.
Blogs
2
Lions
2
SLIDE 45 Structured Labeling
Cat Cat Not Cat Cat Is this a Cat?
2
Definitely Cat
2
Definitely Not Cat
2 2
Definitely Not Cat
User provided tags on groups aid recall. Grouping within high-level categories.
Blogs
2
Lions
2
Cat Poster
2
SLIDE 46 Structured Labeling
Cat Cat Not Cat Cat Is this a Cat?
2
Definitely Cat
2
Definitely Not Cat
2 2
Definitely Not Cat
User provided tags on groups aid recall. Grouping within high-level categories.
Lions
2
Blogs
2
Cat Poster
2
Can move, merge and split groups as desired.
SLIDE 47 Assisted Structured Labeling
Cat Cat Not Cat Cat Is this a Cat?
2
Definitely Cat
2
Definitely Not Cat
2 2
Definitely Not Cat Lions
2
Blogs
2
Cat Poster
2
Grouping recommendations to improve label consistency.
SLIDE 48 Assisted Structured Labeling
Cat Cat Not Cat Cat Is this a Cat?
2
Definitely Cat
2
Definitely Not Cat
2 2
Definitely Not Cat Lions
2
Blogs
2
Cat Poster
2
Grouping recommendations to improve label consistency. Similar items to help users make decisions.
SLIDE 49 Findings
People revised labels significantly more with structured labeling People labeled more consistently People preferred it over traditional labeling
Label Consistency
(X2=6.53, df=2, p < .038) (X2=20.19, df=2, p < .001)
Mean # Groups
(X2=12, df=2, p < .002)
# Revisions
SLIDE 50 Structured Labeling Summary
Current tools do not support concept cept evoluti
Str tructur uctured ed labeli eling ng helps people refine their concepts by surfacing labeling decisions and aiding recall. People used structured labeling when it was available and labeled eled mo more cons nsistently stently. Str tructur ucture e conta ntains ins additi itional
mation ion (e.g., group related features, group related accuracy, decisions made…)
SLIDE 51 Evaluate Results Algorithm ModelTracker [CHI 2015, VAST 2016] Collect & Label Samples Create Features Feature Insight [VAST 2015] Structured labeling improves consistency [CHI 2014]
SLIDE 52 “At the end of the day, some machine learning projects succeed and some fail. What makes the difference? Eas asily y the he mo most st imp mportan tant t fac actor r is the s the fea eatures es use sed. d.”
[Domingos, CACM 2012]
…yet, little guidance or best practices exist.
SLIDE 53
How do people come up with features?
Look for features used in related domains. Use intuition or domain knowledge. Apply automated techniques Featu ture e ideation ation – Think of and experiment with custom features (a “black art”).
SLIDE 54
Proposed Solution – Feature Insight
Support comp mpar are e and contra ntrast st of data.
SLIDE 55
What makes a cat a cat?
SLIDE 56
What makes a cat a cat?
SLIDE 57
Proposed Solution – Feature Insight
Support comp mpar are e and contra ntrast st of data. Comparing pairs vs sets?
SLIDE 58 Comparing Pairs vs Sets
Sets may help people think of generalizable features.
Negative Positive Positives Negatives
vs
SLIDE 59
Proposed Solution – Feature Insight
Support comp mpar are e and contra ntrast st of data. Comparing pairs vs sets? Raw data vs visual summaries?
SLIDE 60 Looking at Raw Data vs. Visual Summaries
Visual summaries may reveal relevant characteristics and hide irrelevant noise.
vs
Raw Data Visual Summary
SLIDE 61
Individual Comparison Set Comparison Raw Data Visual Summaries
SLIDE 62 0.0 0.5 1.0
Classifier Performance (p<.01)
2 4 6
Feature Count
1 2 3
Preference Rank (Small is better) (p=.03) Raw + Individual Raw + Set Visual + Individual Visual + Set
Findings
Visual summaries led to better features Visual summaries preferred over looking at raw data Sets useful only in combination with visuals
SLIDE 63
Feature Insight Summary
Featu turing ing is arguably the most important step in machine learning, but there is little guidance on featur ture ideation ation. Feature Insight supports error comparison, examination of sets, and visual summaries. Visual al summ mmaries ries help people create better tter quality lity featur atures es.
SLIDE 64 Algorithm Collect & Label Samples Structured Labeling [CHI 2014] Create Features Feature Insight [VAST 2015] Evaluate Results ModelTracker [CHI 2015, VAST 2016]
SLIDE 65 Algorithm Evaluate Results ModelTracker [CHI 2015, VAST 2016] Collect & Label Samples Structured Labeling [CHI 2014] Create Features Feature Insight [VAST 2015]
How do people evaluate performance?
SLIDE 66 Algorithm Evaluate Results Collect & Label Samples Create Features
0.71 0.67 0.70 ???
Predicted Actual Positive Negative Positive 143 72 Negative 35 190
How do people evaluate performance?
Summary statistics hide de imp mportant ant info format mation ion about model behavior.
SLIDE 67 Algorithm Evaluate Results Collect & Label Samples Create Features
0.71 0.67 0.70 ???
Predicted Actual Positive Negative Positive 143 72 Negative 35 190
How do people evaluate performance?
Summary statistics hide de imp mportant ant info format mation ion about model behavior.
SLIDE 68 Algorithm Evaluate Results Collect & Label Samples Create Features
0.71 0.67 0.70 ???
Predicted Actual Positive Negative Positive 143 72 Negative 35 190
Summary statistics hide de imp mportant ant info format mation ion about model behavior. Switch ching ing tools to examine data is di disr srupt ptive ive and leads to a trial-and- error approach [Patel et al., AAAI 2008].
How do people evaluate performance?
SLIDE 69
Example: Predicting Income Levels
SLIDE 70
SLIDE 71
Decision Tree 86% Accuracy Support Vector Machine 85% Accuracy
SLIDE 72
Decision Tree 86% Accuracy Support Vector Machine 85% Accuracy
SLIDE 73
ModelTracker Demo
SLIDE 74 Significantly faster and more accurate performance analysis
ModelTracker Common Confusion Matrix
SLIDE 75
ModelTracker Summary
Current tools for performance analysis and debugging hide a lot of important information about model behavior. ModelTracker suppor ports ts esti timating mating performance formance at mu t multi tiple ple level vels s of granul nularity rity while enabling direct ect acces cess s to to data ta. People are significantly faster ter and mo more e accura urate te at performance analysis with ModelTracker.
SLIDE 76 Algorithm Evaluate Results ModelTracker [CHI 2015, VAST 2016] Collect & Label Samples Structured Labeling [CHI 2014] Create Features Feature Insight [VAST 2015]
SLIDE 77 Tune Collect Clean Deploy Train
Many more opportunities to better support machine learning in practice.
Label Feature Evaluate
SLIDE 78 Tune Evaluate Label Feature Collect Clean Train Deploy
Many more opportunities to better support machine learning in practice.
SLIDE 79 Tune Evaluate Feature Collect Clean Deploy Label Train
Many more opportunities to better support machine learning in practice and theory.
SLIDE 80
Making better sense of data. Better data means better machine learning. Most influence practitioners have on machine learning is through data. Many more opportunities!
SLIDE 81 Better Machine Learning Through Data
Sa Saleema ema Amershi, shi, samer ershi@ shi@mic micros
.com Machine T eaching Group Microsoft Research
August 14, 2016
Thanks! Questions?