better machine learning through data
play

Better Machine Learning Through Data Sa Saleema ema Amershi shi - PowerPoint PPT Presentation

Better Machine Learning Through Data Sa Saleema ema Amershi shi Machine T eaching Group Microsoft Research August 14, 2016 Making better sense of data. Better data makes better machine learning. Data + Algorithm = Model Data Algorithm


  1. Better Machine Learning Through Data Sa Saleema ema Amershi shi Machine T eaching Group Microsoft Research August 14, 2016

  2. Making better sense of data. Better data makes better machine learning.

  3. Data + Algorithm = Model Data Algorithm Model

  4. Data + Algorithm = Model Data Algorithm Model Machine learning research often takes the data as given.

  5. When Algorithms Discriminate – The New York Times, 2015 Big Data’s all -too-human failings – Reuters, 2016 Artificial Intelligence’s White Guy Problem – The New York Times, 2016 Mapping Crime – Or Stirring Hate? – Financial Times, 2014

  6. Making better sense of data. Better data makes better machine learning. Most influence practitioners have on machine learning is through data.

  7. Data + Algorithm = Model Data Algorithm Model In research, data is often taken as given.

  8. Data + Algorithm = Model Data Algorithm Model In practice, the In research, algorithm is often data is often taken as given. taken as given.

  9. Data + Algorithm = Model Data Algorithm Model In practice, the “Data scientists, according to interviews and expert estimates, spend 50 perc ercen ent to 80 perc ercen ent algorithm is often of thei eir r time mired in this more mundane labor taken as given. of collecting and preparing unruly digital data.” - New York Times, 2014

  10. Data + Algorithm = Model Data Algorithm Model

  11. [Patel et al., CHI 2008]

  12. Data + Algorithm = Model Data Algorithm Model

  13. Data + Algorithm = Model Data Algorithm Model Iterations are driven by evaluating models on data.

  14. Data + Algorithm = Model Data Algorithm Model In practice, most effort is Iterations are driven by spent crafting input data. evaluating models on data.

  15. Data Algorithm Model Machine learning in theory

  16. Collect & Create Evaluate Label Algorithm Features Results Samples Machine learning in practice

  17. Collect & Create Evaluate Label Algorithm Features Results Samples

  18. Collect & Create Evaluate Label Algorithm Features Results Samples Structured Labeling Feature Insight ModelTracker [CHI 2014] [VAST 2015] [CHI 2015, VAST 2016]

  19. Collect & Create Evaluate Label Algorithm Features Results Samples Structured Labeling Feature Insight ModelTracker [CHI 2014] [VAST 2015] [CHI 2015, VAST 2016]

  20. Traditional Labeling Is this a Cat? Cat Cat Pre-defined high-level categories. 0 Not Cat Cat 0

  21. Traditional Labeling Is this a Cat? Cat Cat Pre-defined high-level categories. 0 Not Cat Cat 0

  22. Traditional Labeling Is this a Cat? Cat Cat Pre-defined high-level categories. 1 Not Cat Cat 0

  23. Traditional Labeling Is this a Cat? Cat Cat Pre-defined high-level categories. 1 Not Cat Cat 0

  24. Traditional Labeling Is this a Cat? Cat Cat Pre-defined high-level categories. 2 Not Cat Cat 0

  25. Traditional Labeling Is this a Cat? Cat Cat Pre-defined high-level categories. 2 Not Cat Cat 0

  26. Traditional Labeling Is this a Cat? Cat Cat Pre-defined high-level categories. 2 Not Cat Cat 1

  27. Traditional Labeling Is this a Cat? Cat Cat Pre-defined high-level categories. 2 Not Cat Cat 1

  28. Traditional Labeling Is this a Cat? Cat Cat Pre-defined high-level categories. 2 Not Cat Cat 2

  29. Traditional Labeling Is this a Cat? Cat Cat Pre-defined high-level categories. 2 Not Cat Cat 2

  30. Traditional Labeling Is this a Cat? Cat Cat Pre-defined high-level categories. 3 Not Cat Cat 2

  31. Traditional Labeling Is this a Cat? Cat Cat Pre-defined high-level categories. 3 Not Cat Cat 2

  32. Traditional Labeling Is this a Cat? Cat Cat Pre-defined high-level categories. 4 Not Cat Cat 2

  33. Traditional Labeling Is this a Cat? Cat Cat Pre-defined high-level categories. 4 Not Cat Cat 2

  34. Traditional Labeling Is this a Cat? Cat Cat Pre-defined high-level categories. 4 Not Cat Cat 3

  35. Traditional Labeling Is this a Cat? Cat Cat Pre-defined high-level categories. 4 Not Cat Cat 3

  36. Traditional Labeling Is this a Cat? Cat Cat Pre-defined high-level categories. Does not support 4 Not Cat Cat concep ncept t evolution olution (refining the target concept as data is observed). 3

  37. How common is concept evolution? Nine machine learning experts labeled the same 200 pages in two sessions (4 weeks apart). 100 Average consistency 81.7% (SD=6.8%) 75 Consistency 6 out of 9 people’s labels 50 changed significantly (via Chi 25 Square test of symmetry) Participants

  38. Proposed Solution – Structured Labeling Enable people to exp xplic licitly itly organize anize th their eir concept ncept via grou ouping ping and ta tagging gging within a traditional labeling scheme.

  39. Traditional Labeling Is this a Cat? Cat Cat Pre-defined high-level categories. 2 Not Cat Cat 2

  40. Structured Labeling Is this a Cat? Cat Cat Definitely Cat 2 Not Cat Cat Definitely Not Cat 2

  41. Structured Labeling Is this a Cat? Cat Cat Definitely Cat 2 Not Cat Cat Definitely Not Cat 2

  42. Structured Labeling Is this a Cat? Cat Cat Grouping within Definitely Cat high-level 2 2 Cat Poster categories. 1 User provided Not Cat Cat tags on groups Definitely Not Cat Definitely Not Cat aid recall. 2 2

  43. Structured Labeling Is this a Cat? Cat Cat Grouping within Definitely Cat high-level 2 2 Cat Poster categories. 1 Blogs User provided 2 Not Cat Cat tags on groups Definitely Not Cat Definitely Not Cat aid recall. 2 2 Lions 2

  44. Structured Labeling Is this a Cat? Cat Cat Grouping within Definitely Cat high-level 2 2 Cat Poster categories. 1 Blogs User provided 2 Not Cat Cat tags on groups Definitely Not Cat Definitely Not Cat aid recall. 2 2 Lions 2

  45. Structured Labeling Is this a Cat? Cat Cat Grouping within Definitely Cat high-level 2 2 Cat Poster categories. 2 Blogs User provided 2 Not Cat Cat tags on groups Definitely Not Cat Definitely Not Cat aid recall. 2 2 Lions 2

  46. Structured Labeling Is this a Cat? Cat Cat Grouping within Definitely Cat high-level 2 2 Blogs categories. 2 User provided Not Cat Cat tags on groups Definitely Not Cat Definitely Not Cat aid recall. 2 2 Lions Can move, merge 2 Cat Poster and split groups 2 as desired.

  47. Assisted Structured Labeling Is this a Cat? Cat Cat Definitely Cat Grouping 2 2 Blogs recommendations 2 to improve label consistency. Not Cat Cat Definitely Not Cat Definitely Not Cat 2 2 Lions 2 Cat Poster 2

  48. Assisted Structured Labeling Is this a Cat? Cat Cat Definitely Cat Grouping 2 2 Blogs recommendations 2 to improve label consistency. Not Cat Cat Definitely Not Cat Definitely Not Cat 2 2 Lions 2 Cat Poster 2 Similar items to help users make decisions.

  49. Findings People revised labels significantly more with structured labeling People labeled more consistently People preferred it over traditional labeling Label Consistency Mean # Groups # Revisions ( X 2 =12, df=2, p < .002) ( X 2 =6.53, df=2, p < .038) ( X 2 =20.19, df=2, p < .001)

  50. Structured Labeling Summary Current tools do not support concept cept evoluti olution on. Str tructur uctured ed labeli eling ng helps people refine their concepts by surfacing labeling decisions and aiding recall. People used structured labeling when it was available and labeled eled mo more cons nsistently stently. Str tructur ucture e conta ntains ins additi itional onal infor ormat mation ion (e.g., group related features, group related accuracy, decisions made…)

  51. Collect & Create Evaluate Label Algorithm Features Results Samples Structured labeling Feature Insight ModelTracker improves consistency [VAST 2015] [CHI 2015, VAST 2016] [CHI 2014]

  52. “At the end of the day, some machine learning projects succeed and some fail. What makes the difference? Eas asily y the he mo most st imp mportan tant t fac actor r is the s the fea eatures es use sed. d. ” [Domingos, CACM 2012] …yet, little guidance or best practices exist.

  53. How do people come up with features? Look for features used in related domains. Use intuition or domain knowledge. Apply automated techniques Featu ture e ideation ation – Think of and experiment with custom features (a “black art”).

  54. Proposed Solution – Feature Insight Support comp mpar are e and contra ntrast st of data.

  55. What makes a cat a cat?

  56. What makes a cat a cat?

  57. Proposed Solution – Feature Insight Support comp mpar are e and contra ntrast st of data. Comparing pairs vs sets?

  58. Comparing Pairs vs Sets Sets may help people think of generalizable features. Negatives Positives Positive Negative vs

  59. Proposed Solution – Feature Insight Support comp mpar are e and contra ntrast st of data. Comparing pairs vs sets? Raw data vs visual summaries?

  60. Looking at Raw Data vs. Visual Summaries Visual summaries may reveal relevant characteristics and hide irrelevant noise. Visual Summary Raw Data vs

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend