Now What? Foster Provost Thanks to Josh Attenburgh, Henry Chen, - PDF document

Foster Provost – 11/17/17 So You’ve Built a Machine Learning Model… Now What? Foster Provost Thanks to Josh Attenburgh, Henry Chen, Brian Dalessandro, Sam Fraiberger, Thore Graepel, Panos Ipeirotis, Michal Kosinski, David Martens, Claudia Perlich, David Stillwell The Data Science Process is a useful framework for thinking through lots of modeling & managerial decisions about solving problems with AI/Machine Learning/Data Science For more, see Data Science for Business Provost & FawceF. O’Reilly Media 2013 1

Foster Provost – 11/17/17 The Data Science Process is a useful framework for thinking through lots of modeling & managerial decisions about solving problems with AI/Machine Learning/Data Science Just a few issues: • Misalignment of problem formulaQon • Leakage in features Sampling bias • • Learning bias (ML favors larger subpopulaQons) • Labeling bias EvaluaQon bias • In Reality… 2

Foster Provost – 11/17/17 In this talk I’ll focus on two common problems faced when deploying machine learned models • Lack of transparency into why model-driven systems make the decisions that they do – important for a whole bunch of reasons • user acceptance, managerial acceptance, debugging/improving – of current interest: are your decisions fair? • “Unknown Unknowns” – do you know what your model is missing? Especially what it’s missing and “thinks” it’s ge[ng right? Gabrielle Giffords ShooQng, Tucson, AZ, Jan 2011 6 3

Foster Provost – 11/17/17 7 Why was Mariko shown this PoFery Barn ad? 4

Foster Provost – 11/17/17 Why was this decision made? ? decision evidence data-driven model Explana5ons for whom? Customer Manager Data Science Team 5

Foster Provost – 11/17/17 The Complex World of Models �� (Martens & FP, “Explaining Data-driven Document ClassificaQon.” MISQ 2014) A noQon of explanaQon (cf. Hume 1748) The Evidence Counterfactual see (Martens & FP MISQ 2014); (Chen, Moakler, Fraiberger, FP, Big Data 2017) (Moeyersoms et al.; Chen, et al.; ICML’16 Wkshp on Human Interpretability In ML) • Models can be viewed as evidence-combining systems We are considering cases where individual pieces of evidence are interpretable • • Thus, for any specific decision * from any model we can ask: What is a minimal set of evidence such that if it were not present, the decision * would not have been made? * The “decision” can be a threshold crossing for a prob. esQmaQon, scoring or regression model 6

Foster Provost – 11/17/17 Why was Mariko shown this PoFery Barn ad? Why was Mariko shown this PoFery Barn ad? Because she visited: • www.diningroomtableshowroom.com • www.mazeltovfurniture.com • www.realtor.com • www.recipezaar.com • www.americanidol.com 7

Foster Provost – 11/17/17 Let’s focus on the developers ExplanaQons aid the data science process • Help to understand false posiQves – omen revealing problems with the training data • Can reveal problems with the model 8

Foster Provost – 11/17/17 With the increasing use of predic=ve models from massive fine-grained behavior data… Consumers are increasingly concerned about the inferences drawn about them. Kosinski, M., SQllwell, D., & Graepel, T. (2013). Proceedings of the NaQonal Academy of Sciences, 110(15), 5802-5805. 9

Foster Provost – 11/17/17 Two guys predicted to be gay: Effect of removing selected Facebook Likes from consideraQon by the predicQve model Model: logisQc regression (Chen, Moakler, Fraiberger, … Big Data 2017) on the top 100 latent (Chen, et al., ICML Wkshp Interpretability 2016) dimensions from an SVD of the user/Like matrix. 10

Foster Provost – 11/17/17 Why was this guy predicted to be smart? Effect of removing selected Likes from consideraQon by the predicQve model Opportunity for offering users control via a “cloaking device”? False PosiQves (Chen, Moakler, Fraiberger, … Big Data 2017) (Chen, et al., ICML Wkshp Interpretability 2016) 11

Foster Provost – 11/17/17 But there’s a twist… A firm could purport to give users transparency and control … … but actually make it cumbersome for users to affect the inferences drawn about them: (Chen, Moakler, Fraiberger, … Big Data 2017) (Chen, et al., ICML Wkshp Interpretability 2016) 12

Foster Provost – 11/17/17 So. ExplanaQons of individual decisions can help with many issues in the process of building and using machine learned models. But we need more help with one very important problem… The problem of Unknown Unknowns • What is your model missing? What is it missing and it really thinks that it’s correct? • Why would it be missing things? 13

Foster Provost – 11/17/17 We need to think carefully about the data-generaQng process(es) and the data preparaQon processes – especially the process of ge[ng labeled training & tesQng data. The problem of Unknown Unknowns (AFenberg, IpeiroQs & Provost JDIQ 2015) • What is your model missing? What is it missing and it really thinks that it’s correct? • Why would it be missing things? – Sampling bias – Learning bias (ML favors larger subpopulaQons) – Labeling bias – Especially severe for Non-self-revealing problems 14

Foster Provost – 11/17/17 Harness Humans to Improve Machine Learning • With normal labeling, humans are passively labeling the data that we give them Instead ask humans to search and find positive instances of a rare class 31 Searching instead of labeling has intriguing performance (AFenberg & FP KDD 2010) 15

Foster Provost – 11/17/17 Active learning missing disjunctive subconcepts (AFenberg & FP KDD 2010) 33 NIPS 2016 16

Foster Provost – 11/17/17 BeFer, but….. • Classifier seems great: Cross-validaQon tests show excellent performance • Alas, classifier fails on “ unknown unknowns ” “ Unknown unknowns ” à classifier fails with high confidence 35 (AFenberg, IpeiroQs & Provost JDIQ 2015) 17

Foster Provost – 11/17/17 Beat the Machine! Ask humans to find examples that Example: the classifier will classify incorrectly • Find hate speech pages that the machine will classify as benign another human will classify correctly • 37 (AFenberg, IpeiroQs & Provost JDIQ 2015) Beat the Machine! IncenQve structure: Example: Find hate speech pages that the machine • $1 if you “beat the machine” will classify as benign $0.001 if the machine already knows • 38 (AFenberg, IpeiroQs & Provost JDIQ 2015) 18

Foster Provost – 11/17/17 AAAI 2017 (AFenberg, IpeiroQs & Provost JDIQ 2015) AAAI 2017 19

Foster Provost – 11/17/17 Summary • We can provide transparency into the reasons why AI systems make the decisions that they do • We can create mechanisms to help find the “Unknown Unknowns” • As a research area, there’s sQll a lot to do Some reading AFenberg, J. & Provost, F. Why label when you can search? AlternaQves to acQve learning for Martens & FP, “Explaining Data-driven applying human resources to build Document ClassificaQon.” MISQ 2014 classificaQon models under extreme class imbalance. In KDD 2010. Moeyersoms et al. 2016, ICML’16 Wkshp on Human Interpretability In ML AFenberg, J., IpeiroQs, P. & Provost, F. Beat Chen, et al. 2016, ICML’16 Wkshp on Human the Machine: Challenging Humans to Find a Interpretability In ML PredicQve Model's “Unknown Unknowns”. Journal of Data and InformaQon Quality (JDIQ), Chen, Fraiberger, Moakler, Provost. Big Data 6(1) 2015. 5(3) 2017 20

Now What? Foster Provost Thanks to Josh Attenburgh, Henry Chen, - PDF document

Foster Provost 11/17/17 So Youve Built a Machine Learning Model Now What? Foster Provost Thanks to Josh Attenburgh, Henry Chen, Brian Dalessandro, Sam Fraiberger, Thore Graepel, Panos Ipeirotis, Michal Kosinski, David Martens,

K12 PRODUCT PROMOTION WHAT WE ARE DOING NOW Email and mail campaigns WHAT WE ARE DOING NOW

Sex Now: Canadas largest survey of Gay and Bisexual men Catie Webinar February 3 rd , 2015

LFCS Now and Then Gordon Plotkin LFCS@30 Edinburgh, April, 2016 Gordon Plotkin LFCS Now and

Know how. Know now. Know how. Know now. Please Thank our sponsor! The Nebraska Soybean Board

IPv6 Site Multihoming: Now What? (A view on what we should be doing now)

Celiac research Why now? Celiac research Why now? Benny Kerzner MD Benny Kerzner MD

Presentation Now: Prepare a Perfect Presentation in Presentation Now: Prepare a Perfect

Tuition Freeze Now Consultation Results This presentation is compiled from the Tuition Freeze Now

Now Everyone Can Fly Now Everyone Can Fly First Quarter 2006 Results First Quarter

Meeting February 7, 2018 What is Nieman Now!? Nieman Now! encompasses four Stormwater

Now Everyone Can Fly Now Everyone Can Fly 2005 Fourth Quarter & Full Year Results

North Forest High School Data Conferences IWBAT Agenda Do Now IWBAT Do Now Process:

Presentation Now: Prepare a Perfect Presentation in Presentation Now: Prepare a Perfect

Now Everyone Can Fly Now Everyone Can Fly 2005 Second Quarter Results 2005 Second

Now Everyone Can Fly Now Everyone Can Fly Second Quarter 2006 Results Second Quarter

And now for something completely different And now for something completely different Algorithms

Syllabus Professor Adam Bates Fall 2018 Security & Privacy Research at Illinois (SPRAI)

CSE484/CSE584 DRIVE-BY MALWARE Dr. Benjamin Livshits Homework, Labs, and Project 2 Please

Ranking the Web with Spark Apache Big Data Europe 2016 sylvain@sylvainzimmer.com @sylvinus

EU Art.29 Data Protection Users care about privacy Working Party From: Special Eurobarometer

Corners Scattering and Inverse Scattering Jingni Xiao Department of Mathematics Rutgers

Malicious Online Activities in the 2012 U.S. General Election George Mason University ShmooCon

Time and synchronization (Theres never enough time) Todays outline Time in

MC714 - Sistemas Distribuidos slides by Maarten van Steen (adapted from Distributed System - 3rd

Now What? Foster Provost Thanks to Josh Attenburgh, Henry Chen, - PDF document

Foster Provost 11/17/17 So Youve Built a Machine Learning Model Now What? Foster Provost Thanks to Josh Attenburgh, Henry Chen, Brian Dalessandro, Sam Fraiberger, Thore Graepel, Panos Ipeirotis, Michal Kosinski, David Martens,

K12 PRODUCT PROMOTION WHAT WE ARE DOING NOW Email and mail campaigns WHAT WE ARE DOING NOW

Sex Now: Canadas largest survey of Gay and Bisexual men Catie Webinar February 3 rd , 2015

LFCS Now and Then Gordon Plotkin LFCS@30 Edinburgh, April, 2016 Gordon Plotkin LFCS Now and

Know how. Know now. Know how. Know now. Please Thank our sponsor! The Nebraska Soybean Board

IPv6 Site Multihoming: Now What? (A view on what we should be doing now)

Celiac research Why now? Celiac research Why now? Benny Kerzner MD Benny Kerzner MD

Presentation Now: Prepare a Perfect Presentation in Presentation Now: Prepare a Perfect

Tuition Freeze Now Consultation Results This presentation is compiled from the Tuition Freeze Now

Now Everyone Can Fly Now Everyone Can Fly First Quarter 2006 Results First Quarter

Meeting February 7, 2018 What is Nieman Now!? Nieman Now! encompasses four Stormwater

Now Everyone Can Fly Now Everyone Can Fly 2005 Fourth Quarter &amp; Full Year Results

North Forest High School Data Conferences IWBAT Agenda Do Now IWBAT Do Now Process:

Presentation Now: Prepare a Perfect Presentation in Presentation Now: Prepare a Perfect

Now Everyone Can Fly Now Everyone Can Fly 2005 Second Quarter Results 2005 Second

Now Everyone Can Fly Now Everyone Can Fly Second Quarter 2006 Results Second Quarter

And now for something completely different And now for something completely different Algorithms

Syllabus Professor Adam Bates Fall 2018 Security &amp; Privacy Research at Illinois (SPRAI)

CSE484/CSE584 DRIVE-BY MALWARE Dr. Benjamin Livshits Homework, Labs, and Project 2 Please

Ranking the Web with Spark Apache Big Data Europe 2016 sylvain@sylvainzimmer.com @sylvinus

EU Art.29 Data Protection Users care about privacy Working Party From: Special Eurobarometer

Corners Scattering and Inverse Scattering Jingni Xiao Department of Mathematics Rutgers

Malicious Online Activities in the 2012 U.S. General Election George Mason University ShmooCon

Time and synchronization (Theres never enough time) Todays outline Time in

MC714 - Sistemas Distribuidos slides by Maarten van Steen (adapted from Distributed System - 3rd

Now Everyone Can Fly Now Everyone Can Fly 2005 Fourth Quarter & Full Year Results

Syllabus Professor Adam Bates Fall 2018 Security & Privacy Research at Illinois (SPRAI)