machine learning chenhao tan
play

Machine Learning: Chenhao Tan University of Colorado Boulder - PowerPoint PPT Presentation

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 2 Slides adapted from Jordan Boyd-Graber, Thorsten Joachims, Kilian Weinberger Machine Learning: Chenhao Tan | Boulder | 1 of 31 Logistics Piazza:


  1. Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 2 Slides adapted from Jordan Boyd-Graber, Thorsten Joachims, Kilian Weinberger Machine Learning: Chenhao Tan | Boulder | 1 of 31

  2. Logistics • Piazza: https://piazza.com/colorado/fall2017/csci5622/ • Moodle: https://moodle.cs.colorado.edu/course/view.php?id=507 • Prerequisite quiz • Final project • iCliker Machine Learning: Chenhao Tan | Boulder | 2 of 31

  3. Outline Supervised Learning Data representation K-nearest neighbors Overview Performance Guarantee Curse of Dimensionality Machine Learning: Chenhao Tan | Boulder | 3 of 31

  4. Supervised Learning Outline Supervised Learning Data representation K-nearest neighbors Overview Performance Guarantee Curse of Dimensionality Machine Learning: Chenhao Tan | Boulder | 4 of 31

  5. Supervised Learning Supervised Learning Data Labels X Y • Supervised methods find patterns in fully observed data and then try to predict something from partially observed data. • For example, in sentiment analysis, after learning something from annotated reviews, we want to take new reviews and automatically identify sentiments. Machine Learning: Chenhao Tan | Boulder | 5 of 31

  6. Supervised Learning Formal Definitions • Labels Y , e.g., binary labels y ∈ { + 1 , − 1 } • Instance space X , all the possible instances (based on data representation) • Target function f : X → Y ( f is unknown) Machine Learning: Chenhao Tan | Boulder | 6 of 31

  7. Supervised Learning Formal Definitions • Labels Y , e.g., binary labels y ∈ { + 1 , − 1 } • Instance space X , all the possible instances (based on data representation) • Target function f : X → Y ( f is unknown) • Example/instance ( x , y ) • Training data S train : collection of examples observed by the algorithm Machine Learning: Chenhao Tan | Boulder | 6 of 31

  8. Supervised Learning Formal Definitions • Goal of a learning algorithm: Find a function h : X → Y from training data S train so that h approximates f Machine Learning: Chenhao Tan | Boulder | 7 of 31

  9. Supervised Learning Supervised learning in a nutshell S train = { ( x , y ) } → h Machine Learning: Chenhao Tan | Boulder | 8 of 31

  10. Supervised Learning No Free Lunch Theorems • No free lunch for supervised machine learning [Wolpert, 1996]: in a noise-free scenario where the loss function is the misclassification rate, if one is interested in off-training-set error, then there are no a priori distinctions between learning algorithms. Machine Learning: Chenhao Tan | Boulder | 9 of 31

  11. Supervised Learning No Free Lunch Theorems • No free lunch for supervised machine learning [Wolpert, 1996]: in a noise-free scenario where the loss function is the misclassification rate, if one is interested in off-training-set error, then there are no a priori distinctions between learning algorithms. Corollary I: there is no single ML algorithm that works for everything. Corollary II: every successful ML algorithm makes assumptions. Machine Learning: Chenhao Tan | Boulder | 9 of 31

  12. Supervised Learning No Free Lunch Theorems • No free lunch for supervised machine learning [Wolpert, 1996]: in a noise-free scenario where the loss function is the misclassification rate, if one is interested in off-training-set error, then there are no a priori distinctions between learning algorithms. Corollary I: there is no single ML algorithm that works for everything. Corollary II: every successful ML algorithm makes assumptions. • No free lunch for search/optimization [Wolpert and Macready, 1997]: All algorithms that search for an extremum of a cost function perform exactly the same when averaged over all possible cost functions. Machine Learning: Chenhao Tan | Boulder | 9 of 31

  13. Data representation Outline Supervised Learning Data representation K-nearest neighbors Overview Performance Guarantee Curse of Dimensionality Machine Learning: Chenhao Tan | Boulder | 10 of 31

  14. Data representation Data representation Republican nominee George Bush said he felt nervous as he voted today in his adopted home state of Texas, where he ended... ( (From Chris Harrison's WikiViz) Machine Learning: Chenhao Tan | Boulder | 11 of 31

  15. Data representation Data representation Let us have an interactive example to think through data representation! Machine Learning: Chenhao Tan | Boulder | 12 of 31

  16. Data representation Data representation Let us have an interactive example to think through data representation! Auto insurance quotes id rent income urban state car value car year 1 yes 50,000 no CO 20,000 2010 2 yes 70,000 no CO 30,000 2012 3 no 250,000 yes CO 55,000 2017 4 yes 200,000 yes NY 50,000 2016 Machine Learning: Chenhao Tan | Boulder | 12 of 31

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend