Machine Learning: Chenhao Tan University of Colorado Boulder - PowerPoint PPT Presentation

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 2 Slides adapted from Jordan Boyd-Graber, Thorsten Joachims, Kilian Weinberger Machine Learning: Chenhao Tan | Boulder | 1 of 31

Logistics • Piazza: https://piazza.com/colorado/fall2017/csci5622/ • Moodle: https://moodle.cs.colorado.edu/course/view.php?id=507 • Prerequisite quiz • Final project • iCliker Machine Learning: Chenhao Tan | Boulder | 2 of 31

Outline Supervised Learning Data representation K-nearest neighbors Overview Performance Guarantee Curse of Dimensionality Machine Learning: Chenhao Tan | Boulder | 3 of 31

Supervised Learning Outline Supervised Learning Data representation K-nearest neighbors Overview Performance Guarantee Curse of Dimensionality Machine Learning: Chenhao Tan | Boulder | 4 of 31

Supervised Learning Supervised Learning Data Labels X Y • Supervised methods find patterns in fully observed data and then try to predict something from partially observed data. • For example, in sentiment analysis, after learning something from annotated reviews, we want to take new reviews and automatically identify sentiments. Machine Learning: Chenhao Tan | Boulder | 5 of 31

Supervised Learning Formal Definitions • Labels Y , e.g., binary labels y ∈ { + 1 , − 1 } • Instance space X , all the possible instances (based on data representation) • Target function f : X → Y ( f is unknown) Machine Learning: Chenhao Tan | Boulder | 6 of 31

Supervised Learning Formal Definitions • Labels Y , e.g., binary labels y ∈ { + 1 , − 1 } • Instance space X , all the possible instances (based on data representation) • Target function f : X → Y ( f is unknown) • Example/instance ( x , y ) • Training data S train : collection of examples observed by the algorithm Machine Learning: Chenhao Tan | Boulder | 6 of 31

Supervised Learning Formal Definitions • Goal of a learning algorithm: Find a function h : X → Y from training data S train so that h approximates f Machine Learning: Chenhao Tan | Boulder | 7 of 31

Supervised Learning Supervised learning in a nutshell S train = { ( x , y ) } → h Machine Learning: Chenhao Tan | Boulder | 8 of 31

Supervised Learning No Free Lunch Theorems • No free lunch for supervised machine learning [Wolpert, 1996]: in a noise-free scenario where the loss function is the misclassification rate, if one is interested in off-training-set error, then there are no a priori distinctions between learning algorithms. Machine Learning: Chenhao Tan | Boulder | 9 of 31

Supervised Learning No Free Lunch Theorems • No free lunch for supervised machine learning [Wolpert, 1996]: in a noise-free scenario where the loss function is the misclassification rate, if one is interested in off-training-set error, then there are no a priori distinctions between learning algorithms. Corollary I: there is no single ML algorithm that works for everything. Corollary II: every successful ML algorithm makes assumptions. Machine Learning: Chenhao Tan | Boulder | 9 of 31

Supervised Learning No Free Lunch Theorems • No free lunch for supervised machine learning [Wolpert, 1996]: in a noise-free scenario where the loss function is the misclassification rate, if one is interested in off-training-set error, then there are no a priori distinctions between learning algorithms. Corollary I: there is no single ML algorithm that works for everything. Corollary II: every successful ML algorithm makes assumptions. • No free lunch for search/optimization [Wolpert and Macready, 1997]: All algorithms that search for an extremum of a cost function perform exactly the same when averaged over all possible cost functions. Machine Learning: Chenhao Tan | Boulder | 9 of 31

Data representation Outline Supervised Learning Data representation K-nearest neighbors Overview Performance Guarantee Curse of Dimensionality Machine Learning: Chenhao Tan | Boulder | 10 of 31

Data representation Data representation Republican nominee George Bush said he felt nervous as he voted today in his adopted home state of Texas, where he ended... ( (From Chris Harrison's WikiViz) Machine Learning: Chenhao Tan | Boulder | 11 of 31

Data representation Data representation Let us have an interactive example to think through data representation! Machine Learning: Chenhao Tan | Boulder | 12 of 31

Data representation Data representation Let us have an interactive example to think through data representation! Auto insurance quotes id rent income urban state car value car year 1 yes 50,000 no CO 20,000 2010 2 yes 70,000 no CO 30,000 2012 3 no 250,000 yes CO 55,000 2017 4 yes 200,000 yes NY 50,000 2016 Machine Learning: Chenhao Tan | Boulder | 12 of 31

Machine Learning: Chenhao Tan University of Colorado Boulder - PowerPoint PPT Presentation

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 2 Slides adapted from Jordan Boyd-Graber, Thorsten Joachims, Kilian Weinberger Machine Learning: Chenhao Tan | Boulder | 1 of 31 Logistics Piazza:

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 3 Slides adapted from

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 7 Slides adapted from

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 4 Slides adapted from

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 1 Slides adapted from

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 10 Slides adapted from

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 5 Slides adapted from

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 6 Slides adapted from

Department of Computer Science CSCI 5622: Machine Learning Chenhao Tan Lecture 23: Machine

User Level Sentiment Analysis Incorporating Social Networks Chenhao Tan Department of Computer

Natural Language Processing (CSEP 517): Computational Pragmatics Chenhao Tan 2017 c

Department of Computer Science CSCI 5622: Machine Learning Chenhao Tan Lecture 18: Clustering

Department of Computer Science CSCI 5622: Machine Learning Chenhao Tan Lecture 13: Boosting

Department of Computer Science CSCI 5622: Machine Learning Chenhao Tan Lecture 12:

Department of Computer Science CSCI 5622: Machine Learning Chenhao Tan Lecture 21: Reinforcement

Department of Computer Science CSCI 5622: Machine Learning Chenhao Tan Lecture 17: Midterm

Department of Computer Science CSCI 5622: Machine Learning Chenhao Tan Lecture 16:

Couple scientific simulation codes with preCICE A journey towards sustainable research software

Governance from Below website http://personal.lse.ac.uk/faguetj/ DECENTRALIZATION AND POPULAR

Agricultural transformation in the drylands- role of Data & Disruptive technologies Wayne

Overview of Week 4 September 19-September 23, 2016 Concept: Geography and Civilization Essential

Foreign Real Estate Investor Tax Planning Techniques By Richard S. Lehman Esq. 1 Richard S.

English as the academic lingua franca: looking back in anger and looking forward Ann

Single Static Assignment CSE 401 Section 10/10 Aaron Johnston & Nate Yazdani Adapted from

Modeling Solution Dominance over CSPs Tias Guns, Peter Stuckey, Guido Tack ModRef 2018 C o n

Machine Learning: Chenhao Tan University of Colorado Boulder - PowerPoint PPT Presentation

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 2 Slides adapted from Jordan Boyd-Graber, Thorsten Joachims, Kilian Weinberger Machine Learning: Chenhao Tan | Boulder | 1 of 31 Logistics Piazza:

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 3 Slides adapted from

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 7 Slides adapted from

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 4 Slides adapted from

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 1 Slides adapted from

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 10 Slides adapted from

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 5 Slides adapted from

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 6 Slides adapted from

Department of Computer Science CSCI 5622: Machine Learning Chenhao Tan Lecture 23: Machine

User Level Sentiment Analysis Incorporating Social Networks Chenhao Tan Department of Computer

Natural Language Processing (CSEP 517): Computational Pragmatics Chenhao Tan 2017 c

Department of Computer Science CSCI 5622: Machine Learning Chenhao Tan Lecture 18: Clustering

Department of Computer Science CSCI 5622: Machine Learning Chenhao Tan Lecture 13: Boosting

Department of Computer Science CSCI 5622: Machine Learning Chenhao Tan Lecture 12:

Department of Computer Science CSCI 5622: Machine Learning Chenhao Tan Lecture 21: Reinforcement

Department of Computer Science CSCI 5622: Machine Learning Chenhao Tan Lecture 17: Midterm

Department of Computer Science CSCI 5622: Machine Learning Chenhao Tan Lecture 16:

Couple scientific simulation codes with preCICE A journey towards sustainable research software

Governance from Below website http://personal.lse.ac.uk/faguetj/ DECENTRALIZATION AND POPULAR

Agricultural transformation in the drylands- role of Data &amp; Disruptive technologies Wayne

Overview of Week 4 September 19-September 23, 2016 Concept: Geography and Civilization Essential

Foreign Real Estate Investor Tax Planning Techniques By Richard S. Lehman Esq. 1 Richard S.

English as the academic lingua franca: looking back in anger and looking forward Ann

Single Static Assignment CSE 401 Section 10/10 Aaron Johnston &amp; Nate Yazdani Adapted from

Modeling Solution Dominance over CSPs Tias Guns, Peter Stuckey, Guido Tack ModRef 2018 C o n

Agricultural transformation in the drylands- role of Data & Disruptive technologies Wayne

Single Static Assignment CSE 401 Section 10/10 Aaron Johnston & Nate Yazdani Adapted from