Ensemble Methods + Recommender Systems Matt Gormley Lecture 21 - PowerPoint PPT Presentation

10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Midterm Review + Ensemble Methods + Recommender Systems Matt Gormley Lecture 21 Nov. 4, 2019 1

Reminders • Homework 6: Information Theory / Generative Models – Out: Fri, Oct. 25 – Due: Fri, Nov. 8 at 11:59pm • Midterm Exam 2 – Thu, Nov. 14, 6:30pm – 8:00pm – more details announced on Piazza • Homework 7: HMMs – Out: Fri, Nov. 8 – Due: Sun, Nov. 24 at 11:59pm • Today’s In-Class Poll – http://p21.mlcourse.org 2

MIDTERM EXAM LOGISTICS 3

Midterm Exam • Time / Location – Time: Evening Exam Thu, Nov. 14 at 6:30pm – 8:00pm – Room : We will contact each student individually with your room assignment . The rooms are not based on section. – Seats: There will be assigned seats . Please arrive early. – Please watch Piazza carefully for announcements regarding room / seat assignments. • Logistics – Covered material: Lecture 9 – Lecture 19 (95%), Lecture 1 – 8 (5%) – Format of questions: • Multiple choice • True / False (with justification) • Derivations • Short answers • Interpreting figures • Implementing algorithms on paper – No electronic devices – You are allowed to bring one 8½ x 11 sheet of notes (front and back) 4

Midterm Exam • How to Prepare – Attend the midterm review lecture (right now!) – Review prior year’s exam and solutions (we’ll post them) – Review this year’s homework problems – Consider whether you have achieved the “learning objectives” for each lecture / section 5

Midterm Exam • Advice (for during the exam) – Solve the easy problems first (e.g. multiple choice before derivations) • if a problem seems extremely complicated you’re likely missing something – Don’t leave any answer blank! – If you make an assumption, write it down – If you look at a question and don’t know the answer: • we probably haven’t told you the answer • but we’ve told you enough to work it out • imagine arguing for some answer and see if you like it 6

Topics for Midterm 1 • Foundations • Classification – Probability, Linear – Decision Tree Algebra, Geometry, – KNN Calculus – Perceptron – Optimization • Regression • Important Concepts – Linear Regression – Overfitting – Experimental Design 7

Topics for Midterm 2 • Classification • Reinforcement – Binary Logistic Learning Regression – Value Iteration – Multinomial Logistic – Policy Iteration Regression – Q-Learning • Important Concepts – Deep Q-Learning – Regularization • Learning Theory – Feature Engineering – Information Theory • Feature Learning – Neural Networks – Basic NN Architectures – Backpropagation 8

SAMPLE QUESTIONS 9

Sample Questions 3.2 Logistic regression Given a training set { ( x i , y i ) , i = 1 , . . . , n } where x i 2 R d is a feature vector and y i 2 { 0 , 1 } is a binary label, we want to find the parameters ˆ w that maximize the likelihood for the training set, assuming a parametric model of the form 1 p ( y = 1 | x ; w ) = 1 + exp( � w T x ) . The conditional log likelihood of the training set is n X ` ( w ) = y i log p ( y i , | x i ; w ) + (1 � y i ) log(1 � p ( y i , | x i ; w )) , i =1 and the gradient is n X r ` ( w ) = ( y i � p ( y i | x i ; w )) x i . i =1 (b) [5 pts.] What is the form of the classifier output by logistic regression? (c) [2 pts.] Extra Credit: Consider the case with binary features, i.e, x 2 { 0 , 1 } d ⇢ R d , where feature x 1 is rare and happens to appear in the training set with only label 1. What is ˆ w 1 ? Is the gradient ever zero for any finite w ? Why is it important to include a regularization term to control the norm of ˆ w ? 10

Samples Questions 2.1 Train and test errors In this problem, we will see how you can debug a classifier by looking at its train and test errors. Consider a classifier trained till convergence on some training data D train , and tested on a separate test set D test . You look at the test error, and find that it is very high. You then compute the training error and find that it is close to 0. 1. [4 pts] Which of the following is expected to help? Select all that apply. (a) Increase the training data size. (b) Decrease the training data size. (c) Increase model complexity (For example, if your classifier is an SVM, use a more complex kernel. Or if it is a decision tree, increase the depth). (d) Decrease model complexity. (e) Train on a combination of D train and D test and test on D test (f) Conclude that Machine Learning does not work. 11

Samples Questions 2.1 Train and test errors In this problem, we will see how you can debug a classifier by looking at its train and test errors. Consider a classifier trained till convergence on some training data D train , and tested on a separate test set D test . You look at the test error, and find that it is very high. You then compute the training error and find that it is close to 0. 4. [1 pts] Say you plot the train and test errors as a function of the model complexity. Which of the following two plots is your plot expected to look like? (a) (b) 12

Sample Questions Neural Networks Can the neural network in Figure (b) correctly classify the dataset given in Figure (a)? 5 y S 2 4 w 31 w 32 3 S 1 S 3 x2 h 1 h 2 2 w 12 w 11 w 21 w 22 1 x 1 x 2 0 0 1 2 3 4 5 x1 (b) The neural network architecture (a) The dataset with groups S 1 , S 2 , and S 3 . 14

Sample Questions Neural Networks y w 31 w 32 Apply the backpropagation algorithm to obtain the partial derivative of the mean-squared error h 1 h 2 of y with the true value y * with respect to the weight w 22 assuming a sigmoid nonlinear w 12 w 11 w 21 w 22 activation function for the hidden layer. x 1 x 2 (b) The neural network architecture 15

Sample Questions 7.1 Reinforcement Learning � 3. (1 point) Please select one statement that is true for reinforcement learning and supervised learning. � Reinforcement learning is a kind of supervised learning problem because you can treat the reward and next state as the label and each state, action pair as the training data. � Reinforcement learning di ff ers from supervised learning because it has a tem- poral structure in the learning process, whereas, in supervised learning, the prediction of a data point does not a ff ect the data you would see in the future. 4. (1 point) True or False: Value iteration is better at balancing exploration and ex- ploitation compared with policy iteration. � True � False 16

Sample Questions 7.1 Reinforcement Learning 1. For the R(s,a) values shown on the arrows below, what is the corresponding optimal policy? Assume the discount 0 factor is 0.1 0 4 8 2. For the R(s,a) values shown on the arrows below, which are the corresponding V*(s) values? Assume the discount 2 4 8 factor is 0.1 0 0 0 0 2 4 3. For the R(s,a) values shown on the arrows below, which are the corresponding Q*(s,a) values? Assume the discount factor is 0.1 17

Example: Robot Localization Im St ’| 18 Figure from Tom Mitchell

ML Big Picture Learning Paradigms: Problem Formulation: Vision, Robotics, Medicine, What is the structure of our output prediction? What data is available and NLP, Speech, Computer when? What form of prediction? boolean Binary Classification • supervised learning categorical Multiclass Classification • unsupervised learning ordinal Ordinal Classification Application Areas • semi-supervised learning • real Regression reinforcement learning Key challenges? • active learning ordering Ranking • imitation learning multiple discrete Structured Prediction • domain adaptation • multiple continuous (e.g. dynamical systems) online learning Search • density estimation both discrete & (e.g. mixed graphical models) • recommender systems cont. • feature learning • manifold learning • dimensionality reduction Facets of Building ML Big Ideas in ML: • ensemble learning Systems: Which are the ideas driving • distant supervision How to build systems that are development of the field? • hyperparameter optimization robust, efficient, adaptive, • inductive bias effective? Theoretical Foundations: • generalization / overfitting 1. Data prep • bias-variance decomposition What principles guide learning? 2. Model selection • 3. Training (optimization / generative vs. discriminative q probabilistic search) • deep nets, graphical models q information theoretic 4. Hyperparameter tuning on • PAC learning q evolutionary search validation data • distant rewards 5. (Blind) Assessment on test q ML as optimization data 24

Outline for Today We’ll talk about two distinct topics: 1. Ensemble Methods : combine or learn multiple classifiers into one (i.e. a family of algorithms) 2. Recommender Systems : produce recommendations of what a user will like (i.e. the solution to a particular type of task) We’ll use a prominent example of a recommender systems (the Netflix Prize) to motivate both topics… 25

RECOMMENDER SYSTEMS 26

Recommender Systems A Common Challenge: – Assume you’re a company selling items of some sort: movies, songs, products, etc. – Company collects millions of ratings from users of their items – To maximize profit / user happiness, you want to recommend items that users are likely to want 27

Recommender Systems 28

Ensemble Methods + Recommender Systems Matt Gormley Lecture 21 - PowerPoint PPT Presentation

10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Midterm Review + Ensemble Methods + Recommender Systems Matt Gormley Lecture 21 Nov. 4, 2019 1 Reminders

Web Mining and Recommender Systems Recommender Systems: Introduction Learning Goals

Variable selection bias Bias in Ensemble Bias in Ensemble Methods Methods Variable selection

2. Recommender Systems Recommenders Everywhere Advanced Topics in Information Retrieval /

Affect- and Personality-based Recommender Systems Part II: Acquisition, Usage in Recommender

On the Economics of Recommender Systems Emilio Calvano Center for Studies in Econ and Finance U.

Boosting (ensemble) Module 4 - Ensemble classifiers - Objectives module 4: boosting (ensemble

Privacy in Recommender Systems CompSci 590.03 Instructor: Ashwin Machanavajjhala Lecture 21:

Introduction to ensemble methods EN S EMBLE METH ODS IN P YTH ON Romn de las Heras Data

Ensemble Methods + Recommender Systems Matt Gormley Lecture 28 Apr. 29, 2019 1 Reminders

CSE 255 Lecture 5 Data Mining and Predictive Analytics Recommender Systems Why

Content- -based Recommender Systems based Recommender Systems Content problems, challenges

CSE 158 Lecture 7 Web Mining and Recommender Systems Recommender Systems Announcements

Web Mining and Recommender Systems Advanced Recommender Systems: Bayesian Personalized Ranking

CSE 158 Lecture 7 Web Mining and Recommender Systems Recommender Systems Announcements

CSE 258 Web Mining and Recommender Systems Advanced Recommender Systems This week

Ruiqi Guo, Philip Sun, Erik Lindgren, Quan Geng, David Simcha, Felix Chern, Sanjiv Kumar Overview

High-order Asymptotic-Preserving schemes for the Boltzmann equation and related problems Lorenzo

APAN Sensor Network WG APAN Sensor Network WG nd meeting @ Hanoi 2 nd 2 nd nd meeting @ Hanoi

802.1 Plenary July 2017 Berlin, Germany Closing Agenda Glenn Parsons IEEE 802.1 WG Chair

iLab 2 Internet Protocol version 6 Stefan Liebald liebald@net.in.tum.de Lehrstuhl fr

Reform into competency-based curriculum in medical education in South Korea Seou Seoul Na Natio

Sorting Into Incentives for Prosocial Behavior Christian J. Meyer & Egon Tripodi European

IPv6 Jyh-Cheng Chen Department of Computer Science and Institute of Communications Engineering

iLabX Internet Protocol version 6 Stefan Liebald liebald@net.in.tum.de Lehrstuhl fr

Ensemble Methods + Recommender Systems Matt Gormley Lecture 21 - PowerPoint PPT Presentation

10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Midterm Review + Ensemble Methods + Recommender Systems Matt Gormley Lecture 21 Nov. 4, 2019 1 Reminders

Web Mining and Recommender Systems Recommender Systems: Introduction Learning Goals

Variable selection bias Bias in Ensemble Bias in Ensemble Methods Methods Variable selection

2. Recommender Systems Recommenders Everywhere Advanced Topics in Information Retrieval /

Affect- and Personality-based Recommender Systems Part II: Acquisition, Usage in Recommender

On the Economics of Recommender Systems Emilio Calvano Center for Studies in Econ and Finance U.

Boosting (ensemble) Module 4 - Ensemble classifiers - Objectives module 4: boosting (ensemble

Privacy in Recommender Systems CompSci 590.03 Instructor: Ashwin Machanavajjhala Lecture 21:

Introduction to ensemble methods EN S EMBLE METH ODS IN P YTH ON Romn de las Heras Data

Ensemble Methods + Recommender Systems Matt Gormley Lecture 28 Apr. 29, 2019 1 Reminders

CSE 255 Lecture 5 Data Mining and Predictive Analytics Recommender Systems Why

Content- -based Recommender Systems based Recommender Systems Content problems, challenges

CSE 158 Lecture 7 Web Mining and Recommender Systems Recommender Systems Announcements

Web Mining and Recommender Systems Advanced Recommender Systems: Bayesian Personalized Ranking

CSE 158 Lecture 7 Web Mining and Recommender Systems Recommender Systems Announcements

CSE 258 Web Mining and Recommender Systems Advanced Recommender Systems This week

Ruiqi Guo, Philip Sun, Erik Lindgren, Quan Geng, David Simcha, Felix Chern, Sanjiv Kumar Overview

High-order Asymptotic-Preserving schemes for the Boltzmann equation and related problems Lorenzo

APAN Sensor Network WG APAN Sensor Network WG nd meeting @ Hanoi 2 nd 2 nd nd meeting @ Hanoi

802.1 Plenary July 2017 Berlin, Germany Closing Agenda Glenn Parsons IEEE 802.1 WG Chair

iLab 2 Internet Protocol version 6 Stefan Liebald liebald@net.in.tum.de Lehrstuhl fr

Reform into competency-based curriculum in medical education in South Korea Seou Seoul Na Natio

Sorting Into Incentives for Prosocial Behavior Christian J. Meyer &amp; Egon Tripodi European

IPv6 Jyh-Cheng Chen Department of Computer Science and Institute of Communications Engineering

iLabX Internet Protocol version 6 Stefan Liebald liebald@net.in.tum.de Lehrstuhl fr

Sorting Into Incentives for Prosocial Behavior Christian J. Meyer & Egon Tripodi European