Course Overview Matt Gormley Lecture 1 August 27, 2018 1 WHAT IS - PowerPoint PPT Presentation

10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Course Overview Matt Gormley Lecture 1 August 27, 2018 1

WHAT IS MACHINE LEARNING? 2

Artificial Intelligence The basic goal of AI is to develop intelligent machines. This consists of many sub-goals: Artificial • Perception Intelligence • Reasoning Machine • Control / Motion / Manipulation Learning • Planning • Communication • Creativity • Learning 3

What is Machine Learning? 5

What is ML? Computer Domain of Science Interest Machine Learning Optimization Statistics Probability Measure Calculus Linear Algebra Theory 6

Speech Recognition 1. Learning to recognize spoken words THEN NOW “…the SPHINX system (e.g. Lee 1989) learns speaker- specific strategies for recognizing the primitive sounds (phonemes) and words from the observed speech signal…neural network methods…hidden Markov models…” (Mitchell, 1997) Source : https://www.stonetemple.com/great-knowledge-box- showdown/#VoiceStudyResults 7

Robotics 2. Learning to drive an autonomous vehicle THEN NOW “…the ALVINN system (Pomerleau 1989) has used its learned strategies to drive unassisted at 70 miles per hour for 90 miles on public highways among other cars…” (Mitchell, 1997) waymo.com 8

Robotics 2. Learning to drive an autonomous vehicle THEN NOW “…the ALVINN system (Pomerleau 1989) has used its learned strategies to drive unassisted at 70 miles per hour for 90 miles on public highways among other cars…” https://www.geek.com/wp- (Mitchell, 1997) content/uploads/2016/03/uber.jpg 9

Games / Reasoning 3. Learning to beat the masters at board games THEN NOW “…the world’s top computer program for backgammon, TD-GAMMON (Tesauro, 1992, 1995), learned its strategy by playing over one million practice games against itself…” (Mitchell, 1997) 10

Computer Vision 4. Learning to recognize images THEN NOW “…The recognizer is a convolution network that can be spatially replicated. From the network output, a hidden Markov model produces word scores. The entire system is globally trained to minimize word-level LeRec: Hybrid for On-Line Handwriting Recognition 1295 errors.…” ... . 3x3 I 2x2 convolve (slide from Kaiming He’s recent presentation) Lecture 7 - Lecture 7 - 27 Jan 2016 27 Jan 2016 Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson 78 feature maps feature maps feature maps 889x8 INPUT AMAP 2505x4 output code o ~ ~ ~ ~ x " p d e 5820x18 8018x16 8482x1 (LeCun et al., 1995) Figure 2: Convolutional neural network character recognizer. This architecture is robust to local translations and distortions, with subsampling, shared weights, and local receptive fields. number of subsampling layers and the sizes of the kernels are chosen, the sizes of all the layers, including the input, are determined unambigu- ously. The only architectural parameters that remain to be selected are the number of feature maps in each layer, and the information as to what feature map is connected to what other feature map. In our case, the subsampling rates were chosen as small as possible (2 x 2), and the kernels as small as possible in the first layer (3 x 3) to limit the total number of connections. Kernel sizes in the upper layers are chosen to be as small as 11 possible while satisfying the size constraints mentioned above. The last subsampling layer performs a vertical subsampling to make the network more robust to errors of the word normalizer (which tends to create vari- Images from https://blog.openai.com/generative-models/ ations in vertical position). Several architectures were tried (but clearly not exhaustively), varying the type of layers (convolution, subsampling), the kernel sizes, and the number of feature maps. Larger architectures did not necessarily perform better and required considerably more time to be trained. A very small architecture with half the input field also performed worse, because of insufficient input resolution. Note that the input resolution is nonetheless much less than for optical character resolution, because the angle and curvature provide more information than a single grey level at each pixel. Training proceeded in two phases. First, we kept the centers of the RBFs fixed, and trained the network weights so as to maximize the log- arithm of the output RBF corresponding to the correct class (maximum log-likelihood). This is equivalent to minimizing the mean-squared er- ror between the previous layer and the center o f the correct-class RBF.

Learning Theory • 5. In what cases and how well can we learn? Sample%Complexity%Results Four$Cases$we$care$about… Realizable Agnostic 34 1. How many examples do we need to learn? 2. How do we quantify our ability to generalize to unseen data? 3. Which algorithms are better suited to specific learning settings? 12

What is Machine Learning? To solve all the problems above and more 13

Topics • • Foundations Neural Networks – – Probability Feedforward Neural Nets – – MLE, MAP Basic architectures – – Optimization Backpropagation • – CNNs Classifiers • – Graphical Models KNN – – Naïve Bayes Bayesian Networks – – Logistic Regression HMMs – – Perceptron Learning and Inference – • SVM Learning Theory • Regression – Statistical Estimation (covered right before midterm) – Linear Regression – PAC Learning • Important Concepts • Other Learning Paradigms – Kernels – Matrix Factorization – Regularization and Overfitting – Reinforcement Learning – Experimental Design – Information Theory • Unsupervised Learning – K-means / Lloyd’s method – PCA – EM / GMMs 14

ML Big Picture Learning Paradigms: Problem Formulation: Vision, Robotics, Medicine, What is the structure of our output prediction? What data is available and NLP, Speech, Computer when? What form of prediction? boolean Binary Classification • supervised learning categorical Multiclass Classification • unsupervised learning ordinal Ordinal Classification Application Areas • semi-supervised learning • real Regression reinforcement learning Key challenges? • active learning ordering Ranking • imitation learning multiple discrete Structured Prediction • domain adaptation • multiple continuous (e.g. dynamical systems) online learning Search • density estimation both discrete & (e.g. mixed graphical models) • recommender systems cont. • feature learning • manifold learning • dimensionality reduction Facets of Building ML Big Ideas in ML: • ensemble learning Systems: Which are the ideas driving • distant supervision How to build systems that are development of the field? • hyperparameter optimization robust, efficient, adaptive, • inductive bias effective? Theoretical Foundations: • generalization / overfitting 1. Data prep • bias-variance decomposition What principles guide learning? 2. Model selection • 3. Training (optimization / generative vs. discriminative q probabilistic search) • deep nets, graphical models q information theoretic 4. Hyperparameter tuning on • PAC learning q evolutionary search validation data • distant rewards 5. (Blind) Assessment on test q ML as optimization data 15

DEFINING LEARNING PROBLEMS 16

Well-Posed Learning Problems Three components <T,P,E> : 1. Task, T 2. Performance measure, P 3. Experience, E Definition of learning: A computer program learns if its performance at tasks in T , as measured by P , improves with experience E . 17 Definition from (Mitchell, 1997)

Example Learning Problems 3. Learning to beat the masters at chess 1. Task, T : 2. Performance measure, P : 3. Experience, E : 18

Example Learning Problems 4. Learning to respond to voice commands (Siri) 1. Task, T : 2. Performance measure, P : 3. Experience, E : 19

Capturing the Knowledge of Experts 1980 1990 2000 2010 Solution #1: Expert Systems Give me directions to Starbucks • Over 20 years ago, we If: “give me directions to X” had rule based systems Then: directions(here, nearest(X)) • Ask the expert to How do I get to Starbucks? 1. Obtain a PhD in If: “how do i get to X” Linguistics Then: directions(here, nearest(X)) 2. Introspect about the structure of their native Where is the nearest Starbucks? language If: “where is the nearest X” 3. Write down the rules Then: directions(here, nearest(X)) they devise 20

Course Overview Matt Gormley Lecture 1 August 27, 2018 1 WHAT IS - PowerPoint PPT Presentation

10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Course Overview Matt Gormley Lecture 1 August 27, 2018 1 WHAT IS MACHINE LEARNING? 2 Artificial Intelligence The

Course Orientation q Course Description q Course Outcomes q Course Requirements q Course Outline

CANVAS COURSE PROFILE STUDENT PERFORMANCE COURSE OVERVIEW ASSIGNMENT AND SUBMISSION ANALYSIS

Course Search Widget Topics StudyLink Course Search Widget Demo Generic Course Search

Course Specifications/Detailed Course Outline Course code : STA 331 2.0 Course title :

DPD Basic Bicycle Course Course Objectives COURSE GOAL: The course will provide the trainee with

Leadplane Training Course Leadplane Training Course Course Objectives Describe procedures for

Statistics II Xavier Vil Course 2004-2005 1.- Course Contents 2.- Course Resources 3.-

ARM Microcontroller Course June 3, 2015 ARM Microcontroller Course The Course Direct Digital

Course Home Page Course Design Course Structure main source reading-intensive course

Level 1, V2.0 Level 1, V2.0 1 Course Contents Course Contents Course Contents Course

Management Course presentation Dan C. Lungescu, PhD, assistant professor 2015-2016 Topics A.

LOS ALAMOS COUNTY GOLF COURSE OVERVIEW DESIGN DEVELOPMENT SUBMITTAL, NOVEMBER 2019 LOS ALAMOS

BIOE 301/362 Lecture One Overview of Lecture 1 Course Overview: Course organization

01 | KPF Overview 01 | KPF Overview 01 | KPF Overview 01 | KPF Overview 01 | KPF Overview 01 |

SAP SAP SAP SAP Cash Management Cash Management g Course Objective Course Objective Course

to the 1 year Foundation Course Aims of the Foundation course The course has four distinct

CS434 Machine Learning and Data Mining Fall 2008 1 Administrative Trivia Instructor:

Deep Multi-Task and Meta-Learning CS 330 Course Logistics Information & Resources Chelsea

Learning From Data Lecture 1 The Learning Problem Introduction Motivation Credit Default - A

The Learning Problem and Regularization Tomaso Poggio 9.520 Class 02 September 2015 Tomaso

CS 285 Instructor: Sergey Levine UC Berkeley Todays Lecture 1. So far: manually design

Class 1 Introduction to Statistical Learning Theory Carlo Ciliberto Department of Computer

13. Reinforcemen t Learning [Read Chapter 13] [Exercises 13.1, 13.2, 13.4] Con

Learning What is learning? Foundations of Artificial Intelligence An agent learns when it

Course Overview Matt Gormley Lecture 1 August 27, 2018 1 WHAT IS - PowerPoint PPT Presentation

10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Course Overview Matt Gormley Lecture 1 August 27, 2018 1 WHAT IS MACHINE LEARNING? 2 Artificial Intelligence The

Course Orientation q Course Description q Course Outcomes q Course Requirements q Course Outline

CANVAS COURSE PROFILE STUDENT PERFORMANCE COURSE OVERVIEW ASSIGNMENT AND SUBMISSION ANALYSIS

Course Search Widget Topics StudyLink Course Search Widget Demo Generic Course Search

Course Specifications/Detailed Course Outline Course code : STA 331 2.0 Course title :

DPD Basic Bicycle Course Course Objectives COURSE GOAL: The course will provide the trainee with

Leadplane Training Course Leadplane Training Course Course Objectives Describe procedures for

Statistics II Xavier Vil Course 2004-2005 1.- Course Contents 2.- Course Resources 3.-

ARM Microcontroller Course June 3, 2015 ARM Microcontroller Course The Course Direct Digital

Course Home Page Course Design Course Structure main source reading-intensive course

Level 1, V2.0 Level 1, V2.0 1 Course Contents Course Contents Course Contents Course

Management Course presentation Dan C. Lungescu, PhD, assistant professor 2015-2016 Topics A.

LOS ALAMOS COUNTY GOLF COURSE OVERVIEW DESIGN DEVELOPMENT SUBMITTAL, NOVEMBER 2019 LOS ALAMOS

BIOE 301/362 Lecture One Overview of Lecture 1 Course Overview: Course organization

01 | KPF Overview 01 | KPF Overview 01 | KPF Overview 01 | KPF Overview 01 | KPF Overview 01 |

SAP SAP SAP SAP Cash Management Cash Management g Course Objective Course Objective Course

to the 1 year Foundation Course Aims of the Foundation course The course has four distinct

CS434 Machine Learning and Data Mining Fall 2008 1 Administrative Trivia Instructor:

Deep Multi-Task and Meta-Learning CS 330 Course Logistics Information &amp; Resources Chelsea

Learning From Data Lecture 1 The Learning Problem Introduction Motivation Credit Default - A

The Learning Problem and Regularization Tomaso Poggio 9.520 Class 02 September 2015 Tomaso

CS 285 Instructor: Sergey Levine UC Berkeley Todays Lecture 1. So far: manually design

Class 1 Introduction to Statistical Learning Theory Carlo Ciliberto Department of Computer

13. Reinforcemen t Learning [Read Chapter 13] [Exercises 13.1, 13.2, 13.4] Con

Learning What is learning? Foundations of Artificial Intelligence An agent learns when it

Deep Multi-Task and Meta-Learning CS 330 Course Logistics Information & Resources Chelsea