Course Summary Course Summary Introduction: Introduction: Basic - PowerPoint PPT Presentation

Course Summary Course Summary Introduction: Introduction: – Basic problems and questions in machine learning. Basic problems and questions in machine learning. – Linear Classifiers Linear Classifiers – – Na Naï ïve Bayes ve Bayes – Logistic Regression Logistic Regression – – LMS LMS – Five Popular Algorithms Five Popular Algorithms – – Decision trees (C4.5) Decision trees (C4.5) – Neural networks (backpropagation) Neural networks (backpropagation) – – Probabilistic networks (Na Probabilistic networks (Naï ïve Bayes; Mixture models) ve Bayes; Mixture models) – – Support Vector Machines (SVMs) Support Vector Machines (SVMs) – – Nearest Neighbor Method Nearest Neighbor Method – Theories of Learning: Theories of Learning: – – PAC, Bayesian, Bias PAC, Bayesian, Bias- -Variance analysis Variance analysis Optimizing Test Set Performance: Optimizing Test Set Performance: – Overfitting, Penalty methods, Holdout Methods, Ensembles Overfitting, Penalty methods, Holdout Methods, Ensembles – Sequential Data Sequential Data – Hidden Markov models, Conditional Random Fields; Hidden Markov Hidden Markov models, Conditional Random Fields; Hidden Markov – SVMs SVMs

Course Summary Course Summary Goal of Learning Goal of Learning Loss Functions Loss Functions Optimization Algorithms Optimization Algorithms Learning Algorithms Learning Algorithms Learning Theory Learning Theory Overfitting and the Triple Tradeoff Overfitting and the Triple Tradeoff Controlling Overfitting Controlling Overfitting Sequential Learning Sequential Learning Statistical Evaluation Statistical Evaluation

Goal of Learning Goal of Learning Classifier: ŷ ŷ = f( = f( x ) “ “Do the right thing! Do the right thing!” ” Classifier: x ) Conditional probability estimator: P(y| x ) Conditional probability estimator: P(y| x ) Joint probability estimator: P( x ,y) Joint probability estimator: P( x ,y) – compute conditional probability at compute conditional probability at – classification time classification time

Loss Functions Loss Functions Cost matrices and Bayesian decision Cost matrices and Bayesian decision theory theory – Minimize expected loss Minimize expected loss – – Reject option Reject option – ∑ k Log Likelihood: ∑ –I(y=k) log P(y=k| I(y=k) log P(y=k| x ,h) Log Likelihood: k – x ,h) 0/1 loss: need to approximate 0/1 loss: need to approximate – squared error squared error – – mutual information mutual information – – margin slack ( margin slack (“ “hinge loss hinge loss” ”) ) –

Optimization Algorithms Optimization Algorithms None: direct estimation of µ µ , Σ , P(y), P( None: direct estimation of , Σ , P(y), P( x x | y) | y) Gradient Descent: LMS, logistic Gradient Descent: LMS, logistic regression, neural networks, CRFs regression, neural networks, CRFs Greedy Construction: Decision trees Greedy Construction: Decision trees Boosting Boosting None: nearest neighbor None: nearest neighbor

Learning Algorithms Learning Algorithms LMS LMS Logistic Regression Logistic Regression Multivariate Gaussian and LDA Multivariate Gaussian and LDA Naï ïve Bayes (gaussian, discrete, kernel density ve Bayes (gaussian, discrete, kernel density Na estimation) estimation) Decision Trees Decision Trees Neural Networks (squared error and softmax) Neural Networks (squared error and softmax) k- -nearest neighbors nearest neighbors k SVMs (dot product, gaussian, and polynomial SVMs (dot product, gaussian, and polynomial kernels) kernels) HMMs/CRFs/averaged perceptron HMMs/CRFs/averaged perceptron

The Statistical Problem: Overfitting The Statistical Problem: Overfitting Goal: choose h h to optimize to optimize test set test set performance performance Goal: choose Triple tradeoff: sample size, test set accuracy, Triple tradeoff: sample size, test set accuracy, complexity complexity – For fixed sample size, there is an accuracy/complexity tradeoff For fixed sample size, there is an accuracy/complexity tradeoff – Measures of complexity: Measures of complexity: – |H|, VC dimension, log P(h), || |H|, VC dimension, log P(h), || w ||, number of nodes in tree – w ||, number of nodes in tree Bias/Variance analysis Bias/Variance analysis – Bias: systematic error in – Bias: systematic error in h h – Variance: high disagreement between different – Variance: high disagreement between different h h ’ ’s s 2 + variance + noise (square loss, log loss) – test error = Bias – test error = Bias 2 + variance + noise (square loss, log loss) – test error = Bias + unbiased – test error = Bias + unbiased- -variance variance – – biased biased- -variance (0/1 variance (0/1 loss) loss) Most accurate hypothesis on training data is not usually Most accurate hypothesis on training data is not usually most accurate on test data most accurate on test data Most accurate hypothesis on test data may be Most accurate hypothesis on test data may be deliberately wrong (i.e., biased) deliberately wrong (i.e., biased)

Controlling Overfitting Controlling Overfitting Penalty Methods Penalty Methods – Pessimistic pruning of decision trees Pessimistic pruning of decision trees – – Weight decay Weight decay – – Weight elimination Weight elimination – – Maximum Margin Maximum Margin – Holdout Methods Holdout Methods – Early stopping for neural networks Early stopping for neural networks – – Reduce Reduce- -error pruning error pruning – Combined Methods (use CV to set penalty level) Combined Methods (use CV to set penalty level) – Cost Cost- -complexity pruning complexity pruning – – CV to choose pruning confidence, weight decay level, SVM CV to choose pruning confidence, weight decay level, SVM – σ parameters C and σ parameters C and Ensemble Methods Ensemble Methods – Bagging Bagging – – Boosting Boosting –

Off- -The The- -Shelf Criteria Shelf Criteria Off Boosted Boosted Criterion LMS Logistic LDA Trees Nets NNbr SVM NB Criterion LMS Logistic LDA Trees Nets NNbr SVM NB Trees Trees Mixed data no no no yes no no no yes yes Mixed data no no no yes no no no yes yes Missing values Missing values no no no no yes yes yes yes no no some some no no yes yes yes yes Outliers no yes no yes yes yes yes disc yes Outliers no yes no yes yes yes yes disc yes Monotone transforms Monotone transforms no no no no no no yes yes some some no no no no disc disc yes yes Scalability yes yes yes yes yes no no yes yes Scalability yes yes yes yes yes no no yes yes Irrelevant inputs no no no some no no some some yes Irrelevant inputs no no no some no no some some yes Linear combinations Linear combinations yes yes yes yes yes yes no no yes yes some some yes yes yes yes some some Interpretable yes yes yes yes no no some yes no Interpretable yes yes yes yes no no some yes no Accurate Accurate yes yes yes yes yes yes no no yes yes no no yes yes yes yes yes yes

What We’ ’ve Skipped ve Skipped What We Unsupervised Learning Unsupervised Learning – Given examples Given examples X – X i i – Find: P( Find: P( X ) – X ) – Clustering Clustering – – Dimensionality Reduction Dimensionality Reduction –

What We Skipped (2) What We Skipped (2) Reinforcement Learning: Agent interacting Reinforcement Learning: Agent interacting with an environment with an environment – At each time step t At each time step t – Agent perceives current state s Agent perceives current state s of environment of environment Agent choose action to perform according to a Agent choose action to perform according to a π (s) = π policy: : a a = (s) policy Action is executed, environment moves to new Action is executed, environment moves to new state s’ ’ and returns reward r and returns reward r state s π to maximizes long term sum of Goal: Find π – Goal: Find to maximizes long term sum of – rewards rewards

What We Skipped (3): What We Skipped (3): Semi- -Supervised Learning Supervised Learning Semi Learning from a mixture of supervised and Learning from a mixture of supervised and unsupervised data unsupervised data In many applications, unlabeled data is In many applications, unlabeled data is very cheap very cheap – BodyMedia BodyMedia – – Task Tracer Task Tracer – – Natural Language Processing Natural Language Processing – – Computer Vision Computer Vision – How can we use this? How can we use this?

Research Frontier Research Frontier More complex data objects More complex data objects – sequences, images, networks, relational databases sequences, images, networks, relational databases – More complex runtime tasks More complex runtime tasks – planning, scheduling, diagnosis, configuration planning, scheduling, diagnosis, configuration – Learning in changing environments Learning in changing environments Learning online Learning online Combining supervised and unsupervised Combining supervised and unsupervised learning learning Multi- -agent reinforcement learning agent reinforcement learning Multi Cost- -sensitive learning; imbalanced classes sensitive learning; imbalanced classes Cost Learning with prior knowledge Learning with prior knowledge

Course Summary Course Summary Introduction: Introduction: Basic - PowerPoint PPT Presentation

Course Summary Course Summary Introduction: Introduction: Basic problems and questions in machine learning. Basic problems and questions in machine learning. Linear Classifiers Linear Classifiers Na Na ve Bayes ve

Course Orientation q Course Description q Course Outcomes q Course Requirements q Course Outline

Baldwin Space Summary October 25 1 Baldwin School Space Summary 2 Baldwin School Space Summary

Course Search Widget Topics StudyLink Course Search Widget Demo Generic Course Search

Course Specifications/Detailed Course Outline Course code : STA 331 2.0 Course title :

Lecture Outline 1. Course summary 2. Beyond the course DD2452 Formal Methods 3. Exam

DPD Basic Bicycle Course Course Objectives COURSE GOAL: The course will provide the trainee with

CANVAS COURSE PROFILE STUDENT PERFORMANCE COURSE OVERVIEW ASSIGNMENT AND SUBMISSION ANALYSIS

Leadplane Training Course Leadplane Training Course Course Objectives Describe procedures for

Statistics II Xavier Vil Course 2004-2005 1.- Course Contents 2.- Course Resources 3.-

ARM Microcontroller Course June 3, 2015 ARM Microcontroller Course The Course Direct Digital

Course Home Page Course Design Course Structure main source reading-intensive course

Level 1, V2.0 Level 1, V2.0 1 Course Contents Course Contents Course Contents Course

SUMMARY OF 2 0 1 5 BRI TI SH EVENTI NG DATA DATA SUMMARY 2015 68,269 Cross Country Starters

1 Product Range Products 2 summary summary summary summary Relays with 8 and 11-Pins

An Ultramarathon Pie with Doge Glaze An Ultramarathon Pie with Doge Glaze Marathon: The Summary

to the 1 year Foundation Course Aims of the Foundation course The course has four distinct

Nearest neighbor classifier Information retrieval 8 ! Databases, systems, networking 4 ! Subhransu

Chapter 4: Variability Variability Provides a quantitative measure of the degree to which

Variance - Making T (or ) Simpler Random Intercept Model: T = [ 00 ] 00 01

Patterns of CO 2 variability from AIRS data Alexander Ruzmaikin & George Aumann in

The Separation Theorem for Differential Interaction Nets Damiano Mazza Laboratoire

The separation principle in stochastic control, revisited Workshop in honor of Eduardo Sontag on

7. Separating Hyperplane Theorems I Daisuke Oyama Mathematics II May 1, 2020 Separating

Linear algebra and analysis recalls Lectures for PHD course on Numerical optimization Enrico

Sambuz

Useful Links

Newsletter

Mail Us

Course Summary Course Summary Introduction: Introduction: Basic - PowerPoint PPT Presentation

Course Summary Course Summary Introduction: Introduction: Basic problems and questions in machine learning. Basic problems and questions in machine learning. Linear Classifiers Linear Classifiers Na Na ve Bayes ve

Course Orientation q Course Description q Course Outcomes q Course Requirements q Course Outline

Baldwin Space Summary October 25 1 Baldwin School Space Summary 2 Baldwin School Space Summary

Course Search Widget Topics StudyLink Course Search Widget Demo Generic Course Search

Course Specifications/Detailed Course Outline Course code : STA 331 2.0 Course title :

Lecture Outline 1. Course summary 2. Beyond the course DD2452 Formal Methods 3. Exam

DPD Basic Bicycle Course Course Objectives COURSE GOAL: The course will provide the trainee with

CANVAS COURSE PROFILE STUDENT PERFORMANCE COURSE OVERVIEW ASSIGNMENT AND SUBMISSION ANALYSIS

Leadplane Training Course Leadplane Training Course Course Objectives Describe procedures for

Statistics II Xavier Vil Course 2004-2005 1.- Course Contents 2.- Course Resources 3.-

ARM Microcontroller Course June 3, 2015 ARM Microcontroller Course The Course Direct Digital

Course Home Page Course Design Course Structure main source reading-intensive course

Level 1, V2.0 Level 1, V2.0 1 Course Contents Course Contents Course Contents Course

SUMMARY OF 2 0 1 5 BRI TI SH EVENTI NG DATA DATA SUMMARY 2015 68,269 Cross Country Starters

1 Product Range Products 2 summary summary summary summary Relays with 8 and 11-Pins

An Ultramarathon Pie with Doge Glaze An Ultramarathon Pie with Doge Glaze Marathon: The Summary

to the 1 year Foundation Course Aims of the Foundation course The course has four distinct

Nearest neighbor classifier Information retrieval 8 ! Databases, systems, networking 4 ! Subhransu

Chapter 4: Variability Variability Provides a quantitative measure of the degree to which

Variance - Making T (or ) Simpler Random Intercept Model: T = [ 00 ] 00 01

Patterns of CO 2 variability from AIRS data Alexander Ruzmaikin &amp; George Aumann in

The Separation Theorem for Differential Interaction Nets Damiano Mazza Laboratoire

The separation principle in stochastic control, revisited Workshop in honor of Eduardo Sontag on

7. Separating Hyperplane Theorems I Daisuke Oyama Mathematics II May 1, 2020 Separating

Linear algebra and analysis recalls Lectures for PHD course on Numerical optimization Enrico

Sambuz

Useful Links

Newsletter

Mail Us

Patterns of CO 2 variability from AIRS data Alexander Ruzmaikin & George Aumann in