Course Summary Course Summary Introduction: Introduction: Basic - - PowerPoint PPT Presentation

course summary course summary
SMART_READER_LITE
LIVE PREVIEW

Course Summary Course Summary Introduction: Introduction: Basic - - PowerPoint PPT Presentation

Course Summary Course Summary Introduction: Introduction: Basic problems and questions in machine learning. Basic problems and questions in machine learning. Linear Classifiers Linear Classifiers Na Na ve Bayes ve


slide-1
SLIDE 1

Course Summary Course Summary

Introduction: Introduction:

– – Basic problems and questions in machine learning. Basic problems and questions in machine learning.

Linear Classifiers Linear Classifiers

– – Na Naï ïve Bayes ve Bayes – – Logistic Regression Logistic Regression – – LMS LMS

Five Popular Algorithms Five Popular Algorithms

– – Decision trees (C4.5) Decision trees (C4.5) – – Neural networks (backpropagation) Neural networks (backpropagation) – – Probabilistic networks (Na Probabilistic networks (Naï ïve Bayes; Mixture models) ve Bayes; Mixture models) – – Support Vector Machines (SVMs) Support Vector Machines (SVMs) – – Nearest Neighbor Method Nearest Neighbor Method

Theories of Learning: Theories of Learning:

– – PAC, Bayesian, Bias PAC, Bayesian, Bias-

  • Variance analysis

Variance analysis

Optimizing Test Set Performance: Optimizing Test Set Performance:

– – Overfitting, Penalty methods, Holdout Methods, Ensembles Overfitting, Penalty methods, Holdout Methods, Ensembles

Sequential Data Sequential Data

– – Hidden Markov models, Conditional Random Fields; Hidden Markov Hidden Markov models, Conditional Random Fields; Hidden Markov SVMs SVMs

slide-2
SLIDE 2

Course Summary Course Summary

Goal of Learning Goal of Learning Loss Functions Loss Functions Optimization Algorithms Optimization Algorithms Learning Algorithms Learning Algorithms Learning Theory Learning Theory Overfitting and the Triple Tradeoff Overfitting and the Triple Tradeoff Controlling Overfitting Controlling Overfitting Sequential Learning Sequential Learning Statistical Evaluation Statistical Evaluation

slide-3
SLIDE 3

Goal of Learning Goal of Learning

Classifier: Classifier: ŷ ŷ = f( = f(x x) ) “ “Do the right thing! Do the right thing!” ” Conditional probability estimator: P(y| Conditional probability estimator: P(y|x x) ) Joint probability estimator: P( Joint probability estimator: P(x x,y) ,y)

– – compute conditional probability at compute conditional probability at classification time classification time

slide-4
SLIDE 4

Loss Functions Loss Functions

Cost matrices and Bayesian decision Cost matrices and Bayesian decision theory theory

– – Minimize expected loss Minimize expected loss – – Reject option Reject option

Log Likelihood: Log Likelihood: ∑ ∑k

k –

–I(y=k) log P(y=k| I(y=k) log P(y=k|x x,h) ,h) 0/1 loss: need to approximate 0/1 loss: need to approximate

– – squared error squared error – – mutual information mutual information – – margin slack ( margin slack (“ “hinge loss hinge loss” ”) )

slide-5
SLIDE 5

Optimization Algorithms Optimization Algorithms

None: direct estimation of None: direct estimation of µ

µ, , Σ Σ, P(y), P( , P(y), P(x x | y) | y)

Gradient Descent: LMS, logistic Gradient Descent: LMS, logistic regression, neural networks, CRFs regression, neural networks, CRFs Greedy Construction: Decision trees Greedy Construction: Decision trees Boosting Boosting None: nearest neighbor None: nearest neighbor

slide-6
SLIDE 6

Learning Algorithms Learning Algorithms

LMS LMS Logistic Regression Logistic Regression Multivariate Gaussian and LDA Multivariate Gaussian and LDA Na Naï ïve Bayes (gaussian, discrete, kernel density ve Bayes (gaussian, discrete, kernel density estimation) estimation) Decision Trees Decision Trees Neural Networks (squared error and softmax) Neural Networks (squared error and softmax) k k-

  • nearest neighbors

nearest neighbors SVMs (dot product, gaussian, and polynomial SVMs (dot product, gaussian, and polynomial kernels) kernels) HMMs/CRFs/averaged perceptron HMMs/CRFs/averaged perceptron

slide-7
SLIDE 7

The Statistical Problem: Overfitting The Statistical Problem: Overfitting

Goal: choose Goal: choose h h to optimize to optimize test set test set performance performance Triple tradeoff: sample size, test set accuracy, Triple tradeoff: sample size, test set accuracy, complexity complexity

– – For fixed sample size, there is an accuracy/complexity tradeoff For fixed sample size, there is an accuracy/complexity tradeoff

Measures of complexity: Measures of complexity:

– – |H|, VC dimension, log P(h), || |H|, VC dimension, log P(h), ||w w||, number of nodes in tree ||, number of nodes in tree

Bias/Variance analysis Bias/Variance analysis

– – Bias: systematic error in Bias: systematic error in h h – – Variance: high disagreement between different Variance: high disagreement between different h h’ ’s s – – test error = Bias test error = Bias2

2 + variance + noise (square loss, log loss)

+ variance + noise (square loss, log loss) – – test error = Bias + unbiased test error = Bias + unbiased-

  • variance

variance – – biased biased-

  • variance (0/1

variance (0/1 loss) loss)

Most accurate hypothesis on training data is not usually Most accurate hypothesis on training data is not usually most accurate on test data most accurate on test data Most accurate hypothesis on test data may be Most accurate hypothesis on test data may be deliberately wrong (i.e., biased) deliberately wrong (i.e., biased)

slide-8
SLIDE 8

Controlling Overfitting Controlling Overfitting

Penalty Methods Penalty Methods

– – Pessimistic pruning of decision trees Pessimistic pruning of decision trees – – Weight decay Weight decay – – Weight elimination Weight elimination – – Maximum Margin Maximum Margin

Holdout Methods Holdout Methods

– – Early stopping for neural networks Early stopping for neural networks – – Reduce Reduce-

  • error pruning

error pruning

Combined Methods (use CV to set penalty level) Combined Methods (use CV to set penalty level)

– – Cost Cost-

  • complexity pruning

complexity pruning – – CV to choose pruning confidence, weight decay level, SVM CV to choose pruning confidence, weight decay level, SVM parameters C and parameters C and σ σ

Ensemble Methods Ensemble Methods

– – Bagging Bagging – – Boosting Boosting

slide-9
SLIDE 9

Off Off-

  • The

The-

  • Shelf Criteria

Shelf Criteria

yes yes yes yes yes yes some some yes yes disc disc disc disc yes yes yes yes NB NB yes yes no no some some yes yes yes yes yes yes yes yes yes yes yes yes Boosted Boosted Trees Trees yes yes some some yes yes some some no no no no yes yes no no no no SVM SVM no no no no some some no no no no no no yes yes some some no no NNbr NNbr yes yes no no yes yes no no yes yes some some yes yes no no no no Nets Nets no no yes yes no no some some yes yes yes yes yes yes yes yes yes yes Trees Trees yes yes yes yes yes yes no no yes yes no no no no yes yes no no LDA LDA yes yes yes yes Accurate Accurate yes yes yes yes Interpretable Interpretable yes yes yes yes Linear combinations Linear combinations no no no no Irrelevant inputs Irrelevant inputs yes yes yes yes Scalability Scalability no no no no Monotone transforms Monotone transforms yes yes no no Outliers Outliers no no no no Missing values Missing values no no no no Mixed data Mixed data Logistic Logistic LMS LMS Criterion Criterion

slide-10
SLIDE 10

What We What We’ ’ve Skipped ve Skipped

Unsupervised Learning Unsupervised Learning

– – Given examples Given examples X Xi

i

– – Find: P( Find: P(X X) ) – – Clustering Clustering – – Dimensionality Reduction Dimensionality Reduction

slide-11
SLIDE 11

What We Skipped (2) What We Skipped (2)

Reinforcement Learning: Agent interacting Reinforcement Learning: Agent interacting with an environment with an environment

– – At each time step t At each time step t

Agent perceives current state Agent perceives current state s s of environment

  • f environment

Agent choose action to perform according to a Agent choose action to perform according to a policy policy: : a a = = π π(s) (s) Action is executed, environment moves to new Action is executed, environment moves to new state s state s’ ’ and returns reward r and returns reward r

– – Goal: Find Goal: Find π π to maximizes long term sum of to maximizes long term sum of rewards rewards

slide-12
SLIDE 12

What We Skipped (3): What We Skipped (3): Semi Semi-

  • Supervised Learning

Supervised Learning

Learning from a mixture of supervised and Learning from a mixture of supervised and unsupervised data unsupervised data In many applications, unlabeled data is In many applications, unlabeled data is very cheap very cheap

– – BodyMedia BodyMedia – – Task Tracer Task Tracer – – Natural Language Processing Natural Language Processing – – Computer Vision Computer Vision

How can we use this? How can we use this?

slide-13
SLIDE 13

Research Frontier Research Frontier

More complex data objects More complex data objects

– – sequences, images, networks, relational databases sequences, images, networks, relational databases

More complex runtime tasks More complex runtime tasks

– – planning, scheduling, diagnosis, configuration planning, scheduling, diagnosis, configuration

Learning in changing environments Learning in changing environments Learning online Learning online Combining supervised and unsupervised Combining supervised and unsupervised learning learning Multi Multi-

  • agent reinforcement learning

agent reinforcement learning Cost Cost-

  • sensitive learning; imbalanced classes

sensitive learning; imbalanced classes Learning with prior knowledge Learning with prior knowledge