Basics of Machine Learning and Deep Learning (Part I) Machine - - PowerPoint PPT Presentation

basics of machine learning
SMART_READER_LITE
LIVE PREVIEW

Basics of Machine Learning and Deep Learning (Part I) Machine - - PowerPoint PPT Presentation

CSci 8980 Basics of Machine Learning and Deep Learning (Part I) Machine Learning Tom Mitchell: An algorithm that is able to learn from data Learning? A computer program is said to learn from experience E with respects to some


slide-1
SLIDE 1

CSci 8980 Basics of Machine Learning and Deep Learning (Part I)

slide-2
SLIDE 2

Machine Learning

  • Tom Mitchell:

– An algorithm that is able to learn from data

  • Learning?

– A computer program is said to learn from experience E with respects to some class of tasks T and performance measure P, if its performance at tasks T, as measured by P, improves with experience E.

slide-3
SLIDE 3

Machine Learning

  • Task types

– Classification: k categories – Regression: predict a value – Structured outputs: decompose/annotate output – Anomaly detection

  • Experience E; samples x

– Supervised: labelled outputs => p (y|x) – Unsupervised: non-labelled outputs => p(x) – Reinformement learning: seq. experience x1 x2 …

slide-4
SLIDE 4

Machine Learning

  • Input is represented by features

– image: pixels, color, … – game: move right

  • Extract features from inputs to solve a task

– Classic ML: human provides features – DL: system learns representation (i.e. features)

  • From simpler to complex (layers of simpler)
slide-5
SLIDE 5
  • Learning representations and patterns of data
  • Generalization (failure of classic AI/ML)
  • Learn (multiple levels of) representation by using a hierarchy of

multiple layers

DL vs. ML

https://www.xenonstack.com/blog/static/public/uploads/media/machine-learning-vs-deep-learning.png

slide-6
SLIDE 6
  • Manual features
  • over-specified, incomplete and take a long time to

design and validate

  • Learned Features are easy to adapt, fast to learn
  • Deep learning provides a universal, learnable

framework for representing world information

Why is DL useful?

In ~2010 DL started outperforming other ML techniques: e.g. speech, NLP, …

slide-7
SLIDE 7

Big Win in Vision

slide-8
SLIDE 8

Machine learning is a field of computer science that gives computers the ability to learn without being explicitly programmed Methods that can learn from and make predictions on data

?Labeled Data ?Labeled Data Machine Learning algorithm Learned model Prediction

Training Prediction

Machine Learning Basics

slide-9
SLIDE 9

ML in a Nutshell

  • Every machine learning algorithm has three

components:

– Representation – Evaluation – Optimization

slide-10
SLIDE 10

(Model) Representation

  • Decision trees
  • Sets of rules / Logic programs
  • Instances
  • Graphical models (Bayes/Markov nets)
  • Neural networks
  • Support vector machines
  • Model ensembles
  • Logistic regression
  • Randomized Forests
  • Boosted Decision Trees
  • K-nearest neighbor
  • Etc.
slide-11
SLIDE 11

Evaluation

  • Differ between supervised and unsupervised

learning

– Accuracy – Precision and recall – Mean squared error – Max Likelihood – Posterior probability – Cost / Utility – Entropy – Etc.

slide-12
SLIDE 12

Optimization

  • Combinatorial optimization

– E.g.: Greedy search

  • Convex optimization

– E.g.: Gradient descent

  • Constrained optimization

– E.g.: Linear programming

slide-13
SLIDE 13

Types of Learning

  • Supervised learning

– Training data includes desired outputs – Prediction, Classification, Regression

  • Unsupervised learning

– Training data does not include desired outputs – Clustering, Probability distribution estimation – Finding association (in features), Dimension reduction – Best representation of data

  • Reinforcement learning

– Rewards from sequence of actions – Seq. decision making (robot, chess, games)

slide-14
SLIDE 14

Regression

Supervised: Learning with a labeled training set Example: email classification with already labeled emails Unsupervised: Discover patterns in unlabeled data Example: cluster similar documents based on text Reinforcement learning: learn to act based on feedback/reward Example: learn to play Go, reward: win or lose

Types of Learning: examples

class A class B

Classification Anomaly Detection Sequence labeling … Clustering

http://mbjoseph.github.io/2013/11/27/measure.html

slide-15
SLIDE 15

15

Comparison

Supervised learning Unsupervised learning

slide-16
SLIDE 16
  • Supervised learning categories and techniques

– Linear classifier (numerical functions)

  • Works well: output depends on many features

– Parametric (probabilistic functions)

  • Work wells: limited data, but with assumptions about function
  • Naïve Bayes, Gaussian discriminant analysis (GDA), Hidden

Markov models (HMM), … – Non-parametric (Instance-based functions)

  • Works well: Lot of data, no prior knowledge
  • K-nearest neighbors, Kernel regression, Kernel density

estimation, …

Learning techniques

slide-17
SLIDE 17
  • Unsupervised learning categories and techniques

– Clustering

  • K-means clustering
  • Spectral clustering

– Density Estimation

  • Gaussian mixture model (GMM)
  • Graphical models

– Dimensionality reduction

  • Principal component analysis (PCA)
  • Factor analysis

Learning techniques

slide-18
SLIDE 18

Classification

  • Assign input vector to one of two or more classes
  • Any decision rule divides input space into

decision regions separated by decision boundaries

Slide credit: L. Lazebnik

slide-19
SLIDE 19

Linear Classifier

  • Find a linear function to separate the classes:

f(x) = sgn(w  x + b)

Slide credit: L. Lazebnik

slide-20
SLIDE 20

Classifiers: Nearest neighbor

f(x) = label of the example nearest to x

  • All we need is a distance function for our inputs
  • No training required!

Test example Previous examples from class 1 Previous examples from class 2

Slide credit: L. Lazebnik

slide-21
SLIDE 21

K-nearest neighbor

x x x x x x x x

  • x2

x1 + + Assign label of nearest training data point to each test data point

slide-22
SLIDE 22

1-nearest neighbor

x x x x x x x x

  • x2

x1 + +

slide-23
SLIDE 23

3-nearest neighbor

x x x x x x x x

  • x2

x1 + +

slide-24
SLIDE 24

5-nearest neighbor

x x x x x x x x

  • x2

x1 + +

  • Cannot discriminate between features

– Poor generalization if small “training set”

slide-25
SLIDE 25

Supervised Learning Goal

y = f(x)

  • Training: given a training set of labeled examples {(x1,y1),

…, (xN,yN)}, estimate f by minimizing the prediction error on the training set

  • Testing: apply f to a never before seen test example x and
  • utput the predicted value y = f(x)
  • utput

prediction function feature(s) or inputs

Slide credit: L. Lazebnik

slide-26
SLIDE 26

Example

  • Apply a prediction function to a feature representation of

the image to get the desired output:

f( ) = “apple” f( ) = “tomato” f( ) = “cow”

Slide credit: L. Lazebnik

slide-27
SLIDE 27

Generalization

  • How well does a learned model generalize from

the data it was trained on to a new test set?

Training set (labels known) Test set (labels unknown)

Slide credit: L. Lazebnik

slide-28
SLIDE 28

Prediction

Steps

Training Labels Training Images Training

Training

Image Features Image Features

Testing

Test Image Learned model Learned model

Slide credit: D. Hoiem and L. Lazebnik

slide-29
SLIDE 29
  • Training is the process of making the system able to

learn/generalize

  • No free lunch rule:

– Training set and testing set come from the same distribution – No universal ML algorithm! – Need to make some assumptions

Training and testing

slide-30
SLIDE 30
  • ML algorithm must perform well on unseen

inputs “generalization”

– Training error – run training data back on model – Testing error – error on new data

  • Underfit

– High training error

  • Overfit

– Gap between training and testing error too large

Under{Over} fitting

slide-31
SLIDE 31
slide-32
SLIDE 32

Generalization

  • Components of generalization error

– Bias: how much the average model over all training sets differ from the true model?

  • Error due to simplifications made by the model

– Variance: how much models estimated from different training sets differ from each other

  • Underfitting: model is too “simple” to represent all the relevant

class characteristics

– High bias and low variance – High training error and high test error

  • Overfitting: model is too “complex” and fits irrelevant

characteristics (noise) in the data

– Low bias and high variance – Low training error and high test error

Slide credit: L. Lazebnik

slide-33
SLIDE 33

Bias-Variance Trade-off

  • Models with too few

parameters are inaccurate because of a large bias (not enough flexibility)

  • Models with too many

parameters are inaccurate because of a large variance (too much sensitivity to the sample)

Slide credit: D. Hoiem

slide-34
SLIDE 34

Regularization

Prevent Overfitting: bias green toward black

slide-35
SLIDE 35

Effect of Training Size

Testing Training Generalization Error Number of Training Examples Error Fixed prediction model

Slide credit: D. Hoiem

slide-36
SLIDE 36

Comparison of errors

Using logistic regression

Training Testing Error rate: 0.11 Error rate: 0.145

slide-37
SLIDE 37
  • More on deep learning
  • Start research papers on Thursday

Next Week