Machine Learning Basics Prof. Kuan-Ting Lai 2020/4/4 Machine - - PowerPoint PPT Presentation

machine learning basics
SMART_READER_LITE
LIVE PREVIEW

Machine Learning Basics Prof. Kuan-Ting Lai 2020/4/4 Machine - - PowerPoint PPT Presentation

Machine Learning Basics Prof. Kuan-Ting Lai 2020/4/4 Machine Learning Francois Chollet , Deep Learning with Python, Manning, 2017 Machine Learning Flow ( ) ( ) ( ) Data Evaluation Training


slide-1
SLIDE 1

Machine Learning Basics

  • Prof. Kuan-Ting Lai

2020/4/4

slide-2
SLIDE 2

Machine Learning

Francois Chollet, “Deep Learning with Python,” Manning, 2017

slide-3
SLIDE 3

Machine Learning Flow

(收集資料) Data (評估準確度) Evaluation (Loss Function) (訓練模型) Training (Optimization)

slide-4
SLIDE 4

4

Machine Learning

Supervised Learning

Classification Regression

Unsupervised Learning

Clustering Dimensionality Reduction

Reinforcement Learning

Deep Reinforcement Learning

slide-5
SLIDE 5

Machine Learning

Supervised Learning

Classification Regression

Unsupervised Learning

Clustering Dimensionality Reduction

Reinforcement Learning

Deep Reinforcement Learning

5

Has a teacher to label data!

slide-6
SLIDE 6

6

Machine Learning

Supervised Learning

Classification (分門別類) Regression (回歸分析)

Unsupervised Learning

Clustering (物以類聚) Dimensionality Reduction (化繁為簡)

Reinforcement Learning

Deep Reinforcement Learning

(連續資料) (離散資料)

slide-7
SLIDE 7

7

slide-8
SLIDE 8

scikit-learn.org

slide-9
SLIDE 9

9

Types of Data

slide-10
SLIDE 10

Data Types (Measurement Scales)

(Discrete) (Continuous)

https://towardsdatascience.com/data-types-in-statistics-347e152e8bee

slide-11
SLIDE 11

Nominal Data (Labels)

  • Nominal data are labeling variables without any quantitative value
  • Encoded by one-hot encoding for machine learning
  • Examples:
slide-12
SLIDE 12

Ordinal Data

  • Ordinal values represent discrete and ordered units
  • The order is meaningful and important
slide-13
SLIDE 13

Interval Data

  • Interval values represent ordered units that have the same difference
  • Problem of Interval: Don’t have a true zero
  • Example: Temperature Celsius (°C) vs. Fahrenheit (°F)
slide-14
SLIDE 14

Ratio Data

  • Same as interval data but have absolute zero
  • Can be applied to both descriptive and inferential statistics
  • Example: weight & height
slide-15
SLIDE 15

Machine Learning vs. Statistics

  • https://www.r-bloggers.com/whats-the-difference-between-

machine-learning-statistics-and-data-mining/

slide-16
SLIDE 16

Supervised and Unsupervised Learning

Supervised Learning

Regression Classification

Unsupervised Learning

Clustering Dimension Reduction

slide-17
SLIDE 17

Iris Flower Classification (鳶尾花分類)

slide-18
SLIDE 18

Extracting Features of Iris (抽取特徵值)

  • Width and Length of Petal (花瓣) and Sepal (花萼)
slide-19
SLIDE 19

Iris Flower Dataset

Jebaseelan Ravi @ Medium

slide-20
SLIDE 20

Classify Iris Species via Petals and Sepals

  • Iris versicolor and virginica

are not linearly separable

https://www.tensorflow.org/tutorials/customization/custom_training_walkthrough

slide-21
SLIDE 21

Linear Classifier

slide-22
SLIDE 22

Evaluation (Loss Function)

𝑦1 𝑦2

slide-23
SLIDE 23

Support Vector Machine (SVM)

  • Choose the hyperplanes that have the largest separation (margin)
slide-24
SLIDE 24

Loss Function of SVM

  • Calculate prediction errors
slide-25
SLIDE 25

SVM Optimization

  • Maximize the margin while reduce hinge loss
  • Hinge loss:
slide-26
SLIDE 26

Nonlinear Problem?

  • How to separate Versicolor and Virginica?
slide-27
SLIDE 27

SVM Kernel Trick

  • Project data into higher dimension and calculate the inner products

https://datascience.stackexchange.com/questions/17536/kernel-trick-explanation

slide-28
SLIDE 28

Nonlinear SVM for Iris Classification

slide-29
SLIDE 29

Using Neural Network

https://www.tensorflow.org/tutorials/customization/custom_training_walkthrough

slide-30
SLIDE 30

Supervised and Unsupervised Learning

Supervised Learning

Regression Classification

Unsupervised Learning

Clustering Dimension Reduction

slide-31
SLIDE 31

Linear Regression (Least squares)

  • Find a "line of best fit“ that minimizes the total of the square of the

errors

slide-32
SLIDE 32

Supervised and Unsupervised Learning

Supervised Learning

Regression Classification

Unsupervised Learning

Clustering Dimension Reduction

slide-33
SLIDE 33

Logistic Regression

  • Sigmoid function

𝑇 𝑦 = 𝑓𝑦 𝑓𝑦 + 1 = 1 1 + 𝑓−𝑦

https://en.wikipedia.org/wiki/Sigmoid_function

  • Derivative of Sigmoid

𝑇 𝑦 = 𝑇 𝑦 (1- 𝑇 𝑦 )

S-shaped curve

slide-34
SLIDE 34

Decision Boundary

  • Binary classification with decision boundary t

𝑧′ = 𝑄 𝑦, 𝑥 = 𝑄𝜄 𝑦 = 1 1 + 𝑓− 𝒙𝑈𝒚+𝑐 𝑧′ = ቊ0, 𝑦 < 𝑢 1, 𝑦 ≥ 𝑢

slide-35
SLIDE 35

Cross Entropy Loss

  • Loss function: cross entropy

loss= ൝− log 1 − 𝑄𝜄 𝑦 , 𝑗𝑔 𝑧 = 0 − log 𝑄𝜄 𝑦 , 𝑗𝑔 𝑧 = 1

https://towardsdatascience.com/a-guide-to-neural-network-loss-functions-with-applications-in-keras-3a3baa9f71c5

slide-36
SLIDE 36

Cross Entropy Loss

  • Loss function: cross entropy

loss= ൝− log 1 − 𝑄𝜄 𝑦 , 𝑗𝑔 𝑧 = 0 − log 𝑄𝜄 𝑦 , 𝑗𝑔 𝑧 = 1 ⇒ 𝑀𝜄(x) = −𝑧 log 𝑄𝜄 𝑦 + − (1 − y)log 1 − 𝑄𝜄 𝑦 ∇𝑀𝑋(x) = − 𝑧 − 𝑄𝜄 𝑦 𝑦

https://towardsdatascience.com/a-guide-to-neural-network-loss-functions-with-applications-in-keras-3a3baa9f71c5

slide-37
SLIDE 37

Machine Learning Workflow

https://towardsdatascience.com/workflow-of-a-machine-learning-project-ec1dba419b94

slide-38
SLIDE 38

Overfitting and Underfitting

Overfitting Underfitting

https://en.wikipedia.org/wiki/Overfitting

slide-39
SLIDE 39

Overfitting (以偏概全)

  • Overfitting is common, especially for neural networks
slide-40
SLIDE 40

Neural Network Urban Legend: Detecting Tanks

  • Detector learned the illumination of photos
slide-41
SLIDE 41

Bias and Variance Trade-off

  • Model with high variance overfits

to training data and does not generalize on unseen test data

http://scott.fortmann-roe.com/docs/BiasVariance.html

slide-42
SLIDE 42

Model Selection

slide-43
SLIDE 43

Training, Validation, Testing

  • Never leak test data information into our model
  • Tuning the hyperparameters of our model on validation dataset
slide-44
SLIDE 44

K-Fold Cross Validation

  • Lower the variance of validation set
slide-45
SLIDE 45

Regularization

  • https://developers.google.com/machine-learning/crash-

course/regularization-for-sparsity/l1-regularization

slide-46
SLIDE 46

Metrics:

Accuracy vs. Precision

in Binary Classification

46

slide-47
SLIDE 47

Confusion Matrix

https://en.wikipedia.org/wiki/Confusion_matrix

slide-48
SLIDE 48

Confusion Matrix

https://en.wikipedia.org/wiki/Confusion_matrix

slide-49
SLIDE 49

Coronavirus Example

  • Precision = 8 / 18 = 44%
  • Accuracy = (8 + 90) / 110 = 89%

https://www.facebook.com/numeracylab/posts/2997362376951435

slide-50
SLIDE 50

Popular Metrics

  • Notations

−P: positive samples, N: negative samples, P’: predicted positive samples, TP: true positives, TN: true negatives

  • Recall =

TP P

  • Precision =

TP P′

  • Accuracy =

TP+TN 𝑄+N

  • F1 score =

2

1 𝑠𝑓𝑑𝑏𝑚𝑚 + 1 𝑞𝑠𝑓𝑑𝑗𝑡𝑗𝑝𝑜

  • Miss rate = false negative rate = 1 – recall
slide-51
SLIDE 51

Evaluate Decision Boundary t

  • ROC (Receiver Operating

Characteristic) Curve

  • Precision-Recall (PR) Curve

Recall Precision False Positive Rate (FPR) True Positive Rate (TPR)

slide-52
SLIDE 52

Summary of ML Training Flow

  • 1. Defining the problem and assembling a dataset
  • 2. Choosing a measure of success
  • 3. Deciding on an evaluation protocol
  • 4. Preparing your data
  • 5. Developing a model that does better than a baseline
  • 6. Scaling up: developing a model that overfits
  • 7. Regularizing your model and tuning your hyperparameters
slide-53
SLIDE 53

Pedro Domingos – Things to Know about Machine Learning

53

slide-54
SLIDE 54

Useful Things to Know about Machine Learning

  • 1. It’s generalization that counts
  • 2. Data alone is not enough
  • 3. Overfitting has many faces
  • 4. Intuition fails in high dimensions
  • 5. Theoretical guarantees are not what they seem
  • 6. More data beats a cleverer algorithm
  • 7. Learn many models, not just one

Pedro Domingos, “A Few Useful Things to Know about Machine Learning,” Commun. ACM, 2012

slide-55
SLIDE 55

It’s Generalization that Counts

  • The goal of machine learning is to generalize beyond the

examples in the training set

  • Don’t use test data for training
  • Use cross validation to verify your model
slide-56
SLIDE 56

Data Alone Is Not Enough

  • No free lunch theorem (Wolpert)

−Every learner must embody some knowledge or assumptions beyond the data

  • Learners combine knowledge with data to grow

programs

slide-57
SLIDE 57

Overfitting Has Many Faces

  • Ex: when your model accuracy is 100% on

training data but only 50% on test data, when in fact it could have 75% on both, it has overfit.

  • Overfitting has many forms. Example: bias &

variance

  • Combat overfitting

− Cross validation − Add regularization term

slide-58
SLIDE 58

Intuition Fails in High Dimensions (Number of Features)

  • Curse of Dimensionality
  • Algorithms that work fine in low dimensions fail when the input is

high-dimensional

  • Generalizing correctly becomes exponentially harder as the

dimensionality of the examples grows

  • Our intuition only comes from 3-dimension
slide-59
SLIDE 59

Theoretical Guarantees Are Not What They Seem

  • Theoretical bounds are usually very loose
  • The main role of theoretical guarantees in machine learning is to help

understand and drive force for algorithm design

slide-60
SLIDE 60

More Data Beats a Cleverer Algorithm

  • Try simplest algorithm first
slide-61
SLIDE 61

Learn Many Models, Not Just One

  • Ensembling methods: Random Forest ,XGBoost, Late Fusion
  • Combining different models can get better results
slide-62
SLIDE 62

References

  • Francois Chollet, “Deep Learning with Python”, Chapter 4
  • Pedro Domingos, “A Few Useful Things to Know about Machine Learning,”
  • Commun. ACM, 2012
  • https://ml-cheatsheet.readthedocs.io/en/latest/index.html
  • https://machinelearningmastery.com/roc-curves-and-precision-recall-curves-for-

classification-in-python/

  • https://towardsdatascience.com/data-types-in-statistics-347e152e8bee
  • https://en.wikipedia.org/wiki/Naive_Bayes_classifier