introduction to machine learning
play

Introduction to machine learning Yifeng Tao School of Computer - PowerPoint PPT Presentation

Introduction to Machine Learning Introduction to machine learning Yifeng Tao School of Computer Science Carnegie Mellon University Slides adapted from Tom Mitchell, Eric Xing, Barnabas Poczos Yifeng Tao Carnegie Mellon University 1


  1. Introduction to Machine Learning Introduction to machine learning Yifeng Tao School of Computer Science Carnegie Mellon University Slides adapted from Tom Mitchell, Eric Xing, Barnabas Poczos Yifeng Tao Carnegie Mellon University 1

  2. Logistics o Course website: http://www.cs.cmu.edu/~yifengt/courses/machine-learning Slides uploaded after lecture o Time: Mon-Fri 9:50-11:30am lecture, 11:30-12:00pm discussion o Contact: yifengt@cs.cmu.edu Yifeng Tao Carnegie Mellon University 2

  3. What is machine learning Natural language Computational Computer vision processing Biology Machine Learning Probability Statistics Calculus Linear algebra Yifeng Tao Carnegie Mellon University 3

  4. Computer vision o Object detection [Figure from https://www.cvdeveloper.com/projects and Alex Krizhevsky et al. ] Yifeng Tao Carnegie Mellon University 4

  5. Natural language processing o NER, translation, document classification… [Figure from Jacob Devlin et al. ] Yifeng Tao Carnegie Mellon University 5

  6. Computational Biology o DNA-protein binding [Figure from Haoyang Zeng et al. ] Yifeng Tao Carnegie Mellon University 6

  7. What is machine learning? o What are we talking when we talk about AI and ML? Machine learning Deep learning Artificial intelligence Yifeng Tao Carnegie Mellon University 7

  8. What’s more after introduction? Probabilistic graphical models Deep learning Machine learning Conditional probability Learning theory Optimization Yifeng Tao Carnegie Mellon University 8

  9. What is machine learning o Methods that can help generalize information from the observed data so that it can be used to make better decisions in the future. o Supervised learning: given a set of features and values Learn a model that will predict a label to a new feature set. o Regression: predict continuous values o Classification: predict discrete labels o Unsupervised learning: Discover patterns in data o And more! E.g., transfer learning, semi-supervised learning, reinforcement learning etc. [Slide from Eric Xing et al. ] Yifeng Tao Carnegie Mellon University 9

  10. Supervised learning o Goal of supervised learning: Construct a predictor to minimize a risk (performance measure) o Not minimizing empirical errors: o Training and test sets [Slide from Barnabas Poczos et al. ] Yifeng Tao Carnegie Mellon University 10

  11. Topics o Supervised learning: linear models o Kernel machines: SVMs and duality o Unsupervised learning: latent space analysis and clustering o Supervised learning: decision tree, kNN and model selection o Learning theory o Neural network (basics) o Deep learning in CV and NLP o Probabilistic graphical models o Reinforcement learning and its application in clinical text mining o Attention mechanism and transfer learning in precision medicine Yifeng Tao Carnegie Mellon University 11

  12. Supervised learning: linear models Yifeng Tao Lecture 1 May 13, 2019 Yifeng Tao Carnegie Mellon University 12

  13. Example of regression o Predicting reviews of restaurant from factors i Price Distance Cuisine Review 1 30 21 7 4 2 15 12 8 2 3 27 53 9 5 [Slide from Barnabas Poczos et al. ] Yifeng Tao Carnegie Mellon University 13

  14. Empirical Risk Minimization (ERM) o More in the learning theory part… [Slide from Barnabas Poczos et al. ] Yifeng Tao Carnegie Mellon University 14

  15. Linear Regression [Slide from Barnabas Poczos et al. ] Yifeng Tao Carnegie Mellon University 15

  16. Linear Regression [Slide from Barnabas Poczos et al. ] Yifeng Tao Carnegie Mellon University 16

  17. Least Squares Estimator [Slide from Barnabas Poczos et al. ] Yifeng Tao Carnegie Mellon University 17

  18. Least Squares Estimator [Slide from Barnabas Poczos et al. ] Yifeng Tao Carnegie Mellon University 18

  19. Normal Equations [Slide from Barnabas Poczos et al. ] Yifeng Tao Carnegie Mellon University 19

  20. Cases A T A not invertible o Gene expression data: o n=20,000, p=50-4,000 o Regularization: Lasso [Slide from Barnabas Poczos et al. ] Yifeng Tao Carnegie Mellon University 20

  21. Geometric Interpretation [Slide from Barnabas Poczos et al. ] Yifeng Tao Carnegie Mellon University 21

  22. Pseudo Inverse (skip) [Slide from Barnabas Poczos et al. ] Yifeng Tao Carnegie Mellon University 22

  23. Pseudo Inverse (skip) [Slide from Barnabas Poczos et al. ] Yifeng Tao Carnegie Mellon University 23

  24. Polynomial Regression [Slide from Barnabas Poczos et al. ] Yifeng Tao Carnegie Mellon University 24

  25. Maximum Likelihood Estimation (MLE) o Goal: estimate distribution parameters θ from a dataset of n independent, identically distributed (i.i.d.), fully observed training cases: o Maximum Likelihood Estimation (MLE): o One of the most common estimators o With iid and fully-observability assumptions: o Pick the setting of parameters most likely to have generated the data we saw: o Maximum conditional likelihood: [Slide from Eric Xing et al. ] Yifeng Tao Carnegie Mellon University 25

  26. Least Squares and MLE [Slide from Barnabas Poczos et al. ] Yifeng Tao Carnegie Mellon University 26

  27. Least Squares and MLE o By independence assumption: o Therefore, [Slide from Eric Xing et al. ] Yifeng Tao Carnegie Mellon University 27

  28. Regularized Least Squares o Recap the polynomial regression o Intuition of overfitting: very large weights o How to solve/alleviate it? [Figure from Christopher M. Bishop ] Yifeng Tao Carnegie Mellon University 28

  29. Maximum a Posteriori Estimation (MAP) o The Bayesian theory: o The posterior equals to the likelihood times the prior, up to a constant o Maximum a posteriori estimator (MAP): o This allows us to capture uncertainty about the model in a principled way Yifeng Tao Carnegie Mellon University 29

  30. Regularized Least Squares and MAP [Slide from Barnabas Poczos et al. ] Yifeng Tao Carnegie Mellon University 30

  31. Regularized Least Squares and MAP [Slide from Barnabas Poczos et al. ] Yifeng Tao Carnegie Mellon University 31

  32. Example of classification o Predicting reviews of restaurant from features i Price Distance Cuisine Review 1 30 21 7 Good 2 15 12 8 Bad 3 27 53 9 Good [Slide from Barnabas Poczos et al. ] Yifeng Tao Yifeng Tao Carnegie Mellon University Carnegie Mellon University 32 32

  33. Logistic regression o Logistic/Sigmoid function: o p : the probability that y is 1 [Figure from Wikipedia ] Yifeng Tao Carnegie Mellon University 33

  34. Logistic regression: MLE o The likelihood function is: o Therefore, the conditional log-likelihood function: o The -l(β) is also referred to as “cross-entropy loss” o Good new: l(β) is a concave function of β o Bad news: no closed-form solution to maximize l(β) o Solution: optimization algorithm (to be discussed) [Slide from Tom Mitchell et al. ] Yifeng Tao Carnegie Mellon University 34

  35. Logistic regression: MAP o Gaussian prior of β: o Laplacian prior of β : Yifeng Tao Carnegie Mellon University 35

  36. Maximize conditional likelihood: gradient ascent o Gradient ascent algorithm: iterate until change < ε: o This applies to linear regression as well, although there exist closed form. initial point initial point [Slide from Tom Mitchell et al. ] Yifeng Tao Carnegie Mellon University 36

  37. Bayesian classifier i Price Distance Cuisine Review 1 30 21 7 Good 2 15 12 8 Bad 3 27 53 9 Good o Generative model vs discriminative model [Slide from Tom Mitchell et al. ] Yifeng Tao Carnegie Mellon University 37

  38. Bayesian classifier X Yifeng Tao Carnegie Mellon University 38

  39. Naïve Bayes o The Bayesian classifier requires large number of samples to train o Naïve Bayes assumes: o The X i are conditionally independent, given Y . o Therefore the classification rule for X new =(X 1 , …, X n ) is [Slide from Tom Mitchell et al. ] Yifeng Tao Carnegie Mellon University 39

  40. Naïve Bayes Algorithm o Very fast to train/estimate! [Slide from Tom Mitchell et al. ] Yifeng Tao Carnegie Mellon University 40

  41. Bag of words: model the documents o 8 th floor of Gates building, CMU [Figure from https://twitter.com/smithamilli/status/837153616116985856 ] Yifeng Tao Carnegie Mellon University 41

  42. Document classification o The (independent) probability that the i-th word of a given document occurs in a document from class C: o The probability that a given document D contains all of the words given a class C : o What is the probability that a given document D belongs to a given class C ? [Slide from Wikipedia ] Yifeng Tao Carnegie Mellon University 42

  43. Continuous X i in Naïve Bayes [Slide from Tom Mitchell et al. ] Yifeng Tao Carnegie Mellon University 43

  44. Estimating parameters of GNB o Y discrete, X i continuous [Slide from Tom Mitchell et al. ] Yifeng Tao Carnegie Mellon University 44

  45. Inference of Gaussian Naïve Bayes [Slide from Tom Mitchell et al. ] Yifeng Tao Carnegie Mellon University 45

  46. Linear models in application o R: glmnet package o Comprehensive regression formats o Linear / Logistic / Cox regression… o Flexible penalty form o Ridge, Lasso, elastic net regression o Optimization algorithms with a lot of heuristics o E.g., coordinate descent, warm start… o Easy to analyze results in a few lines o Python: scikit-learn package Yifeng Tao Carnegie Mellon University 46

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend