Introduction to Machine Learning CMU-10701 2. MLE, MAP What - PowerPoint PPT Presentation

May 05, 2023 •574 likes •739 views

Introduction to Machine Learning CMU-10701 2. MLE, MAP What happened last time? Barnabs Pczos & Aarti Singh 2014 Spring Administration Piazza: Please use it! Blackboard is ready Self assessment questions?

Introduction to Machine Learning CMU-10701 2. MLE, MAP What happened last time? Barnabás Póczos & Aarti Singh 2014 Spring
Administration  Piazza: … Please use it!  Blackboard is ready  Self assessment questions?  Slides are online  HW questions next week  Feedback is important!  Recitation: This Wednesday at 6pm (prob theory) 2
Independence Independent random variables: Y and X don’t contain information about each other. Observing Y doesn’t help predicting X. 3
Dependent / Independent Y Y X X Independent X,Y Dependent X,Y 4
Conditionally Independent Conditionally independent : Knowing Z makes X and Y independent Examples: Dependent: show size and reading skills Conditionally independent: show size and reading skills given age 5
Our first machine learning problem: Parameter estimation: MLE, MAP 6
MLE for Bernoulli distribution Data, D = P(Heads) =  , P(Tails) = 1-  “Frequency of heads” The estimated probability is: 3/5 MLE: Choose  that maximizes the probability of observed data 7
Maximum Likelihood Estimation MLE: Choose  that maximizes the probability of observed data Independent draws Identically distributed 8
How good is this estimator? I want to know the coin parameter  2 [0,1] within  = 0.1 error, with probability at least 1-  = 0.95. How many flips do I need? 9
Rolling a Dice, Estimation of parameters  1 ,  2 ,…,  6 12 24 Does the MLE estimation (relative frequancies) converge to the right value? How fast does it converge? 60 120 10
Rolling a Dice Calculating the Empirical Average Does the empirical average converge to the true mean? How fast does it converge? 11
Rolling a Dice, Calculating the Empirical Average 5 sample traces How fast do they converge to the true mean? 12
Hoeffding’s inequality (1963) It only contains the range of the variables, but not the variances. 13
“Convergence rate” for LLN from Hoeffding From Hoeffding: Convergence rate 14
Introduction to Machine Learning CMU-10701 Stochastic Convergence and Tail Bounds Barnabás Póczos

Recommend

Introduction to Machine Learning CMU-10701 Support Vector Machines Barnabs Pczos & Aarti

Introduction to Machine Learning CMU-10701 Support Vector Machines Barnabs Pczos & Aarti Singh 2014 Spring http://barnabas-cmu-10701.appspot.com/ Linear classifiers which line is better? Which decision boundary is better? 4 Pick

1.19k views • 73 slides

CMU-10701 Support Vector Machines Barnabs Pczos & Aarti Singh 2014 Spring

1.49k views • 71 slides

Introduction to Machine Learning CMU-10701 11. Learning Theory Barnabs Pczos Learning

Introduction to Machine Learning CMU-10701 11. Learning Theory Barnabs Pczos Learning Theory We have explored many ways of learning from data But How good is our classifier, really? How much data do we need to make it good

862 views • 48 slides

Introduction to Machine Learning CMU-10701 2. MLE, MAP, Bayes classification Barnabs Pczos

Introduction to Machine Learning CMU-10701 2. MLE, MAP, Bayes classification Barnabs Pczos & Aarti Singh 2014 Spring Administration http://www.cs.cmu.edu/~aarti/Class/10701_Spring14/index.html Blackboard manager & Peer grading:

1.08k views • 41 slides

Introduction to Machine Learning CMU-10701 Deep Learning Barnabs Pczos & Aarti Singh

Introduction to Machine Learning CMU-10701 Deep Learning Barnabs Pczos & Aarti Singh Credits Many of the pictures, results, and other materials are taken from: Ruslan Salakhutdinov Joshua Bengio Geoffrey Hinton Yann LeCun 2

1.22k views • 64 slides

Introduction to Machine Learning CMU-10701 23. Decision Trees Barnabs Pczos Contents

Introduction to Machine Learning CMU-10701 23. Decision Trees Barnabs Pczos Contents Decision Trees: Definition + Motivation Algorithm for Learning Decision Trees Entropy, Mutual Information, Information gain Generalizations

1.06k views • 68 slides

Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabs Pczos

Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabs Pczos & Aarti Singh Contents Markov Chain Monte Carlo Methods Goal & Motivation Sampling Rejection Importance Markov

1.16k views • 70 slides

Introduction to Machine Learning CMU-10701 2. Basic Statistics Barnabs Pczos & Alex

Introduction to Machine Learning CMU-10701 2. Basic Statistics Barnabs Pczos & Alex Smola Remember the color coding Important Not so important You can sleep now 2 Please ask Questions and give us Feedbacks ! 3 2. Basic

942 views • 78 slides

Introduction to Machine Learning CMU-10701 10. Risk Minimization Barnabs Pczos 10. Risk

Introduction to Machine Learning CMU-10701 10. Risk Minimization Barnabs Pczos 10. Risk Minimization 2 What have we seen so far? Several algorithms that seem to work fine on training datasets: Linear regression Nave Bayes

1.13k views • 51 slides

Introduction to Machine Learning CMU-10701 14. Principal Component Analysis Barnabs Pczos

Introduction to Machine Learning CMU-10701 14. Principal Component Analysis Barnabs Pczos Contents Motivation PCA algorithms Applications Some of these slides are taken from Karl Booksh Research group Tom Mitchell

1.15k views • 72 slides

Introduction to Machine Learning CMU-10701 3. Bayes classification Barnabs Pczos & Aarti

Introduction to Machine Learning CMU-10701 3. Bayes classification Barnabs Pczos & Aarti Singh 2014 Spring What about prior knowledge ? (MAP Estimation) 2 What about prior knowledge, Domain knowledge, expert knowledge We know the

746 views • 39 slides

Introduction to Machine Learning CMU-10701 Principal Component Analysis Barnabs Pczos &

Introduction to Machine Learning CMU-10701 Principal Component Analysis Barnabs Pczos & Aarti Singh Contents Motivation PCA algorithms Applications Some of these slides are taken from Karl Booksh Research group

1.23k views • 73 slides

Introduction to Machine Learning CMU-10701 Clustering and EM Barnabs Pczos & Aarti Singh

Introduction to Machine Learning CMU-10701 Clustering and EM Barnabs Pczos & Aarti Singh Contents Clustering K-means Mixture of Gaussians Expectation Maximization Variational Methods 2 Clustering 3 K- means

1.01k views • 67 slides

Introduction to Machine Learning CMU-10701 Hidden Markov Models Barnabs Pczos & Aarti

Introduction to Machine Learning CMU-10701 Hidden Markov Models Barnabs Pczos & Aarti Singh Slides courtesy: Eric Xing i.i.d to sequential data So far we assumed independent, identically distributed data Sequential (non i.i.d.)

519 views • 30 slides

Introduction to Machine Learning CMU-10701 19. Clustering and EM Barnabs Pczos Contents

Introduction to Machine Learning CMU-10701 19. Clustering and EM Barnabs Pczos Contents Clustering K-means Mixture of Gaussians Expectation Maximization Variational Methods Many of these slides are taken from Aarti

917 views • 65 slides

Introduction to Machine Learning CMU-10701 8. Stochastic Convergence Barnabs Pczos

Introduction to Machine Learning CMU-10701 8. Stochastic Convergence Barnabs Pczos Motivation 2 What have we seen so far? Several algorithms that seem to work fine on training datasets: Linear regression Nave Bayes classifier

968 views • 31 slides

P( ) Spring 2015 W.L. Ruzzo 1 conditional probability - intuition Roll one fair die. What is

4. Conditional Probability BT 1.3, 1.4 CSE 312 P( ) Spring 2015 W.L. Ruzzo 1 conditional probability - intuition Roll one fair die. What is the probability that the outcome is 5? 1/6 (5 is one of 6 equally likely outcomes) What is

659 views • 43 slides

Bayesian Updating: Discrete Priors: 18.05 Spring 2014 Jeremy Orloff and Jonathan Bloom Courtesy of

Bayesian Updating: Discrete Priors: 18.05 Spring 2014 Jeremy Orloff and Jonathan Bloom Courtesy of xkcd. CC-BY-NC. http://xkcd.com/1236/ Which treatment would you choose? 1. Treatment 1: cured 3 out of 3 patients in a trial. 2. Treatment 2:

591 views • 20 slides

Bayesian Updating: Discrete Priors: 18.05 Spring 2014 http://xkcd.com/1236/ January 1, 2017

Bayesian Updating: Discrete Priors: 18.05 Spring 2014 http://xkcd.com/1236/ January 1, 2017 1 / 16 Learning from experience Which treatment would you choose? 1. Treatment 1: cured 100% of patients in a trial. 2. Treatment 2: cured 95% of

455 views • 16 slides

TEP 4215 - Plan for Assignments with Guidance Ass. Topic Supervised Deadline 1 Sequence of

Department of Energy and Process Engineering TEP 4215 - Energy Utilization and Process Integration in Industrial Plants, or for short: Energy and Process The Objective is to convey u Systems Thinking and Systematic Methods for:

536 views • 30 slides

Spring + JPA + Hibernate Agenda Persistence JdbcTemplate Hibernate JPA Spring

Spring + JPA + Hibernate Agenda Persistence JdbcTemplate Hibernate JPA Spring Spring 2.x JPA features J2EE 1.4 Reality Check Common Technology Stack Spring (IoC) Hibernate (Light-Weight Persistence)

1.16k views • 48 slides

Higher areas Modeling Extrastriate Areas Many higher areas beyond V1 Dr. James A. Bednar

Higher areas Modeling Extrastriate Areas Many higher areas beyond V1 Dr. James A. Bednar Selective for jbednar@inf.ed.ac.uk faces, buildings, http://homepages.inf.ed.ac.uk/jbednar self-motion, etc. Not as well understood

567 views • 8 slides

Game playing Chapter 5, Sections 16 of; based on AIMA Slides c Artificial Intelligence,

Game playing Chapter 5, Sections 16 of; based on AIMA Slides c Artificial Intelligence, spring 2013, Peter Ljungl Stuart Russel and Peter Norvig, 2004 Chapter 5, Sections 16 1 Outline Games Perfect play minimax

375 views • 24 slides

5. Conditioning and Independence Andrej Bogdanov Conditional PMF Let X be a random variable and

ENGG 2430 / ESTR 2004: Probability and Sta.s.cs Spring 2019 5. Conditioning and Independence Andrej Bogdanov Conditional PMF Let X be a random variable and A be an event. The conditional PMF of X given A is P ( X = x | A ) = P ( X = x and A ) P

633 views • 35 slides