Introduction to Machine Learning CMU-10701 2. MLE, MAP What - - PowerPoint PPT Presentation

introduction to machine learning cmu 10701
SMART_READER_LITE
LIVE PREVIEW

Introduction to Machine Learning CMU-10701 2. MLE, MAP What - - PowerPoint PPT Presentation

Introduction to Machine Learning CMU-10701 2. MLE, MAP What happened last time? Barnabs Pczos & Aarti Singh 2014 Spring Administration Piazza: Please use it! Blackboard is ready Self assessment questions?


slide-1
SLIDE 1

Introduction to Machine Learning CMU-10701

  • 2. MLE, MAP

Barnabás Póczos & Aarti Singh 2014 Spring

What happened last time?

slide-2
SLIDE 2

Administration

  • Piazza: … Please use it!
  • Blackboard is ready
  • Self assessment questions?
  • Slides are online
  • HW questions next week
  • Feedback is important!
  • Recitation: This Wednesday at 6pm (prob theory)

2

slide-3
SLIDE 3

Independence

3

Y and X don’t contain information about each other. Observing Y doesn’t help predicting X. Independent random variables:

slide-4
SLIDE 4

Dependent / Independent

Independent X,Y X Dependent X,Y X Y Y

4

slide-5
SLIDE 5

Conditionally Independent

5

Dependent: show size and reading skills Conditionally independent: show size and reading skills given age

Examples:

Conditionally independent: Knowing Z makes X and Y independent

slide-6
SLIDE 6

Parameter estimation: MLE, MAP

6

Our first machine learning problem:

slide-7
SLIDE 7

MLE for Bernoulli distribution

7

Data, D = P(Heads) = , P(Tails) = 1-

MLE: Choose  that maximizes the probability of observed data

3/5

“Frequency of heads”

The estimated probability is:

slide-8
SLIDE 8

Maximum Likelihood Estimation

Independent draws Identically distributed

8

MLE: Choose  that maximizes the probability of observed data

slide-9
SLIDE 9

How good is this estimator?

9

I want to know the coin parameter 2[0,1] within  = 0.1 error, with probability at least 1- = 0.95.

How many flips do I need?

slide-10
SLIDE 10

Rolling a Dice, Estimation of parameters 1,2,…,6

24 120 60 12

10

Does the MLE estimation (relative frequancies) converge to the right value? How fast does it converge?

slide-11
SLIDE 11

11

Rolling a Dice Calculating the Empirical Average

Does the empirical average converge to the true mean? How fast does it converge?

slide-12
SLIDE 12

5 sample traces

12

How fast do they converge to the true mean?

Rolling a Dice, Calculating the Empirical Average

slide-13
SLIDE 13

Hoeffding’s inequality (1963)

It only contains the range of the variables, but not the variances.

13

slide-14
SLIDE 14

“Convergence rate” for LLN from Hoeffding

From Hoeffding:

14

Convergence rate

slide-15
SLIDE 15

Introduction to Machine Learning CMU-10701

Stochastic Convergence and Tail Bounds

Barnabás Póczos