BBM406 Fundamentals of Machine Learning Lecture 7: Probability - PowerPoint PPT Presentation

photo: Chessex Borealis™ Aquerple Polyhedral BBM406 Fundamentals of   Machine Learning Lecture 7: Probability Review (cont’d.) Maximum Likelihood Estimation (MLE) Aykut Erdem // Hacettepe University // Fall 2019

Administrative • Project proposal due November 15 • A half page description − problem to be investigated, − why it is interesting, − what data you will use, − related work. 2

Deadlines in the syllabus are   closer than they appear 3

Today • Probabilities - Dependence, Independence, Conditional Independence   • Parameter estimation - Maximum Likelihood Estimation (MLE) - Maximum a Posteriori (MAP) 4

Last time… Sample space Def : A sample space Ω is the set of all � possible outcomes of a (conceptual or physical) random experiment. ( Ω can be finite or infinite.) � Examples: • Ω may be the set of all possible outcomes of a � � dice roll (1,2,3,4,5,6)   • Pages of a book opened randomly. (1-157)   slide by Barnabás Póczos & Alex Smola • Real numbers for temperature, location, time, etc 5

Last time… Events We will ask the question: What is the probability of a particular event? Def: Event A is a subset of the sample space Ω Examples: What is the probability of - the book is open at an odd number slide by Barnabás Póczos & Alex Smola - rolling a dice the number <4 - a random person’s height X : a<X<b 6

Last time… Probability Def: Probability P(A), the probability that event (subset) A happens , is a function that maps the event A onto the interval [0, 1]. P(A) is also called the probability measure of A. outcomes in which A is false sample space � 1,3,5,6 outcomes in which A is slide by Barnabás Póczos & Alex Smola true 2,4 Example: Example: What is the probability that What is the probability that the P(A) is the volume of the area. the number on the dice is 2 or 4? number on the dice is 2 or 4? 10 7

Last time… Kolmogorov Axioms Consequences: slide by Barnabás Póczos & Alex Smola 8

Last time… Venn Diagram B A slide by Barnabás Póczos & Alex Smola �� P ( A U B ) = P ( A ) + P ( B ) - P ( A � B ) 9

Last time… Random Variables Def: Real valued random variable is a function of the outcome of a randomized experiment Examples: Discrete random variable examples ( � is discrete): • X( � ) = True if a randomly drawn person ( � ) from our • slide by Barnabás Póczos & Alex Smola class ( � ) is female X( � ) = The hometown X( � ) of a randomly drawn person • ( � ) from our class ( � ) 10

Last time… Discrete Distributions • Bernoulli distribution: Ber( p ) • Binomial distribution: Bin(n,p) Suppose a coin with head prob. p is tossed n times. What is the probability of getting k heads and n-k tails? slide by Barnabás Póczos & Alex Smola 17 11

Last time… Conditional Probability P(X|Y) = Fraction of worlds in which X event is true given Y event is true. No Flu Flu Headache 1/80 7/80 Y slide by Barnabás Póczos & Alex Smola X � Y X 1/80 71/80 No Headache 28 15

Last time… Conditional Probability P(X|Y) = Fraction of worlds in which X event is true given Y event is true. No Flu Flu Headache 1/80 7/80 Y slide by Barnabás Póczos & Alex Smola X � Y X 1/80 71/80 No Headache 28 16

Independence Independent random variables: Y and X don’t contain information about each other. Observing Y doesn’t help predicting X. Observing X doesn’t help predicting Y. Examples: slide by Barnabás Póczos & Alex Smola Independent: Winning on roulette this week and next week. Dependent: Russian roulette 17

Dependent / Independent Y Y slide by Barnabás Póczos & Alex Smola X X Independent X,Y Dependent X,Y 18

Conditionally Independent Conditionally independent : Knowing Z makes X and Y independent Examples: Dependent: shoe size of children and reading skills Conditionally independent: shoe size of children and reading skills given age slide by Barnabás Póczos & Alex Smola Stork deliver babies:   Highly statistically significant correlation   exists between stork populations and   human birth rates across Europe. 7 19

Conditionally Independent • London taxi drivers: A survey has pointed out a positive and significant correlation between the number of accidents and wearing coats. They concluded that coats could hinder movements of drivers and be the cause of accidents. A new law was prepared to prohibit drivers from wearing coats when driving. Finally, another study pointed out that people wear slide by Barnabás Póczos & Alex Smola coats when it rains… 20

Correlation ≠ Causation Number people who drowned by falling into a swimming-pool correlates with Number of films Nicolas Cage appeared in Correlation: 0.666004 21

Conditional Independence Formally: X is conditionally independent of Y given Z Equivalent to: slide by Barnabás Póczos & Alex Smola Note: does NOT mean Thunder is independent of Rain But given Lightning knowing Rain doesn’t give more info about Thunder 22

Parameter estimation: MLE, MAP Estimating Probabilities slide by Barnabás Póczos & Alex Smola 25

Flipping a Coin I have a coin, if I flip it, what’s the probability that it will fall with the head up? Let us flip it a few times to estimate the probability: slide by Barnabás Póczos & Alex Smola “Frequency of heads” The estimated probability is: 3/5 26

Flipping a Coin 3/5 “Frequency of heads” The estimated probability is: Questions: (1) Why frequency of heads??? (2) How good is this estimation??? slide by Barnabás Póczos & Alex Smola (3) Why is this a machine learning problem??? We are going to answer these questions 30

Question (1) Why frequency of heads???   • Frequency of heads is exactly the   maximum likelihood estimator for this problem   • MLE has nice properties   (interpretation, statistical guarantees, simple) slide by Barnabás Póczos & Alex Smola 31

32 Maximum Likelihood Estimation slide by Barnabás Póczos & Alex Smola

MLE for Bernoulli distribution Data, D = P(Heads) = θ , P(Tails) = 1- θ Flips are i.i.d. : – Independent events slide by Barnabás Póczos & Alex Smola Identically distributed according to Bernoulli distribution – MLE: Choose θ that maximizes the probability of observed data 33

BBM406 Fundamentals of Machine Learning Lecture 7: Probability - PowerPoint PPT Presentation

photo: Chessex Borealis Aquerple Polyhedral BBM406 Fundamentals of Machine Learning Lecture 7: Probability Review (contd.) Maximum Likelihood Estimation (MLE) Aykut Erdem // Hacettepe University // Fall 2019 Administrative Project

BBM406 Fundamentals of Machine Learning Lecture 1: Course outline and logistics An overview

BBM406 Fundamentals of Machine Learning Lecture 23: Dimensionality Reduction Aykut Erdem //

BBM406 Fundamentals of Machine Learning Lecture 6: Learning theory Probability Review Aykut

BBM406 Fundamentals of Machine Learning Lecture 18: Decision Trees Aykut Erdem // Hacettepe

BBM406 Fundamentals of Machine Learning Lecture 9: Logistic Regression Discriminative vs.

BBM406 Fundamentals of Machine Learning Lecture 11: Multi-layer Perceptron Forward Pass

BBM406 Fundamentals of Machine Learning Lecture 13: Introduction to Deep Learning Aykut

BBM406 Fundamentals of Machine Learning Lecture 10: Linear Discriminant Functions Perceptron

BBM406 Fundamentals of Machine Learning Lecture 19: What is Ensemble Learning? Bagging

BBM406 Fundamentals of Machine Learning Lecture 8: Maximum a Posteriori (MAP) Nave Bayes

BBM406 Fundamentals of Machine Learning Lecture 2: Machine Learning by Examples, Nearest

BBM406 Fundamentals of Machine Learning Lecture 20: AdaBoost Aykut Erdem // Hacettepe

BBM406 Fundamentals of Machine Learning Lecture 15: Support Vector Machines Aykut Erdem //

BBM406 Fundamentals of Machine Learning Lecture 17: Kernel Trick for SVMs Risk and Loss

BBM406 Fundamentals of Machine Learning Lecture 12: Computational Graph Backpropagation

BBM406 Fundamentals of Machine Learning Lecture 14: Deep Convolutional Networks Aykut Erdem

ClkScrew Aaron Zhang Outline Introduction to DVFS and background information. What makes

Flipping the classroom in Rutgers Mathematics Courses Rutgers Active Learning Conference 2019

Goals The Clock introduce clock signal. logical level clock fall clock rise Chapter 11:

CS 4100: Artificial Intelligence Informed Search Instructor: Jan-Willem van de Meent [Adapted

Flip Flops Lecture 10 CAP 3103 06-18-2014 Uses for State Elements 1. As a place to store

Arguing a Research Project CS 197 | Stanford University | Michael Bernstein

Slides for Lecture 24 ENEL 353: Digital Circuits Fall 2013 Term Steve Norman, PhD, PEng

Real World Example Big Picture Computer Overview (Chapter 1) Buzzer Feature for a