Review We have provided a basic review of the probability theory - PDF document

Review • We have provided a basic review of the probability theory – What is a ( discrete ) random variable – Basic axioms and theorems – Conditional distribution – Bayes rule Bayes Rule P(A ^ B) P(B|A) P(A) P(A|B) = ----------- = --------------- P(B) P(B) More general forms: ( | ) ( ) P B A P A = ( | ) P A B + ( | ) ( ) ( |~ ) (~ ) P B A P A P B A P A ∧ ∧ ( | ) ( ) P B A X P A X ∧ = ( | ) P A B X ∧ ( ) P B X 1

Commonly used discrete distributions Binomial distribution: x ~ Binomial(n , p) the probability to see x heads out of n flips − − + ( 1 ) L ( 1 ) n n n x = − − ( ) ( 1 ) x n x P x p p ! x Categorical distribution: x can take K values, the distribution θ is specified by a set of ‘s k θ θ + θ + + θ = ... 1 =P(x=v k ), and 1 2 K k Multinomial distribution: Multinomial (n , [x 1 , x 2 , …, x k ]) The probability to see x 1 ones, x 2 twos, etc, out of n dice rolls ! n = θ θ θ ([ , ,..., ]) x x x 1 2 L P x x x k 1 2 ! ! ! 1 2 k L k x x x 1 2 k Continuous Probability Distribution • A continuous random variable x can take any value in an interval on the real line – X usually corresponds to some real-valued measurements, e.g., today’s lowest temperature – It is not possible to talk about the probability of a continuous random variable taking an exact value --- P(x=56.2)=0 – Instead we talk about the probability of the random variable taking a value within a given interval P(x ∈ [50, 60]) – This is captured in Probability density function 2

PDF: probability density function • The probability of X taking value in a given range [x1, x2] is defined to be the area under the PDF curve between x1 and x2 • We use f (x) to represent the PDF of x • Note: – f (x) ≥ 0 – f (x) can be larger than 1 ∞ – ∫ = 1 ( ) f x dx − ∞ 2 x ∫ ∈ = – ( [ 1 , 2 ]) ( ) P X x x f x dx 1 x What is the intuitive meaning of f(x) ? If f (x1)= α * a and f (x2)=a Then when x is sampled from this distribution, you are α times more likely to see that x is “very close to” x1 than that x is “very close to” x2 3

Commonly Used Continuous Distributions f f f • So far we have looked at univariate distributions, i.e., single random variables • Now we will briefly look at joint distribution of multiple variables • Why do we need to look at joint distribution? – Because sometimes different random variables are clearly related to each other • Imagine three random variables – A: teacher appears grouchy – B: teacher had morning coffee – C: kelly parking lot is full at 8:50 AM • How do we represent the distribution of 3 random variables together? 4

The Joint Distribution Example: Binary variables A, B, C Recipe for making a joint distribution of M variables: The Joint Distribution Example: Binary variables A, B, C A B C Recipe for making a joint distribution 0 0 0 of M variables: 0 0 1 0 1 0 1. Make a truth table listing all 0 1 1 combinations of values of your 1 0 0 variables (if there are M Boolean 1 0 1 variables then the table will have 1 1 0 2 M rows). 1 1 1 5

The Joint Distribution Example: Boolean variables A, B, C A B C Prob Recipe for making a joint distribution 0 0 0 0.30 of M variables: 0 0 1 0.05 0 1 0 0.10 1. Make a truth table listing all 0 1 1 0.05 combinations of values of your 1 0 0 0.05 variables (if there are M Boolean 1 0 1 0.10 variables then the table will have 1 1 0 0.25 2 M rows). 1 1 1 0.10 2. For each combination of values, say how probable it is. The Joint Distribution Example: Boolean variables A, B, C A B C Prob Recipe for making a joint distribution 0 0 0 0.30 of M variables: 0 0 1 0.05 0 1 0 0.10 1. Make a truth table listing all 0 1 1 0.05 combinations of values of your 1 0 0 0.05 variables (if there are M Boolean 1 0 1 0.10 variables then the table will have 1 1 0 0.25 2 M rows). 1 1 1 0.10 2. For each combination of values, say how probable it is. A 3. If you subscribe to the axioms of 0.05 0.10 0.05 probability, those numbers must 0.10 sum to 1. 0.25 0.05 C 0.10 B Question: What is the relationship 0.30 between p(A,B,C) and p(A)? 6

Using the Joint ∑ = One you have the JD you can ( ) ( row ) P E P ask for the probability of any rows matching E logical expression involving your attribute Using the Joint ∑ = P(Poor Male) = 0.4654 ( ) ( row ) P E P rows matching E 7

Inference with the Joint ∑ ( row ) P ∧ ( ) P E E rows matching and = = ( | ) 1 2 E E 1 2 P E E ∑ 1 2 ( ) ( row ) P E P 2 rows matching E 2 Inference with the Joint ∑ ( row ) P ∧ ( ) P E E = = rows matching and ( | ) 1 2 E E 1 2 P E E ∑ 1 2 ( ) ( row ) P E P 2 rows matching E 2 P(Male | Poor) = 0.4654 / 0.7604 = 0.612 8

So we have learned that • Joint distribution is useful! we can do all kinds of cool inference – I’ve got a sore neck: how likely am I to have meningitis? – Many industries grow around this kind of Inference: examples include medicine, pharma, Engine diagnosis etc. • But, HOW do we get joint distribution? – We can learn from data So we have learned that • Joint distribution is extremely useful! we can do all kinds of cool inference – I’ve got a sore neck: how likely am I to have meningitis? – Many industries grow around Beyesian Inference: examples include medicine, pharma, Engine diagnosis etc. • But, HOW do we get joint distribution? – We can learn from data 9

Learning a joint distribution Build a JD table for your The fill in each row with attributes in which the probabilities are unspecified records matching row ˆ = ( row ) P total number of records A B C Prob 0 0 0 ? A B C Prob 0 0 1 ? 0 0 0 0.30 0 1 0 ? 0 0 1 0.05 0 1 1 ? 0 1 0 0.10 1 0 0 ? 0 1 1 0.05 1 0 1 ? 1 0 0 0.05 1 1 0 ? 1 0 1 0.10 1 1 1 ? 1 1 0 0.25 Fraction of all records in which 1 1 1 0.10 A and B are True but C is False Example of Learning a Joint • This Joint was obtained by learning from three attributes in the UCI “Adult” Census Database [Kohavi 1995] UCI machine learning repository: http://www.ics.uci.edu/~mlearn/MLRepository.html 10

Where are we? • We have recalled the fundamentals of probability • We have become content with what JDs are and how to use them • And we even know how to learn JDs from data. Bayes Classifiers • A formidable and sworn enemy of decision trees Input Prediction of Classifier Attributes categorical output DT BC 11

Recipe for a Bayes Classifier • Assume you want to predict output Y which has arity n Y and values v 1 , v 2 , … v ny . • Assume there are m input attributes called X=( X 1 , X 2 , … X m ) • Learn a conditional distribution of p(X|y) for each possible y value, y = v 1 , v 2 , … v ny, , we do this by: – Break training set into n Y subsets called DS 1 , DS 2 , … DS ny based on the y values, i.e., DS i = Records in which Y=v i – For each DS i , learn a joint distribution of input distribution – This will give us p(X| Y=v i ), i.e., P( X 1 , X 2 , … X m | Y=v i ) Recipe for a Bayes Classifier • Assume you want to predict output Y which has arity n Y and values v 1 , v 2 , … v ny . • Assume there are m input attributes called X=( X 1 , X 2 , … X m ) • Learn a conditional distribution of p(X|y) for each possible y value, y = v 1 , v 2 , … v ny, , we do this by: – Break training set into n Y subsets called DS 1 , DS 2 , … DS ny based on the y values, i.e., DS i = Records in which Y=v i – For each DS i , learn a joint distribution of input distribution – This will give us p(X| Y=v i ), i.e., P( X 1 , X 2 , … X m | Y=v i ) • Idea: When a new example ( X 1 = u 1 , X 2 = u 2 , …. X m = u m ) come along, predict the value of Y that has the highest value of P( Y=v i | X 1 , X 2 , … X m ) = = = = predict argmax ( | ) L Y P Y v X u X u 1 1 m m v 12

Getting what we need = = = = predict argmax ( | ) L Y P Y v X u X u 1 1 m m v Getting a posterior probability = = = ( | ) L P Y v X u X u 1 1 m m = = = = ( | ) ( ) L P X u X u Y v P Y v = 1 1 m m = = ( ) L P X u X u 1 1 m m = = = = ( | ) ( ) L P X u X u Y v P Y v = 1 1 m m n ∑ Y = = = = ( | ) ( ) L P X u X u Y v P Y v 1 1 m m j j = 1 j 13

Bayes Classifiers in a nutshell 1. Learn the P( X 1 , X 2 , … X m | Y=v i ) for each value v i 3. Estimate P( Y=v i ) as fraction of records with Y=v i . 4. For a new prediction: = = = = argmax ( | ) predict L Y P Y v X u X u 1 1 m m v = = = = = argmax ( | ) ( ) L P X u X u Y v P Y v 1 1 m m v Estimating the joint distribution of X1 , X2 , … X m given y can be problematic! Joint Density Estimator Overfits • Typically we don’t have enough data to estimate the joint distribution accurately • It is common to encounter the following situation: – If no records have the exact X=( u 1 , u 2 , …. u m ) , then P( X|Y=v i ) = 0 for all values of Y. • In that case, what can we do? – we might as well guess Y’s value! 14

Review We have provided a basic review of the probability theory - PDF document

Review We have provided a basic review of the probability theory What is a ( discrete ) random variable Basic axioms and theorems Conditional distribution Bayes rule Bayes Rule P(A ^ B) P(B|A) P(A) P(A|B) =

FE Review-Transportation 1 FE Review-Transportation 2 FE Review-Transportation 3 FE

FE Review-Mechanics of Materials 1 FE Review-Mechanics of Materials 2 FE Review-Mechanics of

MTA-RF: Fabrication Readiness Review Bowring Review Daniel Bowring Lawrence Berkeley National

Keeyask Engineering Review Jan 30 2017 Project Design Review Contract Cost Review

Part 1 Part 1 I ntroduction Review of I ntroduction Review of I ntroduction, Review of I

Peer Review Process Boris Sokolov, PhD Scientific Review Officer Center for Scientific Review

SAB Review: SAB Review: IRIS Toxicological Review IRIS Toxicological Review of Acrylamide of

STATE DRUG OVERDOSE REVIEW FATALITY REVIEW TEAM November 28, 2017 Fatality Review Teams The

5-Year Review OCP Monitoring Program 5 Year Review Annual Review Five Year Review

Welcome & Introduction Welcome & Introduction Annual Review 2017 Annual Review 2017

Title I Annual Review July 7, 2014 Goal: Complete Title I Annual Review Outcomes: Review

Virginia Webb, PhD, RD Procurement Review Process First review cycle Review last

Review of the Department of Justice Review of the Department of Justice Review of the Department of

ML&P Sale Worksession #1 Plan for Transaction Review November 2 nd : Review of

SAMHSA GRANT REVIEW THE MYSTERY OF REVIEW REVEALED TENETS OF REVIEW Each application must

London Borough of Croydon Peer Review 20 th 22 nd June 2018 Review team Name Title Review

Frequentist example An entomologist spots what might be a rare subspecies of beetle, due to the

Probability Intro Part II: Bayes Rule Jonathan Pillow Mathematical Tools for Neuroscience

Lecture 4 : Conditional Probability and Bayes Theorem 0/ 26 The conditional sample space

Margin Dr. Richard Swenson M.D. HERES WHERE I WANT TO START >>> To create a

CSCE 478/878 Lecture 6: Bayesian Learning MAP learners 1. Provide practical learning

Discrete Mathematics & Mathematical Reasoning Chapter 7 (section 7.3): Conditional

Bayesian Inference Harvard Math Camp - Econometrics Ashesh Rambachan Summer 2018 Outline What

Hierarchical Methods for Bayesian Inverse Problems Optimization and Inversion under Uncertainty,

Review We have provided a basic review of the probability theory - PDF document

Review We have provided a basic review of the probability theory What is a ( discrete ) random variable Basic axioms and theorems Conditional distribution Bayes rule Bayes Rule P(A ^ B) P(B|A) P(A) P(A|B) =

FE Review-Transportation 1 FE Review-Transportation 2 FE Review-Transportation 3 FE

FE Review-Mechanics of Materials 1 FE Review-Mechanics of Materials 2 FE Review-Mechanics of

MTA-RF: Fabrication Readiness Review Bowring Review Daniel Bowring Lawrence Berkeley National

Keeyask Engineering Review Jan 30 2017 Project Design Review Contract Cost Review

Part 1 Part 1 I ntroduction Review of I ntroduction Review of I ntroduction, Review of I

Peer Review Process Boris Sokolov, PhD Scientific Review Officer Center for Scientific Review

SAB Review: SAB Review: IRIS Toxicological Review IRIS Toxicological Review of Acrylamide of

STATE DRUG OVERDOSE REVIEW FATALITY REVIEW TEAM November 28, 2017 Fatality Review Teams The

5-Year Review OCP Monitoring Program 5 Year Review Annual Review Five Year Review

Welcome &amp; Introduction Welcome &amp; Introduction Annual Review 2017 Annual Review 2017

Title I Annual Review July 7, 2014 Goal: Complete Title I Annual Review Outcomes: Review

Virginia Webb, PhD, RD Procurement Review Process First review cycle Review last

Review of the Department of Justice Review of the Department of Justice Review of the Department of

ML&amp;P Sale Worksession #1 Plan for Transaction Review November 2 nd : Review of

SAMHSA GRANT REVIEW THE MYSTERY OF REVIEW REVEALED TENETS OF REVIEW Each application must

London Borough of Croydon Peer Review 20 th 22 nd June 2018 Review team Name Title Review

Frequentist example An entomologist spots what might be a rare subspecies of beetle, due to the

Probability Intro Part II: Bayes Rule Jonathan Pillow Mathematical Tools for Neuroscience

Lecture 4 : Conditional Probability and Bayes Theorem 0/ 26 The conditional sample space

Margin Dr. Richard Swenson M.D. HERES WHERE I WANT TO START &gt;&gt;&gt; To create a

CSCE 478/878 Lecture 6: Bayesian Learning MAP learners 1. Provide practical learning

Discrete Mathematics &amp; Mathematical Reasoning Chapter 7 (section 7.3): Conditional

Bayesian Inference Harvard Math Camp - Econometrics Ashesh Rambachan Summer 2018 Outline What

Hierarchical Methods for Bayesian Inverse Problems Optimization and Inversion under Uncertainty,

Welcome & Introduction Welcome & Introduction Annual Review 2017 Annual Review 2017

ML&P Sale Worksession #1 Plan for Transaction Review November 2 nd : Review of

Margin Dr. Richard Swenson M.D. HERES WHERE I WANT TO START >>> To create a

Discrete Mathematics & Mathematical Reasoning Chapter 7 (section 7.3): Conditional