review of probability
play

Review of probability Nuno Vasconcelos UCSD Probability - PDF document

Review of probability Nuno Vasconcelos UCSD Probability probability is the language to deal with processes that are non-deterministic examples: if I flip a coin 100 times, how many can I expect to see heads? what is the


  1. Review of probability Nuno Vasconcelos UCSD

  2. Probability • probability is the language to deal with processes that are non-deterministic • examples: – if I flip a coin 100 times, how many can I expect to see heads? – what is the weather going to be like tomorrow? – are my stocks going to be up or down? – am I in front of a classroom or is this just a picture of it?

  3. Sample space • the most important concept is that of a sample space • our process defines a set of events – these are the outcomes or states of the process • example: – we roll a pair of dice – call the value on the up face at the n th toss x n – note that possible events such as x 2 � odd number on second throw 6 � two sixes � x 1 = 2 and x 2 = 6 – can all be expressed as combinations of the sample space events 1 x 1 1 6

  4. Sample space • is the list of possible events that satisfies the following properties: x 2 – finest grain: all possible distinguishable 6 events are listed separately – mutually exclusive: if one event happens the other does not (if x 1 = 5 it cannot be anything else) – collectively exhaustive: any possible 1 outcome can be expressed as unions of x 1 1 6 sample space events • mutually exclusive property simplifies the calculation of the probability of complex events • collectively exhaustive means that there is no possible outcome to which we cannot assign a probability

  5. Probability measure • probability of an event: – number expressing the chance that the event will be the outcome of the process • probability measure: satisfies three axioms – P(A) ≥ 0 for any event A x 2 – P(universal event) = 1 6 – if A ∩ B = ∅ , then P(A+B) = P(A) + P(B) • e.g. – P(x 1 ≥ 0) = 1 1 – P(x 1 even U x 1 odd) = P(x 1 even)+ P(x 1 odd) x 1 1 6

  6. Probability measure • the last axiom – combined with the mutually exclusive property of the sample set – allows us to easily assign probabilities to all possible events • back to our dice example: – suppose that the probability of any pair x 2 (x 1 ,x 2 ) is 1/36 6 – we can compute probabilities of all “union” events – P(x 2 odd) = 18x1/36 = 1 – P(U) = 36x1/36 = 1 1 – P(two sixes) = 1/36 x 1 1 6 – P(x 1 = 2 and x 2 = 6) = 1/36

  7. Probability measure • note that there are many ways to x 2 define the universal event U 6 – e.g. A = {x 2 odd}, B = {x 2 even}, U = A U B 1 – on the other hand x 1 1 6 U = (1,1) U (1,2) U (1,3) U … U (6,6) – the fact that the sample space is finest grain, exhaustive, and mutually exclusive and the measure axioms – make the whole procedure consistent

  8. Random variables • random variable X – is a function that assigns a real value to each sample space event – we have already seen one such function: P X (x 1 ,x 2 ) = 1/36 for all (x 1 ,x 2 ) • notation: – specify both the random variable and the value that it takes in your probability statements – we do this by specifying the random variable as subscript P X and the value as argument P X (x 1 ,x 2 ) = 1/36 means Prob[X=(x 1 ,x 2 )] = 1/36 – without this, probability statements can be hopelessly confusing

  9. Random variables • two types of random variables: – discrete and continuous – really means what types of values the RV can take • if it can take only one of a finite set of possibilities, we call it discrete – this is the dice example we saw, there are only 36 possibilities x 2 6 1 x 1 1 6

  10. Random variables • if it can take values in a real interval we say that the random variable is continuous • e.g. consider the sample space of weather temperature – we know that it could be any number between -50 and 150 degrees – random variable T ∈ [-50,150] – note that the extremes do not have to be very precise, we can just say that P(T < -45 o ) = 0 • most probability notions apply equal well to discrete and continuous random variables

  11. Discrete RV • for a discrete RV the probability assignments given by a probability mass function (PMF) – this can be thought of as a normalized histogram α – satisfies the following properties ≤ ≤ ∀ 0 ( ) 1 , P a a X ∑ = ( ) 1 P a X a • example for the random variable – X ∈ {1,2,3, …, 20} where X = i if the grade of student z on class is between 5i and 5(i+1) – we see that P X (14) = α

  12. Continuous RV • for a continuous RV the probability assignments are given by a probability density function (PDF) – this is just a continuous function – satisfies the following properties ≤ ∀ 0 ( ) P a a X ∫ = ( ) 1 P a da X • example for the Gaussian random variable of mean µ and variance σ 2 ⎧ ⎫ − µ 2 1 ( ) a = − ⎨ ⎬ ( ) exp P X a σ π σ 2 ⎩ 2 ⎭ 2

  13. Discrete vs continuous RVs • in general the same, up to replacing summations by integrals • note that PDF means “density of probability”, – this is probability per unit – the probability of a particular event is always zero (unless there is a discontinuity) – we can only talk about b ∫ ≤ ≤ = Pr( ) ( ) a X b P t dt X a – note also that PDFs are not upper bounded – e.g. Gaussian goes to Dirac when variance goes to zero

  14. Multiple random variables • frequently we have problems with multiple random variables – e.g. when in the doctor, you are mostly a collection of random variables � x 1 : temperature � x 2 : blood pressure � x 3 : weight � x 4 : cough � … • we can summarize this as – a vector X = (x 1 , …, x n ) of n random variables – P X (x 1 , …, x n ) is the joint probability distribution

  15. Marginalization ( cold ) ? P • important notion for multiple random variables is marginalization – e.g. having a cold does not depend on blood pressure and weight – all that matters are fever and cough – that is, we need to know P X1,X4 (a,b) • we marginalize with respect to a subset of variables – (in this case X 1 and X 4 ) – this is done by summing (or integrating) the others out ∑ = ( , ) ( , , , ) P x x P x x x x , 1 4 , , , 1 2 3 4 X X X X X X 1 4 1 2 3 4 , x x 3 4 ∫∫ = ( , ) ( , , , ) P x x P x x x x dx dx , 1 4 , , , 1 2 3 4 2 3 X X X X X X 1 4 1 2 3 4

  16. Conditional probability ( | ) ? P X sick cough | Y • another very important notion: – so far, doctor has P X1,X4 (fever,cough) – still does not allow a diagnostic – for this we need a new variable Y with two states Y ∈ {sick, not sick} – doctor measures fever and cough levels, these are no longer unknowns, or even random quantities – the question of interest is “what is the probability that patient is sick given the measured values of fever and cough?” • this is exactly the definition of conditional probability – what is the probability that Y takes a given value given observations for X ( | 98 , ) P sick high | 1 , Y X X 4

  17. Conditional probability • note the very important difference between conditional and joint probability • joint probability is an hypothetical question with respect to all variables – what is the probability that you will be sick and cough a lot? ( , ) ? P X sick cough , Y

  18. Conditional probability • conditional probability means that you know the values of some variables – what is the probability that you are sick given that you cough a lot? ( | ) ? P X sick cough | Y – “given” is the key word here – conditional probability is very important because it allows us to structure our thinking – shows up again and again in design of intelligent systems

  19. Conditional probability • fortunately it is easy to compute – we simply normalize the joint by the probability of what we know ( , 98 ) P sick = , Y X ( | 98 ) P sick 1 | Y X ( 98 ) 1 P X 1 – note that this makes sense since + = ( | 98 ) ( | 98 ) 1 P sick P not sick | | Y X Y X 1 1 – and, by the marginalization equation, + = ( , 98 ) ( , 98 ) ( 98 ) P sick P not sick P , , Y X Y X X 1 1 1 – the definition of conditional probability � just makes these two statements coherent � simply says that, given what we know, we still have a valid probability measure � universal event {sick} U {not sick} still probability 1 after observation

  20. The chain rule of probability • is an important consequence of the definition of conditional probability – note that, from this definition, = ( , ) ( | ) ( ) P y x P y x P x , 1 | 1 1 Y X Y X X 1 1 1 – more generally, it has the form = × ( , ,..., ) ( | ,..., ) P x x x P x x x , ,..., 1 2 | ,..., 1 2 X X X n X X X n 1 2 1 2 n n × × ( | ,..., ) ... P x x x | ..., 2 3 X X X n 2 3 n × × ... ( | ) ( ) P x x P x − | 1 X X n n X n − 1 n n n • combination with marginalization allows us to make hard probability questions simple

  21. The chain rule of probability • e.g. what is the probability that you will be sick and have 104 o of fever? = ( , 104 ) ( | 104 ) ( 104 ) P sick P sick P , | Y X Y X X 1 1 1 – breaks down a hard question (prob of sick and 104) into two easier questions – Prob (sick|104): everyone knows that this is close to one You have a cold! = ( | 104 ) 1 ! P X sick | Y

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend