basic concepts
play

Basic Concepts G. Urvoy-Keller urvoy@unice.fr Probabilty and - PowerPoint PPT Presentation

Basic Concepts G. Urvoy-Keller urvoy@unice.fr Probabilty and Statistics Outline Basic concepts Probability Conditional Probability Moments Common Distributions Binomial Zipf Poisson Uniform


  1. Basic Concepts G. Urvoy-Keller urvoy@unice.fr Probabilty and Statistics

  2. Outline Basic concepts � Probability � Conditional Probability � Moments � Common Distributions � Binomial � Zipf � Poisson � Uniform � Normal � Beta � Gamma � 2

  3. Basic Concepts A random experiment is an experiment whose outcome � cannot be predicted with certainty The sample space is the set of all possible outcomes from an � experiment The outcomes from random experiments are called random � variables and often represented as uppercase variables (e.g. X) Random variables can be discrete or continuous � An event is a subset of outcomes in the sample space � Mutually exclusive events: 2 events that cannot occur � altogether Extension: n events that taken in every possible pairs are � 3 mutually exclusive

  4. Probability Probability is the measure of the likelihood that some event will occur � Historically, there are two ways of computing probabilities � Equal likelihood model (classical theory): � For an event E we count (no experiment) the number n of favorable outcomes � We also know the total number of possible outcomes N � We then set P=n/N � We thus assume that all outcomes are equally likely • Works well for coin and die tossing, cards. � Relative frequency methods: � Can be used when all outcomes are not equally likely � “Active method” where the experiment is carried out n times � If the event E occurred f times, then P=f/n � Modern theory of probabilities is based on axiomatic theory � The probability of an event is computed based on: � Probability density function in the case of a continuous random variable � Probability mass function in the case of a discrete random variable � Common convention: use density (or pdf) for discrete and continuous rv � 4

  5. Probability in the case of a continuous random variable Let f(x)=P(x<X<x+dx)/dx be the probability density function � (pdf) b P ( a ≤ X ≤ b )= ∫ f ( x ) dx a f(x) x) 5 x

  6. Probability in the case of a discrete random variable Let f(x) be the probability mass function (pmf) � b P ( a ≤ X ≤ b )= ∑ f ( x ) a f(x) x) b a 6

  7. Cumulative Distribution Function The cdf F(x) is the probability that the random variable X is � less than or equal to x: x F ( x )= ∫ f ( u ) du ( continuous case ) −∞ Cdf dfs s conver converge ge to o 1 F ( x )= ∑ f ( x i ) ( discrete case ) x i ≤ x 7

  8. Axioms of Probability Let S be the sample space and E be an event (i.e., subset of S) � Axiom 1: The probability of event E must be between 0 and 1: � 0≤P(E)≤1 Axiom 2: � P(S)=1 Axiom 3: for mutually exclusive events E 1 ,E 2 ,…,E n � n P ( E 1 ∪ E 2 ∪ ... ∪ E n )= ∑ P ( E i ) 1 8

  9. Axioms of Probability Axiom 1 states that a probability must be between 0 and 1. � This means that pdf and pmf must be positive and sum to 1 Axiom 2 says that an outcome must occur and the sample � space cover all possible outcomes Axiom 3 enables to compute the probability that at least one � of the mutually exclusive events occur by summing their individual probabilities. 9

  10. Conditional Probability and Independence The conditional probability of event E given event F is defined � as: P ( E ∣ F )= P ( E ∩ F ) P ( F ) P(E∩F) represents the probability that E and F occur together � P(F) appears as a “re-normalization” factor � Example: for mutually exclusive events E and F, P(E∩F) =0 � and thus P(E|F)=0. The latter denotes a very strong dependence between the two events! 10

  11. Conditional Probability and Independence Independence: two events E and F are said to be independent � if: P ( E ∣ F ) =P ( E ) which is equivalent to: P ( E ∩ F ) =P ( E ) P ( F ) Definition for the case of n events: E 1 ,…E n are said to be � independent if any subset E (1) , E (2) ,.. E (k) , is independent P ( E ( 1 ) ∩ E ( 2 ) ... ∩ E ( k ) ) =P ( E ( 1 ) )× P ( E ( 2 ) )× .... × P ( E ( k ) ) Independence is not transitive!!!! � If E1 is independent from E2 and E2 from E3, E1 might depend on E3 Independence is reflexive: if E is independent from F, F is � independent from E since P ( F ∣ E ) =P ( E ∣ F ) P ( F ) 11 P ( E )

  12. Conditional Probability- Illustration It has been demonstrated that there was a lot free riders in � Gnutella networks. Free-riders: clients that retrieve documents but do not provide � any data to other peers. A natural question that may arise when studying such � systems is: “How many files does a client share with its peers?” Due to free-riding, you will find very low figures. It is thus � better to split the above question into two sub-questions: What is the probability that a client is a free-rider? � What is the probability that a non free-rider shares n files? � 12

  13. Conditional Probability- Illustration Let: � Q be the random variable that denotes the number of files offered � by a client S be the random variable that denotes the type of client � F: free-rider � Non-F: not a free rider � The previous questions can be formulated as follows: � P ( S = F ) P ( Q=n ∣ S=non − F ) 13

  14. Independence - Illustration A die is tossed twice. Consider the following events: � A: the first toss gives an odd number � B: the second toss gives an odd number � C: the sum of the two tosses is an odd number � Any pair of the previous events are independent. Indeed � P(A)=P(B)=P(C)=1/2 � P(A∩B)=P(A∩C)=P(B∩C)=1/4 � Since to obtain an odd number, you need one odd number � Still, P(A∩B∩C)=0. Hence (A,B,C) are not independent � 14

  15. Total Probability Theorem Theorem: Let E 1 ,E 2 ,…E n be n mutually exclusive events such � that U i E i =S (S is the sample space) and P(E i ) ≠ 0. Let B be an event. Then: n P ( B )= ∑ P ( B ∣ E i ) P ( E i ) i= 1 Proof: � 15

  16. Bayes Theorem Bayes theorem allows to estimate a “posteriori” probabilities � from “a priori” probabilities. Consider the following problem: one wants to evaluate the � efficiency of a test for a disease. Let: A= event that the test states that the person is infected � B=event that the person is infected � A c =event that the test states that the person is not infected � B c =event that the person is not infected � Suppose we have the following a-priori information: � P(A|B)=P(A c |B c )=0.95 - obtained from tests on well defined � populations P(B)=0.005 � A good measure of the efficiency of the test is the “a � posteriori” probability P(B|A) 16

  17. Bayes Theorem Theorem: given a event F and a set of mutually exclusive � events E 1 ,E 2 ,…E n whose union makes up the entire sample space: P ( E i ) P ( F ∣ E i ) P ( E i ∣ F )= n ∑ P ( F ∣ E k ) P ( E k ) k= 1 A post poster erio o inf nfor ormat ation on A pr prior ori inf nfor ormat ation on Derivation of the theorem is straightforward using the � definition of conditional probabilities 17

  18. Bayes Theorem Applied to the “disease test” problem stated before, we � obtain: P ( B ∣ A )= P ( B ) P ( A ∣ B ) P ( B ) P ( A ∣ B ) +P ( B c ) P ( A ∣ B c ) 0.005 × 0.95 0.005 × 0.95 + 0.995 ( 1 − 0.95 ) = 0.087 Thus, when the test is positive, the person is in fact infected in � only 8.7% of the cases! Very bad!!! Conclusion: even if “a priori” tests were correct for 95% of the � cases, this was not enough due to the scarcity of the disease For example, with P(B)=0.1 and , we would have obtained: � 18 P(B|A)=68% (not that good either…)

  19. Mean and Variance The mean or average value E[X]= µ of a distribution provides a � measure of the tendency of a distribution. +∞ E [ X ]= ∫ xf ( x ) dx ( continuous case ) −∞ +∞ ∑ x i f ( x i ) ( discrete case ) i= 1 The variance V(X)= σ 2 of a random variable (r.v.) X measures the average dispersion around the mean μ +∞ 2 f ( x ) dx ( continuous case ) 2 ]= ∫ 2 =V ( X ) =E [( X − μ ) σ ( x − μ ) −∞ +∞ ( x i − μ ) 2 f ( x i ) ( dis crete case ) ∑ i= 1 19

  20. Mean and Variance E[] is a linear � function E [ αX ] =αE [ X ] α is a scalar � E [ X+Y ] =E [ X ] +E [ Y ] X,Y r.v. � Practical formula: � V ( X ) =E [( X − μ ) 2 ] =E [ X 2 − 2μX +μ 2 ] E [ X 2 ]− 2μE [ X ] +μ 2 E [ X 2 ]− μ 2 20

  21. Coefficient of Variation σ= √ V ( X ) is called the standard deviation of the r.v. X � C= σ is called the coefficient of variation of the r.v. X � μ Interpretation: � “C measures the level of divergence of X with respect to its mean” � or “C measures the variation of X in units of its mean” � C allows to compare two distributions with different means � C is independent of the chosen unity � 21

  22. Coefficient of Variation To illustrate C, let us consider two sets of values drawn from normal distributions (defined later): Distribution 1 with µ =1, σ =10 => => C=10 Distribution 2 with μ=100,σ=10 => C=0.1 Looking at the pdfs, you might miss how values can be close or Set et 1 Set et 2 far away from the means: -11 11 106. 106.2 -9. 9.6 108 108 16 16 109. 109.4 1. 1.6 90. 90.08 08 -11 11 102. 102.1 0. 0.59 59 102.4 102. -10 10 89. 89.92 92 -12 12 92.58 92. 58 -1. 1.6 110.8 110. 22 11 11 98.69 98. 69

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend