Basic Concepts
- G. Urvoy-Keller
urvoy@unice.fr Probabilty and Statistics
Basic Concepts G. Urvoy-Keller urvoy@unice.fr Probabilty and - - PowerPoint PPT Presentation
Basic Concepts G. Urvoy-Keller urvoy@unice.fr Probabilty and Statistics Outline Basic concepts Probability Conditional Probability Moments Common Distributions Binomial Zipf Poisson Uniform
urvoy@unice.fr Probabilty and Statistics
2
3
cannot be predicted with certainty
experiment
variables and often represented as uppercase variables (e.g. X)
altogether
mutually exclusive
4
5
(pdf)
P(a≤ X ≤b)=∫
a b
f (x)dx
f(x) x) x
6
a b
P(a≤ X ≤b)=∑
a b
f ( x)
f(x) x)
7
less than or equal to x:
F( x)=∫
−∞ x
f (u)du (continuous case) F (x)=∑
xi≤x
f ( xi) (discrete case)
Cdf dfs s conver converge ge to
8
0≤P(E)≤1
P(S)=1
P(E1∪E 2∪...∪En)=∑
1 n
P(Ei)
9
This means that pdf and pmf must be positive and sum to 1
space cover all possible outcomes
individual probabilities.
10
as:
P(E∣F )= P(E∩F ) P(F )
and thus P(E|F)=0. The latter denotes a very strong dependence between the two events!
11
if:
P(E∣F )=P(E) which is equivalent to: P(E∩F )=P(E)P(F )
independent if any subset E(1), E(2),.. E(k), is independent
If E1 is independent from E2 and E2 from E3, E1 might depend on E3
independent from E since P(F∣E)=P(E∣F ) P(F )
P(E)
P(E(1)∩E(2)...∩E(k ))=P(E(1))×P(E(2))×....×P(E(k ))
12
Gnutella networks.
any data to other peers.
systems is: “How many files does a client share with its peers?”
better to split the above question into two sub-questions:
13
by a client
P ( S= F ) P(Q=n∣S=non− F )
14
15
that UiEi=S (S is the sample space) and P(Ei)≠0. Let B be an
P(B)=∑
i=1 n
P(B∣Ei)P(Ei)
16
from “a priori” probabilities.
efficiency of a test for a disease. Let:
populations
posteriori” probability P(B|A)
17
events E1,E2,…En whose union makes up the entire sample space:
P(Ei∣F )= P(Ei)P(F∣Ei)
k=1 n
P(F∣Ek )P(Ek)
definition of conditional probabilities
A post poster erio
nfor
ation
A pr prior
nfor
ation
18
P(B∣A)=P(B)P(A∣B) P(B)P(A∣B)+P(Bc)P( A∣Bc) 0.005×0.95 0.005×0.95+0.995(1−0.95) =0.087
cases, this was not enough due to the scarcity of the disease
P(B|A)=68% (not that good either…)
19
measure of the tendency of a distribution.
σ
2=V ( X )=E[( X − μ) 2]=∫ −∞ +∞
(x−μ)
2 f (x)dx (continuous case)
i=1 +∞
(xi−μ)2 f (xi) (dis crete case)
The variance V(X)=σ2 of a random variable (r.v.) X measures the average dispersion around the mean μ
E[ X ]=∫
−∞ +∞
xf ( x)dx (continuous case)
i=1 +∞
xi f ( xi) (discrete case)
20
function
is a scalar
V ( X )=E [( X −μ)2]=E[ X 2−2μX+μ2] E[ X 2]−2μE[ X ]+μ2 E[ X 2]−μ2
E[αX ]=αE[ X ] E[ X+Y ]=E[ X ]+E[Y ]
21
σ=√V ( X ) C= σ μ
22
To illustrate C, let us consider two sets of values drawn from normal distributions (defined later):
Distribution 1 with µ=1, σ=10 => => C=10 Distribution 2 with μ=100,σ=10 => C=0.1
Looking at the pdfs, you might miss how values can be close or far away from the means:
106. 106.2 108 108 109. 109.4 90. 90.08 08 102. 102.1 102. 102.4 89. 89.92 92 92. 92.58 58 110. 110.8 98. 98.69 69 Set et 2
11
9.6 16 16 1. 1.6
11 0. 0.59 59
10
12
1.6 11 11 Set et 1
23
μr
' =E[ X r ]
μr =E[( X −μ)r ]
24
the distribution
unity:
γ1= μ3 μ2
3/2
Remark:γ1=0 does not mean that the distribution is symmetric
25
γ2= μ4 μ2
2
coefficient of excess kurtosis γ
2
'= μ4
μ2
2−3
γ’2≤0, rv is flatter than a normal distribution γ’2≥0, rv is more peaked than a normal distribution
26
percentile (or quantile) of X is the value x such as n percents
above x:
from the same rv, 50% should be smaller than x and 50% larger
80 80th per percent centile ≈1
For a continuous rv: n 100 =∫
−∞ x
f (u)du=F (x) ⇔ x=F
−1(n/100)
For a discrete rv: x=min
i {xi∣ ∑ j=−∞ i
f (x j)≥n 100}
27
Let p = the percentile of interest, n = the number of data points or observations, then i = (p/100)n is an index number we will use to find pth percentile. IF a) i is not an integer, round up to the next integer. In other words, the next integer higher than i is the position of the pth percentile. b) i is an integer, the pth percentile is the average of the values in the i and i + 1 positions.
28
An example: last 10 golf scores for 18 holes, sorted: 82, 83, 84, 85, 85, 88, 90, 90, 93, 95 To find the 50th percentile score, first find i. i = (50/100)10 = 5. So the 50th percentile is the average of the values in the 5th and 6th positions – 85 and 88, for an average
Thus 86.5 is the 50th percentile. Note the 50th percentile is the median we saw before. Half the values are below this value and half are above this value. To find the 25th percentile, we take i = (25/100)10 = 2.5. So the 25th percentile is the value in the 3rd position – 84. 25% of the values are less than or equal to this.
30
“success’’ or “failure” corresponding to Y=0 and Y=1 respectively
f (1)=P(Y=1)=p f (0)=P(Y=0)=1− p
trials)
successes.
pmf follows:
f (k,n,p )=P( X=k )=( n k) pk(1− p)n−k ; k∈[0, n] E[ X ]=np V [ X ]=np(1− p)
31
32
P(more than 1 error/s)=1−P(0 error/s) =1− f (0,10,0.01) =1−( 10 0 )0.9910×0.010 0.0956
33
34
P( X=k )=C .1 k1+α , α>0, k=1,2,3,.... with C=1
i=1 ∞
1/i1+α
unpopular (requested < 1% of times)
35
if its pmf is such that:
f (x,λ)=P( X=x)=e−λ λx x! ;x=0,1,...
law when with λ =np for n>>1 and p<<1, with np finite.
E[ X ]=λ V [ X ]=λ
36
37
38
year?
f binomial(10;10 ,000 ;10−4)=( 10 ,000 10
10 (1−10−4) 9990 [ Arghh!!!]
f Poisson(10 ;10 ,000×10−4) [n>>1, p<<1, np=1] 110 10! e−1≈10-7
39
such that -∞<a<b<∞ if its pdf is such that: f (x;a,b)=1 b−a E[ X ]=a+b 2 V [ X ]=(b−a)2 12
40
variance σ2 if its pdf is such that:
f (x;μ,σ 2)= 1 σ √2π exp{−( x−μ)2 2σ2
X ~ N (μ,σ 2)
X ~ N (0,1)
then
X ~ N (μ,σ 2)
Z= X −μ σ ~ N (0,1)
41
Φ(x)= 1
√2π ∫
−∞ x
exp{ − y2 2 }dy
statistics
limit for the binomial and the Poisson distributions:
42
two discrete variables by a continuous one
formula: P[α≤ X binomial≤β ]≈Φ[ β−np+0.5
√np(1− p)]−Φ[
α−np−0.5
√np(1− p)]
P[ α≤ X Poisson≤β ]≈Φ[ β−λ+0 .5
√ λ
α−λ−0 .5
√ λ
43
44
within an interval of 4σ around the mean Proof (µ=0, σ=1):
Φ(2)−Φ(−2)= 1
√2π∫
−2 2
exp( −y2 2 )dy≈0.9545
45
such that:
f (x;λ)=λe−λx for x≥0;λ>0
E[ X ]=1 λ V [ X ]=1 λ2
46
47
scale parameter) and t>0 (shape parameter) if:
f (x;λ,t )=λe−λx(λx)t−1 Γ (t) ;x>0 where Γ (t)=∫
∞
e-y yt−1dy
E[ X ]=t λ V [ X ]=t λ2
48
49
/2, where ϖ is a positive integer is called a chi-square distribution with ϖ degrees of freedom
χν
2
f (x;v)= 1 Γ (ν/2)( 1 2)
ν/2
xν/2−1e−x/2; x≥0
E[ X ]=ν V [ X ]=2ν
50
f (x;α;β )=1 B(α,β) xα−1(1−x)β−1; 0<x<1 where B(α,β )=∫
1
xα−1(1−x)β−1dx= Γ (α)Γ ( β) Γ (α+β )
E[ X ]=α α+β V [ X ]=αβ
(α+β )2 (α+β+1)
51
U shape shape J J shape shape
urvoy@unice.fr Probability and Statistics
2
single mode or anti-mode
for a given γ1 and γ2
3
1.
Observe a dataset whose histogram looks unimodal
2.
Compute γ1 and γ2
3.
Use the Pearson Graph to decide which type of distributions models your data the best
Step ep 1 Step ep 2 γ1,γ2 Step ep 3 γ 1 γ 2
4
following differential equation:
df ( x) dx =− (x+a) (c0+c1 x+c2 x2) f (x)
c0+c1x+c2x2=0
f ( x )⃗
∣x∣→∞0 because: {
f ( x )≥0
∞
f ( x )<∞ Hence, df ( x ) dx
∣x∣→∞0
5
(considerably) on the values of a, c0,c1 and c2
the skweness and kurtosis parameters
w=1 4 γ1(γ2+3)2 (4γ2−3γ1)(2γ2−3γ1−6)
6
classified as follows:
I (Beta ): w<0 II : γ1=0, γ2<3 III (Gamma ): 2γ2−3γ1−6=0 IV : 0<w<1 V : w=1 VI : w>1 VII : γ1=0, γ2>3 N (Normal) : γ1=0, γ2=3(single point)
γ 1 γ 2
7
random sample you have obtained
referred in the literature as Pearson type x
8
a sample “really follows” a given distribution
2 (or more) samples come from different distributions
different regions of the Pearson graph
urvoy@unice.fr Probability and Statistics
2
3
4
bins that cover whole range of data
the total number of samples
integrates to one, like a density. Let:
The density histogram is defined through the following function: ̂ f (x)= νk Nh ; ∀ x∈Bk
5
the number of bins n
Instead of
f (x)
vary too much from one x to the other
some x and larger for others
̂ f (x)
6
sample of size 1000
̂ f (x)
f (x)
7
possible trajectories:
MSE[ ̂ f ( x)]=E[( ̂ f (x)− f (x))
2]
ISE=∫( ̂ f (x)− f ( x))2dx
MISE=E [∫( ̂ f (x)− f ( x))2dx]
̂ f (x)
f (x)
̂ f (x)
8
upper bound on MSE:
MSE( ̂ f (x ))≤ f (ξ k ) Nh +γk
2h2
where: and the density is assumed to be Lipschitz-continuous:
∣ f (x)− f ( y)∣<γk∣x−y∣; ∀ x , y in Bk
9
n=1+log2 N
'(x)) 2dx
We can obtain an asymptotic value for the MISE:
MISE⃗
N large 1
Nh + 1 12 h2 R( f ' )
h
*=(
6 NR( f ' ))
1/3
10
variable, we obtain:
R( f ' )= 1 4σ3√π
variables (not only normally distributed ones):
h
*=(
24σ
3√π
N
1/3
≈3.5σN
−1/3
IQR=75thquantile−25thquantile
11
but not the issue of x0, the initial value
size of the bins is to consider the m following histograms:
̂ f 1 ,.. , ̂ f m where: ̂ f 1 is the histogram that start at x0 ̂ f 2 is the histogram that start at x0+h m .... ̂ f m is the histogram that start at x0+(m−1)h m
12
̂ f 1 ,.. , ̂ f m
̂ f ASH (x)= 1 m∑
i=1 m
̂ f i( x)
sample of size 100
default histogram of matlab (10 bins)
ASH with m=5 and 10 bins
13
compare them
following range: ( ̂
q0.75) [ ̂ q0.25−1.5×I ̂ Q R , ̂ q0.75+1.5×I ̂ Q R]
These values are called the adjacent values
[min (x), ̂ q0.25−1.5×I ̂ Q R] and [ ̂ q0.75+1.5×I ̂ Q R,max(x)]
are not real outlier
14
exponential
15
16
plot the quantiles of one distribution against the quantiles of the
we call the quantile-based plot a Quantile-Quantile plot or Q-Q plot
call the quantile-based plot a Quantile plot
17
( x1 ,..., xn) and ( y1 ,... , ym); m≤n
( x(1),... , x(n)) and ( y(1),... , y(m)); m≤n
x(1)≤x(2)...≤x(n) and y(1)≤y(2)...≤ y(m)
18
19
and σ=2
20
One normal sample with µ=1 and σ=1 against one exponential sample with λ=1 (i.e. µ=1 and σ=1 )
21
i−1 m ≤q≤ i m
22
the order statistics of a sample x of size n
F−1( i−0.5 n ) against x(i )
23
samples are drawn from the same variables”
24
blind test)
25
e.g. how RTTs and throughputs are related in the Internet
all obtained 2-uples (xi,yi) from the two datasets x and y
normal samples (much more widespread)
for the two data. Time ordering is lost here!
temporal relation Y=aX+b
diverge from the bissector (unless a is close to 1 and b to 0)
distribution
26
27
̂ f (⃗ x)= νk Nh1h2 ; ⃗ x∈Bk