Probability, Statistics, and Statistical Methods
1
1
Introduction to Statistics and Data Analysis
2
1 Introduction to Statistics and Data Analysis 2 1.1 Overview: - - PDF document
Probability, Statistics, and Statistical Methods 1 1 Introduction to Statistics and Data Analysis 2 1.1 Overview: Statistical Inference, Samples, Populations, and the Role of Probability 3 4 1.2 Sampling Procedures; Collection of
Probability, Statistics, and Statistical Methods
1
2
3 4
5 6
7
8
9
10
The sample mode denoted by xmode is the observation with the highest frequency
11
Sample median is a better measure of the central tendency of a sample since it would not be effected by extreme values in the sample
12
The sample range, denoted by r, is given by r = Xmax – Xmin
13
Data collected on a pH meter from a sample of 10 observations are: 7.07, 7.00, 7.10, 6.97, 7.00, 7.03, 7.01, 7.01, 6.98, 7.08
The sample mean
x = (7.07 + 7.00 + …+7.08)/10 = 7.025
The sample mode
xmode = 7.00 and 7.01
The sample median
x = (7.01 + 7.01)/2 = 7.01
The sample range
r = 7.10 – 6.97 = 0.14
The sample variance
s2 = [(7.07-7.025)2 + (7.00-7.025)2 + …+(7.08-7.025)2)] /9 = 0.00194
14
15
Discrete Data – countable, could be finite or infinite, no additional data point between two consecutive data
number of trees in a forest, … Continuous Data – measurable, infinite, additional data points could be found between any two data
16
17 18
19 20
21
4.8 4.0 3.2 2.4 1 .6 1 2 1 0 8 6 4 2
Battery Life Frequency
Histogram of Battery Life
Before editing
5.0 4.5 4.0 3.5 3.0 2.5 2.0 1 .5 1 6 1 4 1 2 1 0 8 6 4 2
Battery Life Frequency
Histogram of Battery Life
After editing
22
23 24
5.0 4.5 4.0 3.5 3.0 2.5 2.0 1.5
Battery Life
Boxplot of Battery Life
25
Consider the experiment of tossing a die. If we are interested in the number facing up, the sample space would be S = {1, 2, 3, 4, 5, 6}
26
27 28
29
A B = A B C = C’ = (A B ) (B C) (A C) =
30
31 32
In how many different ways can a buyer order one of these homes? n = 4 ꞏ 3 = 12
33
Sam is going to assemble a computer. He has two choices of chips, four choices of a hard drive, three choices for memory, and five choice of the
n = 2 ꞏ 4 ꞏ 3 ꞏ 5 = 120
34
35 36
In how many different ways could three individuals, A, B, and C, be arranged in a row from left to right? A B C ACB BAC BCA CAB CBA 3! = 6
37
Example: If three medals (gold, silver, bronze) could be given to three students in a class of 25 and each student can receive at most one medal, how many possible selections could be made?
25P3 = 25!/(25-3)! = 13,800 38
Example: If three identical medals could be given to three students in a class
ways could the three medals be given? 25 3 = 25! 3!(25−3)! = 2,300
39 40
41 42
43
44
45 46
47 48
49 50
51 52
Example: If an adult is chosen, what is the probability that a male person is chosen given that this male person is employed? P(M|E) = P(E M)/P(E) = 460/600 = 23/45 P(E M) = n (E M) / n(S) = 460/900 P(E) = n(E)/ n(S) = 600/900
53 54
55 56
57 58
59 60
61 62
63 64
Suppose that the number of crashes observed in an intersection on the Memorial weekend has the following probability distribution: x 1 2 3 4 f(x) 0.2 0.1 0.3 0.3 0.1 Find the probability of having 3 crashes. Find the probability of having 3 or more crashes. P(x = 3) = 0.3 P(x ≥ 3) = 0.3 + 0.1 = 0.4
65
66
67 68
69
* P(x = a) = 0, P (a < x < b) = P(x < b) – P(x < a)
70
71
The weekly demand for Pepsi, in 1,000 liters, from a local store, is a continuous random variable with the probability density function f(x) = 2 (x – 1) for 1 < x < 2 = 0 elsewhere Find the probability that x = 1.5. Find the probability that x ≤ 1.5. P(x = 1.5) = 0 P(x ≤ 1.5) = 2 𝑦 1 𝑒𝑦
.
𝑦 1= 0.25
72
1 2 1 2 3
f(x)
73 74
75 76
77 78
Suppose that the number of crashes observed in an intersection on the Memorial weekend has the following probability distribution: x 1 2 3 4 f(x) 0.2 0.1 0.3 0.3 0.1 Find the mean and the variance 2 of X. = E(X) = 0ꞏ0.2 + 1ꞏ0.1 + 2ꞏ0.3 + 3ꞏ0.3 + 4ꞏ0.1 = 2.0 2 = E(X- )2 = (0-2)2ꞏ0.2 + (1-2)2ꞏ0.1 + (2-2)2ꞏ0.3 + (3-2)2ꞏ0.3 + (4-2)2ꞏ0.1 = 1.6
79
The weekly demand for Pepsi, in 1,000 liters, from a local store, is a continuous random variable with the probability density function f(x) = 2 (x – 1) for 1 < x < 2 = 0 elsewhere Find the mean and the variance 2 of X.
80
81
The binomial distribution is a discrete probability distribution of the number of successes in a sequence of n independent success/failure experiments, each of which yields success with probability p. Such a success/failure experiment is also called a Bernoulli experiment or Bernoulli trial. In general, if the random variable X follows the binomial distribution with parameters n and p, we write b(x; n, p) = the probability of getting exactly x successes in n trials is given by the probability mass function:
82
The probability that a certain kind of component will pass a shock test is 0.75. Find the probability that exactly 2 of the next 4 components tested passed. Find the probability that 2 or more of the next 4 components tested passed.
83 84
The hypergeometric distribution is a discrete probability distribution that describes the probability of x successes in n draws from a finite population of size N containing k successes without replacement. A random variable X follows the hypergeometric distribution if its probability mass function is given by:
85
Lot of 40 components each are called unacceptable if they contain as many as 3 defective or more. The procedure for sampling the lot is to select 5 components at random and to reject the lot if a defective is found. What is the probability that exactly 1 defective is found in the sample if there are 3 defectives in the entire lot? n = 5, N = 40, k = 3, and x = 1
86
The Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time and/or space if these events occur with a known average rate and independently
A discrete random variable X is said to have a Poisson distribution with parameter λ>0, t>0, if for x = 0, 1, 2, ... the probability mass function of X is given by:
87 88
During a laboratory experiment the average number of radioactive particles passing through a counter in 1 millisecond is 4. What is the probability that 6 particles enter the counter in a given millisecond? x = 6, = 4, and t = 1
89
90
The continuous uniform distribution is a probability distributions such that for each member of the family, all intervals of the same length on the distribution's support are equally probable. The support is defined by the two parameters, A and B, which are its minimum and maximum values. The probability density function of the continuous uniform random variable X is:
91 92
In probability theory, the normal (or Gaussian) distribution is a continuous probability distribution that has a bell-shaped probability density function, known as the Gaussian function or informally the bell curve: The parameter μ is the mean or expectation (location of the peak) and σ2 is the
σ2 = 1 is called the standard normal distribution.
93 94
95 96
97
98
99 100
An arbitrary normal random variable X could be transformed into a standard normal variable Z by means of the transportation Z = (X – ) /
101
(a) Pr(z>1.84) = 1 – Pr(z < 1.84) = 1 – 0.9671 = 0.0329 (b) Pr(-1.97 < z < 0.86) = Pr(z < 0.86) – Pr(z < -1.97) = 0.8051 – 0.0244 = 0.7807
102
Find k such that (a) Pr(Z > k) = 0.3015 and (b) Pr(k < Z< -0.18) = 0.4197 (a) Pr (Z > k) = 0.3015; thus Pr (Z < k) = 1 – 0.3015 = 0.6985; From normal table, k = 0.52 (b) Pr(k < Z< -0.18) = 0.4197; thus Pr( Z< -0.18) - Pr( Z < k) = 0.4197; 0.4286 - Pr( Z < k) = 0.4197; Pr( Z < k) = 0.0089, k = -2.37
103
Given a random variable X having a normal distribution with = 50 and = 10, find the probability that X will fall in between 45 and 62.
Pr(45 < X < 62) = Pr[ (45-50)/10 < Z < (62-50)/10] = Pr (-0.5 < Z < 1.2) = Pr (Z < 1.2) – Pr (Z < -0.5) = 0.8849 – 0.3085 = 0.5764
104
Given a normal distribution with = 40 and = 6, find the value x that (a) has 45% of the area to its left, and (b) 14% of the area to its right.
Pr (X < x) = 0.45 => Pr (Z < z) = 0.45; From Normal table, z = -0.13 z=(x-)/ = (x-40)/6 = -0.13 => x = 39.22 Pr (X > x) = 0.14 => Pr (Z > z) = 0.14 => Pro(Z < z) = 1 – 0.14 = 0.86 From Normal table, z = 1.08 z=(x-)/ = (x-40)/6 = 1.08 => x = 46.48
105
106
A certain type of storage battery lasts, on average, 3.0 years with a standard deviation of 0.5 years. Assume that the battery life is normally distributed, find the probability that a given battery will last less than 2.3 years. Pr (X < 2.3) = Pr [Z < (2.3 -3.0)/0.5] = Pr (Z < -1.4) = 0.0808
107
In an industrial process, the diameter of a ball bearing is an important
It is known that the diameter of the ball bearings follows a normal distribution with a mean of 3.0 and a standard deviation of 0.005. What’s the proportion of manufactured ball bearings will not meet the specifications?
Pr (X < 2.99) + Pr (X > 3.01) = Pr (Z < -2.0) + Pr (z > 2.0) =2 (0.0228) = 0.0456
108
90 80 70 60 50 40 30 20 10
X Density 2.99 0.02275 3.01 0.02275 3
Distribution Plot
Normal, Mean=3, StDev=0.005