Chapter 5: Probability models 1. Random variables: a) Idea. b) - - PowerPoint PPT Presentation

chapter 5 probability models
SMART_READER_LITE
LIVE PREVIEW

Chapter 5: Probability models 1. Random variables: a) Idea. b) - - PowerPoint PPT Presentation

Applied Statistics Chapter 5: Probability models 1. Random variables: a) Idea. b) Discrete and continuous variables. c) The probability function (density) and the distribution function. d) Mean and variance of a random variable. 2. Probability


slide-1
SLIDE 1

Applied Statistics

Chapter 5: Probability models

  • 1. Random variables:

a) Idea. b) Discrete and continuous variables. c) The probability function (density) and the distribution function. d) Mean and variance of a random variable.

  • 2. Probability models:

a) Bernoulli. b) Geometric. c) Binomial. d) Normal. e) Normal approximation to the binomial distribution. Recommended reading:

  • Capítulos 22 y 23 del libro de Peña y Romo (1997)
slide-2
SLIDE 2

Applied Statistics

5.1: Random variables

  • A function which places a numerical value on each possible

result of an experiment is called a random variable.

  • We use capital letters, e.g. X, Y, Z, to represent random

variables and lower case letters, x, y, z, to represent particular values of these variables. Discrete random variables can only take a discrete set of possible values. Continuous random variables can take an infinite number of values within some continuous range.

slide-3
SLIDE 3

Applied Statistics

The probability function for a discrete r.v.: is the function which associates the probability P(X=x) to each possible value x.

The possible values of a discrete r.v. X and their respective probabilities are often displayed in a probability distribution table: X x1 x2 ... xn P(X=xi) p1 p2 ... pn

1 2 3

1

n

p p p p     

Every probability function satisfies The distribution function for a discrete r.v.: Let X be a r.v. The distribution function of X is the function which gives, for each x, the cumulative probability up to x, that is,

( ) ( ) F x P X x  

slide-4
SLIDE 4

Applied Statistics

Mean, variance and standard deviation of a discrete r.v. The mean or expectation of a discrete r.v., X, which takes values x1, ,x2, ....with probabilities p1, p2,... Is given by the following expression: The variance is defined by the formula which can be calculated using

( )

i i i i i i

x P X x x p    

 

2 2 2 i i i x p

   

2 2

( )

i i i x

p    

Example: The probability distribution of the r.v. X is given in the following table: xi 1 2 3 4 5 pi 0.1 0.3 ? 0.2 0.3 What is P(X=3)? Calculate the mean and variance.

The standard deviation is the root of the variance.

slide-5
SLIDE 5

Applied Statistics

5.2: Probability models Discrete models Continuous models

Bernoulli trials and the geometric and binomial distributions The normal distribution and related distributions

slide-6
SLIDE 6

Applied Statistics

Bernoulli trials and the geometric and binomial distributions A Bernoulli model is an experiment with the following characteristics:

  • In each trial, there are only two possible results, success and failure.
  • The result obtained in each trial is independent of the previous

results.

  • The probability of success is constant, P(success) = p, and does not

change from one trial to the next.

slide-7
SLIDE 7

Applied Statistics

The geometric distribution Suppose we have a Bernoulli model. What is the distribution of the number of failures, F, before the first success?

  • P(F=0) = P(0 failures before the 1st success) = p
  • P(F=1) = P(failure, success) = (1-p)p
  • P(F=2) = P(failure, failure, success) = (1-p)2 p
  • P(F=f) = P( f failures before the 1st success) = (1-p)f p for f = 0, 1, 2,

… The distribution of F is called the geometric distribution with parameter p. E[F] = (1-p)/p V[X] = (1-p)/p2

slide-8
SLIDE 8

Applied Statistics

The binomial distribution Suppose we have a Bernoulli model. What is the distribution of the number of successes, X, in n trials? for r = 0,1,2, …, n. The distribution of X is called the binomial distribution with parameters n and p. E[X] = np V[X] = np(1-p)

P( ) ( ) (1 )

r n r

n Obtener r éxitos P X r p p r

         

slide-9
SLIDE 9

Applied Statistics

EXAMPLE Calculate the probability that in a family with 4 children, 3 of them are boys. EXAMPLE The probability that a student has to repeat the year is 0,3.

  • We pick a student at random. What is the probability that the first

repeater is the 3rd student we pick?

  • We choose 20 students at random. What is the chance that there

are exactly 4 repeaters? EXAMPLE On average, 4% of the votes in an election are null. Calculate the expected number of null votes in a town with an electorate of 1000.

slide-10
SLIDE 10

Applied Statistics

Calculation with Excel

Binomial probabilities are tough to calculate “by hand” except in the simple cases of zero or 1 successes or failures. In Excel it is much easier!

slide-11
SLIDE 11

Applied Statistics

Example (Test)

Of all the charities in España, 30% are charities dedicated to children. If 50 Spanish charities are chosen at random how many of them are expected to be dedicated to children?

slide-12
SLIDE 12

Applied Statistics

Example (Test)

On average, one in every ten members of the CCOO union is a delegate. a) In interviews with CCOO members, what is the probability that the first delegate will be the second person interviewed? b) There are 4 CCOO members in La Chimbomba. What is the chance that none of them are delegates? c) In a sample of 100 CCOO members, what is the expected number of delegates?

slide-13
SLIDE 13

Applied Statistics DENSITY FUNCTION OF A CONTINUOUS R.V.: The density function of a continuous r.v. satisfies the following conditions: It only takes non-negative values, f(x) ≥ 0. The area under the curve is equal to 1. Distribution function. As with discrete variables, the distribution function gives the cumulative probability up to x, that is: The c.d.f. satisfies the conditions: It takes the value 0 for any x below the minimum possible value of the variable. It takes the value 1 for all points over the maximum possible value of the variable.

( ) ( ) F x P X x  

slide-14
SLIDE 14

Applied Statistics

The normal or gaussian distribution

Many variables have a bell shaped density. Examples:

  • Weights of a population of the same age and sex.
  • Heights of the same population.
  • The grades in a course (urban myth).

To say that a continuous variable X, has a normal distribution with mean  and standard deviation  , we write:

X ~ N(,2)

slide-15
SLIDE 15

Applied Statistics

The standard normal distribution

The normal distribution with mean 0 and standard deviation 1 is called the standard normal distribution. There are tables which allow us to calculate the probabilities for this distribution, N(0,1). If we have a normal r.v., X with mean  and standard deviation  we can convert this to a standard, N(0,1) r.v. using the transformation:

X Z    

slide-16
SLIDE 16

Applied Statistics Let Z ~ N(0,1). Calculate the following probabilities

  • P(Z < -1)
  • P(Z > 1)
  • P(-1,5 < Z < 2)

Calculate the 90%, 95%, 97,5% and 99% percentiles of the standard normal distribution. (These values are useful in the next chapter) Let X ~ N(2,4). Calculate the following probabilities

  • P(X < 0)
  • P(-1 < X < 1)

Examples

slide-17
SLIDE 17

Applied Statistics

slide-18
SLIDE 18

Applied Statistics

slide-19
SLIDE 19

Applied Statistics

Calculation with Excel

It is easier to do the calculations with Excel… with the standard normal … … or directly with the original distribution.

slide-20
SLIDE 20

Applied Statistics

Approximation of the binomial distribution using a normal

When n is large enough, the binomial distribution, X~B(n, p), looks like a normal distribution, The approximation is usually considered to be good if np > 5 and n(1-p) > 5.

 

, (1 ) N np np p 

EXAMPLE We throw a fair coin in the air 400 times. What is the probability of getting between 180 and 210 heads?

slide-21
SLIDE 21

Applied Statistics The exact solution using Excel for the binomial distribution is 0,833. The estimated solution using the normal approximation is 0,819. This can be improved with a continuity correction and then the exact and approximate solutions are equal to 3 decimal places, but if we have Excel, … why should we use an approximation? We will see in the next chapter.

slide-22
SLIDE 22

Applied Statistics

Example (Test)

According to the last CIS survey, the mean level of satisfaction with Mariano Rajoy is 3,09 with standard deviation 2,5. If these evaluations follow a normal distribution and a person is chosen at random, then the probability that they give Rajoy a rating of less than 3,09 is: a) 0,5. b) 0. c) 1.236 d) 1.

slide-23
SLIDE 23

Applied Statistics

Example (Test)

The inflation rate follows a normal distribution with mean 1 and variance 4. Which of the following Excel formulas gives the probability that inflation will be negative?

slide-24
SLIDE 24

Applied Statistics

Example: (Exam)

The following table records the ratings of the government ministers in the last CIS survey: a) Supposing that the ratings of Chacón in Spain follow a normal distribution with mean 4,55 and standard deviation 2,6, calculate the probability that a Spaniard rates her below 4,55. b) For a set of three Spanish people, what is they probability that they all rate her below 5?* c) The lowest mean rated minister is González Sinde. If her ratings are normally distributed, calculate the probability that a randomly chosen Spaniard rates her exactly 5. *See the next slide.

slide-25
SLIDE 25

Applied Statistics