Announcements U nit 2: P robability and distributions L ecture 2: B - - PowerPoint PPT Presentation

announcements u nit 2 p robability and distributions l
SMART_READER_LITE
LIVE PREVIEW

Announcements U nit 2: P robability and distributions L ecture 2: B - - PowerPoint PPT Presentation

Announcements U nit 2: P robability and distributions L ecture 2: B inomial and N ormal distribution S tatistics 101 Lab 2 Due today at 6 PM Nicole Dalzell May 20, 2015 Statistics 101 (Nicole Dalzell) U2 - L2: Normal distribution May 20, 2015


slide-1
SLIDE 1

Unit 2: Probability and distributions Lecture 2: Binomial and Normal distribution Statistics 101

Nicole Dalzell May 20, 2015

Announcements

Lab 2 Due today at 6 PM

Statistics 101 (Nicole Dalzell) U2 - L2: Normal distribution May 20, 2015 2 / 1

Conditional Probabilities

Swaizland as the highest HIV prevalence in the world. 25.9% of this country’s population is infected with HIV. The ELISA test is one of the finest and most accurate for detecting HIV. For those who carry HIV, the test is 99.7% accurate. For those who do not carry HIV, the test is 96.2% accurate. If an individual from Swaizland tested positive on the ELISA test, what is the probability that he carries HIV?

Statistics 101 (Nicole Dalzell) U2 - L2: Normal distribution May 20, 2015 3 / 1 Binary outcomes

Milgram experiment

Stanley Milgram, a Yale University psychologist, conducted a series of experiments on obedience to authority starting in 1963. Experimenter (E) orders the teacher (T), the subject of the experiment, to give severe electric shocks to a learner (L) each time the learner answers a question incorrectly. The learner is actually an actor, and the electric shocks are not real, but a prerecorded sound is played each time the teacher administers an electric shock.

Statistics 101 (Nicole Dalzell) U2 - L2: Normal distribution May 20, 2015 4 / 1

slide-2
SLIDE 2

Binary outcomes

Milgram experiment (cont.)

These experiments measured the willingness of study participants to obey an authority figure who instructed them to perform acts that conflicted with their personal conscience. Milgram found that about 65% of people would obey authority and give such shocks, and only 35% refused. Over the years, additional research suggested this number is approximately consistent across communities and time.

Statistics 101 (Nicole Dalzell) U2 - L2: Normal distribution May 20, 2015 5 / 1 Binary outcomes

Binary outcomes

Each person in Milgram’s experiment can be thought of as a trial. A person is labeled a success if she refuses to administer a severe shock, and failure if she administers such shock. Since only 35% of people refused to administer a shock, probability of success is p = 0.35. When an individual trial has only two possible outcomes, it is also called a Bernoulli random variable.

Statistics 101 (Nicole Dalzell) U2 - L2: Normal distribution May 20, 2015 6 / 1 Binomial distribution Considering many scenarios

Suppose we randomly select four individuals to participate in this ex-

  • periment. What is the probability that exactly 1 of them will refuse to

administer the shock?

Let’s call these people Allen (A), Brittany (B), Caroline (C), and Damian (D). Each

  • ne of the four scenarios below will satisfy the condition of “exactly 1 of them refuses

to administer the shock”: Scenario 1 (A) refuse (B) shock (C) shock (D) shock .35 × .65 × .65 × .65 = 0.0961 Scenario 2 (A) shock (B) refuse (C) shock (D) shock .65 × .35 × .65 × .65 = 0.0961 Scenario 3 (A) shock (B) shock (C) refuse (D) shock .65 × .35 × .35 × .65 = 0.0961 Scenario 4 (A) shock (B) shock (C) shock (D) refuse .65 × .35 × .65 × .35 = 0.0961 The probability of exactly one 1 of 4 people refusing to administer the shock is the sum of all of these probabilities. 0.0961 + 0.0961 + 0.0961 + 0.0961 = 4 × 0.0961 = 0.3844

Statistics 101 (Nicole Dalzell) U2 - L2: Normal distribution May 20, 2015 7 / 1 Binomial distribution The binomial distribution

Binomial distribution

The question from the prior slide asked for the probability of given number of successes, k, in a given number of trials, n, (k = 1 success in n = 4 trials), and we calculated this probability as

# of scenarios × P(single scenario) # of scenarios: there is a less tedious way to figure this out,

we’ll get to that shortly... P(single scenario) = pk (1 − p)(n−k)

probability of success to the power of number of successes, probability of failure to the power of number of failures

The Binomial distribution describes the probability of having exactly k successes in n independent Bernouilli trials with probability of success p.

Statistics 101 (Nicole Dalzell) U2 - L2: Normal distribution May 20, 2015 8 / 1

slide-3
SLIDE 3

Binomial distribution The binomial distribution

Counting the # of scenarios

Earlier we wrote out all possible scenarios that fit the condition of exactly one person refusing to administer the shock. If n was larger and/or k was different than 1, for example, n = 9 and k = 2: RRSSSSSSS SRRSSSSSS SSRRSSSSS

· · ·

SSRSSRSSS

· · ·

SSSSSSSRR writing out all possible scenarios would be incredibly tedious and prone to errors.

Statistics 101 (Nicole Dalzell) U2 - L2: Normal distribution May 20, 2015 9 / 1 Binomial distribution The binomial distribution

Calculating the # of scenarios

Choose function The choose function is useful for calculating the number of ways to choose k successes in n trials.

n

k

  • =

n! k!(n − k)! k = 1, n = 4:

4

1

  • =

4! 1!(4−1)! = 4×3×2×1 1×(3×2×1) = 4

k = 2, n = 9:

9

2

  • =

9! 2!(9−2)! = 9×8×7! 2×1×7! = 72 2 = 36

Note: You can also use R for these calculations:

> choose(9,2) [1] 36

Statistics 101 (Nicole Dalzell) U2 - L2: Normal distribution May 20, 2015 10 / 1 Binomial distribution The binomial distribution

Example

At Duke University, 82% of students live in university owned or affiliated housing. A group of 12 students was chosen to speak to incoming students at orientation. What is the probability that zero,

  • ne, four or more than four of these students live in student housing?

Step 1: What is our response variable in this situation? Step 2: What is the probability that a student lives in university housing? We ask one student, and then a second, to come to the podium to speak. What is the probability that both of these students live in university housing?

Statistics 101 (Nicole Dalzell) U2 - L2: Normal distribution May 20, 2015 11 / 1 Binomial distribution The binomial distribution

Example

At Duke University, 82% of students live in university owned or affiliated housing. A group of 12 students was chosen to speak at

  • rientation. What is the probability that (1) zero, (2) one, (3) two of

these students live in student housing?

Statistics 101 (Nicole Dalzell) U2 - L2: Normal distribution May 20, 2015 12 / 1

slide-4
SLIDE 4

Binomial distribution The binomial distribution

Conditions for the Binomial Distribution

1

The trials are independent.

2

The number of trials, n, is fixed.

3

Each trial outcome can be classified as a success or failure.

4

The probability of a success, p, is the same for each trial.

Statistics 101 (Nicole Dalzell) U2 - L2: Normal distribution May 20, 2015 13 / 1 Binomial distribution The binomial distribution

Binomial distribution (cont.)

Binomial probabilities If p represents probability of success, (1 − p) represents probability of failure, n represents number of independent trials, and k represents number of successes P(k successes in n trials) =

n

k

  • pk (1 − p)(n−k)

# of scenarios × P(single scenario)

Statistics 101 (Nicole Dalzell) U2 - L2: Normal distribution May 20, 2015 14 / 1 Binomial distribution The binomial distribution

Example

At Duke University, 82% of students live in university owned or affiliated housing. A group of 12 students was chosen to speak to incoming students at orientation. What is the probability that (1) zero, (2) one, (3) two of these students live in student housing? P(Zero UH) = P(One UH) = P(Two UH) =

Statistics 101 (Nicole Dalzell) U2 - L2: Normal distribution May 20, 2015 15 / 1 Binomial distribution The binomial distribution

Example

Of the 12 students, we are going to randomly select 5 to be on a panel to speak to the parents. What is the probability that at least four (ie four or more) of these selected students will live in student housing? P(At least four) = P(At least one) =

Statistics 101 (Nicole Dalzell) U2 - L2: Normal distribution May 20, 2015 16 / 1

slide-5
SLIDE 5

Binomial distribution The binomial distribution

Participation question Which of the following is not a condition that needs to be met for the binomial distribution to be applicable? (a) the trials must be independent (b) the number of trials, n, must be fixed (c) each trial outcome must be classified as a success or a failure (d) the number of desired successes, k, must be greater than the number of trials (e) the probability of success, p, must be the same for each trial

Statistics 101 (Nicole Dalzell) U2 - L2: Normal distribution May 20, 2015 17 / 1 Binomial distribution The binomial distribution

Participation question A 2012 Gallup survey suggests that 26.2% of Americans are obese. Among a random sample of 10 Americans, the probability that exactly 8 are obese is (a) pretty high (b) pretty low

Gallup: http://www.gallup.com/poll/160061/obesity-rate-stable-2012.aspx , January 23, 2013. Statistics 101 (Nicole Dalzell) U2 - L2: Normal distribution May 20, 2015 18 / 1 Binomial distribution The binomial distribution

Participation question A 2012 Gallup survey suggests that 26.2% of Americans are obese. Among a random sample of 10 Americans, what is the probability that exactly 8 are obese? (a) 0.2628 × 0.7382 (b)

8

10

  • × 0.2628 × 0.7382

(c)

10

8

  • × 0.2628 × 0.7382

(d)

10

8

  • × 0.2622 × 0.7388

Statistics 101 (Nicole Dalzell) U2 - L2: Normal distribution May 20, 2015 19 / 1 Binomial distribution Expected value and variability of successes

Expected value

A 2012 Gallup survey suggests that 26.2% of Americans are obese. Among a random sample of 100 Americans, how many would you ex- pect to be obese? Easy enough, 100 × 0.262 = 26.2. Or more formally, µ = np = 100 × 0.262 = 26.2. But this doesn’t mean in every random sample of 100 people exactly 26.2 will be obese. In fact, that’s not even possible. In some samples this value will be less, and in others more. How much would we expect this value to vary?

Statistics 101 (Nicole Dalzell) U2 - L2: Normal distribution May 20, 2015 20 / 1

slide-6
SLIDE 6

Binomial distribution Expected value and variability of successes

Expected value and its variability

Mean and standard deviation of binomial distribution

µ = np σ =

  • np(1 − p)

Going back to the obesity rate:

σ =

  • np(1 − p) =

100 × 0.262 × 0.738 ≈ 4.4 We would expect 26.2 out of 100 randomly sampled Americans to be obese, give or take 4.4.

Note: Mean and standard deviation of a binomial might not always be whole numbers, and that is alright, these values represent what we would expect to see on average.

Statistics 101 (Nicole Dalzell) U2 - L2: Normal distribution May 20, 2015 21 / 1 Binomial distribution Expected value and variability of successes

Unusual observations

Using the notion that observations that are more than 2 standard deviations away from the mean are considered unusual and the mean and the standard deviation we just computed, we can calculate a range for the plausible number of obese Americans in random samples of 100. 26.2 ± (2 × 4.4) = (17.4, 35)

Statistics 101 (Nicole Dalzell) U2 - L2: Normal distribution May 20, 2015 22 / 1 Binomial distribution Expected value and variability of successes

Participation question An August 2012 Gallup poll suggests that 13% of Americans think home schooling provides an excellent education for children. Would a random sample of 1,000 Americans where only 100 share this opin- ion be considered unusual? (a) No (b) Yes

http://www.gallup.com/poll/156974/private-schools-top-marks-educating-children.aspx Statistics 101 (Nicole Dalzell) U2 - L2: Normal distribution May 20, 2015 23 / 1 Binomial distribution Expected value and variability of successes

An analysis of Facebook users

A recent study found that “Facebook users get more than they give”. For example: 40% of Facebook users in our sample made a friend request, but 63% received at least one request Users in our sample pressed the like button next to friends’ content an average of 14 times, but had their content “liked” an average of 20 times Users sent 9 personal messages, but received 12 12% of users tagged a friend in a photo, but 35% were themselves tagged in a photo Any guesses for how this pattern can be explained?

http://www.pewinternet.org/Reports/2012/Facebook-users/Summary.aspx Statistics 101 (Nicole Dalzell) U2 - L2: Normal distribution May 20, 2015 24 / 1

slide-7
SLIDE 7

Binomial distribution Expected value and variability of successes

This study also found that approximately 25% of Facebook users are considered power users. The same study found that the average Face- book user has 245 friends. What is the probability that the average Facebook user with 245 friends has 70 or more friends who would be considered power users? We are given that n = 245, p = 0.25, and we are asked for the probability P(K ≥ 70). P(K ≥ 70) = P(K = 70 or K = 71 or K = 72 or · · · or K = 245)

= P(K = 70) + P(K = 71) + P(K = 72) + · · · + P(K = 245)

This seems like an awful lot of work...

Statistics 101 (Nicole Dalzell) U2 - L2: Normal distribution May 20, 2015 25 / 1 Binomial distribution Expected value and variability of successes

Histograms of number of successes

Hollow histograms of samples from the binomial model where p = 0.10 and n = 10, 30, 100, and 300. What happens as n increases?

n = 10 2 4 6 n = 30 2 4 6 8 10 n = 100 5 10 15 20 n = 300 10 20 30 40 50

Statistics 101 (Nicole Dalzell) U2 - L2: Normal distribution May 20, 2015 26 / 1 Binomial distribution Expected value and variability of successes

Density Curves

A Density Curve is a smoothed density histogram where the area under the curve is 1. To draw a density curve from a histogram simply connect the peaks of a histogram with a smooth line, and normalize the values of the y-axis such that the area under the curve is 1.

Statistics 101 (Nicole Dalzell) U2 - L2: Normal distribution May 20, 2015 27 / 1 Normal distribution

Heights of males

“The male heights on OkCupid very nearly follow the expected normal distribution – except the whole thing is shifted to the right of where it should be. Almost universally guys like to add a couple inches.” “You can also see a more subtle vanity at work: starting at roughly 5’ 8”, the top of the dotted curve tilts even further rightward. This means that guys as they get closer to six feet round up a bit more than usual, stretching for that coveted psychological benchmark.”

http://blog.okcupid.com/index.php/the-biggest-lies-in-online-dating/ Statistics 101 (Nicole Dalzell) U2 - L2: Normal distribution May 20, 2015 28 / 1

slide-8
SLIDE 8

Normal distribution

Heights of females

“When we looked into the data for women, we were surprised to see height exaggeration was just as widespread, though without the lurch towards a benchmark height.”

http://blog.okcupid.com/index.php/the-biggest-lies-in-online-dating/ Statistics 101 (Nicole Dalzell) U2 - L2: Normal distribution May 20, 2015 29 / 1 Normal distribution Normal distribution model

Normal distribution

Denoted as N(µ, σ) → Normal with mean µ and standard deviation σ Unimodal and symmetric, bell shaped curve, that also follows very strict guidelines about how variably the data are distributed around the mean Therefore while most variables are nearly normal, but none are exactly normal

Statistics 101 (Nicole Dalzell) U2 - L2: Normal distribution May 20, 2015 30 / 1 Normal distribution 68-95-99.7 Rule

68-95-99.7 Rule

µ − 3σ µ − 2σ µ − σ µ µ + σ µ + 2σ µ + 3σ 99.7% 95% 68%

For nearly normally distributed data,

about 68% falls within 1 SD of the mean, about 95% falls within 2 SD of the mean, about 99.7% falls within 3 SD of the mean.

It is possible for observations to fall 4, 5, or more standard deviations away from the mean, but these occurrences are very rare if the data are nearly normal.

Statistics 101 (Nicole Dalzell) U2 - L2: Normal distribution May 20, 2015 31 / 1 Normal distribution 68-95-99.7 Rule

Describing variability using the 68-95-99.7 Rule

SAT scores are distributed nearly normally with mean 1500 and standard deviation 300.

600 900 1200 1500 1800 2100 2400 99.7% 95% 68%

∼68% of students score between 1200 and 1800 on the SAT. ∼95% of students score between 900 and 2100 on the SAT. ∼99.7% of students score between 600 and 2400 on the SAT.

Statistics 101 (Nicole Dalzell) U2 - L2: Normal distribution May 20, 2015 32 / 1

slide-9
SLIDE 9

Normal distribution Standardizing with Z scores

SAT scores are distributed nearly normally with mean 1500 and stan- dard deviation 300. ACT scores are distributed nearly normally with mean 21 and standard deviation 5. A college admissions officer wants to determine which of the two applicants scored better on their stan- dardized test with respect to the other test takers: Pam, who earned an 1800 on her SAT, or Jim, who scored a 24 on his ACT?

600 900 1200 1500 1800 2100 2400 Pam 6 11 16 21 26 31 36 Jim

Statistics 101 (Nicole Dalzell) U2 - L2: Normal distribution May 20, 2015 33 / 1 Normal distribution Standardizing with Z scores

Standardizing with Z scores

Since we cannot just compare these two raw scores, we instead compare how many standard deviations beyond the mean each

  • bservation is.

Pam’s score is 1800−1500

300

= 1 standard deviation above the mean.

Jim’s score is 24−21

5

= 0.6 standard deviations above the mean.

−2 −1 1 2 Pam Jim

Statistics 101 (Nicole Dalzell) U2 - L2: Normal distribution May 20, 2015 34 / 1 Normal distribution Standardizing with Z scores

Standardizing with Z scores (cont.)

These are called standardized scores, or Z scores. Z score of an observation is the number of standard deviations it falls above or below the mean. Z scores Z = observation − mean SD Z scores are defined for distributions of any shape, but only when the distribution is normal can we use Z scores to calculate percentiles. Observations that are more than 2 SD away from the mean (|Z| > 2) are usually considered unusual.

Statistics 101 (Nicole Dalzell) U2 - L2: Normal distribution May 20, 2015 35 / 1 Normal distribution Standardizing with Z scores

Participation question Scores on a standardized test are normally distributed with a mean of 100 and a standard deviation of 20. If these scores are converted to standard normal Z scores, which of the following statements will be correct? (a) Both the mean and median score will equal 0. (b) The mean will equal 0, but the median cannot be determined. (c) The mean of the z-scores will equal 100. (d) The mean of the z-scores will equal 5.

Statistics 101 (Nicole Dalzell) U2 - L2: Normal distribution May 20, 2015 36 / 1

slide-10
SLIDE 10

Normal distribution Standardizing with Z scores

Approximating percentiles

Approximately what percent of students score below 1800 on the SAT? The mean SAT score is 1500, with a standard deviation of 300 (Hint: Use the 68-95-99.7% rule.)

600 900 1200 1500 1800 2100 2400

Statistics 101 (Nicole Dalzell) U2 - L2: Normal distribution May 20, 2015 37 / 1 Normal distribution Standardizing with Z scores

Percentiles

Percentile is the percentage of observations that fall below a given data point. Graphically, percentile is the area below the probability distribution curve to the left of that observation.

600 900 1200 1500 1800 2100 2400

Statistics 101 (Nicole Dalzell) U2 - L2: Normal distribution May 20, 2015 38 / 1 Normal distribution Standardizing with Z scores

Calculating percentiles - using computation

There are many ways to compute percentiles/areas under the curve: R: > pnorm(1800, mean = 1500, sd = 300) [1] 0.8413447 Applet: http://www.socr.ucla.edu/htmls/SOCR Distributions.html

Statistics 101 (Nicole Dalzell) U2 - L2: Normal distribution May 20, 2015 39 / 1 Normal distribution Standardizing with Z scores

Z-Scores

Z-Score The z-score for a data value, xi , is z = xi − ¯ x s Values farther from 0 are more extreme. A z-score puts values on a common scale A z-score is the number of standard deviations a value falls from the mean 95% of all z-scores fall between -2 and 2 . z-scores beyond -2 or 2 can be considered extreme

Statistics 101 (Nicole Dalzell) U2 - L2: Normal distribution May 20, 2015 40 / 1

slide-11
SLIDE 11

Normal distribution Standardizing with Z scores

Participation question Which of the following is false? (a) Z scores are helpful for determining how unusual a data point is compared to the rest of the data in the distribution. (b) Majority of Z scores in a right skewed distribution are negative. (c) Regardless of the shape of the distribution (symmetric vs. skewed) the Z score of the mean is always 0. (d) In a normal distribution, Q1 and Q3 are more than one SD away from the mean.

Statistics 101 (Nicole Dalzell) U2 - L2: Normal distribution May 20, 2015 41 / 1 Normal distribution Standardizing with Z scores

Calculating percentiles - using tables

Second decimal place of Z Z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.0

0.5000 0.5040 0.5080 0.5120 0.5160 0.5199 0.5239 0.5279 0.5319 0.5359

0.1

0.5398 0.5438 0.5478 0.5517 0.5557 0.5596 0.5636 0.5675 0.5714 0.5753

0.2

0.5793 0.5832 0.5871 0.5910 0.5948 0.5987 0.6026 0.6064 0.6103 0.6141

0.3

0.6179 0.6217 0.6255 0.6293 0.6331 0.6368 0.6406 0.6443 0.6480 0.6517

0.4

0.6554 0.6591 0.6628 0.6664 0.6700 0.6736 0.6772 0.6808 0.6844 0.6879

0.5

0.6915 0.6950 0.6985 0.7019 0.7054 0.7088 0.7123 0.7157 0.7190 0.7224

0.6

0.7257 0.7291 0.7324 0.7357 0.7389 0.7422 0.7454 0.7486 0.7517 0.7549

0.7

0.7580 0.7611 0.7642 0.7673 0.7704 0.7734 0.7764 0.7794 0.7823 0.7852

0.8

0.7881 0.7910 0.7939 0.7967 0.7995 0.8023 0.8051 0.8078 0.8106 0.8133

0.9

0.8159 0.8186 0.8212 0.8238 0.8264 0.8289 0.8315 0.8340 0.8365 0.8389

1.0

0.8413 0.8438 0.8461 0.8485 0.8508 0.8531 0.8554 0.8577 0.8599 0.8621

1.1

0.8643 0.8665 0.8686 0.8708 0.8729 0.8749 0.8770 0.8790 0.8810 0.8830

1.2

0.8849 0.8869 0.8888 0.8907 0.8925 0.8944 0.8962 0.8980 0.8997 0.9015 Statistics 101 (Nicole Dalzell) U2 - L2: Normal distribution May 20, 2015 42 / 1 Normal distribution Standardizing with Z scores

What percent of the standard normal distribution is above Z = 0.82? Choose the closest answer. (a) 79.4% (b) 20.6% (c) 82% (d) 18% (e) Need to be provided the mean and the standard deviation of the distribution in order to be able to solve this problem.

Statistics 101 (Nicole Dalzell) U2 - L2: Normal distribution May 20, 2015 43 / 1 Normal distribution Standardizing with Z scores

Example

The average daily temperature in June in LA is 77 F, with a standard deviation of 5 degrees. Suppose the temperatures in June closely follow a normal distribution. What is the probability of observing a temperature of at most 83 F on a randomly chosen day in June? ) T ∼ N (mean = 77, sd = 5) P(T ≤ 83) = P

  • Z ≤ 83 − 77

5

  • = P (Z ≤ 1.2) ≈ 0.885

The probability of observing a temperature of at most 83 F on a randomly chosen day in June is approximately 0.885, or 88.5%.

Statistics 101 (Nicole Dalzell) U2 - L2: Normal distribution May 20, 2015 44 / 1

slide-12
SLIDE 12

Normal distribution Standardizing with Z scores

Example (cont)

The average daily temperature in June in LA is 77 F, with a standard deviation of 5 degrees. Suppose the temperatures in June closely follow a normal distribution. What is the probability of observing a temperature of at least 83 F on a randomly chosen day in June? ) T ∼ N (mean = 77, sd = 5) P(T ≥ 83) = 1 − P(T ≤ 83) ≈ 1 − 0.885 = 0.115 The probability of observing a temperature of at least 83 F on a randomly chosen day in June is approximately 0.115, or 11.5%.

Statistics 101 (Nicole Dalzell) U2 - L2: Normal distribution May 20, 2015 45 / 1 Normal approximation to the binomial

An analysis of Facebook users

A recent study found that “Facebook users get more than they give”. For example: 40% of Facebook users in our sample made a friend request, but 63% received at least one request Users in our sample pressed the like button next to friends’ content an average of 14 times, but had their content “liked” an average of 20 times Users sent 9 personal messages, but received 12 12% of users tagged a friend in a photo, but 35% were themselves tagged in a photo Any guesses for how this pattern can be explained?

http://www.pewinternet.org/Reports/2012/Facebook-users/Summary.aspx Statistics 101 (Nicole Dalzell) U2 - L2: Normal distribution May 20, 2015 46 / 1 Normal approximation to the binomial

This study also found that approximately 25% of Facebook users are considered power users. The same study found that the average Face- book user has 245 friends. What is the probability that the average Facebook user with 245 friends has 70 or more friends who would be considered power users? We are given that n = 245, p = 0.25, and we are asked for the probability P(K ≥ 70). P(K ≥ 70) = P(K = 70 or K = 71 or K = 72 or · · · or K = 245)

= P(K = 70) + P(K = 71) + P(K = 72) + · · · + P(K = 245)

This seems like an awful lot of work...

Statistics 101 (Nicole Dalzell) U2 - L2: Normal distribution May 20, 2015 47 / 1 Normal approximation to the binomial

Histograms of number of successes

Hollow histograms of samples from the binomial model where p = 0.10 and n = 10, 30, 100, and 300. What happens as n increases?

n = 10 2 4 6 n = 30 2 4 6 8 10 n = 100 5 10 15 20 n = 300 10 20 30 40 50

Statistics 101 (Nicole Dalzell) U2 - L2: Normal distribution May 20, 2015 48 / 1

slide-13
SLIDE 13

Normal approximation to the binomial

Density Curves

A Density Curve is a smoothed density histogram where the area under the curve is 1. To draw a density curve from a histogram simply connect the peaks of a histogram with a smooth line, and normalize the values of the y-axis such that the area under the curve is 1.

Statistics 101 (Nicole Dalzell) U2 - L2: Normal distribution May 20, 2015 49 / 1 Normal approximation to the binomial

Normal approximation to the binomial

When the sample size is large enough, the binomial distribution with parameters n and p can be approximated by the normal model with parameters µ = np and σ =

  • np(1 − p).

In the case of the Facebook power users, n = 245 and p = 0.25.

µ = 245 × 0.25 = 61.25 σ = √

245 × 0.25 × 0.75 = 6.78 Bin(n = 245, p = 0.25) ≈ N(µ = 61.25, σ = 6.78).

k 20 40 60 80 100 0.00 0.01 0.02 0.03 0.04 0.05 0.06 Bin(245,0.25) N(61.5,6.78)

Statistics 101 (Nicole Dalzell) U2 - L2: Normal distribution May 20, 2015 50 / 1 Normal approximation to the binomial

Low large is large enough?

The sample size is considered large enough if the expected number

  • f successes and failures are both at least 10.

np ≥ 10 and n(1 − p) ≥ 10

Statistics 101 (Nicole Dalzell) U2 - L2: Normal distribution May 20, 2015 51 / 1 Normal approximation to the binomial

Participation question Below are four pairs of Binomial distribution parameters. Which distri- bution can be approximated by the normal distribution? (a) n = 100, p = 0.95 (b) n = 25, p = 0.45 (c) n = 150, p = 0.05 (d) n = 500, p = 0.015

Statistics 101 (Nicole Dalzell) U2 - L2: Normal distribution May 20, 2015 52 / 1

slide-14
SLIDE 14

Normal approximation to the binomial

Normal approximation to the binomial

When the sample size is large enough, the binomial distribution with parameters n and p can be approximated by the normal model with parameters µ = np and σ =

  • np(1 − p).

In the case of the Facebook power users, n = 245 and p = 0.25.

µ = 245 × 0.25 = 61.25 σ = √

245 × 0.25 × 0.75 = 6.78 Bin(n = 245, p = 0.25) ≈ N(µ = 61.25, σ = 6.78).

k 20 40 60 80 100 0.00 0.01 0.02 0.03 0.04 0.05 0.06 Bin(245,0.25) N(61.5,6.78)

Statistics 101 (Nicole Dalzell) U2 - L2: Normal distribution May 20, 2015 53 / 1 Normal approximation to the binomial

What is the probability that the average Facebook user with 245 friends has 70 or more friends who would be considered power users? (a) 0.0251 (b) 0.0985 (c) 0.1128 (d) 0.9015

Statistics 101 (Nicole Dalzell) U2 - L2: Normal distribution May 20, 2015 54 / 1 Normal approximation to the binomial

What is the probability that the average Facebook user with 245 friends has 70 or more friends who would be considered power users? (a) 0.0251 (b) 0.0985 (c) 0.1128 (d) 0.9015

Statistics 101 (Nicole Dalzell) U2 - L2: Normal distribution May 20, 2015 55 / 1 Application exercises Finding probabilities // Quality control

Six sigma

“The term “six sigma process” comes from the notion that if one has six standard deviations between the process mean and the nearest specification limit, as shown in the graph, practically no items will fail to meet specifications.”

http://en.wikipedia.org/wiki/Six Sigma Statistics 101 (Nicole Dalzell) U2 - L2: Normal distribution May 20, 2015 56 / 1

slide-15
SLIDE 15

Application exercises Finding probabilities // Quality control

At Heinz ketchup factory the amounts which go into bottles of ketchup are supposed to be normally distributed with mean 36 oz. and standard deviation 0.11 oz. Once every 30 minutes a bottle is selected from the production line, and its contents are noted precisely. If the amount of the bottle goes below 35.8 oz. or above 36.2 oz., then the bottle fails the quality control inspection. What percent of bottles pass the quality control inspection?

Statistics 101 (Nicole Dalzell) U2 - L2: Normal distribution May 20, 2015 57 / 1 Application exercises Finding cutoff points // Hot bodies

Body temperatures of healthy humans are distributed nearly normally with mean 98.2◦F and standard deviation 0.73◦F. What is the cutoff for the highest 10% of human body temperatures?

Mackowiak, Wasserman, and Levine (1992), A Critical Appraisal of 98.6 Degrees F , the Upper Limit of the Normal Body Temperature, and Other Legacies of Carl Reinhold August Wunderlick. Statistics 101 (Nicole Dalzell) U2 - L2: Normal distribution May 20, 2015 58 / 1 Application exercises Conditional probability // SAT scores

SAT scores (out of 2400) are distributed normally with mean 1500 and standard devi- ation 300. Suppose a school council awards a certificate of excellence to all students who score at least 1900 on the SAT. What percent of the students who received this certificate scored above 2100?

P(SAT > 2100 | SAT > 1900)

=

P(SAT > 2100 and SAT > 1900) P(SAT > 1900)

=

P(SAT > 2100) P(SAT > 1900) P(SAT > 2100)

=

P

2100 − 1500

300

  • =

P(Z > 2) = 1 − 0.9772 = 0.0228 P(X > 1900)

=

P(Z > 1.33) = 1 − 0.9082 = 0.0918 P(SAT > 2100 | SAT > 1900)

=

0.0228 0.0918 ≈ 0.25 → 25% of students

Statistics 101 (Nicole Dalzell) U2 - L2: Normal distribution May 20, 2015 59 / 1 Application exercises Finding missing parameters // Auto insurance premiums

Suppose a newspaper article states that the distribution of auto insurance premiums for residents of California is approximately normal with a mean of $1,650. The article also states that 25% of California residents pay more than $1,800.

  • 1. What is the standard deviation of this distribution?
  • 2. What is the IQR of this distribution?

Statistics 101 (Nicole Dalzell) U2 - L2: Normal distribution May 20, 2015 60 / 1

slide-16
SLIDE 16

To Do

To Do

PS 3 due tomorrow in class Reading Assignment for Friday:

Chapter 4 Sections 4.1 - 4.2.3( A sampling Distribution for the mean)

Statistics 101 (Nicole Dalzell) U2 - L2: Normal distribution May 20, 2015 61 / 1