SLIDE 1
Business Statistics CONTENTS Data as a random sample Probability - - PowerPoint PPT Presentation
Business Statistics CONTENTS Data as a random sample Probability - - PowerPoint PPT Presentation
BASIC PROBABILITY Business Statistics CONTENTS Data as a random sample Probability theory Be careful with probability Old exam question Further study DATA AS A RANDOM SAMPLE In business and economics, the data represents a small sample of
SLIDE 2
SLIDE 3
In business and economics, the data represents a small sample of the phenomenon of interest
▪ poll with random 100 telephone calls ▪ interviews with random 25 customers ▪ batch of random 50 meals for food quality ▪ car accidents on 5 random days in a year ▪ etc.
All are supposed to represent a bigger population
▪ the entire population of the US (or only those that have a telephone?) ▪ all of your customers (or all of your customers that have a phone, e- mail, or that visit your shop) ▪ all meals produced in some factory or some restaurant ▪ all accidents in a particular year, particular country, ... ▪ etc.
DATA AS A RANDOM SAMPLE
SLIDE 4
Task: to infer properties of the population from the sample
▪ Inferential statistics ▪ example: 12 out of 100 random clients liked our new product – what proportion of the total population will like it?
In order to get an understanding: study the variation of samples
▪ Probability theory ▪ example: if 15% of the population likes our new product, what is the probability that we find 12 in a random sample of 100?
DATA AS A RANDOM SAMPLE
SLIDE 5
Sample space 𝑇: all possible outcomes of a random experiment
▪ coin: 𝑇 = heads, tails ▪ die: 𝑇 = 1,2,3,4,5,6 or ▪ number of accidents on a day: 𝑇 ⊂ ℕ0 ▪ body height: 𝑇 ⊂ ℝ+ ▪ two dice: 𝑇 = 1,1 , 1,2 , … , 6,6
- r
PROBABILITY THEORY
⊂ means “is a subset of”
SLIDE 6
Probability 𝑄: likelihood of a particular outcome
▪ coin: 𝑄 heads = 1
2
▪ die: 𝑄 3 = 1
6
▪ number of accidents on a day: 𝑄 1 = 0.19 (for example) ▪ body height: 𝑄 1.8304678 = 0 (why?)
Types of probabilty
▪ a priori (classical theory) ▪ 𝑄 head = 1
2 or 𝑄 event = number of outcomes in event number of possible outcomes
▪ empirical ▪ 𝑄 event = number of occurrences of event
number of observations
▪ subjective ▪ 𝑄 Italy will win next World Cup = ⋯
PROBABILITY THEORY
SLIDE 7
Probability function 𝑄 event
▪ for every event 𝐵 ⊂ 𝑇: 𝑄 𝐵 ≥ 0 ▪ for entire sample space 𝑇: 𝑄 𝑇 = 1 ▪ for 𝐵 = ∅: 𝑄 𝐵 = 0
Events:
▪ elementary (e.g. for a die, outcome = 3) ▪ compound (e.g. for a die, outcome = even)
Example: die with 𝑇 = 1,2,3,4,5,6
▪ 𝑄 3 =
1 6
▪ 𝑄 even = 𝑄 2,4,6 =
3 6 = 1 2
▪ 𝑄 even or odd = 𝑄 𝑇 = 1 ▪ 𝑄 −1 = 𝑄 7 = 𝑄 2.43 = 𝑄 ∅ = 0
PROBABILITY THEORY
∅ is the empty set
SLIDE 8
The complement of an event 𝐵 is denoted by 𝐵′ and consists
- f everything in the sample space 𝑇 except event 𝐵
Since 𝐵 and 𝐵′ together comprise the entire sample space 𝑇, 𝑄 𝐵 + 𝑄 𝐵′ = 1 or 𝑄 𝐵′ = 1 − 𝑄 𝐵 PROBABILITY THEORY
SLIDE 9
The union of two events consists of all outcomes in the sample space 𝑇 that are contained either in compound event 𝐵 (e.g., “≤ 3”) or in compound event 𝐶 (e.g., “even”) or both
▪ denoted by 𝐵 ∪ 𝐶 ▪ pronounced “𝐵 or 𝐶”
PROBABILITY THEORY
“or” means “and/or”, so not the exclusive or (as in “either 𝐵 or 𝐶”) 1 3 2 4 6
SLIDE 10
The intersection of two events 𝐵 and 𝐶 is the event consisting
- f all outcomes in the sample space 𝑇 that are contained in
both event 𝐵 and event 𝐶
▪ denoted by 𝐵 ∩ 𝐶 ▪ pronounced “𝐵 and 𝐶” ▪ also known as the joint probability
PROBABILITY THEORY
1 3 2 4 6
SLIDE 11
Given are two dies, their random outcomes are 𝑌 and 𝑍.
- a. Find 𝑄
𝑌 = 2 ∩ 𝑍 = 3 ∪ 𝑌 = 3 ∩ 𝑍 = 2
- b. Find 𝑄 𝑌 + 𝑍 = 5
EXERCISE 1
SLIDE 12
The general law of addition states that 𝑄 𝐵 ∪ 𝐶 = 𝑄 𝐵 + 𝑄 𝐶 − 𝑄 𝐵 ∩ 𝐶
▪ when you add the 𝑄(𝐵) and 𝑄(𝐶) together, you count the 𝑄(𝐵 ∩ 𝐶) twice ▪ so, you have to subtract 𝑄(𝐵 ∩ 𝐶) to avoid over-stating the probability ▪ often, the right-hand side is easier to find
PROBABILITY THEORY
SLIDE 13
Example: standard deck of cards
▪ 52 cards ▪ 4 queens: 𝑄 𝑅 =
4 52
▪ 26 red cards: 𝑄 𝑆 =
26 52
▪ 2 red queens: 𝑄 𝑅 ∩ 𝑆 =
2 52
▪ the probability that a random card is red or a queen: 𝑄 𝑅 ∪ 𝑆 = 𝑄 𝑅 + 𝑄 𝑆 − 𝑄 𝑅 ∩ 𝑆 =
4+26−2 52
=
28 52
▪ so 53.85%
PROBABILITY THEORY
SLIDE 14
Events 𝐵 and 𝐶 are mutually exclusive (or disjoint) if their intersection is the null set (∅) that contains no elements
▪ if 𝐵 ∩ 𝐶 = ∅, then 𝑄 𝐵 ∩ 𝐶 = 0
In the case of mutually exclusive events, the addition law reduces to:
▪ 𝑄 𝐵 ∪ 𝐶 = 𝑄 𝐵 + 𝑄 𝐶
Example:
▪ 𝑄 𝑅 ∪ 𝐿 = 𝑄 𝑅 + 𝑄 𝐿 because 𝑅 ∩ 𝐿 = ∅ ▪ more general: 𝑄 𝑅 ∪ 𝑅′ = 𝑄 𝑅 + 𝑄 𝑅′ because 𝑅 ∩ 𝑅′ = ∅
PROBABILITY THEORY
SLIDE 15
Events are collectively exhaustive if their union is the entire sample space 𝑇 Two mutually exclusive, collectively exhaustive events are dichotomous (or binary) events
▪ for example, a car repair is either covered by the warranty (𝐵) or not (𝐵′)
Mutually exclusive, collectively exhaustive events are a partition of the sample space PROBABILITY THEORY
SLIDE 16
The probability of event 𝐵 given that event 𝐶 has occurred
▪ written as 𝑄 𝐵 𝐶 ▪ pronounced as “probability of 𝐵 given 𝐶” ▪ for example, the probability of passing the statistics exam, given that you have passed the mathematics exam
𝑄 𝐵 𝐶 = 𝑄 𝐵 ∩ 𝐶 𝑄 𝐶
▪ only defined when 𝑄 𝐶 > 0 ▪ this is a conditional probability
PROBABILITY THEORY
SLIDE 17
Example: highs school drop-outs Facts about population aged 16-21 and not in college:
▪ unemployed (U): 12% ▪ high school drop-out (D): 30% ▪ unemployed high school drop-out: 4%
What is the conditional probability that a member of this population is unemployed, given that the person is a high school dropout?
▪ 𝑄 𝑉 𝐸 =
𝑄 𝑉∩𝐸 𝑄 𝐸
=
0.04 0.30 = 0.133
Compare to 𝑄 𝑉 = 0.12
▪ so, being a high school dropout is related to being unemployed
PROBABILITY THEORY
Here is a first example of learning from data, although it is not yet inferential statistics (why not?)
SLIDE 18
Given are two dies, their random outcomes are 𝑌 and 𝑍.
- a. Find 𝑄 𝑌 + 𝑍 > 4 𝑌 < 2
- b. Find 𝑄 𝑌 = 3 𝑍 ≤ 2
EXERCISE 2
SLIDE 19
Events 𝐵 and 𝐶 are independent if the 𝑄 𝐵 𝐶 = 𝑄 𝐵 (for 𝑄 𝐶 > 0)
▪ 𝑄 𝑉 𝐸 = 0.133 ≠ 𝑄 𝑉 = 0.12 ▪ so 𝑉 and 𝐸 are not independent ▪ that is, they are dependent
Another way to check for independence: 𝑄 𝐵 ∩ 𝐶 = 𝑄 𝐵 𝑄 𝐶 ⇔ ⇔ 𝑄 𝐵 =
𝑄 𝐵 ×𝑄 𝐶 𝑄 𝐶
=
𝑄 𝐵∩𝐶 𝑄 𝐶
= 𝑄 𝐵 𝐶 , so 𝑄 𝐵 ∩ 𝐶 = 𝑄 𝐵 𝑄 𝐶 ⇔ 𝐵 and 𝐶 are independent PROBABILITY THEORY
even when 𝑄 𝐵 = 0 or 𝑄 𝐶 = 0
SLIDE 20
Contingency tables (see also summarizing data)
▪ from frequencies ... ▪ to proportions
PROBABILITY THEORY
SLIDE 21
Table contains
▪ joint probabilities, like 𝑄 Christ ∩ GATT = 0.36 ▪ marginal (simple) probabilities, like 𝑄 Christ = 0.49
Table can be used to find
▪ conditional probabilities, like 𝑄 Christ GATT =
0.36 0.78 = 0.46
▪ dependence: 𝑄 Christ GATT ≠ 𝑄 Christ ▪ later, we will develop a statistical test for assessing this for inference from the sample to the population
PROBABILITY THEORY
SLIDE 22
Independence
▪ flipping a fair coin 4 times, what is the probability of obtaining 4 heads? ▪ you are given that the outcomes are independent ▪ 𝐼1 means “heads” in experiment 1, 𝑈
1 “tails”, etc;
▪ 𝑄 4 heads = 𝑄 𝐼1𝐼2𝐼3𝐼4 = 𝑄 𝐼1 × 𝑄 𝐼2 × 𝑄 𝐼3 × 𝑄 𝐼4 =
1 2 × 1 2 × 1 2 × 1 2 = 1 16 ≈ 6%
Independence is often not realistic in business problems
▪ fashion: 𝑄 I buy a red shirt depends on what others do ▪ supply: 𝑄 I buy a red shirt depends on what is available in the shops ▪ combi: 𝑄 I buy a red shirt depends on if I also buy a red coat ▪ etc.
PROBABILITY THEORY
SLIDE 23
- Problem: aggregation
- Context: which of two medicines (A and B) is best?
- Facts:
▪ medicine A is better than B for treating cold ▪ medicine A is better than B for treating flue ▪ but altogether B looks better
BE CAREFUL WITH PROBABILITY
The most effective medicine Recovery Treatment A Treatment B Cold 93% 87% Flu 73% 69% Both 78% 83%
Treatment A Treatment B Yes No Tot Yes No Tot Cold 81 6 87 234 36 270 Flu 192 71 263 55 25 80 273 350 289 350
81 87 = 0.93
SLIDE 24
- Problem: aggregation
- Context: which of two football players (A and B) should
take a penalty?
- Facts:
▪ in 2012 A scored 8/12=0.67 and B scored 3/4=0.75 ▪ in 2013 A scored 1/6=0.17 and B scored 1/5=0.20 ▪ so B was best in both years separately ▪ but in 2012-2013 A scored 9/18=0.50 and B scored 4/9=0.44 ▪ so A was best in the two combined years
BE CAREFUL WITH PROBABILITY
SLIDE 25
- Problem: difference in point of view
- Context: what is the class size of an elementary school?
- Facts:
▪ there are 3 classes with 30 pupils and 3 classes with 10 pupils ▪ the director tells that average class size is
3×30+3×10 3+3
= 20 ▪ but pupils tell that average class size is
90×30+30×10 90+30
= 25
- What do we mean by “mean”?
BE CAREFUL WITH PROBABILITY
SLIDE 26
- Problem: unrepresentative means
- Context: if events are unlikely, the statistics are
unreliable
- Facts:
▪ before July 2000, Concorde was one of the safest aircraft ▪ after July 2000, it was one of the unsafest ever
BE CAREFUL WITH PROBABILITY
SLIDE 27
30 June 2014, Q1a-c OLD EXAM QUESTION
SLIDE 28