VIRTUAL CONFERENCE ictcm.com | #ICTCM ENHANCING A PROBABILITY - - PowerPoint PPT Presentation

virtual conference
SMART_READER_LITE
LIVE PREVIEW

VIRTUAL CONFERENCE ictcm.com | #ICTCM ENHANCING A PROBABILITY - - PowerPoint PPT Presentation

32 nd International Conference on Technology in Collegiate Mathematics VIRTUAL CONFERENCE ictcm.com | #ICTCM ENHANCING A PROBABILITY THEORY COURSE USING R RYAN RAHRIG, PH.D. ASSOCIATE PROFESSOR OF STATISTICS OHIO NORTHERN UNIVERSITY


slide-1
SLIDE 1

32nd International Conference on Technology in Collegiate Mathematics

ictcm.com | #ICTCM

VIRTUAL CONFERENCE

slide-2
SLIDE 2

ENHANCING A PROBABILITY THEORY COURSE USING R

RYAN RAHRIG, PH.D. ASSOCIATE PROFESSOR OF STATISTICS OHIO NORTHERN UNIVERSITY

slide-3
SLIDE 3

BACKGROUND

  • Increased effort to use R in the classroom for our STAT majors
slide-4
SLIDE 4

GETTING R INTO MATH 4651

  • Is there any benefit to incorporating R into a Probability Theory course?
  • Reinforce basic concept of probability
  • Provide check for analytical solutions
  • Foster students’ curiosity
  • Explore probing, follow-up questions to standard problems
slide-5
SLIDE 5

BASIC DEFINITION OF PROBABILTY

  • Introduction to the Practice of Statistics (Moore, et al.)

… the proportion of times the outcome would occur in a very long series of repetitions

  • Fundamentals of Statistics (Sullivan)

…the long term proportion in which a certain outcome is observed

  • Mathematical Statistics (Wackerly, et al.)

…the stable long-term relative frequency

slide-6
SLIDE 6

THE CLASSIC BIRTHDAY PROBLEM!

  • In a set of n randomly selected people, what is the probability that two

people share the same birthday?

  • Surprising result: probability is 50% or more if n ≥ 23
slide-7
SLIDE 7

THE CLASSIC BIRTHDAY PROBLEM!

  • Standard solution:

) ( 1 ) 1 ( = − = ≥ X P X P 365 343 365 363 365 364 365 365 1 × × × × − = 

23 23 365

365 1 P − = 507297 . ≈

slide-8
SLIDE 8

LET R HAVE A TURN!

  • Simulations in R not only provide a check on the answer, but can help

students “see” the concept of probability.

  • What happens in the long run when the birthday experiment is

conducted many, many times?

slide-9
SLIDE 9

BUILDING THE R CODE

  • Perform the experiment once (get n birthdays):

Bdays <- sample(1:365, size=n, replace=TRUE)

  • Put birthdays in ascending order and then find differences b/w consecutive

elements D <- diff(sort(Bdays))

  • Save whether there’s a match (if any differences are 0):

Results[i] <- any(D==0)

  • Repeat many times!
slide-10
SLIDE 10

R CODE FOR SIMULATION

n <- 23 Results <- numeric(1000000) for(i in 1:1000000) { Bdays <- sample(1:365, size=n, replace=TRUE) # Get n birthdays D <- diff(sort(Bdays)) Results[i] <- any(D==0) } mean(Results) # proportion in Results that are TRUE

slide-11
SLIDE 11

SIMULATION RESULTS

First execution: [1] 0.507737 Second execution: [1] 0.507253 ***Recall exact value: 0.507297

slide-12
SLIDE 12

FURTHER EXPLORATIONS

  • The real beauty of using this approach is that the problem doesn’t have

to end there.

  • Slight variations of the birthday problem

 Difficult to solve analytically  Simple to simulate using R

slide-13
SLIDE 13

FURTHER EXPLORATIONS

  • What about probability of near birthdays (e.g. birthdays within a day of

each other)?

  • Modify

Results[i] <- any(D==0) to Results[i] <- any(D<=1) *

*Also need to handle case of Dec. 31 and Jan 1

slide-14
SLIDE 14

FURTHER EXPLORATIONS

  • What is the probability that 3 (or more) in the room have a common

birthday? Modify D <- diff(sort(Bdays)) Results[i] <- any(D==0) to D <- diff(sort(Bdays),lag = 2) Results[i] <- any(D==0)

slide-15
SLIDE 15

ASSIGNMENT

  • As capstone advisor, noticed trend of students struggling to come with ideas

for their projects.

  • Need more opportunities for students to be the ones to come up with the

questions.

  • Using R allows students to come up with additional variations actually solve

them.

slide-16
SLIDE 16

OTHER VARIATIONS

  • Average number in the room that share a birthday?
  • Average number that walk in before first match?
  • Non-uniform birthdays
  • Generalized problem (n something different than 365)
  • And many more…
slide-17
SLIDE 17

MORE ADVANCED EXAMPLE

Engineer at Marathon Petroleum wanted this problem solved:

  • Say you have a population set of 200 people.
  • Of those 200 people, 125 names are drawn as winners
  • 9 consecutive drawings in all
  • For x = 0, 1,…, 125, what is the probability that exactly x persons win in all 9

drawings?

slide-18
SLIDE 18

MORE ADVANCED EXAMPLE

Context: There are certain pipeline systems that administer a ‘lottery’ system where 125 shippers

  • ut of 200 are randomly selected each month.

If shipper wins 9 months in a row  good for shipper, bad for Marathon… For x = 1, 2,…, 125, By knowing the probability that x could graduate (win 9 months in a row) with 200 new shippers, we can use those probabilities to manage the risk.

slide-19
SLIDE 19

STUDENT APPROACH

When given this problems, students tend to go with their first inclination: ENUMERATE Suppose x=5. There are total possible outcomes and determining how many of these result in exactly 5 winning all 9 months gets out of hand very quickly. Before students give up, get them to think about whether the Marathon engineer needs a nice exact answer or if a very good approximation will do the job. ( ) 9

56 9

10 * 6885 . 1 125 200

≈      

slide-20
SLIDE 20

USING R

Using R to approximate the probability again allows the student to more literally follow the definition of probability. Coding the simulation forces the student to articulate the experiment in detail since the idea is to repeat the experiment many, many times.

  • How to simulate doing the experiment over and over?
  • How to determine proportion for each possibility?
slide-21
SLIDE 21

USING R

  • Run the entire lottery one time (Pick 125 #’s out of 200 nine different times) :

L <- matrix(, nrow = 9, ncol = 125) for (i in 1:9) { L[i,] <- sample(1:200,size=125) }

Snippet of possible L:

slide-22
SLIDE 22

USING R

Count how many times each number appears: T<-table(L) Count how many were picked all 9 times (and save that for later): numWon9[k] <- sum(T==9)

slide-23
SLIDE 23

USING R

Repeat the procedure many, many (nsim) times by wrapping it in a for loop: for (k in 1:nsim) { L <- matrix(, nrow = 9, ncol = 125) for (i in 1:9) { L[i,] <- sample(1:200,size=125) } T<-table(L) numWon9[k] <- sum(T==9) }

slide-24
SLIDE 24

USING R

For x = 0, 1,…, 125, compute proportion where x were selected all 9 times:

Final <- data.frame(num=0:125,Probs=0) for (x in 0:125) { Final[x+1,2] <- sum(numWon9==x)/nsim }

slide-25
SLIDE 25

FULL R SCRIPT RESULTS

nsim <- 1000000 num9 <- numeric(nsim) for (k in 1:nsim) { L <- matrix(, nrow = 9, ncol = 125) for (i in 1:9) { L[i,] <- sample(1:200,size=125) } T<-table(L) num9[k] <-sum(T==9) } Final <- data.frame(num=0:125,Probs=0) for (i in 0:125) { Final[i+1,2] <- sum(num9==i)/nsim }

slide-26
SLIDE 26

EXACT SOLUTION – A MARKOV CHAIN!

Having successfully found an answer, students motivated to find the exact answer.

  • Random process turns out to be a discrete-time homogeneous Markov chain,

with the matrix M being the transition matrix.

  • If students have already learned Markov Chains, can go over solution.
  • If not, can point out how using R allowed them to solve a complicated probability problem

simply using the basic concept of probability and a little bit of programming.

slide-27
SLIDE 27

32nd International Conference on Technology in Collegiate Mathematics VIRTUAL CONFERENCE

#ICTCM

Contact Information

Ryan Rahrig Associate Professor of Statistics Ohio Northern University r-rahrig@onu.edu