[PPT] - MAU22S06 Numerical and data analysis techniques Slides by Mike PowerPoint Presentation

SLIDE 1

MAU22S06 Numerical and data analysis techniques

Slides by Mike Peardon (2011) minor modifs. by S. Sint

School of Mathematics Trinity College Dublin

Hilary T erm 2019

Slides by Mike Peardon (2011) minor modifs. by S. Sint (TCD) MAU22S06 - Data analysis Hilary T erm 2019 1 / 96

SLIDE 2

Probability

Slides by Mike Peardon (2011) minor modifs. by S. Sint (TCD) MAU22S06 - Data analysis Hilary T erm 2019 2 / 96

SLIDE 3

Sample space

Consider performing an experiment where the outcome is purely randomly determined and where the experiment has a set of possible outcomes.

Sample Space

A sample space, S associated with an experiment is a set such that:

1

each element of S denotes a possible outcome O of the experiment and

2

performing the experiment leads to a result corresponding to one element of S in a unique way. Example: flipping a coin - choose the sample space S = {H, T} corresponding to coin landing heads or tails. Not unique: choose the sample space S = {L} corresponding to coin just landing. Not very useful!

Slides by Mike Peardon (2011) minor modifs. by S. Sint (TCD) MAU22S06 - Data analysis Hilary T erm 2019 3 / 96

SLIDE 4

Events

An event, E can be defined for a sample space S if a question can be put that has an unambiguous answer for all outcomes in S. E is the subset of S for which the question is true. Example 1: T wo coin flips, with S = {HH, HT, TH, TT}. Define the event E1T = {HT, TH}, which corresponds to

ne and only one tail landing.

Example 2: T wo coin flips, with S = {HH, HT, TH, TT}. Define the event E≥1T = {HT, TH, TT}, which corresponds to at least one tail landing.

Slides by Mike Peardon (2011) minor modifs. by S. Sint (TCD) MAU22S06 - Data analysis Hilary T erm 2019 4 / 96

SLIDE 5

Probability measure

Can now define a probability model, which consists of a sample space, S a collection of events (which are all subsets of S) and a probability measure.

Probability measure

The probability measure assigns to each event E a probability P(E), with the following properties:

1

P(E) is a non-negative real number with 0 ≤ P(E) ≤ 1.

2

P(∅) = 0 (∅ is the empty set event).

3

P(S) = 1 and

4

P is additive, meaning that if E1, E2, . . . is a sequence of disjoint events then P(E1 ∪ E2 ∪ . . . ) = P(E1) + P(E2) + . . . T wo events are disjoint if they have no common outcomes

Slides by Mike Peardon (2011) minor modifs. by S. Sint (TCD) MAU22S06 - Data analysis Hilary T erm 2019 5 / 96

SLIDE 6

Probability measure (2)

Venn diagrams give a very useful way of visualising probability models. Example: Ec ⊂ S is the complement to event E, and is the set of all outcomes NOT in E (ie Ec = {x : x / ∈ E}). The probability of an event is visualised as the area of the region in the Venn diagram.

E E S

C

The intersection A ∩ B and union A ∪ B of two events can be depicted ...

Slides by Mike Peardon (2011) minor modifs. by S. Sint (TCD) MAU22S06 - Data analysis Hilary T erm 2019 6 / 96

SLIDE 7

Probability measure (3)

The intersection of two subsets A ⊂ S and B ⊂ S A ∩ B = {x : x ∈ A and x ∈ B}

S

B A A B

S

B A A B

The union of two subsets A ⊂ S and B ⊂ S A ∪ B = {x : x ∈ A or x ∈ B}

Slides by Mike Peardon (2011) minor modifs. by S. Sint (TCD) MAU22S06 - Data analysis Hilary T erm 2019 7 / 96

SLIDE 8

Probability measure (4)

The Venn diagram approach makes it easy to remember:

P(Ec) = 1 − P(E) P(A ∪ B) = P(A) + P(B) − P(A ∩ B)

Also define the conditional probability P(A|B), which is the probability event A occurs, given event B has occured. Since event B occurs with probability P(B) and both events A and B occur with probability P(A ∩ B) then the conditional probability P(A|B) can be computed from

Conditional probability

P(A|B) = P(A ∩ B) P(B)

Slides by Mike Peardon (2011) minor modifs. by S. Sint (TCD) MAU22S06 - Data analysis Hilary T erm 2019 8 / 96

SLIDE 9

Conditional probability (1)

Conditional probability describes situations when partial information about outcomes is given

Example: coin tossing

Three fair coins are flipped. What is the probability that the first coin landed heads given exactly two coins landed heads? S = {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT} A = {HHH, HHT, HTH, HTT} and B = {HHT, HTH, THH} A ∩ B = {HHT, HTH} P(A|B) = P(A∩B)

P(B)

= 2/8

3/8 = 2 3

Answer: 2

3

Slides by Mike Peardon (2011) minor modifs. by S. Sint (TCD) MAU22S06 - Data analysis Hilary T erm 2019 9 / 96

SLIDE 10

Conditional probability (2)

Bayes’ theorem

For two events A and B with P(A) > 0 and P(B) > 0 we have P(A|B) = P(A) P(B) P(B|A)

Thomas Bayes (1702-1761)

Slides by Mike Peardon (2011) minor modifs. by S. Sint (TCD) MAU22S06 - Data analysis Hilary T erm 2019 10 / 96

SLIDE 11

Partitions of state spaces

Suppose we can completely partition S into n disjoint events, A1, A2, . . . An, so S = A1 ∪ A2 ∪ · · · ∪ An. Now for any event E, we find P(E) = P(E|A1)P(A1) + P(E|A2)P(A2) + . . . P(E|An)P(An) This result is seen by using the conditional probability theorem and additivity property of the probability

measure. It can be remembered with the Venn diagram:

A1

E

A A A A

2 4 5 3

S

Slides by Mike Peardon (2011) minor modifs. by S. Sint (TCD) MAU22S06 - Data analysis Hilary T erm 2019 11 / 96

SLIDE 12

A sobering example

With the framework built up so far, we can make powerful (and sometimes surprising) predictions...

Diagnostic accuracy

A new clinical test for swine flu has been devised that has a 95% chance of finding the virus in an infected patient. Unfortunately, it has a 1% chance of indicating the disease in a healthy patient (false positive). One person per 1, 000 in the population is infected with swine flu. What is the probability that an individual patient diagnosed with swine flu by this method actually has the disease?

Slides by Mike Peardon (2011) minor modifs. by S. Sint (TCD) MAU22S06 - Data analysis Hilary T erm 2019 12 / 96

SLIDE 13

A sobering example

With the framework built up so far, we can make powerful (and sometimes surprising) predictions...

Diagnostic accuracy

A new clinical test for swine flu has been devised that has a 95% chance of finding the virus in an infected patient. Unfortunately, it has a 1% chance of indicating the disease in a healthy patient (false positive). One person per 1, 000 in the population is infected with swine flu. What is the probability that an individual patient diagnosed with swine flu by this method actually has the disease? Answer: about 8.7%

Slides by Mike Peardon (2011) minor modifs. by S. Sint (TCD) MAU22S06 - Data analysis Hilary T erm 2019 12 / 96

SLIDE 14

The Monty Hall problem

When it comes to probability, intuition is

ften not very helpful...

The Monty Hall problem

In a gameshow, a contestant is shown three doors and asked to select one. Hidden behind one door is a prize and the contestant wins the prize if it is behind their chosen door at the end of the game. The contestant picks one of the three doors to start. The host then opens at random one of the remaining two doors that does not contain the prize. Now the contestant is asked if they want to change their mind and switch to the other, unopened door. Should they? Does it make any difference?

Slides by Mike Peardon (2011) minor modifs. by S. Sint (TCD) MAU22S06 - Data analysis Hilary T erm 2019 13 / 96

SLIDE 15

The Monty Hall problem

When it comes to probability, intuition is

ften not very helpful...

The Monty Hall problem

In a gameshow, a contestant is shown three doors and asked to select one. Hidden behind one door is a prize and the contestant wins the prize if it is behind their chosen door at the end of the game. The contestant picks one of the three doors to start. The host then opens at random one of the remaining two doors that does not contain the prize. Now the contestant is asked if they want to change their mind and switch to the other, unopened door. Should they? Does it make any difference? P(Win)=2/3 when switching, P(Win) = 1/3 otherwise

Slides by Mike Peardon (2011) minor modifs. by S. Sint (TCD) MAU22S06 - Data analysis Hilary T erm 2019 13 / 96

SLIDE 16

Solution to the Monty Hall problem

Assume the prize is behind door A and look at the 2 strategies (stay with initial choice vs. switching) separately: First strategy, stay with decision: the conditional winning probabilities when choosing doors A, B, or C are P(A|A) = 1, P(B|A) = 0, P(C|A) = 0 and one wins in 1 out of 3 cases. Second strategy, switch: the conditional winning probabilities are P(A|A) = 0, P(B|A) = 1, P(C|A) = 1 i.e. one wins in 2 out of 3 cases.

Slides by Mike Peardon (2011) minor modifs. by S. Sint (TCD) MAU22S06 - Data analysis Hilary T erm 2019 14 / 96

SLIDE 17

Independent events

Events A and B are said to be independent if P(B ∩ A) = P(A) × P(B) If P(A) > 0 and P(B) > 0, then independence implies both:

P(B|A) = P(B) and P(A|B) = P(A).

These results can be seen using the conditional probability result. Example: T wo coins are flipped where the probability the first lands on heads is 1/2 and similarly for the second. If these events are independent we can now show that all

utcomes in S = {HH, HT, TH, TT} have probability 1/4.

Slides by Mike Peardon (2011) minor modifs. by S. Sint (TCD) MAU22S06 - Data analysis Hilary T erm 2019 15 / 96

SLIDE 18

Summary

Defining a probability model means choosing a good sample space S, collection of events (which all correspond to subsets of S) and a probability measure defined on all the events. Events are called disjoint if they have no common

utcomes.

Understanding and remembering probability calculations

r results is often made easier by visualising with Venn

diagrams. The conditional probability P(A|B) is the probability event A occurs given event B also occured. Bayes’ theorem relates P(A|B) to P(B|A). Calculations are often made easier by partitioning state spaces - ie finding disjoint A1, A2, . . . An such that S = A1 ∪ A2 ∪ . . . An. Events are called independent if P(A ∩ B) = P(A) × P(B).

Slides by Mike Peardon (2011) minor modifs. by S. Sint (TCD) MAU22S06 - Data analysis Hilary T erm 2019 16 / 96

SLIDE 19

Binomial experiments

A binomial experiment

Binomial experiments are defined by a sequence of probabilistic trials where:

1

Each trial returns a true/false result

2

Different trials in the sequence are independent

3

The number of trials is fixed

4

The probability of a true/false result is constant

Usual question to ask - what is the probability the trial result is true k times out of n, given the probability of each trial being true is p?

Slides by Mike Peardon (2011) minor modifs. by S. Sint (TCD) MAU22S06 - Data analysis Hilary T erm 2019 17 / 96

SLIDE 20

Examples of binomial experiments

Examples and counter-examples

These examples are binomial experiments:

1

Flip a coin 10 times, does the coin land heads?

2

Ask the next ten people you meet if they like pizza

3

Screen 1000 patients for a virus ... and these are not: Flip a coin until it lands heads (not fixed number of trials) Ask the next ten people you meet their age (not true/false) Is it raining on the first Monday of each month? (not a constant probability)

Slides by Mike Peardon (2011) minor modifs. by S. Sint (TCD) MAU22S06 - Data analysis Hilary T erm 2019 18 / 96

SLIDE 21

Number of experiments with k true outcomes

Number of selections

There are n k

≡ nCk =

n! k!(n − k)! ways of having k out of n selections.

Coin flip outcomes

Example: how many outcomes of five coin flips result in the coin landing heads three times? Answer: n

k

=

5! 3!(5−3)! = 10

They are: {HHHTT, HHTHT, HHTTH, HTHHT, HTHTH, HTTHH, THHHT, THHTH, THTHH, TTHHH}

Slides by Mike Peardon (2011) minor modifs. by S. Sint (TCD) MAU22S06 - Data analysis Hilary T erm 2019 19 / 96

SLIDE 22

Probability of k out of n true trials

If the probability of each trial being true is p (and so the probability of it being false is q = 1 − p) ... and the selection trials are independent then...

Probability of k out of n true outcomes

Pk,n = n k

pkqn−k ≡

n k

pk(1 − p)n−k

We can compute this probability since we can count the number of cases where there are k true trials and each case has the same probability

Slides by Mike Peardon (2011) minor modifs. by S. Sint (TCD) MAU22S06 - Data analysis Hilary T erm 2019 20 / 96

SLIDE 23

Infinite state spaces

The set of outcomes of a probabilistic experiment may be an uncountably infinite set Here, the distinction between outcomes and events is more important: events can be assigned probabilities,

utcomes can’t

Outcomes described by a continuous variable

1

If I throw a coin and measure how far away it lands, the state space is described by the set of real numbers ≥ 0, Ω = [0, ∞)

2

I could also simultaneously see if it lands heads or tails. This set of outcomes is still “uncountably infinite”. The state space is now Ω = {H, T} × [0, ∞) Impossible to define probability the coin lands 1m away. Events can be defined - for example, an even might be “the coin lands heads more than 1m away.”

Slides by Mike Peardon (2011) minor modifs. by S. Sint (TCD) MAU22S06 - Data analysis Hilary T erm 2019 21 / 96

SLIDE 24

Random variables

T

be mathematically correct, random variables (or

random numbers) are neither variables nor numbers! They are functions taking an outcome and returning a number. Depending on the nature of the state-space, they can be discrete or continuous.

Random variables

A random variable X is a function that converts outcomes on a sample space Ω = {O1, O2, O3 . . . } to a number in {x1, x2, x3, . . . } so X(Oi) = xi

Example - heads you win ...

If I flip a coin and pay you e1 if it lands heads and you pay me e2 if it lands tails, then the money you get after playing this game is a random variable: Ω = {H, T}, X(H) = 1, X(T) = −2

Slides by Mike Peardon (2011) minor modifs. by S. Sint (TCD) MAU22S06 - Data analysis Hilary T erm 2019 22 / 96

SLIDE 25

Random variables

A discrete random variable only takes a countable set of non-zero values x1, x2, . . .. A discrete random variable defines a decomposition of the sample space Ω in mutually exclusive events Ω = E1 ∪ E2 ∪ . . . , Ei = {O ∈ Ω : X(O) = xi} ≡ {X = xi} Notation for probabilities of these events: P({O ∈ Ω : X(O) = xi}) ≡ P(X = xi) ≡ P(xi). Completeness then entails 1 = P(Ω) =

i

P(Ei) ≡

i

P(X = xi) ≡

i

P(xi)

Slides by Mike Peardon (2011) minor modifs. by S. Sint (TCD) MAU22S06 - Data analysis Hilary T erm 2019 23 / 96

SLIDE 26

Expected value of a random variable

Imagine we sample a random variable X lots of times and we know the probability different values will occur. We can guess what the average of all these samples will be: P(X = x1)x1 + P(X = x2)x2 + P(X = x3)x3 + . . .

Expected value

The expected value of a discrete random variable which can take any of N possible values is defined as E[X] =

N

i=1

X(Oi)P(X = X(Oi)) =

N

i=1

xiP(X = xi) It gives the average of n samples of the random variable as n gets large

Slides by Mike Peardon (2011) minor modifs. by S. Sint (TCD) MAU22S06 - Data analysis Hilary T erm 2019 24 / 96

SLIDE 27

Expected value (2)

Back to our example:

Heads you win ...

Before, we had X(H) = 1 and X(T) = −2. If both are equally likely (fair coin) then the expected value, E[X] = P(X = 1) × X(H) + P(X = −2) × X(T) = 1 2 × 1 + 1 2 × −2 = − 1 2 So playing n times you should expect to lose e n

2. Not a good

idea to play this game!

Slides by Mike Peardon (2011) minor modifs. by S. Sint (TCD) MAU22S06 - Data analysis Hilary T erm 2019 25 / 96

SLIDE 28

Expected value (3)

The expected value of a function f : R → R applied to our random variable can be defined easily too.

Expected value of a function

E[f(X)] =

N

i=1

f(xi)P(X = xi) T aking the expected values of two different random variables X and Y is linear i.e for constant numbers α, β we see E[αX + βY] = αE[X] + βE[Y]

Slides by Mike Peardon (2011) minor modifs. by S. Sint (TCD) MAU22S06 - Data analysis Hilary T erm 2019 26 / 96

SLIDE 29

Variance and standard deviation

Variance

The variance of X is defined as σ2

X = E[(X − μX)2] ≡ E[X2] − E[X]2

Standard deviation

The standard deviation of X, σX is the square root of the

variance. If X has units, σX has the same units.

The variance and standard deviation are non-negative: σX ≥ 0 They measure the amount a random variable fluctuates from sample-to-sample.

Slides by Mike Peardon (2011) minor modifs. by S. Sint (TCD) MAU22S06 - Data analysis Hilary T erm 2019 27 / 96

SLIDE 30

Variance (2)

Returning again to our game:

Heads you win ...

The variance of X can be computed. Recall that μX = − 1

2. The

variance is then σ2

X

= 1 2 ×

1 +

1 2 2 + 1 2 ×

−2 +

1 2 2 = 1 2 × 9 4 + 1 2 × 9 4 = 9 4 and the standard deviation of X is 3

2.

Slides by Mike Peardon (2011) minor modifs. by S. Sint (TCD) MAU22S06 - Data analysis Hilary T erm 2019 28 / 96

SLIDE 31

The expected number of successful trials

Consider the binomial experiment where n trials are performed with probability of success p Recall P(k) = n

k

pkqn−k ≡

n! k!(n−k)!pkqn−k with q = 1 − p

So the expected value of k is μX =

n

k=0

kP(k) =

n

k=0

k n! k!(n − k)! pkqn−k = np A bit more work gives σ2

X = npq

Slides by Mike Peardon (2011) minor modifs. by S. Sint (TCD) MAU22S06 - Data analysis Hilary T erm 2019 29 / 96

SLIDE 32

Poisson distribution

A limiting case for the binomial experiment can be considered by taking n → ∞, while keeping μ = n × p fixed. This models the number of times a random occurence happens in an interval (radioactive decay, for example). Now k, the number of times the event occurs becomes

The poisson distribution

For integer k, P(k) = μke−μ k! Check that ∞

k=0 P(k) = 1 ie. the probability is properly

normalised. Also find the expected value of X is just μ.

Slides by Mike Peardon (2011) minor modifs. by S. Sint (TCD) MAU22S06 - Data analysis Hilary T erm 2019 30 / 96

SLIDE 33

Poisson distribution (2)

Example: chirping crickets

A field full of crickets are chirping at random, with on average 0.6 chirps per second. Assuming the chirps obey the poisson distribution, what is the probability we hear at most 2 chirps in one second? Answer: P(0)+P(1)+P(2). P(0) = 0.60e−0.6 0! = e−0.6 (NB remember 0! = 1) P(1) = 0.61e−0.6 1! = 0.6e−0.6 and P(2) = 0.62e−0.6 2! = 0.18e−0.6 P(0) + P(1) + P(2) = 0.9768

Slides by Mike Peardon (2011) minor modifs. by S. Sint (TCD) MAU22S06 - Data analysis Hilary T erm 2019 31 / 96

SLIDE 34

Continuous random variables (1)

For continuous random variables X (one that can take any value in some range [a, b]), the sample space is (uncountably) infinite. Consider the event E which occurs when the random variable X ≤ x. NB: Big X ≡ random variable, little x ≡ reference point for E

Cumulative distribution function

The cumulative distribution function (cdf), FX(x) of a continuous random variable X is the probability of the event E : X ≤ x; FX(x) = P(X ≤ x) Since it is a probability, 0 ≤ FX(x) ≤ 1 If X is in the range [a, b] then FX(a) = 0 and FX(b) = 1. FX is monotonically increasing, which means that if q > p then FX(p) ≥ FX(q).

Slides by Mike Peardon (2011) minor modifs. by S. Sint (TCD) MAU22S06 - Data analysis Hilary T erm 2019 32 / 96

SLIDE 35

Continuous random variables (2)

Since E occurs when X < x, then Ec occurs when X ≥ x and so P(X ≤ x) + P(X > x) = 1 and P(X > x) = 1 − FX(x) T ake two events, A which occurs when X ≤ q and B when X > p and assume q > p.

p q

A B

The event A ∪ B always occurs (so P(A ∪ B) = 1) and A ∩ B

ccurs when p ≤ X < q

Since P(A ∪ B) = P(A) + P(B) − P(A ∩ B) we have P(p < X ≤ q) = FX(q) − FX(p)

Slides by Mike Peardon (2011) minor modifs. by S. Sint (TCD) MAU22S06 - Data analysis Hilary T erm 2019 33 / 96

SLIDE 36

Continuous random variables (3)

Example: exponential distribution

FX(x) =

1 − e−2x

when x ≥ 0 when x < 0 Describes random variable X in range [0, ∞]

0.5

0.5 1 1.5 2

x

0.2 0.4 0.6 0.8 1

FX(x)

What is probability X < 1? What is probability X ∈ [ 1

2, 1]?

Slides by Mike Peardon (2011) minor modifs. by S. Sint (TCD) MAU22S06 - Data analysis Hilary T erm 2019 34 / 96

SLIDE 37

Continuous random variables (3)

Example: exponential distribution

FX(x) =

1 − e−2x

when x ≥ 0 when x < 0 Describes random variable X in range [0, ∞]

0.5

0.5 1 1.5 2

x

0.2 0.4 0.6 0.8 1

FX(x)

What is probability X < 1? P(X ≤ 1) = FX(1) = 1 − e−2 = 0.864664 . . . What is probability X ∈ [ 1

2, 1]?

Slides by Mike Peardon (2011) minor modifs. by S. Sint (TCD) MAU22S06 - Data analysis Hilary T erm 2019 34 / 96

SLIDE 38

Continuous random variables (3)

Example: exponential distribution

FX(x) =

1 − e−2x

when x ≥ 0 when x < 0 Describes random variable X in range [0, ∞]

0.5

0.5 1 1.5 2

x

0.2 0.4 0.6 0.8 1

FX(x)

What is probability X < 1? P(X ≤ 1) = FX(1) = 1 − e−2 = 0.864664 . . . What is probability X ∈ [ 1

2, 1]?

P( 1 2 < X ≤ 1) = FX(1) − FX( 1 2 ) = 1 − e−2 − 1 + e−1 = 0.2325442 . . .

Slides by Mike Peardon (2011) minor modifs. by S. Sint (TCD) MAU22S06 - Data analysis Hilary T erm 2019 34 / 96

SLIDE 39

Probability density function

If p and q are brought closer together so q = p + dp then P(p ≤ X < p + dp) = FX(p + dp) − FX(p) ≈ FX(p) + dp dF dp − FX(p) ≈ dp dF dx

Probability density function

The probability density function gives the probability a random variable falls in an infinitesimally small interval, scaled by the size of the interval. fX(x) = lim

dx→0

P(x ≤ X < x + dx) dx For a random variable X in the range [a, b], FX(x) = x

a

fX(z)dz

Slides by Mike Peardon (2011) minor modifs. by S. Sint (TCD) MAU22S06 - Data analysis Hilary T erm 2019 35 / 96

SLIDE 40

Probability density function (2)

fX (the pdf) is not a probability. FX (the cdf) is. While fX is still non-negative, it can be bigger than one. For X in the range [a, b], FX(b) = 1 so fX ≥ 0 and b

a

fX(z) dz = 1

Slides by Mike Peardon (2011) minor modifs. by S. Sint (TCD) MAU22S06 - Data analysis Hilary T erm 2019 36 / 96

SLIDE 41

The uniform distribution

A random variable U that is in the range [a, b] is uniformly distributed if all values in that range are equally likely. This implies the pdf is a constant, fU(u) = α. Normalising this means ensuring b

a fU(u) du = 1.

fU(u) = 1 b − a and FU(u) = u − a b − a

pdf of uniform U[1

4, 3 2]

0.5

0.5 1 1.5 2

x

0.2 0.4 0.6 0.8 1

fX(x)

cdf of uniform U[1

4, 3 2]

0.5

0.5 1 1.5 2

x

0.2 0.4 0.6 0.8 1

FX(x)

Slides by Mike Peardon (2011) minor modifs. by S. Sint (TCD) MAU22S06 - Data analysis Hilary T erm 2019 37 / 96

SLIDE 42

The exponential distribution

For a positive parameter, λ > 0, a random variable W that is in the range [0, ∞] is called exponentially distributed if the density function falls exponentially. The pdf is proportional to e−λx. Normalising again means ensuring b

a fW(w) dw = 1. So

fW(w) =

λe−λw,

w ≥ 0 0, w < 0 and FW(w) =

1 − e−λw

w ≥ 0 w < 0

pdf of exponential(2)

0.5

0.5 1 1.5 2

x

0.5 1 1.5 2

FX(x)

cdf of exponential(2)

0.5

0.5 1 1.5 2

x

0.2 0.4 0.6 0.8 1

FX(x)

Slides by Mike Peardon (2011) minor modifs. by S. Sint (TCD) MAU22S06 - Data analysis Hilary T erm 2019 38 / 96

SLIDE 43

The normal distribution

The normal distribution N(μ, σ2) is parameterised by two numbers, μ and σ. pdf is the “bell curve” The cdf doesn’t have a nice expression (but it is sufficiently important to get its own name - erf(x). fW(x) = 1 σ

2π

e− (x−μ)2

2σ2

pdf of N(0.75,0.4)

0.5

0.5 1 1.5 2

x

0.25 0.5 0.75 1

FX(x)

cdf of N(0.75,0.4)

0.5

0.5 1 1.5 2

x

0.2 0.4 0.6 0.8 1

FX(x)

Slides by Mike Peardon (2011) minor modifs. by S. Sint (TCD) MAU22S06 - Data analysis Hilary T erm 2019 39 / 96

SLIDE 44

Continuous random variables (4)

An expected value of a continuous random variable can be defined, in analogy to that of the discrete random variable

Expected value

For a random variable X taking a value in [a, b], the expected value is defined as E[X] = b

a

z fX(z) dz As with discrete random variables, the easiest way to think of this is the running average of n samples of X as n gets very large. Can show E[αX + βY] = αE[X] + βE[Y]

Slides by Mike Peardon (2011) minor modifs. by S. Sint (TCD) MAU22S06 - Data analysis Hilary T erm 2019 40 / 96

SLIDE 45

Continuous random variables (5)

An expected value of a continuous random variable can be defined, in analogy to that of the discrete random variable

Variance

The variance of a continuous random variable X has the same definition: σ2

X = E[X2] − E[X]2

Again, like discrete random variables, the standard deviation is the square root of the variance. Both the variance and standard deviation are non-negative.

Slides by Mike Peardon (2011) minor modifs. by S. Sint (TCD) MAU22S06 - Data analysis Hilary T erm 2019 41 / 96

SLIDE 46

Example: E[U] and σ2

U of uniform U[a, b]

For U a uniform random variable on [a, b], what is

1

E[U]?

2

σ2

U?

Using definitions, E[U] = 1 b − a b

a

z dz = b + a 2 The mean is (as might be guessed) the mid-point of [a, b] Similarly, substituting to find E[X2] gives σ2

U =

(b − a)2 12 which depends only on b − a, the width of the range

Slides by Mike Peardon (2011) minor modifs. by S. Sint (TCD) MAU22S06 - Data analysis Hilary T erm 2019 42 / 96

SLIDE 47

Example (2): E[U] and σ2

U of exponential(λ)

For W an exponentially distributed variable with parameter λ

1

E[U]?

2

σ2

U?

Again using definitions, E[U] = b

a

w · λe−λw dz = 1 λ From the definition of E[X2], we get σ2

U =

1 λ2 so the expected value and standard deviation of exponentially distributed random variables are given by λ−1

Slides by Mike Peardon (2011) minor modifs. by S. Sint (TCD) MAU22S06 - Data analysis Hilary T erm 2019 43 / 96

SLIDE 48

Visualising a probability density function

Given some sample data, useful to plot a pdf. This can be done by binning data and plotting a histogram. Divide range [a, b] for possible values of X into N bins. Count mi, the number of times X lies in ri = [a + i(b−a)

N

, a + (i+1)(b−a)

N

). Plot mi vs x Care must be taken choosing bin-size; too big, structure will be lost, too small, fluctuations will add features.

Visualising the exponential distribution - 10,000 samples

2 4 6 8 10

x

1

pX(x) 10 bins in range 0 to 10

2 4 6 8 10

x

1

pX(x) 100 bins in range 0 to 10

2 4 6 8 10

x

1

pX(x) 1000 bins in range 0 to 10

Slides by Mike Peardon (2011) minor modifs. by S. Sint (TCD) MAU22S06 - Data analysis Hilary T erm 2019 44 / 96

SLIDE 49

Joint probability distributions

Sometimes in an experiment, we measure two or more Random Variables. Now the sample space is more complicated, but it is still possible to define events usefully. The cumulative distribution function is defined as a probability: FX,Y(x, y) = P(X ≤ x and Y ≤ y) x y Probability that (X, Y) lies inside lower-left quadrant defined by X ≤ x and Y ≤ y In this example, it would be approximated by the fraction

f red dots to the total number
f red and green dots.

Slides by Mike Peardon (2011) minor modifs. by S. Sint (TCD) MAU22S06 - Data analysis Hilary T erm 2019 45 / 96

SLIDE 50

Joint probability distributions (2)

Can write expressions for P(x0 ≤ X < x1 and y0 ≤ Y < y1) in terms of FX,Y: Get P(x0 < X ≤ x1 and y0 < Y ≤ y1) = FX,Y(x1, y1) − FX,Y(x0, y1) − FX,Y(x1, y0) + FX,Y(x0, y0) A joint probability density can be defined too: it is the ratio of the probability a point (X, Y) lands inside an infinitesimally small area dxdy located at (x, y) to the area dxdy: fX,Y(x, y) = lim

dx→0,dy→0

P(X ∈ [x, x + dx] and Y ∈ [y, y + dy]) dxdy

Slides by Mike Peardon (2011) minor modifs. by S. Sint (TCD) MAU22S06 - Data analysis Hilary T erm 2019 46 / 96

SLIDE 51

Joint probability distributions (3)

Independent random variables

T wo random variables, X and Y can be said to be independent if for all x and y, FX,Y(x, y) = FX(x) × FY(y) this is equivalent to fX,Y(x, y) = fX(x) × fY(y) As with independent events, if two random variables are independent, knowing something about one doesn’t allow us to infer anything about the other

Slides by Mike Peardon (2011) minor modifs. by S. Sint (TCD) MAU22S06 - Data analysis Hilary T erm 2019 47 / 96

SLIDE 52

Summary (1)

Mathematically, a random variable is a function taking an

utcome and returning a number

They can be discrete or continuous Their expected value is the sum of all possible values assigned to outcomes, weighted by the probability of each outcome. The variance (and standard deviation) of a random variable quantifies how much they fluctuate. In a binomial experiment, the random variable X that counts the number of successes out of n trials has probability P(X = k) = n

k

pk(1 − p)n−k, where p is the

probability a single trial is successful. Random occurences be modelled by the Poisson

distribution. The probability there will be k occurences if

μ are expected is P(X = k) = μke−μ

k!

Slides by Mike Peardon (2011) minor modifs. by S. Sint (TCD) MAU22S06 - Data analysis Hilary T erm 2019 48 / 96

SLIDE 53

Summary (2)

Continuous random variables can be described by a cumulative distribution function (cdf). It gives the probability X will be smaller than some reference value x. The probability density function (pdf) is the ratio of the probability a random variable will fall in an infinitesimally small range to the size of that range. Given the pdf, the expected value and variance of a continuous random variable can be computed by integration. If a random variable is sampled many times, an approximation to its pdf can be visualised by binning and plotting a histogram. If more than one random varialbe is measured, probabilities are described by joint distributions. T wo random variables are independent if their joint distribution is separable.

Slides by Mike Peardon (2011) minor modifs. by S. Sint (TCD) MAU22S06 - Data analysis Hilary T erm 2019 49 / 96

SLIDE 54

Sampling

Slides by Mike Peardon (2011) minor modifs. by S. Sint (TCD) MAU22S06 - Data analysis Hilary T erm 2019 50 / 96

SLIDE 55

Sample mean

For a sequence of n random numbers, {X1, X2, X3, . . . Xn}. The sample mean is ¯ X(n) = 1 n

n

i=1

Xi ¯ X(n) is also a random number. If all entries have the same mean, μX then E[¯ X(n)] = 1 n

n

i=1

E[Xi] = μX If all entries are independent and identically distributed then σ2

¯ X(n) =

1 n σ2

X

Slides by Mike Peardon (2011) minor modifs. by S. Sint (TCD) MAU22S06 - Data analysis Hilary T erm 2019 51 / 96

SLIDE 56

The law of large numbers

Jakob Bernoulli: “Even the stupidest man — by some instinct of nature per se and by no previous instruction (this is truly amazing) — knows for sure the the more

bservations that are taken, the less the danger will be of

straying from the mark”(Ars Conjectandi - 1713). But the strong law of large numbers was only proved in the 20th century (Kolmogorov, Chebyshev, Markov, Borel, Cantelli, . . . ).

The strong law of large numbers

If ¯ X(n) is the sample mean of n independent, identically distributed random numbers with well-defined expected value μX and variance, then ¯ X(n) converges almost surely to μX. P( lim

n→∞

¯ X(n) = μX) = 1

Slides by Mike Peardon (2011) minor modifs. by S. Sint (TCD) MAU22S06 - Data analysis Hilary T erm 2019 52 / 96

SLIDE 57

Example: exponential random numbers

X ¯ X(2) ¯ X(4) ¯ X(8) ¯ X(16) 0.299921 0.919602 1.013254 1.321258 1.147942 1.539283 1.084130 1.106906 1.129681 0.001301 0.619788 1.629262 1.238275 4.597920 2.638736 0.679552 0.528081 0.901572 0.923931 0.974625 1.275064 0.873661 0.946290 1.018920 0.980259 1.047953 1.025319 1.115647 1.664513 1.002685 0.340858

Slides by Mike Peardon (2011) minor modifs. by S. Sint (TCD) MAU22S06 - Data analysis Hilary T erm 2019 53 / 96

SLIDE 58

The central limit theorem

As the sample size n grows, the sample mean looks more and more like a normally distributed random number with mean μX and standard deviation σX/n

The central limit theorem (de Moivre, Laplace, Lyapunov,. . . )

The sample mean of n independent, identically distributed random numbers, each drawn from a distribution with expected value μX and standard deviation σX obeys lim

n→∞ P(

−aσ n < ¯ X(n) − μX < +aσ n ) = 1

2π

a

−a

e−x2/2dx

Slides by Mike Peardon (2011) minor modifs. by S. Sint (TCD) MAU22S06 - Data analysis Hilary T erm 2019 54 / 96

SLIDE 59

The central limit theorem (2)

The law of large numbers tells us we can find the expected value of a random number by repeated sampling The central limit theorem tells us how to estimate the uncertainty in our determination when we use a finite (but large) sampling. The uncertainty falls with increasing sample size like

1 n

Slides by Mike Peardon (2011) minor modifs. by S. Sint (TCD) MAU22S06 - Data analysis Hilary T erm 2019 55 / 96

SLIDE 60

The central limit theorem

An example: means of bigger sample averages of a random number X with n = 1, 2, 5, 50

0.2 0.4 0.6 0.8 1 2 4 6 8 10 12 14 n=1 0.2 0.4 0.6 0.8 1 2 4 6 8 10 12 14 n=2 0.2 0.4 0.6 0.8 1 2 4 6 8 10 12 14 n=5 0.2 0.4 0.6 0.8 1 2 4 6 8 10 12 14 n=50

Slides by Mike Peardon (2011) minor modifs. by S. Sint (TCD) MAU22S06 - Data analysis Hilary T erm 2019 56 / 96

SLIDE 61

Confidence intervals

The central limit theorem tells us that for sufficiently large sample sizes, all sample means are normally distributed. We can use this to estimate probabilities that the true expected value of a random number lies in a range.

One sigma

What is the probability a sample mean ¯ X is more than one standard deviation σ¯

X = σX/n from the expected value μX? If

n is large, we have P(−σ¯

X < ¯

X − μX < σ¯

X) =

1

2π

1

−1

e−x2/2 dx = 68.3% These ranges define confidence intervals . Most commonly seen are the 95% and 99% intervals

Slides by Mike Peardon (2011) minor modifs. by S. Sint (TCD) MAU22S06 - Data analysis Hilary T erm 2019 57 / 96

SLIDE 62

Confidence intervals (2)

Most commonly seen are the 95%(2σ) and 99%(3σ) intervals. P

−σ¯

X < ¯

X − μX < σ¯

X

68.2%

P

−2σ¯

X < ¯

X − μX < 2σ¯

X

95.4%

P

−3σ¯

X < ¯

X − μX < 3σ¯

X

99.7%

P

−4σ¯

X < ¯

X − μX < 4σ¯

X

99.994%

P

−5σ¯

X < ¯

X − μX < 5σ¯

X

99.99994%

P

−10σ¯

X < ¯

X − μX < 10σ¯

X

99.9999999999999999999985%

The standard deviation is usually measured from the sample variance. Beware - the “variance of the variance” is usually large. Five-sigma events have been known ...

Slides by Mike Peardon (2011) minor modifs. by S. Sint (TCD) MAU22S06 - Data analysis Hilary T erm 2019 58 / 96

SLIDE 63

Sample variance

With data alone, we need a way to estimate the variance

f a distribution. This can be estimated by measuring the

sample variance:

Sample variance

For n > 1 independent, identically distributed samples of a random number X, with sample mean ¯ X, the sample variance is ¯ σ2

X =

1 n − 1

n

i=1

(Xi − ¯ X)2 Now we quantify fluctuations without reference to (or without knowing) the expected value, μX. Note the n − 1 factor. One “degree of freedom” is absorbed into “guessing” the expected value of X

Slides by Mike Peardon (2011) minor modifs. by S. Sint (TCD) MAU22S06 - Data analysis Hilary T erm 2019 59 / 96

SLIDE 64

Student’s t-distribution

In 1908, William Gosset, while working for Guinness in St.James’ Gate published under the pseudonym “Student” Computes the scaling to define a confidence interval when the variance and mean of the underlying distribution are unknown and have been estimated

Student’s t-distribution

fT(t) = Γ( n

2)

π(n − 1)Γ( n−1

2 )

1 +

t2 n − 1 −n/2 Used to find the scaling factor c(γ, n) to compute the γ confidence interval for the sample mean P(−c¯ σ < μX < c¯ σ) = γ For n > 10, the t-distribution looks very similar to the normal distribution

Slides by Mike Peardon (2011) minor modifs. by S. Sint (TCD) MAU22S06 - Data analysis Hilary T erm 2019 60 / 96

SLIDE 65

Student’s t-distribution (2)

3
2
1

1 2 3

x

0.2 0.4

fX(x)

blue - normal distribution red - Student t with n = 2.

Slides by Mike Peardon (2011) minor modifs. by S. Sint (TCD) MAU22S06 - Data analysis Hilary T erm 2019 61 / 96

SLIDE 66

Student’s t-distribution (3)

For example, with just 2 samples, the sample mean and variance can be computed but now the confidence levels are: P

− ¯

σX < ¯ X − μX < ¯ σX

50%

P

−2¯

σX < ¯ X − μX < 2¯ σX

70.5%

P

−3¯

σX < ¯ X − μX < 3¯ σX

79.5%

P

−4¯

σX < ¯ X − μX < 4¯ σX

84.4%

P

−5¯

σX < ¯ X − μX < 5¯ σX

87.4%

P

−10¯

σX < ¯ X − μX < 10¯ σX

93.7%

“Confidences” are much lower because variance is very poorly determined with only two samples.

Slides by Mike Peardon (2011) minor modifs. by S. Sint (TCD) MAU22S06 - Data analysis Hilary T erm 2019 62 / 96

SLIDE 67

Modelling statistical (Monte Carlo) data

Often, we carry out experiments to test a hypothesis. Since the result is a stochastic variable, the hypothesis can never be proved or disproved. Need a way to assign a probability that the hypothesis is

false. One place to begin: the χ2 statistic.

Suppose we have n measurements, ¯ Yi, i = 1..n each with standard deviation σi. Also, we have a model which predicts each measurement, giving yi.

The χ2 statistic

χ2 =

n

i=1

(¯ Yi − yi)2 σ2

i

Slides by Mike Peardon (2011) minor modifs. by S. Sint (TCD) MAU22S06 - Data analysis Hilary T erm 2019 63 / 96

SLIDE 68

Goodness of fit

χ2 ≥ 0 and χ2 = 0 implies ¯ Yi = yi for all i = 1..n (ie the model and the data agree perfectly). Bigger values of χ2 imply the model is less likely to be true. Note χ2 is itself a stochastic variable

Rule-of-thumb

χ2 ≈ n for a good model

Slides by Mike Peardon (2011) minor modifs. by S. Sint (TCD) MAU22S06 - Data analysis Hilary T erm 2019 64 / 96

SLIDE 69

Models with unknown parameters - fitting

The model may depend on parameters αp, p = 1 . . . m Now, χ2 is a function of these parameters; χ2(α). If the parameters are not know a priori, the “best fit” model is described by the set of parameters, α∗ that minimise χ2(α), so ∂χ2(α) ∂αp

α∗

= 0 For linear models; yi = m

p=1 αpqp i , finding α∗ is equivalent

to solving a linear system. For more general models, finding minima of χ2 can be a

challenge. . .

Slides by Mike Peardon (2011) minor modifs. by S. Sint (TCD) MAU22S06 - Data analysis Hilary T erm 2019 65 / 96

SLIDE 70

Example - one parameter fit

Fit a straight line through the origin

Consider the following measured data Yi ± σi, i = 1..5 for inputs xi i xi Yi σi 1 0.1 0.25 0.05 2 0.5 0.90 0.10 3 0.7 1.20 0.05 4 0.9 1.70 0.10 5 1.0 2.20 0.20 Fit this to a straight line through the origin, so our model is y(x) = αx with α an unknown parameter we want to determine Result: α = 1.8097 and χ2 = 8.0.

Slides by Mike Peardon (2011) minor modifs. by S. Sint (TCD) MAU22S06 - Data analysis Hilary T erm 2019 66 / 96

SLIDE 71

Example - one parameter fit (2)

0.2 0.4 0.6 0.8 1

x

0.5 1 1.5 2 2.5 3

y

Slides by Mike Peardon (2011) minor modifs. by S. Sint (TCD) MAU22S06 - Data analysis Hilary T erm 2019 67 / 96

SLIDE 72

Models with unknown parameters - fitting (2)

Example: fitting data to a straight line

Suppose for a set of inputs, xi, i = 1..n we measure output ¯ Yi ± σi. If Y is modelled by a simple straight-line function; yi = α1 + α2xi, what values of {α1, α2} minimise χ2? χ2(α1, α2) is given by χ2(α1, α2) =

n

i=1

(¯ Yi − α1 − α2xi)2 σ2

i

The minimum is at α∗

1

= A22b1 − A12b2 A11A22 − A2

12

α∗

2

= A11b2 − A12b1 A11A22 − A2

12

Slides by Mike Peardon (2011) minor modifs. by S. Sint (TCD) MAU22S06 - Data analysis Hilary T erm 2019 68 / 96

SLIDE 73

Models with unknown parameters - fitting (3)

Example: fitting data to a straight line

A11 =

n

i=1

1 σ2

i

A12 =

n

i=1

xi σ2

i

b1 =

n

i=1

¯ Yi σ2

i

A22 =

n

i=1

x2

i

σ2

i

b2 =

n

i=1

xi¯ Yi σ2

i

The best-fit parameters, α∗

1,2 are themselves stochastic

variables, and so have a probabilistic distribution A range of likely values must be given; the width is approximated by σα

1 =

A22

A11A22 − A2

12

, σα

2 =

A11

A11A22 − A2

12

Slides by Mike Peardon (2011) minor modifs. by S. Sint (TCD) MAU22S06 - Data analysis Hilary T erm 2019 69 / 96

SLIDE 74

Example - two parameter fit (2)

0.2 0.4 0.6 0.8 1

x

0.5 1 1.5 2 2.5 3

y

Now χ2 goes down from 8.0 → 7.1.

Slides by Mike Peardon (2011) minor modifs. by S. Sint (TCD) MAU22S06 - Data analysis Hilary T erm 2019 70 / 96

SLIDE 75

Example - try both fits again ...

0.2 0.4 0.6 0.8 1

x

0.5 1 1.5 2 2.5 3 3.5

y

Now χ2 is 357 for the y = αx model but still 7.1 for the y = α1 + α2x model. The first model should be ruled out.

Slides by Mike Peardon (2011) minor modifs. by S. Sint (TCD) MAU22S06 - Data analysis Hilary T erm 2019 71 / 96

SLIDE 76

Uncertainty propagates

The best fit parameter(s) α∗ have been determined from statistical data - so we must quote an uncertainty. How precisely have they been determined? α∗ is a function of the statistical data, ¯

Y. A statistical

fluctuation in ¯ Y of d¯ Y would result in a fluctuation in α∗ of

dα∗ d¯ Y d¯

Y. All the measured Y values fluctuate but if they are independent, the fluctuations only add in quadrature so:

Error in the best fit parameters:

σ2

α∗ = n

i=1

dα∗ dYi 2 σ2

i

Slides by Mike Peardon (2011) minor modifs. by S. Sint (TCD) MAU22S06 - Data analysis Hilary T erm 2019 72 / 96

SLIDE 77

Uncertainty propagates (2)

Back to our example:

One-parameter fit

We found α∗ = b/A with A =

n

i=1

x2

i

σ2

i

and b =

n

i=1

xiyi σ2

i

So dα∗

dyi = 1 A db dyi since A is fixed. We get

σ2

α∗ =

1 A2

n

i=1

xi σ2

i

2 σ2

i =

1 A Back to our first example: We quote α∗ = 1.81 ± 0.05.

Slides by Mike Peardon (2011) minor modifs. by S. Sint (TCD) MAU22S06 - Data analysis Hilary T erm 2019 73 / 96

SLIDE 78

Introduction to

Markov processes

Slides by Mike Peardon (2011) minor modifs. by S. Sint (TCD) MAU22S06 - Data analysis Hilary T erm 2019 87 / 96

SLIDE 79

Markov chains

In 1906, Markov was interested in demonstrating that independence was not necessary to prove the (weak) law

f large numbers.

He analysed the alternating patterns of vowels and consonants in Pushkin’s novel “Eugene Onegin”. In a Markov process, a system makes stochastic transitions such that the probability of a transition

ccuring depends only on the start and end states. The

system retains no memory of how it came to be in the current state. The resulting sequence of states of the system is called a Markov chain.

Slides by Mike Peardon (2011) minor modifs. by S. Sint (TCD) MAU22S06 - Data analysis Hilary T erm 2019 88 / 96

SLIDE 80

Markov chains

A Markov chain

A system can be in any one of n distinct states, denoted {χ1, χ2, . . . χn}. If ψ(0), ψ(1), ψ(2), . . . is a sequence of these states observed at time t = 0, 1, 2, . . . and generated by the system making random jumps between states so that the conditional probability P(ψ(t) = χi|ψ(t − 1) = χj, ψ(t − 2) = χk, . . . ψ(0) = χz) = P(ψ(t) = χi|ψ(t − 1) = χj) then the sequence {ψ} is called a Markov Chain If P(ψ(t) = χi|ψ(t − 1) = χj) does not depend on t, then the sequence is called a homogenous Markov chain - we will consider only homogenous Markov chains.

Slides by Mike Peardon (2011) minor modifs. by S. Sint (TCD) MAU22S06 - Data analysis Hilary T erm 2019 89 / 96

SLIDE 81

Markov matrix

The conditional probabilities describe the probability of the system jumping from state χj to state χi. There are n × n possible jumps and these probabilities can be packed into a matrix, with elements Mij being the probability of jumping from j to i.

The Markov matrix

A matrix containing the n × n probabilities of the system making a random jump from j → i is called a Markov matrix Mij = P(ψ(t + 1) = χi|ψ(t) = χj) Since the entries are probabilities, and the system is always in a well-defined state, a couple of properties follow. . . 0 ≤ Mij ≤ 1 n

i=1 Mij = 1

Slides by Mike Peardon (2011) minor modifs. by S. Sint (TCD) MAU22S06 - Data analysis Hilary T erm 2019 90 / 96

SLIDE 82

Markov Processes (4)

Dublin’s weather

An example: On a rainy day in Dublin, the probability tomorrow is rainy is 80%. Similarly, on a sunny day the probability tomorrow is sunny is 40%. This suggests Dublin’s weather can be described by a (homogenous) Markov process. Can we compute the probability any given day is sunny or rainy? For this system, the Markov matrix is Sunny Rainy Sunny Rainy

0.4

0.6 0.2 0.8

Slides by Mike Peardon (2011) minor modifs. by S. Sint (TCD)

MAU22S06 - Data analysis Hilary T erm 2019 91 / 96

SLIDE 83

Dublin’s weather (2)

If today is sunny we can write the state as ψ(0) =

1
,

and the state tomorrow is then ψ(1) =

0.4

0.6

, and

ψ(2) =

0.28

0.72

, ψ(3) =
0.256

0.744

, . . .

If today is rainy we can write the state as ψ(0) =

1
,

and the state tomorrow is then ψ(1) =

0.2

0.8

, and

ψ(2) =

0.24

0.76

, ψ(3) =
0.248

0.752

,

The vector ψ quickly collapses to a fixed-point, which must be π, the eigenvector of M with eigenvalue 1, normalised such that 2

i=1 πi = 1.

Slides by Mike Peardon (2011) minor modifs. by S. Sint (TCD) MAU22S06 - Data analysis Hilary T erm 2019 92 / 96

SLIDE 84

Dublin’s weather (3)

Finding the probability of sun or rain a long time in the future is equivalent to solving

0.4

0.2 0.6 0.8 π1 π2

=
π1

π2

with the normalising condition for the probabilities;

π1 + π2 = 1 We find π =

0.25

0.75

. This is the invariant probability

distribution of the process; with no prior information these are the probabilities any given day is sunny (25%) or rainy (75%).

Slides by Mike Peardon (2011) minor modifs. by S. Sint (TCD) MAU22S06 - Data analysis Hilary T erm 2019 93 / 96

SLIDE 85

Population migrations

In a year, the fraction of the population of three provinces A, B and C who move between provinces is given by From/ T

A

B C A 1% 1% B 3% 2% C 7% 7% Show the stable populations of the three provinces are in the proportions 8 : 3 : 1

Slides by Mike Peardon (2011) minor modifs. by S. Sint (TCD) MAU22S06 - Data analysis Hilary T erm 2019 94 / 96

SLIDE 86

Winning a T ennis game at Deuce

T wo tennis players, Alice and Bob have reached Deuce. The probability Alice wins a point is p while the probability Bob wins is q = 1 − p. Write a Markov matrix describing transitions this system can make. Answer: M =      1 p p q p q q 1      with states given by χ = {A wins, Adv A, Deuce, Adv B, B wins}

Slides by Mike Peardon (2011) minor modifs. by S. Sint (TCD) MAU22S06 - Data analysis Hilary T erm 2019 95 / 96

SLIDE 87

Winning a T ennis game at Deuce (2)

Remember: entry Mij = P(ψ(t + 1) = χi|ψ(t) = χi) Some transitions are forbidden (like Adv A → Adv B) Some states are “absorbing” - once in that state, the system never moves away. With a bit of work, it is possible to see the long-time average after starting in state χ3 ≡ Deuce is π3 =       

p2 1−2pq q2 1−2pq

      

1

The tennis game ends with probability 1

2

Alice wins with probability

p2 1−2pq

Slides by Mike Peardon (2011) minor modifs. by S. Sint (TCD) MAU22S06 - Data analysis Hilary T erm 2019 96 / 96